Pep and Nom

home | documentation | examples | translators | download | blog | all blog posts

debugging a ℕ𝕠𝕞 script

How to debug ℕ𝕠𝕞 scripts

Since pep/nom is a parsing and compiling system there are 2 concurrent processes happening during the execution of a ℕ𝕠𝕞 script: the reduction of tokens on the grammar parse stack and the “assembling” of the attributes of those tokens to create the translated/transpiled/compiled output text.

It is often necessary to debug scripts since the ℙ𝕖𝕡 virtual machine is not trivial and ℕ𝕠𝕞 is a relatively “low-level” language. See the script /eg/toyBNF.pss for the beginning of a “higher-level” language which compiles to nom*.

Luckily there are a number of techniques to debug nom scripts which are detailed below.

watch the parse stack reductions

This is possibly the most useful debug technique to ensure that the grammar you have designed (for whatever language or pattern that you wish to recognise or transpile) is functioning properly.

As the worlds leading, (and possibly only) expert on the Nom language this is the technique that I use most.

Because pep/nom is a “filter” style system (which writes output to “stdout ” ), we can print the stack and line/character number after each reduction and watch the grammar in action. This is a very useful technique for debugging grammars and scripts.

The “print” statements are placed just after the “parse>” label. The 2 lines below should probably be included in every non-trivial script. When the script is working well the lines can be commented-out.

visualise the stack token reductions with line/character numbers


  parse>
    add "# "; lines; add ":"; chars; add " "; print; clear; 
    unstack; print; stack; add "\n"; print; clear; 
  

The “less” program makes it possible to search for any particular token by name, to watch it being reduced. We can also search by input line number.

watch token reduction and search for a particular reduction with less
 pep -f eg/script.pss file.txt | less

the state command

The state command is an extremely useful nom command which displays the internal state of the pep machine at the moment that the command executes. This is possibly the second most useful debugging technique after watching the parse stack reductions.

The translation scripts at /tr/ also implement this command.

For some reason, I was going to remove this command from the Nom language. No idea why.

the interactive debugger

The ℙ𝕖𝕡 interpreter also includes a fully interactive debug mode which is activated with the -I switch. This interactive debugger has a whole list of different commands to step through the script or run the script until a certain point and then view the state of the pep virtual machine

This facility is the 'big mamma' of debug techniques and hopefully you will not need to use it too often. It is a bit like having to use [gdb] to debug a c program.

load a script and view/execute/step through it interactively
 pep -If someScript input.txt

interactively view how some script is being compiled by "asm.pp"
 pep -Ia asm.pp someScript 
 pep -a asm.pp someScript 

(Now you can step through the compiled program “asm.pp” and watch as it parses and compiles “someScript". Generally, use” rr" to run the whole script, and “rrw text” to run the script until the workspace is some particular text. This helps to narrow down where the asm.pp compiler is not parsing the input script correctly.

Once in an interactive “pep” session, there are many commands to run and debug a script. Type hh to get a full list of available commands. For example:

some commands in the interactive debugger


  -
  -  count - execute the next instruction in the program (step)
  -  mark - view the state of the machine (stack/ ??workspace/ ??registers/ ??tape/ ??program)
  -  rrw & ??lt;text> - run the script until the workspace is exactly some text.
  -  rre & ??lt;text> - run script until the workspace ends with something
  -  rrc & ??lt;num> - run script with & ??lt;num> characters of input.
  -  rr  - run the whole script from the current instruction
  -  go.read - reset the virtual machine and input stream 
          (but not the compiled program)
  

If the script did not compile properly there will only be 1 instruction ( quit ). But this almost never happens these days (2025)

commenting out lines and printing

Probably the most primitive techique is just using the print command to show the contents of the workspace at a given time and also commenting out problematic lines and blocks

common script bugs and errors

not clearing the parse tokens before reducing


    "article*noun*" {
      # !!! no clear. 
      add "nounphrase*"; push; .reparse
    }
  

make sure to balance the "++" and "--" commands
 "sentence*" { get; ++; put; }  # unbalanced tape increment

Generally if you increment the tape pointer with ++ then you will have to decrement it with -- in the same block. There are exceptions to this rule, since you are free to write your scripts however you want, and to use the virtual machine in any way you wish.

make sure that you are pushing as many times as there are tokens.
 add "noun*verb*noun*"; push; push; # << error, 3 tokens, 2 pushes

in a block, if you "push" the tokens back, you need to .reparse


    "article*noun*" {
      clear; add "nounphrase*"; push; 
      # error! no '.reparse' command
    }
  

The .reparse command is important for ensuring that all grammar reductions take place. It also acts as an if/else logic structure because code in the same block, after the .reparse command will not execute.

make sure there is at least one read command in the script
 "."{ clear; } print; clear; # << error: no read in script

Two “pop” commands does not guarantee that there are 2 tokens in the workspace. The stack may be empty, or may contain only 1 token.

check that the workspace has 2 tokens, and last is not a verb


    pop; pop;
    B"noun*".!"noun*".!E"verb*" {
       # process tokens here.
    }
  

Often we expect a certain order of tokens, without realising that an extra token has already been parsed and pushed onto the stack.

view compilation of a script

In extreme cases, you may wish to see how a script is “compiled ” into the ℙ𝕖𝕡 assembly language. This would be only if you suspect that there is a bug in the compiler. The debugging techniques mentioned above a more practical.

see how a particular script is compiled to "assembler" format
 pep -f compile.pss script

The compiled script will be printed to stdout and saved in sav.pp or an error message will be displayed if the script has a syntax error.

Sometimes the line above is useful for finding errors in a script which are not caught during the script loading process.

check the syntax of a ℕ𝕠𝕞 script

Use the script eg/nom.reference.syntax.pss to check the syntax of the script.

miscelaneous techniques

get a unique list of tokens used during parsing
 pep -f eg/mark.latex.simple.pss pars-book.txt | sed '/%% ---/q;' | sed 's/^[^:]*: *//;s/\* *$//' | tr '*' '\n' | sort | uniq