How to debug ℕ𝕠𝕞 scripts
Since pep/nom is a parsing and compiling system there are 2 concurrent processes happening during the execution of a ℕ𝕠𝕞 script: the reduction of tokens on the grammar parse stack and the “assembling” of the attributes of those tokens to create the translated/transpiled/compiled output text.
It is often necessary to debug scripts since the ℙ𝕖𝕡 virtual machine is not trivial and ℕ𝕠𝕞 is a relatively “low-level” language. See the script /eg/toyBNF.pss for the beginning of a “higher-level” language which compiles to nom*.
Luckily there are a number of techniques to debug nom scripts which are detailed below.
This is possibly the most useful debug technique to ensure that the grammar you have designed (for whatever language or pattern that you wish to recognise or transpile) is functioning properly.
As the worlds leading, (and possibly only) expert on the Nom language this is the technique that I use most.
Because pep/nom is a “filter” style system (which writes output to “stdout ” ), we can print the stack and line/character number after each reduction and watch the grammar in action. This is a very useful technique for debugging grammars and scripts.
The “print” statements are placed just after the “parse>” label. The 2 lines below should probably be included in every non-trivial script. When the script is working well the lines can be commented-out.
parse>
add "# "; lines; add ":"; chars; add " "; print; clear;
unstack; print; stack; add "\n"; print; clear;
The “less” program makes it possible to search for any particular token by name, to watch it being reduced. We can also search by input line number.
pep -f eg/script.pss file.txt | less
The state
command is an extremely useful nom command which displays the
internal state of the pep machine at the moment that the command executes.
This is possibly the second most useful debugging technique after watching the
parse stack reductions.
The translation scripts at /tr/ also implement this command.
For some reason, I was going to remove this command from the Nom language. No idea why.
The ℙ𝕖𝕡 interpreter also includes a fully interactive debug mode which is activated with the -I switch. This interactive debugger has a whole list of different commands to step through the script or run the script until a certain point and then view the state of the pep virtual machine
This facility is the 'big mamma' of debug techniques and hopefully you will not need to use it too often. It is a bit like having to use [gdb] to debug a c program.
pep -If someScript input.txt
pep -Ia asm.pp someScript
pep -a asm.pp someScript
(Now you can step through the compiled program “asm.pp” and watch as
it parses and compiles “someScript". Generally, use” rr" to run the
whole script, and “rrw text” to run the script until the workspace
is some particular text. This helps to narrow down where the asm.pp
compiler is not parsing the input script correctly.
Once in an interactive “pep” session, there are many commands to run and debug a script. Type hh to get a full list of available commands. For example:
-
- count - execute the next instruction in the program (step)
- mark - view the state of the machine (stack/ ??workspace/ ??registers/ ??tape/ ??program)
- rrw & ??lt;text> - run the script until the workspace is exactly some text.
- rre & ??lt;text> - run script until the workspace ends with something
- rrc & ??lt;num> - run script with & ??lt;num> characters of input.
- rr - run the whole script from the current instruction
- go.read - reset the virtual machine and input stream
(but not the compiled program)
If the script did not compile properly there will only be 1
instruction ( quit
). But this almost never happens these days
(2025)
Probably the most primitive techique is just using the print
command to show the contents of the workspace at a given time and
also commenting out problematic lines and blocks
"article*noun*" {
# !!! no clear.
add "nounphrase*"; push; .reparse
}
"sentence*" { get; ++; put; } # unbalanced tape increment
Generally if you increment the tape pointer with ++
then you
will have to decrement it with --
in the same block. There are
exceptions to this rule, since you are free to write your scripts
however you want, and to use the virtual machine in any way you wish.
add "noun*verb*noun*"; push; push; # << error, 3 tokens, 2 pushes
"article*noun*" {
clear; add "nounphrase*"; push;
# error! no '.reparse' command
}
The .reparse
command is important for ensuring that all
grammar reductions take place. It also acts as an if/else logic
structure because code in the same block, after the .reparse
command will not execute.
"."{ clear; } print; clear; # << error: no read in script
Two “pop” commands does not guarantee that there are 2 tokens in the workspace. The stack may be empty, or may contain only 1 token.
pop; pop;
B"noun*".!"noun*".!E"verb*" {
# process tokens here.
}
Often we expect a certain order of tokens, without realising that an extra token has already been parsed and pushed onto the stack.
In extreme cases, you may wish to see how a script is “compiled ” into the ℙ𝕖𝕡 assembly language. This would be only if you suspect that there is a bug in the compiler. The debugging techniques mentioned above a more practical.
pep -f compile.pss script
The compiled script will be printed to stdout and saved in sav.pp
or an error message will be displayed if the script has a syntax
error.
Sometimes the line above is useful for finding errors in a script which are not caught during the script loading process.
Use the script eg/nom.reference.syntax.pss
to check the syntax
of the script.
pep -f eg/mark.latex.simple.pss pars-book.txt | sed '/%% ---/q;' | sed 's/^[^:]*: *//;s/\* *$//' | tr '*' '\n' | sort | uniq