Pep & Nom

Alexander the Great: What wish can I grant you? Diogenes: Stand out of my light. Diogenes

documentation for the syntagma language

The following information was extracted from /tr/syntagma.pss but at the moment it is the best documentation for the language that is available.

begin blocks only execute once at the beginning of the script


  begin {
    # create a time variable. The names $line,$char,$counter and the 
    # $1,$2, etc are reserved. Variables must be declared in the 
    # begin block.
    var $time;
    var $server = 'ssh://etc';
  }

single line comments allowed

#<star> multiline comments between these <star># print 'hello'; exit 3; quit; delete;

at eof, delete the pattern space, print text and exit with code '4'

EOF { delete; print 'yes'; exit 4; }

delete all instances of 'green' in the pattern space text.

delete 'green';

ignore all whitespace (and delete it)

ignore [:space:]; ignore: [:space:]; # the same ig [:space:]; # the same delete: [:space:]; # the same

delete one char from the left of the pattern space

ltrim; print “line: $line, char: $char” ; # interpolate line number with $line etc

use the accumulator counter

print “counter is $counter” ;

non interpolating between single quotes

print ' $delimiter is a syntagma system variable';

concatenation of everything with the dot operator

user variables wont interpolate.

begin { var $name := 'smooth'; } a = b c {@1 := “$1+$2” .$1.' and '.$counter.' name:'.$name; }

double quotes are allowed and interpolate variables

print “problem at line $line \n” ;

print text with a newline at the end

println “hi";

”

interpolate special variables in the print string, but only

in double quotes.

println “the line count is $line” ;

make token 'capword' if the text begins with A-Z

[:alpha:]+ { capword: begins [A-Z]; }

star lexing for zero or more characters, with get class

[0-9]+ { get [.]*; ends “.” { println “integer with dots” ; }}

different ways to make literal tokens

you have to define them before you can use them in rules

lit: 'x'; lit: [0-4]|'a'|'b'|';' ; literal: [(){}] ; # braces as literal tokens

lex zero or more alphanumeric characters from the input stream

get [:alnum:]*;

lex zero or more non space characters from the input stream

get not [:space:]*;

make multicharacter literals and non-literals within a lex block

[:=]+ { lit: ':'|'='; assign: ':='; equals: '=='; { println “strange syntax on line $line” ; exit 1; } }

multicharacter literals

[:alpha:]+ { lit: “while"|"if"|"begin"|"end"; }

” begins '<' and ends '>' { print “tag"; tag: *; }”

make token 'word' for all alpha numeric sequences

word: [:alnum:]+ ; name: * ; # default lex rule

I think I will dispose of the 'match' keyword

match empty { exit; } match 'abcd' { print 'hi'; exit; } match not empty { print 'Extra char on line $line'; exit 2; }

matching within text blocks


  [a-z]+ { 
    # match a,aa,aaa,aaaa etc, same as [a]
    alist: only 'a';
    # match a,ba,ab,aa,bb,bbb etc, same as [ab]
    ablist: only 'ab';
    ablist: [ab];    # same as above 
    list: [ab];  the same 
  }

  [:digit:]+ {
    # 'not only doesnt work because it compiles to ![0] in nom.
    0number: begins '0' and not only [0]; 
  }

for all alphabetic sequences, if the text begins with '<'

and ends with '>' then, if the text begins with '<' make a

"link" parse token, and if not, make a "tag" parse token

[:alpha]+ { match begins '<' and ends '>' { link: begins '<a '; tag: ; } }

match empty { print 'missing char at char $char'; exit 2; } punct: NOT [:alnum:]+ ; # negated classes x: not 'a'; register: '[' to ']' ; # 1^st item is only 1 char presently name: [.:] to '.' ; # from '.' or ':' to the next '.' item: “/” to “:end” ; # 2^nd item can be a string item: '/' TO '/' ; # ?? same but thows error if no end '/' file: '/' between [:space:] # up to but not including any space char. [:alpha:]+ { keyword: 'is'|'to'|'go'; name: 'tree'; name: [:alpha:]; # this is the default, no plus required } [a-z]+ { num: 'one'|'two';

print an error message and quit if no matches

print 'invalid word\n'; exit 2; }

negated class blocks

NOT [:space:]+ { key: '/find/'; print 'not a space'; exit; }

space: ' '; newline : '\n';

literal: [;:] ; # def of literal tokens (only in lex part)

------------------------

the parsing section - these rules must all come after the

lexing rules above

check the value of the second word in this parse rule

phrase = word word { “green” == $2 { ...} [:space:] == $2 { ...} not begins “the” == $story { ...} }

alternation with same length sequences

block = '[' statement ']' | '[' statementset ']' ;

alternation same length RHS sequences and rule block

a b = '[' c ']' | '[' c ']' { print “alternation\n"; } ”

alternation with unequal length sequences, but no rule block.

a b = c d | e f g;

optionals between <...>

a = b < x y | p q > c;

look-behind syntax with +(...)

expression = +('/'|'*') number ;

look-behind syntax with +(...) and a rule-block. The attribute variable

refers to the 1st token after the lookbehind.

expression = +('/'|'*') number { @1 = “($1)"; }

”

lookahead syntax with +(...)

a = ex '*' ex +('/'|'*') ;

look ahead with negative rules but tokens must be quoted which

is silly unless we are dealing with literal tokens

a b = c d +(not “f” and not “j");” a = c d +(not ';' and not '.');

look ahead syntax with code block. The attributes of '.' ',' and x

are automatically copied to their new positions on the stack.

a b = x y z +('.' x | ',' x) { @1 := “$1 or $2” ; @2 := “$1 and $2” ; println “found xyz followed by .x or ,x” ; exit 1; }

lookahead with

o = colour shape +(';' | block) ; option = name digit ';' ; # use literal char token in parse rule object = colour ':' shape; # lit token, but must define earlier () = space word; # just delete tokens in the parse section

check if the stack contains only a list token at end of file

eof { stack (list) { print “list found\n” ; }} eof { stack (a b | x y) { print “list found\n” ; }} eof { () = x y { println “at eof parse stack ends with 'x y'” ; }}

check if the parse stack is list or number or float.

eof { parse (list|number|float) { print “list found\n” ; }} EOF: words = words word; # only reduces at end of stream EOF { name = first second; }

,,,,