top | item 32953615

(no title)

benjaminjosephw | 3 years ago

> langcc is general enough that the tool is self-hosting: that is, one can express the "language of languages" in the "language of languages" itself

There's something deeply satisfying about recursive tools who's inputs can include the definition of the tool itself. Brilliant.

discuss

order

cakoose|3 years ago

That claim is sort misleading. The complexity of the specification's own grammar is largely unrelated to it's expressiveness.

I'd expect most parser generators in this category to be able to parse their own grammars.

Imagine a very weak parser that can only handle LR0. But if it uses a Lisp-like grammar language, it too is self-hosting.

kazinator|3 years ago

> The complexity of the specification's own grammar is largely unrelated to it's expressiveness.

E.g.:

  bitstring : bistring '1'
            | bitstring '0'
            | /* empty */
            ;
this expresses all that can be expressed. The rest is semantics.

kragen|3 years ago

The OG of recursive parser generators whose inputs can include the definition of the parser generator itself is D. Val Schorre's META-II, which compiles itself (or other compilers written in the same grammar language) to an assembly language for a parsing-oriented abstract machine. Thanks to the ACM's enlightened policies, the META-II paper is now freely available (though not yet, as far as I can tell, open access): https://dl.acm.org/doi/10.1145/800257.808896

I wrote an improved version called Meta5ix http://canonical.org/~kragen/sw/dev3/meta5ix.m5 (an interpreter for the virtual machine, with a complete description, is at http://canonical.org/~kragen/sw/dev3/meta5ixrun.py, with the precompiled grammar at http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm) and adapted it to produce C: http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (precompiled version at http://canonical.org/~kragen/sw/dev3/meta5ix2c.c).

Meta5ix written in itself is only 18 lines of code. Briefly, "" encloses expected input, {} encloses a line of output, [] encloses repeating constructs, commas separate alternatives (implemented without backtracking), fnord skips leading whitespace, $it copies the last <<>>-enclosed input token to the output, and other $variables interpolate locally-defined assembly labels.

    - program: ["-" name @it ":" terms {return}, "#" [:notin ""]]
    - terms: term ["," {continue $choice} term] @choice
    - term: (factor {else $seq}, output) [factor {assert}, output] @seq
    - factor: string {literal $it}
           , "(" terms ")"
           , "[" @many terms {continue $many} "]"
           , name {call $it}
           , "<<" {begin} terms ">>" {end}
           , ":fnord" {fnord} factor
           , ":notin" string {notin $it}
           , ":between" string {between $it}
    - output: (quasiquote, "@" {dedent} (var, quasiquote)) {writeline}
    - quasiquote: "{" [(<<ch [ch]>>, "\" <<:notin "">>) {say "$it"}, "$" var] "}"
    - ch: :notin "$}\"
    - var: "it" {copyinput}, name {gen $it}
    - string: :fnord <<'"' [:notin '"'] '"', "'" [:notin "'"] "'">>
    - name: :fnord <<letter [letter, :between "09"]>>
    - letter: :between "az", :between "AZ", :between "__"
Meta5ix, like my earlier peg-bootstrap https://github.com/kragen/peg-bootstrap/blob/master/peg.md (66 lines of code, compiles to JavaScript, supports backtracking), is not really something you want to write a parser for your application language in. It's a compiler-compiler you can extend to support the parsing constructs you actually want for your language, then recompile with itself. Dave Long described META-II as "a field-improvised lever", and I think that's true of these as well, but maybe an even better analogy is Archimedes's fixed point.

tempodox|3 years ago

What does “OG” stand for?

XorNot|3 years ago

I kind of hate systems like this though. What I want is easily bootstrappable compilers: how hard is it to get the system back from nothing but raw x86_64 machine code and some type of data storage.