top | item 3621035

Show HN: A PHP parser written in PHP

77 points| nikic | 14 years ago |github.com | reply

39 comments

order
[+] aarondf|14 years ago|reply
This is not a snide comment, I'm truly curious. What is the point of "An [x] parser/interpreter/compiler written in [x]." I've seen one now for JS and this one for PHP. I lean more toward the sponge learning [1] side of HN, so forgive me if this is super obvious.

[1] http://alexrosen.com/blog/2011/05/sponge-learning/

edit: grammar.

[+] mfonda|14 years ago|reply
The benefit is that you can use it to transform PHP code into an abstract syntax tree [1]. This allows you to do some really cool things like static analysis, code transformation, preprocessing, etc. in a convenient way.

The fact that it is written in PHP is nice because it allows you to use it in existing PHP projects in an environment you are already familiar with. I think the fact that it is written in PHP is less important than the fact that it can be used in PHP (e.g. it would be equally useful as a PHP extension written in C).

[1] http://en.wikipedia.org/wiki/Abstract_syntax_tree

[+] jacquesm|14 years ago|reply
Line one in tfa: "It's purpose is to simplify static code analysis and manipulation."
[+] exim|14 years ago|reply
It doesn't necessary apply to this specific posting. But in general, as with many open source projects out there, the point is in increasing author's visibility on the net and assisting author's CV in finding a job.

And there is nothing bad with this. It is just that many authors don't want to articulate on this and are trying to come up with some artificial rationales.

[+] bradt|14 years ago|reply
This would be a great way to create a PHP-based templating system that just uses a subset of PHP as syntax. Would be very fast.
[+] bithive123|14 years ago|reply
This must be a stupid question because I've noticed that it's commonplace, but why do PHP-based projects try to build templating systems in PHP? Do they not realize that the original use case for PHP was to turn plain HTML files into dynamic templates?

I saw this in Horde, the other day; mixed in with the usual PHP tags they had added their own XML tags for doing "if/else" type things in the template. What is the purpose of that?

[+] kemo|14 years ago|reply
Lithiums' templating "engine" does something like that, using the tokenizer.
[+] amosrobinson|14 years ago|reply
Cool... Now define your Node datatype in Haskell deriving Show & Read, and pretty-print to that. Then you can (easier) do some interesting analysis and transforms!
[+] jasonlotito|14 years ago|reply
How does this compare to PHP_CodeSniffer, and it's Tokenizer? You're both using token_get_all under the hood, but PHPCS has support for CSS/JS as well.
[+] nikic|14 years ago|reply
PHP_CodeSniffer - as you already say - works with the source code at a token level. This is necessary, because it looks at the precise formatting of the code (like whitespace usage).

The parser is more for analyzers that are not interested in the precise formatting of the code, but want to look at the code from a higher level perspective.

For example, if you want to do control flow analysis and type inference working directly on the tokens would be really hard. An Abstract Syntax Tree makes this kind of work much easier, as you don't have to think about the tiny details of the language.

[+] mgkimsal|14 years ago|reply
One of the interesting things about Groovy for me has been the runtime AST transformation stuff - annotating something as @singleton, then having the engine make it in to a singleton at compile time, etc.

Certainly this project doesn't get us there immediately, but might give some neat ideas for future PHP versions to incorporate.

[+] nazar|14 years ago|reply
Can it compile itself then?
[+] cfdrake|14 years ago|reply
It's only a parser - it just transforms plain source code into an abstract syntax tree representation. However, if you wanted to, you could use this tree for a variety of things - including translating and generating compiled code.
[+] alexpak|14 years ago|reply
It's not a compiler, just a syntactic parser.
[+] ajx|14 years ago|reply

[deleted]

[+] icheishvili|14 years ago|reply
It seems like you could have saved yourself quite a bit of parsing/lexing work if you had used the parser that ships with PHP:

http://us3.php.net/manual/en/function.token-get-all.php http://us3.php.net/manual/en/function.token-name.php

Very cool nonetheless.

[+] mfonda|14 years ago|reply
They are very different things. token_get_all just tokenizes the code, but this tool parses PHP code into an AST. If you look at the source of this project, you'll notice that it does indeed use token_get_all to handle the lexing.