top | item 3607248

Lisp: It's Not About Macros, It's About Read

158 points| jlongster | 14 years ago |jlongster.com | reply

99 comments

order
[+] masklinn|14 years ago|reply
It's not really about read either, it's about this:

> Wait a second, if you take a look at Lisp code, it’s really all made up of lists:

Haskell has `read`, most of your data types can just derive Read and Show and they'll "magically" get a representation allowing you to `read` and `show` them.

But that only works for datatypes, you can't do that with code.

In Lisp you trivially can, because code is represented via basic datatypes (through a reader macro if needed).

It's not macros. It's not read. It's much more basic than that: it's homoiconicity. From that everything falls out, and without that you need some sort of separate, special-purpose preprocessor (whether it's a shitty textual transformer — as in C — or a more advanced and structured one — as with Camlp4 or Template Haskell — does not matter).

And I don't get why the author got the gist (and title) so wrong when he himself notes read is irrelevant and useless in and of itself:

> Think of read like JSON.parse in Javascript. Except since Javascript code isn’t the same as data, you can’t parse any code

`read` does not matter if the language isn't homoiconic, you can `read` all you want it won't give you anything.

[+] jlongster|14 years ago|reply
Yep, you're absolutely right. I was mainly offering a different viewpoint that might be refreshing to those who have always heard the "code is data and data is code" statement, but never really understood it.

Focusing on `read` was a way to anchor my article, even if it truly isn't about read either. I tried to tie that together at the end.

[+] majmun|14 years ago|reply
isnt every language that takes string homoiconic? because code is string and , string is also valid data type in that language? (by wikipedia definition)
[+] dpkendal|14 years ago|reply
I disagree with this article's perspective, but I understand what the author intended to say and to a certain extent I like how the idea is presented.

Yes, Lisp's power comes from the embodiment of code and data together in one manner, and the ability to treat them this way when writing code is good, but `read` is a coincidence of that power, not a demonstration of how it is used. Macros are the method by which we harness the power of homoiconicity in an efficient, powerful manner.

[+] kenjackson|14 years ago|reply
As masklinn put it, it's really about the homoiconicity of Lisp. But with that said, I'm not convinced at how great that is. While macros in general can be useful, is homoiconicity generally a good thing?

It's rarely the case that I want a single representation for all my data -- and if we treat code as data, do I want a representation that is indistinguisable from all my other data?

For example, the distinction between data that specifies layout (html, xaml, etc...) and that which performs logically computation (javascript, c#, etc...) seems like a useful distinction to have.

While I can appreciate the AST form of s-exprs I also do like the richness of many standard languages -- and the semantic richness of their ASTs.

Lastly, treating code as data (and vice-versa) has been the bane of many programmers of days past. Go back 40 years and you can find many developers who did treat code as data (it was all actually viewed as sequence of bits by many) and this caused no end of problems. In most modern systems there are often safeguards to specify data and code segments and ensure that you don't treat one as the other. While not completely analogous to Lisp macros, it does show that you tread dangerous ground when you attempt to treat all forms of data as indistinguishable.

Given the special purpose nature of code, I don't mind (and actually appreciate) a well thought through syntax, and a special set of functionality to interact with it -- as I do most special purpose forms of data.

[+] jhuni|14 years ago|reply
In emacs I represent everything in Lisp and I use separate color schemes for SXML documents and code. This effectively allows me to distinguish between these different classes of data. Furthermore, emacs has different sets of functionality for different types of data, so I feel that I have all the features you mentioned already.
[+] mnemonicsloth|14 years ago|reply
Can you provide a link to some Clojure/Scheme/CL code that you've written?

(curious about how you use the language, not interested in scoring points)

[+] _delirium|14 years ago|reply
Unfortunately, in practice, it's not really the case that "it’s really easy to parse Lisp code". Yes, you can do it in some cases, but to do it correctly in general, at least in CL, is the somewhat infamous "code-walker" problem, which needs to do all sorts of strange things:

Do you handle all varieties of lambda lists? recognize and descend into all the special forms? what do you do with macros? expand them (a mess)? try to walk into special-cased standard ones like 'loop' and ignore user-defined ones?

The closest you can come to doing it sanely is to use a code-walking library like the one in arnesi: http://common-lisp.net/project/bese/docs/arnesi/html/A_0020C...

[+] dpkendal|14 years ago|reply
These are all issues of compilation, not reading. Reading Lisp code into cons-pairs, symbols, numbers, etc. is not that hard at all: you can do it in about 100 lines of code. Lisp can be trivially parsed by a recursive-descent parser. "'Recursive-descent'," as the UNIX-HATERS Handbook says, "is computer science jargon for 'simple enough to write on a liter of Coke.'"
[+] akkartik|14 years ago|reply
> You can implement a macro system in 30 lines of Lisp. All you need is read, and it’s easy.

The linked pastebin isn't a macro system. It's merely a macroexpansion system, it needs to be evaluated. And it's not as simple as merely wrapping it in 'eval' because of subtleties in getting at the right lexical scope.

More generally, no fair claiming macros are easy because you managed to build them atop a lisp. You're using all the things other comments here refer to; claiming it's all 'read' is disingenuous.

I'm still[1] looking for a straightforward non-lisp implementation of real macros. The clearest I've been able to come up with is an fexpr-based interpreter: http://github.com/akkartik/wart

[1] From nearly 2 years ago: http://news.ycombinator.com/item?id=1468345

[+] haberman|14 years ago|reply
Right now there is a divide among programmers. One one side you have people like the author who crave the power of code-as-data more than they care about nice syntax and therefore love Lisp. On the other side you have people who like more conventional syntax more than they care about code-as-data and therefore don't love Lisp.

Neither side can understand the other: one side says "why do you resist ultimate power?" and the other side says "how can you possibly think that your code is readable?"

My belief (and what I am starting to consider my life's work) is that the gap can be bridged. Lisp's power comes from treating code as data. But all code becomes data eventually; turning code into data is exactly what parsers do, and every language has a parser. The author says "it's about read," but "read" (in his example) is just a parser.

The author asks "How would you do that in Python?" The answer is that it would be something like this:

  import ast
  
  class MyTransformer(ast.NodeTransformer):
    pass  # Implement transformation logic here.
  
  node = MyTransformer().visit(ast.parse("x = 1"))
  print ast.dump(node)
This works alright, but what I'm after is a more universal solution. With syntax trees there's a lot of support functionality you frequently want: a way to specify the schema of the tree, convenient serialization/deserialization, and ideally a solution that is not specific to any one programming language.

My answer to this question might surprise some people, but after spending a lot of time thinking about this problem, I'm quite convinced of it. The answer is Protocol Buffers.

It's true that Protocol Buffers were originally designed for network messaging, but they turn out to be an incredibly solid foundation on which to build general-purpose solutions for specifying and manipulating trees of strongly-typed data without being tied to any one programing language. Just look at a system like http://scottmcpeak.com/elkhound/sources/ast/index.html that was specifically designed to store AST's and look how similar it is to .proto files.

(As an aside, programmers have spent the last 15 years or so attempting to use XML in this role of "generic language-independent tree structured serialization format," but it wasn't the right fit because most data is not markup. Protocol Buffers can deliver on everything people wanted XML to be).

Why should manipulating syntax trees require us to write in syntax trees? The answer is that it shouldn't, but this is non-obvious because of how inconvenient parsers currently are to use. One of my life's goals is to help change that. If you find this intriguing, please feel free to follow:

  https://github.com/haberman/upb/wiki
  https://github.com/haberman/gazelle
[+] nessus42|14 years ago|reply
One one side you have people like the author who crave the power of code-as-data more than they care about nice syntax and therefore love Lisp.

I crave both the power of code-as-data and nice syntax, which is why I love Lisp.

[+] ScottBurson|14 years ago|reply
You can use the protocol buffer schema language to define your ASTs if you want, but I think that addresses only a relatively small part of the problem.

There are two larger problems in adding Lisp-style macros to non-Lisp languages, one social and one technical.

The social problem is that language designers must be persuaded to publish a specification of the internal representation of the AST of their language. This makes the AST a public interface, one which they are committed to and can't easily change. People don't like to do this without a good reason.

The technical problem is more difficult, though. To make a non-Lisp language as extensible as Lisp would require making the parser itself extensible. This is not too hard to implement, but perhaps not so easy to use. If you've ever tried to add productions to a grammar written by someone else, you know it can be nontrivial. You have to understand the grammar before you can modify it.

And if you overcome the difficulties of having one user in isolation add productions to the grammar, what happens when you try to load multiple subsystems written by different people using different syntax extensions which, together, make the grammar ambiguous?

I don't know that these problems are insurmountable, but a few people have taken a crack at them, and AFAIK no one has produced a system that any significant number of people want to use.

It's worth taking a look at how Lisp gets around these problems. Lisp has not so much a syntax as a simple, general metasyntax. Along with the well-known syntax rules for s-expressions, it adds the rule that a form is a list, and the meaning of the form is determined by the car of the list -- and if it's a macro, even the syntax of the form is determined thereby.

Add a package system like CL's, and you get pretty good composability of subsystems containing macros. You can get conflicts, but only when you explicitly create a new package and attempt to import macros from two or more existing packages into it.

Applying these ideas to a conventional language gives us, I think, the following:

() While the grammar is extensible, all user-added productions must be "left-marked": they must begin with an "extension keyword" that appears nowhere else in the grammar.

() Furthermore, those extension keywords are scoped: they are active only within certain namespaces; elsewhere they are just ordinary names. This requires parsing itself to be namespace-relative, which is a bit weird, but probably workable.

I think that by working along these lines it might be possible to add extensible syntax to a conventional language in a way that avoids both the grammatical difficulty and the composition problem. And if you do that, maybe you can then get the relevant committees or whoever to standardize the AST representation for the language.

I've never taken a crack at all this myself, though, because I'm happy writing Lisp :-)

[+] moonchrome|14 years ago|reply
I don't know Lisp, I only tried Clojure but I think this applies. I don't think it's about code as data per se as it is about the philosophy that there are no "special cases" outside the few special forms needed to bootstrap the language. This simplicity allows you to shape the language the way you want without having to consider the impact of your modifications on existing code. Consider the hell C# or Java teams have to go trough when they introduce things such as async, lambda, linq, etc. and how those features interact with the existing language. Consider implementing pattern matching for C# and all the edge cases. Even when if you had a open compiler it would be difficult. Python is no better, for eg. it has a rigid class/type model that has been abused more than once to provide metadata for ORM for eg. I once tried to extend a ORM functionality that was using metaclasses and multiple inheritance, the hell you go trough with metaclass ordering is insane, compared to writing a declarative DSL in clojure with a macro and leveraging existing ORM library. And consider when libraries overload operators to implement DSL's you immediately get in to problems with operator precedence. This problems don't go away even if you can redefine the precedence because it's not consistent with the rest of the language.

So what I'm saying complexity is significantly reduced when you have a small/consistent core. As for readability I think Clojure makes this better by providing different literals for vectors and maps and those literals have consistent meaning in similar situations so it provides nice visual cues. But immutability by default, clear scoping and functional programming make things like using significant whitespace and pretty conditional expression syntax bikesheding level details.

[+] kabdib|14 years ago|reply
Protocol buffers are a reimplementation of some whizzy stuff we did at a messaging start-up circa 1994.

They were great for messaging . . . but we found ourselves using them /everywhere/. And since our stuff worked in many different environments (C++, Java, Visual Basic were the ones we directly supported), you could have your choice of language.

It's flattering to see this rediscovered, several times over :-)

[+] daeken|14 years ago|reply
I wrote a (batshit crazy) Python library to do exactly what you're talking about. It converts Python ASTs into S-expressions (just Python lists) and allows you to apply 'macros' to methods just by decorators.
[+] andreasvc|14 years ago|reply
I don't agree that "the other side" cares so much about conventional syntax. Syntax is a very superficial thing; I think it's rather that most of the time you don't want to have to deal with the power and complexity of something like Lisp. If you compare it with natural language, most of the time we don't speak in poetic or literary language, even though that might be the most beautiful; rather we naturally strive for efficiency and simplicity, using a much smaller vocabulary, re-using common expressions and being redundant etc.
[+] dkersten|14 years ago|reply
I'm also in some kind of middle ground. I like the power lisp provides and dont mind the syntax, but I also like and appreciate what languages with rich syntax provide. On the other hand, I dont particularly love either one. I think the only way I'll truly be happy is if the gap is bridged without really giving up on either sides advantages.

But there are other factors which would make me happier with a language than closing the gap between expressive power and great syntax. For example, I would love if there were a language with nice syntax and good metaprogramming (eg python) that also had an unambiguous visual representation (something like eg Max) that you can switch between at will. Dunno how realistic that would be without adding complexity or ambiguity or ruining code formatting)

[+] 6ren|14 years ago|reply
TXL is the best I've seen for transforming ASTs (sample: http://www.txl.ca/tour/tour9.html). As you can see, it's a little complex, which is because the problem is complex. I think they do a really good job. (TXL homepage http://www.txl.ca/)

Also: I recall Matz said Ruby was lisp with friendlier syntax (but I can't find the quote right now, so maybe he didn't).

[+] andrewflnr|14 years ago|reply
I also think there is a way to have the best of both worlds, but I have so far taken a rather different approach. What do you think of what I've got here?

  https://github.com/andrewf/fern
It's a very much a prototype, and my ideas have evolved a lot (towards lisp), but I'm at least curious what you think of the ideas in the README.
[+] mattdeboard|14 years ago|reply
So, I have a question. After reading this article and thinking about homoiconicity and macros, I remembered the Python source for the `namedtuple` function: http://dpaste.com/704870/

Is this considered a macro? Is it homoiconic? It's code as data and using input variables to generate code based on that input. It struck me as weird the first time I read through it but figured since I'm pretty stupid that there's a good reason for it.

[+] andreasvc|14 years ago|reply
It's not as powerful as what you could do with Lisp. The code is not a first-class object, so you can't for example substitute variable a for variable b. The "data" in code as data refers to an abstract syntax tree, not just a string which happens to contain code. Python does give you access to abstract syntax trees, but because of the complexity of the language this is less useful than with Lisp.
[+] yason|14 years ago|reply
Macros are a method to do programmatically what read can do lexically. Using macros avoids having to load strings into sexps and modify them and build callable constructs out of the modifications. With macros, you just define the modification as a transform on s-expressions and let the macro facility do the rest. The point is to modify the AST and while read can be abused to do that macros are designed to do just that.
[+] chimeracoder|14 years ago|reply
Am I missing something, or should that be read-from-string, not read? My understanding is that read only operates on streams.
[+] jlongster|14 years ago|reply
Technically, you are correct. Some schemes implement read as a simple function which takes a string. "Lisp" isn't referring to any specific lisp and more just the idea of it.
[+] njharman|14 years ago|reply
I do a lot of data-driven development. I'm sure some of the time, 20%?, it'd be nicer if the code/syntax was directly data. But, 80%, it's all much easier to deal with it for me being separate.

I'm willing to give up that 20-25% to enjoy and be happy writing the other 80%.

[+] lninyo|14 years ago|reply
I tried the example code in cygwin/clisp and it doesn't work. Way to come up with examples. At least mention which version it does work with?
[+] spacemanaki|14 years ago|reply
The examples aren't valid Lisp or Scheme. "read" takes an input port not a string, as someone else mentioned, so you should replace it with "read-from-string" in Common Lisp or something like "(read (string->input-port "(1 (0) 1 (0) 0)"))" in Scheme. The second and third block of code is Scheme, not CL, so it won't work in clisp at all.