The Future of Markdown

[+] blasdel|13 years ago|reply

John Gruber's original Markdown.pl is one of the worst small programs I have ever read, completely riddled with outright bugs and misfeatures that continually bite its users in the ass. It's awful even by the already low standards of hand-written many-pass regex-based spaghetti-parsers.

Nobody should be using the original script, and unfortunately many of the other implementations out there are direct transliterations that replicate all of its absurd errors, like where if you mention the MD5 hash of another token in the document, the hash will be replaced with the token, because it uses that as an inline escaping mechanism! Reddit got hit with a XSS virus that got through their filters because of it: http://blog.reddit.com/2009/09/we-had-some-bugs-and-it-hurt-...

See the changelog for what started as a PHP transliteration and turned into a rewrite that squashed 125 (!) unacknowledged bugs: http://michelf.com/projects/php-markdown/

The worst part is that he outright refuses to either disclaim or fix his implementation, and so far he's repudiated everyone else's attempts to do so. He's a terrible programmer and a worse maintainer, he really still thinks the documentation on his site is comprehensive and canonical. As much as Jeff Atwood leaps at every chance to play the fool, there's no way his directorship can be anything but an improvement.

[+] dandelany|13 years ago|reply

I'm so tired of this mentality that says basically, if you release something for free on the internet, you are obligated to maintain and support it for the rest of your life. Gruber created this program, for free. You are under no obligation to use it. Don't like it? Here's your money back. It may be true that the code is shit. If you think so, don't use it.

Like other responders, I worry that this mentality causes fewer coders to release their projects, for fear of backlash like this post. Think about it: Your feelings toward Gruber are incredibly negative and hostile, and in fact, you would have better feelings toward him if he had kept Markdown to himself and never released it at all. Does that seem fair to you? If the ill will generated by people like yourself outweighs the good will generated by those who appreciate the code I release, or if I fear that it might, what motivation do I have to release my code?

[+] tptacek|13 years ago|reply

What an embarrassing post to be occupying the top of this thread. Blaming Markdown.pl for security flaws? I suppose the memory corruption bugs in the "optimized" C Markdown parsers are somehow his fault too?

He wrote a text-to-HTML parser with a particularly elegant little language design and got on with his life, which involves writing more than keeping up with bug reports in Perl scripts. Get over yourself; comments like this make us all look bad.

[+] cdmoyer|13 years ago|reply

This would be a lot stronger argument without the ad hominem bits. He's obviously not so terrible if he created this thing that has people so up in arms.

I think there's something to your post, but the tone makes me want to dismiss it. I know, stupid emotions.

This internet lynch mob mentality... I wonder how much this discourages people from releasing things. So, Gruber releases markdown.pl. People like it. People love it. People use it, people reuse it, people rewrite it. Next think you know, he's being insulted on the internet because he released something he wrote to serve his needs and not passing on some sort of figurative mantle or blessing.

[+] squidsoup|13 years ago|reply

I find it disheartening that the top voted comment is so blatantly rude. I'm not sure what is gained by calling John Gruber a terrible programmer and maintainer. If you want to praise Jeff Atwood for taking over the stewardship of Markdown, great.

[+] atacrawl|13 years ago|reply

Get over yourself. The guy wrote something that suited his own needs and released it so that others could use it too if they wanted. Programmers ported it to other languages because they liked the idea and wanted to see it thrive -- I've used the PHP port in many homespun web apps over the years. Is it perfect? No, and no software is. But when I've needed a script that easily converts line breaks and hyphens into paragraphs and unordered lists (the normal use case I've taken advantage of), it's done the job every time.

[+] oemera|13 years ago|reply

This is one of the reasons Why left the programming community. People are not thankful. Even if you release a great idea - with obviously _not_ the best code - the one thing you get is criticism. Maybe also the words that "you are the worst programmer on earth" following the words that "you put your family in such a shame".

You know what programmers think because of such comments? "My code is so bad I can't release it. Even if the idea is good." And this Sir helps no-one.

Stop this shit.

[+] Steko|13 years ago|reply

Gruber's Law: the highest upvoted comment for any Daring Fireball link or Markdown discussion on HN will tend to be a repulsive ad hominem whinge.

[+] Tyrannosaurs|13 years ago|reply

What I love here is that not just satisfied with running down Gruber, you also feel the need to get in a little swipe at Jeff ("as much as [he] leaps at every chance to play the fool"), a man who seems broadly in agreement with you (though is far more constructive in his approach).

Stay classy.

[+] kamaal|13 years ago|reply

>>John Gruber's original Markdown.pl is one of the worst small programs

Perl makes it look small, if you have to write something like this in Java or Python, multiply the LOC by at least 20. But I assure it will be higher.

[+] dgreensp|13 years ago|reply

Wow, I wasn't expecting my email to Jeff to end up as a front-page blog post!

The point here is that Markdown doesn't have a spec, nor do any of its variants to my knowledge, so I was proposing to come up with some Markdown-like language that does have a spec. Under discussion here is the more ambitious (but also appealing) plan of writing an official spec for Markdown, the same way JavaScript got a spec in the form of ECMAScript that we now identify with JavaScript itself.

A spec is a long, tedious, human-readable document that explains the behavior of a system in unambiguous terms. Specs are important because they allow us to reason about a language like Markdown without reference to any particular implementation, and they allow people to write implementations (Markdown processors) independently that behave identically. The Markdown Syntax Documentation is not a spec (it's highly ambiguous), nor is any implementation (not human-readable; some behaviors are probably accidental or incidental and difficult to port perfectly). The hard part of writing a spec is codifying the details in English, and secondarily making decisions about what should happen in otherwise ambiguous or undefined cases.

My motivation for working on a Markdown spec is first and foremost avoiding "bit rot" of content, which happens when we write content against one Markdown implementation and then later process it with another. We don't have this concern with HTML, JSON, or JavaScript, or at least we know what bounds to stay within to write code that will work on any implementation. This is achieved through specs, even if only implementers ever read them.

I would love pointers to Markdown processors that are implemented in a more principled way than the original code, for example using standard-looking lexing and parsing passes, but that still handle nested blockquotes and bullet lists together with hard-wrapped paragraphs.

[+] greggman|13 years ago|reply

Specs are important but I'd argue a conformance test is equally or even more important.

With a conformance test people can test their implementations and when ambiguities arrise new tests can be added or old tests fixed. Without, conformance tests different interpretations of a spec lead to divergent behavior. Once that behavior is out there long enough it becomes difficult to fix as people are depended on it and/or its quirks.

[+] eslaught|13 years ago|reply

You should look at Pandoc[1]. It's a Markdown-to-everything converter, and though I'm not super familiar with the code I believe it's well written. One of my favorite tools for writing.

[1]: http://johnmacfarlane.net/pandoc/

[+] buro9|13 years ago|reply

I agree with all of this.

I'd also like to note that some parts of Markdown from the user perspective are non-intuitive and clumsy.

Such as links and images (inline).

Markdown works so well because it is intuitive and appeals to those who once saw old word processors. They don't have to worry about syntax, and can just enter their text into a textarea (free from JavaScript WYSIWYG interference and the inherent troubles of running that on old and new mobile phones, their playstation, web browsers, etc)... and it just works.

Yet some parts of markdown are simply not intuitive. Links and images are two places where I see in usability testing that the end user will constantly refer to help documentation to figure out how to do it.

Beyond getting the code consistent, maintainable, and testable I'd love to see the language itself solve some of the papercuts that trouble the lay end user.

Realising that what I was planning to do for my project (discussion forums, tumblr for forums) was to create an alternative to markdown that would resolve some of these user issues as well as parser issues, I had already decided that I would not call it markdown and that I would educate my users in something new that hopefully solves their and my needs and would remain very stable due to a well-thought out and documented design in the first place. If what you're proposing is in this vein then consider my hat thrown in, what help I can give I will... take me to your git repository.

[+] bitcartel|13 years ago|reply

Why are people still interested in Markdown - what problem is it solving and for whom?

It's 2012 and we have HTML 5 compliant WYSIWYG text editors. We no longer have to write plain text littered with special codes, for the purpose of running through a parser, to produce HTML which looks nice on a web page. Maybe it made sense a decade ago when web forms had terrible editors, but not anymore. I think Joe Internet writing blog posts and forum replies would agree with me.

For developers, a README file in plain text looks great everywhere, and avoids any Github vs Bitbucket vs Assembla display issues. If you need to write structured documentation for a system, there's probably already a designated markup language you're supposed to use, so Markdown doesn't help there.

[+] halostatue|13 years ago|reply

https://github.com/jgm/peg-markdown and https://github.com/fletcher/peg-multimarkdown

I haven't looked at the implementations of these, but they are most certainly grammar-based, not regex-based.

[+] someone13|13 years ago|reply

Another good parser is python-markdown2 [1], which has a very extensive set of test cases, and the code is very readable and well-commented.

[1] https://github.com/trentm/python-markdown2

[+] arnarbi|13 years ago|reply

I do not agree that a spec has to be written in English. Like most natural languages, it is very easy to be ambiguous and to underspecify. Resolving this can make English just as human un-readable as anything else.

Specs like this should be written as executable reference implementations in a well defined programming language. This can very well be human readable, and should be done without regard for efficient execution. It's less ambiguous, amenable to automated conformance testing, and is easier to evolve than a natural language document.

[+] fiddlosopher|13 years ago|reply

[lunamark](https://github.com/jgm/lunamark/tree/master/lunamark) is another PEG-based implementation.

[+] uvtc|13 years ago|reply

I'm not so sure what's needed here is a spec.

What people really want is for all Markdown implementations to be basically the same so they don't have to learn any implementation-specific ideosyncrasies to switch from one to another.

The problem though, as I see it, tends to go away under certain circumstances. And the circumstances, I think, are these:

* If you've got a strong implementation, robust, actively-maintained, and runs fast,

* is easy to obtain, install, and use,

* is well tested and well documented, and

* has just the right blend of sensible additions to the syntax (for example, tables, def lists, LaTeX math, etc.) --- done tastefully,

then folks will just use that, model their own implementations after that, and just overall start considering that to be the standard.

I think this has been slowly and steadily happening with Pandoc.

And, aside from all that, two additional "killer features" that Pandoc seems to have over other implementations:

1. it can convert to/from other doc markup formats, thus making it easy to just convert your existing docs to pandoc-markdown and then use that as your master source format to generate other formats you might need; and

2. with its carefully-chosen set of additional features, it has been slowly proving itself capable of being a replacement for raw LaTeX for certain types of longer technical documents.

My understanding is that there's even some features in the works (for the next release) for converting between markdown dialects --- which would make it even easier to convert markdown files of various flavors into plain standard pandoc-markdown.

So, if you're looking for a standard, I'd suggest that it's for the most part already here. :)

[+] draegtun|13 years ago|reply

>I would love pointers to Markdown processors that are implemented in a more principled way than the original code, for example using standard-looking lexing and parsing passes...

Have a look at Markdent - An event-based Markdown parser toolkit.

https://metacpan.org/module/Markdent

[+] aleemb|13 years ago|reply

I hope some more thought goes into the usability as well. I find the syntax for links confuses a lot of users for whom I have enabled mark down editing. Wiki style [[link text]] works great if you assume links have no space or maybe something similar. Similarly the syntax for images has also always bothered me.

[+] ExpiredLink|13 years ago|reply

IMO, for the spec the most important point is to add mandatory versioning information on the first line of markdown scripts, e.g.

    _markdown_version_ Rockdown_1.0

This would allow processor implementers to support more than one markdown version (for a transition period or in general).

[+] sandGorgon|13 years ago|reply

[deleted]

[+] Helianthus|13 years ago|reply

I don't know if Reddit's implementation of Markdown is principled, but as a syntax it's pretty ubiquitous so you might have the traction to make it standard.

[+] jamii|13 years ago|reply

> I would love pointers to Markdown processors that are implemented in a more principled way than the original code

This is almost what you want:

http://news.ycombinator.com/item?id=555153

EDIT: wrong link :(

[+] X-Istence|13 years ago|reply

I might be the only one, but I actually prefer Markdowns handling of a single "enter" without spaces at the end to mean that the paragraph is not finished. It makes writing blogs and various other stuff in Vim much simpler, and I can more easily reformat text to wrap at 80 characters, and have better control over it.

Could I soft-wrap in my editor? Sure, but that would mean that the text files sitting on my hard drive now have very long strings in them making it harder to grep, making it harder to add to git (change a single character, entire line is now a diff :-().

I hope that doesn't become the default.

[+] eob|13 years ago|reply

Lots of people are with you.

I think this behavior is the better route because it accommodates both crowds. The line-wrap folks can just press enter twice; no biggie. But the console and vim users of the world can continue using line-breaks the way that work best for their environment.

On the flip side, making a single enter start a new paragraph wouldn't really help the GUI users (what's the difference between one line break and two, really?) but it would really hurt the console users of the world

[+] jomar|13 years ago|reply

I guess there is a distinction between

* people typing markdown in text files, where you want to split paragraphs into word-wrapped lines of a sensible length, and consecutive non-blank lines form paragraphs just as they always have in text files;

* and people typing markdown into text entry boxes on web pages, where you would like pressing the <enter> key to actually mean something.

These two situations probably prefer a different default.

[+] halostatue|13 years ago|reply

If this is a change that Jeff really wants, he needs to fork Markdown. This change actively subverts one of the goals that Gruber set down for Markdown, which is that Normal People can use it.

Changing this, especially for people who implement Markdown parsers, is geek arrogance.

[+] RegEx|13 years ago|reply

I agree completely. Visually highlighting a paragraph that went a bit too wide and cleaning it up with 'gq' makes me a happy camper.

[+] pjscott|13 years ago|reply

Believe me, you're not the only one who prefers this behavior. It makes working with hard-wrapped text a hell of a lot easier.

[+] fiddlosopher|13 years ago|reply

I, too, would be strongly against "automatic return-based linebreaks." Given that markdown has constructions for lists and code blocks, one very rarely needs a hard line break anyway. Currently markdown works fine both for people who hard-wrap and people who soft-wrap. Let's keep it that way.

[+] raldi|13 years ago|reply

I'd also advocate for accepting reversed ()[]'s on links.

In other words, let the user type:

    [something](http://whatever.com)

or

    (something)[http://whatever.com]

...and have both work exactly the same.

It will save a lot of trouble -- and especially when linking to a Wikipedia page whose URL contains parentheses.

[+] kaptain|13 years ago|reply

Why?

Why get all angry at John Gruber? As many have already noted, he created Markdown for himself and released so that others could use it. AFAIK he didn't put any license/restrictions on it outside of calling himself BDFL. Whatever his skills as a programmer, writer, or his role as Mouthpiece of Apple, the vitriol is unnecessary (but absolutely fanscinating to watch). My panties bunch up naturally, no need to allow my feelings regarding Gruber to bunch them further.

Why get his approval? In the same spirit that Gruber created something for himself, you should just create something for yourself. I find it hard to believe that Gruber was the first person that conceived the idea of user-friendly text-markup. The new standard could just be inspired by Markdown and that would be a win-win: a respectful nod towards Gruber as well as the ability to move towards something 'better'.

[+] dfc|13 years ago|reply

I really hope that they borrow a lot if not everything from pandoc[1]. My only real complaint with pandoc is the table formatting, but I think fiddlosopher is adding org-mode like table support.

If you have not taken a pandoc for a spin I highly recommend you do so soon. In addition to being a great markdown dialect the pandoc tool set is the swiss army knife of text formatting. It is amazing how many formats pandoc can read and/or write.

[1] http://johnmacfarlane.net/pandoc/README.html

EDIT: I spoke too soon, Fiddlosopher continues to impress. I just checked the open issues and a little less than a month ago he added "limited org-table support." Based off of the rest of pandoc "limited" probably means something like 85% to 95% :)

https://github.com/jgm/pandoc/issues/519

[+] SeoxyS|13 years ago|reply

I'm the author of a Markdown text (prose) editor[1], and can attest to Jeff's statement that all Markdown's parsers suck. The official perl regex-based implementation is a joke. Sundown is great, but only works for cross-compilation to other markup languages; it doesn't work for syntax highlighting, which is what I'm more interested in.

I ended up writing my own in Objective-C. It's not very pretty, and it doesn't use a formal grammar (just a lexer + custom grammar code), but it does the trick. I took a few liberties with the spec: throwing in GitHub-flavored code blocks.

https://gist.github.com/29dabe4b6e762ee221df

[1]: http://getmacchiato.com/

[+] StavrosK|13 years ago|reply

I'm not that psyched about automatic return-based linebreaks. Everyone thinks they should use linebreaks to align their text, and the system should just ignore all single line breaks.

The current behavior of Markdown solves this problem very well. I don't want the newlines I enter for non-wrapping editors to remain in the generated HTML.

[+] wreel|13 years ago|reply

I found that I've moved on to reStructuredText. It doesn't seem to be marketed as much as Markdown (the only reason I know about it is because of Sphinx) but I feel that it's a bit more capable. Simple tables are exceptionally easy and it handles URLs with parens in it just fine (a common pain when trying to link to Wikipedia articles with Markdown).

[+] eob|13 years ago|reply

As a heavy LaTeX user (phd student; can't escape it), I'm convinced that there is a small enough subset of LaTeX that actually gets used day-to-day that someone could figure out a way to shim it into something like Markdown.

And then, for the LaTeX that you can't shim in, just have some escape hatch that sends fragments out to a renderer. If I could only have:

    * Math mode
    * Citations and Bib files
    * Labels and References

Then I'd be willing to go through a lot of extra pain to get all the weird tables and precise image placements that are inevitable in a 2-column ACM format.

EDIT: Having just investigated Pandoc, which many here are talking about, I realize this might be exactly what I've been looking for :)

[+] zrail|13 years ago|reply

(shameless plug) I wrapped Pandoc[1] in a web service and added on nice PDF exports and called it Docverter[2]. It will convert basically anything plain-text, including Markdown, into almost anything else plaintext, HTML, RTF or Docx. I also added rich PDF exports that go through a HTML intermediary.

If this gains some traction I'm sure I'll be adding support for it at some point.

[1]: a wonderful almost-everything-to-everything text converter http://johnmacfarlane.net/pandoc/

[2]: http://www.docverter.com

[+] engtech|13 years ago|reply

From the comments on the blog:

   "I'm reminded of the guy who decides that there should be 
    one standard because there are n divergent implementations. 

    So he goes and writes his own. Now there are n+1 divergent implementations."

That is probably the most likely outcome, but kudos to Jeff for trying.

The idea of Markdown is great, but I found the implementation of links is less than obvious. (haven't tried it in 4 years, so there was probably other issues that I had that I've forgotten)

The problem I inherently always end up having with "parses to HTML" syntax conventions is there are always warts where the syntax is harder to remember than the HTML it is supposed to parse to.

[+] antirez|13 years ago|reply

I love Markdown, and I hate Markdown.

I love it because the world needs an easy-for-humans way to format in pure ASCII without any tool. It is much simpler than using even the most well designed GUI. You can even write books with it, and you can focus on content.

But I hate Markdown. I hate it because it is superficially good: a lot of Markdown seems to make sense at a first glance, but if you look at it more closely you see that a lot is broken in its design (IMHO the fact that the reference implementation is broken is the minor of the issues).

It is surely possible to fix it. However it's better to have a broken Markdown now that no markdown at all. The fact that Github and Stack Overflow and Reddit are using it makes it absolutely obvious how useful and great the concept is. The actual design, implementation, and specifications can be fixed now. So kudos to the original inventor, but it needs a second pass from people that can give it a more coherent shape, with clear behavior, minor surprise, and parsing in mind.

[+] christiangenco|13 years ago|reply

Why not just move to Pandoc[1]?

1. http://johnmacfarlane.net/pandoc/

[+] _pdeschen|13 years ago|reply

A BNF grammar would be nice to start with.

IMHO, pandoc markdown support is the mother of all implement featuring lots of goodies (table and footnote to name 2)

[+] starpilot|13 years ago|reply

It'd be nice if it Markdown was added to HN, at least for a consistent way of quoting that's better than using the code tag (which frequently cuts off text for some reason in mobile Safari).

[+] kbd|13 years ago|reply

Here's hoping they can finally work natural _underline_ support in...

Edit: I've wondered whether the original Markdown didn't have underline support because <u> was deprecated/removed from HTML. FWIW, <u> is now back in HTML5.

[+] juliangamble|13 years ago|reply

What is the canonical implementation of markdown?

> The problem with writing my own Markdown parser in Clojure is that Markdown is not a well-specified language. There is no "official" grammar, just an informal "Here's how it works" description and a really ugly reference implementation in Perl. http://briancarper.net/blog/415/

http://stackoverflow.com/questions/7307480/what-is-the-canon...

[+] nickpresta|13 years ago|reply

I really like the Mou text editor for Markdown: http://mouapp.com/

Mou + the (built in) Github theme = best Markdown editing experience.

[+] dysoco|13 years ago|reply

As a non-web developer I cry every time I need to use HTML: It's really "ugly" in some way (And I'm used to ugly languages).

But I have learned to love Markdown too, I hope in the future, distant future: Someone will create a language that integrates HTML and CSS into a nice Markdown-like language.

328 comments