top | item 8024067

Patoline: A modern digital typesetting system

192 points| ulrikrasmussen | 11 years ago |patoline.org | reply

80 comments

order
[+] jwr|11 years ago|reply
I'm very glad people are doing this. The local maximum created by TeX (and its surroundings), while producing impressive output, has kept many people from developing any typesetting software. It was usually easier to learn how to trick TeX into doing what you wanted, or build yet another overlay on top of it.

A similar local maximum exists with Emacs (you might also call it a local energy minimum, as it requires less energy to hack Emacs into doing what you want than to write a whole new system). This has also held back editor development for years (if not decades), but recently we started seeing attempts at breaking out (LightTable is one example).

This is an interesting phenomenon: neither TeX nor Emacs are bad, in fact they are both impressive pieces of software. But we could do better now, and yet we usually don't even start, because it is such a daunting task, and it is easier to extend existing solutions.

Best of luck to authors and contributors to this project!

[+] mtdewcmu|11 years ago|reply
Your emacs analogy has issues -- vim is another programmable editor alongside emacs. And IDEs, especially ones like Eclipse, have similar complexity behind a graphical interface. Should languages be replaced, or is it better that they just grow and adapt? Look at the scale of the energy minimum around English.
[+] Argorak|11 years ago|reply
The about page gives me no idea of what it is, especially as someone who knows TeX quite well. It starts with the single mistake all of those pages do: "It's like A, but with B!" On top of that, B is already in A. Typesetting with scripting? LuaTex (actually quite easy, if you know the typesetting surroundings). More layout control? ConTeXt.

The biggest hurdle in all those systems is understanding what typesetting actually is (in contrast to content editing or especially programming). It comes with a completely different vocabulary. What's a strut?[1] What's a quad?[2] What are the standard elements of typography and page layout? How are fonts measured?

Finally: how does the system manage forward compatibility (can I still compile my document from 5 years ago?) - scripting and modularity are features that have to be evaluated under all those regards.

Also, I am very surprised that code examples in the reference documentation look horrible. Isn't that what the system is there for?

That said: I love that there is competition in that space. TeX is great and will be around for a few decades, but that doesn't mean there can't be others.

[1] http://en.wikipedia.org/wiki/Strut_%28typesetting%29 [2] http://en.wikipedia.org/wiki/Quad_%28typography%29

[+] tekacs|11 years ago|reply
> Can I still compile my document from 5 years ago?

Well on the About page it does have the FAQ:

> Can I use Patoline for a huge, ten years long project that I'm starting now?

> Although one of the authors has written his PhD. thesis (120 pages, in computer science) with it, we don't recommend it now. The reason for this is that small adjustements and bugfixes are being made all the time, and working on such an unstable system can be frustrating. However, for documents that are not meant to last forever in time, we would be happy to help you with it: feel free to contact us for help.

> Stay tuned: we do plan to release a long term supported version of Patoline soon.

[+] tinco|11 years ago|reply
It uses the same archaic syntax as TeX :( I was hoping they would come up with something nice and modern like Markdown.

It seems like a cool project, OCaml is probably a fine choice although I've never used it. But there are a few smells. The first to me was him mentioning the project had 100.000 lines already. Now if this was C I could see why. But this is a modern functional language. There's no 10.000 line rolled out parsers, no hundreds of specialised general libraries.

Why is the project 100k lines? Is it an unmaintainable behemoth already?

[+] pling|11 years ago|reply
a) markdown isn't expressive enough. In fact it's pretty horrendous to edit and separate content and layout IMHO. I'd go as far to say I prefer docbook over markdown. It's fine for github readme's but not typesetting.

b) the problem is complicated. Complicated problems need lots of lines of code. 100kloc isn't much. The thing I'm working on has 5.2Mloc and took 25 people 20 years to write. To the layman it probably looks like it does less than this typesetter meaning it's a bad metric to use.

My MacTeX installation is about 1300Mb if that's any gauge.

[+] Derbasti|11 years ago|reply
I was immediately put off by the syntax, too.

We don't have to do strict Markdown, but even Markdown is transparently extensible in HTML. It is entirely feasible to write publishing quality documents in HTML/CSS3, and thus, Markdown.

I would love to see something like Markdown, but well-suited for typesetting, but with a transparent backend scripting language for more in-depth tasks. Scribble looks like a good start: http://docs.racket-lang.org/scribble/index.html

Currently, I use org-mode and LaTeX-export for this. It works well enough, but a native integrated environment would open this to a much wider audience.

[+] yannis|11 years ago|reply
To quote Knuth [1] describing TeX “Think thrice before extending,” because that may save a lot of work, and it will also keep incompatible extensions of TeX from proliferating.

Personally, I am of the opinion that the original TeX was well 'modularized' in 'Pascal Procedures' [1], being Turing complete, extensions such as the LaTeX format and the countless of packages after that helped it survive and prosper. Think of a macro that you define as an easier way than programming a module in JavaScript and it doesn't need a half a dozen tools to set it up.

    \def\#1{\TeX\ is alive says #1.}
Nevertheless, provided the lessons learned are incorporated I applaud any new initiatives in more modern languages. Whatever is produced will however, need to be able to parse TeX, otherwise it will not be easily adopted by the community.

[1] http://www.tug.org/texlive//devsrc/Master/texmf-dist/doc/gen...

[+] JadeNB|11 years ago|reply
> Whatever is produced will however, need to be able to parse TeX, otherwise it will not be easily adopted by the community.

This is a huge barrier to raise, since parsing TeX is the same as typesetting TeX—the meaning of a macro later on can be affected by a macro now. You can't even just execute the macros in a vacuum, since the meaning of a macro can depend on things like the current page number.

[+] todd8|11 years ago|reply
The insight (made here by jwr) that TeX and Emacs and Vim are local maximums is a great observation that explains why we are stuck with these three programs. I've used these programs for over a quarter of a century and often wished for shiny, new replacements. Younger HN readers may not realize just how old these programs are.

How in the world did Bill Joy come up with Vi in 1976? It lives on today as the velociraptor of editors, Bram Moolenaar's Vim. I use the T. Rex of editors, Stallman's Gnu Emacs, another dinosaur of software, yet unsurpassed in scope and capability. These are great at what they do and so flexible and extensible that it's difficult for any new project to catch up.

TeX is different. Knuth, one of the greatest computer scientists of all time, created TeX. His choices for development tools were meager, but with the help of /literate programming/, essentially invented by Knuth to write TeX, he wrote TeX using the Pascal programming language in the late 1970s.

TeX, like Emacs and Vi/Vim, has an extension language. TeX has a powerful macro system that allows it to be extended. LaTeX is a set of TeX macros that most users use to create documents. The number of macro packages written for TeX to support every imaginable kind of typesetting (chess notation and boards, music, etc.) is staggering. This accounts for the large size of TeX installations. One can easily download and install every package ever available and be ready to typeset anything. The core TeX program, however, is composed of 1376 extremely well documented paragraphs (small code fragments). It is a pleasure to read through Knuth's literate code, all available as a beautiful book (naturally typeset in TeX).

The features of TeX were effectively frozen in 1985. Knuth kept track of every error in TeX during its development. Since 1985 there have been less than 100 errors found in TeX [1].

The open-source communities around TeX (and Emacs and Vim) make it is difficult for any new project to develop the features that make it worth switching. This is the uphill climb that is facing Patoline. Will it succeed? I'm not sure because many others have tried and failed. Lout was a really good attempt [2], but it seems to have lost steam [3].

[1] http://texdoc.net/texmf-dist/doc/generic/knuth/errata/errorl... [2] http://en.wikipedia.org/wiki/Lout_(software) [3] https://www.ohloh.net/p/lout

[+] hyperbovine|11 years ago|reply
I would argue it's not so much the community (I'll bet there are maybe a dozen people who understand the source well enough to improve it) so much as the fact that TeX is feature complete, essentially bug free, and written by a genius. It really is an amazing piece of software which has no equal, even if you wanted to spend thousands of dollars. Why are we intent on replacing it again? I understand the gripes about the language, the error messages, and so forth. So design something that compiles down to TeX, not unlike what the Java(Script) people have done. Reinventing TeX just seems like an exercise in futility. (Nice duck logo though.)
[+] taeric|11 years ago|reply
Is it a great observation, or just a good hypothesis? Even as a hypothesis, seems difficult to test.
[+] ShellfishMeme|11 years ago|reply
I'd recommend adding some more information on the index page. When I arrived I tried to scroll down hoping to find out what this is about, but instead I had to go to the 'about' page myself.

Then I tried to read the 'about' page but the text is very small and the lines very long, so it's hard to scan for relevant information. The bullet points make a couple of claims about what Patoline excels at, but still no code examples can be seen that could give me a feel for how Patoline actually works.

I had to go to 'documentation' and click a small insignificant looking link to actually get to see an explanation of Patoline and some code examples... in a PDF file.

I understand that it's cool to write the tool's manual using the tool itself, but if I come to the website and want to evaluate quickly whether this is interesting and could replace LaTeX for me, I need a quick overview directly on the index page, together with some examples similar to the one on the wikipedia LaTeX page (http://en.wikipedia.org/wiki/LaTeX#Examples).

[+] Argorak|11 years ago|reply
It's not only cool, the documentation is the primary test case for many of those projects.
[+] Signez|11 years ago|reply
Using OCaml and darcs won't help this project to find contributors outside French academic world. It's unfortunate, IMHO.
[+] silentvoice|11 years ago|reply
Will it work sanely with Make, handle modular documents better than TeX, etc? The biggest frustration with me and TeX isn't the quality of its output or its extendability, it's simply difficult for me to work it into a reproducible workflow without a lot of effort on my part.

With all of my tools that I use that somehow depend on each other I can always glue together the results of my work with one tool to the input of another tool via Make, minimizing human error in translating results over myself, but LaTeX (combined with tools like BibTeX) somehow always breaks this model, which can be very frustrating since LaTeX is such an important component of mathematics research.

[+] waynecochran|11 years ago|reply
It's going to take da Vinci like talent and a herculean effort and to build anything 10% as good a TeX. Knuth, the pre-eminent computer scientist, devoted 10 years of his life to type setting to create MetaFont and TeX. Being able to "write scripts" for it ain't enough. Anyway... prove me wrong.
[+] rst|11 years ago|reply
A lot of that time went into the development of not the code per se, but algorithms that newer code could reimplement. Perhaps in simpler forms on newer machines. Conventional TeX has a hard time optimizing placement of page breaks because its data structures hold only one page at a time, and discard its contents before starting the next, due to memory limitations of computers from the 1980s. Those limits no longer apply, which could make it a lot easier to code strategies for eliminating "widows" and "orphans" -- situetions in which only one line of a paragraph is stuck at the top or bottom of a page.
[+] seanmcdirmid|11 years ago|reply
I think if someone was to redo TeX, that it would be really nice to see an incremental type setting system. That means modularity not just at the code level, but also with respect to how the run-time data structures are dependent on each other, and the ability to "reflow" text on a change in a way that required minimal changes to the previous flow of the text. Then we could have something that was Wysiwyg, interactive, and produced output that looked fairly decent.
[+] Argorak|11 years ago|reply
"Fairly decent" is a fairly low bar for a typesetting system.
[+] pestaa|11 years ago|reply
Hm. The rendered patobook.pdf documentation wouldn't display in the browser; after downloading and opening it in Acrobat Reader, it said some embedded font couldn't be loaded completely, I guess all monospace formatted text was lost: see screenshot at http://i.imgur.com/MhkPqRR.png

I'm very glad to see this project happen but this wasn't a good first impression.

[+] cdash|11 years ago|reply
Seemed to work fine for me in Chrome on Windows.
[+] Al-Khwarizmi|11 years ago|reply
Same problem here. Windows 7 64-bit, Opera 12.17.
[+] oftenwrong|11 years ago|reply
Is "line" in this pronounced as in "gasoline" (lin) or "waistline" (laɪn) or "adrenaline" (lɪn)?

edit: Like gasoline. "Its name is to be pronounced like Pa-toe-leen, and it is the frenchifcation of the translation in portuguese of a joke in english"

[+] cevn|11 years ago|reply
I was ready to like this, but I ran into a lot of problems installing camlimages on os x. Anyone else? I'll keep working on it later.
[+] ajarmst|11 years ago|reply
Folks, stop trying to solve problems that have already been solved. Especially if it was Don Knuth who solved it. When that guy solves something, it stays solved.

In this case, you're up against something Don Knuth solved and then Leslie Lamport made more useable. You really want to compete with those two?

[+] sdp|11 years ago|reply
It's a mistake to think we can't improve on something because someone smart built it. In my academic community, everyone fights with LaTeX and I can only say with confidence that one person I know has mastered it. Everyone else just hacks at their document until it's close enough.
[+] pmeunier|11 years ago|reply
Thanks for your interest in our project, that hasn't released any version 0.1 in three years, and yet manages to concentrate a lot of (maybe necessary) love and hate.

First, about the technical points:

- Ocaml has been backward compatible for the last 20 years, which is why we rely on it for backward/forward compatibility. After some time, we obviously hope to release a forward compatible version of Patoline. As a side note, I've got several papers written in LaTeX on my hard drive, that don't compile anymore after only eight years.

I imagine that debugging and improving packages written in TeX is hard enough that authors who manage to do it do not bother about compatibility. With the exception, of course of those "who know TeX and LaTeX pretty well" (at least until the day they write their first package, like I did shortly before beginning Patoline).

Moreover, there is something called "a type system" that OCaml uses, that makes your code more likely that any other non-functional language to remain stable through time. I know there are people who do not acknowledge the existence of this, and confuse it with older systems such as type checking in C, or who believe functional programming is a parenthesis writing competition. I would like not to use the kind of authority arguments I've seen in this page to convince you. Trying ocaml or haskell is a good way, but you need to be willing to be convinced, which is usually not the case in this kind of discussions. At least it makes sure that the program cannot run into an "undefined behavior" without the author being aware of it, something that my own daily experience with programs such as svg2tex does not do.

- In our first project meeting about Patoline, it was decided to not choose a definitive language. This is probably the only design choice. Of course there is a default one (intended to be forward compatible, if you are still following), but you can change it. Like markdown? Write a compiler to Ocaml, it should not take more than a couple of hours, and you won't have to rewrite 20000 lines of code to handle the crappy font formats that Microsoft, Apple and Adobe have designed for you, nor 3000 to output reasonably portable PDF documents that most printers can print (maybe this is an explanation of the size).

- Knowing quads, struts, \expandafter and \futurelet is cool knowledge. Did you also known that TeX uses its own fixed-point algebra? While these are certainly "inventions of a genius", using 21st century numerical methods to adjust spaces is efficient use of science and technology, and that's what we do. By the way, have you heard of IEEE-754? It doesn't begin with a slash, I've seen it used at Caltech, probably Stanford knows about it too.

Now about other points:

- What is "feature-completeness"? I know "Turing-completeness", which is the ability to simulate any Turing machine. On the operating systems we have today, "Turing^OS-completeness" (the ability to simulate any Turing machine with the OS as an oracle) is probably a great feature too. This is something Patoline has, that TeX doesn't. Querying online bibliographic databases in Patoline is a matter of writing a few lines of OCaml. In TeX, it means writing pascal code, for a variant of pascal that can talk to the OS (web2c probably can, I'm sure, although it was not written by a genius).

- We also seek to provide a development platform for new typesetting algorithm. While Knuth may be regarded as "the man who invented dynamic programming", he was ten years old when Bellman discovered it. Today, we have other methods, such as approximation algorithms. We could even imagine learning good typographic choice using methods from machine learning. There is space for innovation on this planet, and although I use emacs and vim, and even pdflatex on a daily basis, I do not consider them a full stop to software.

[+] Argorak|11 years ago|reply
While this is half an answer to my post, I am incredibly put off by the tone and will not take a second look at your project.

Yes, I know what a modern type system is and what merits it has. I also know that TeX has it's own fixed point math. But I am not convinced by your product either and this is certainly not helping. I do also now know that one of the authors is incredibly snobbish about his product, which I have a strong aversion against. And this is where I'll stop.

[+] gone35|11 years ago|reply
To go along with the Gospel of Matthew references [1], "No one can serve two masters" (Matt 6:29).

This is a lesson your arguably most relevant predecessor, the NTS project from the early 2000s [3,4], learned too well --they too found TeX82 hopelessly quaint and monolithic, frozen in time for the sake of backward compatibility [2]:

"The stated aims of the project are very simple: to continue the tradition of Donald Knuth's TeX by providing first-class typesetting software which is both portable and available free of charge. But whereas TeX is now frozen (Knuth no longer has either the time or the inclination to extend it), NTS is intended to remain flexible and extensible. Indeed, its very raison d'être is to provide a portable platform on which experiments and extensions can be easily layered."

And how could anyone find fault in their reasoning? It felt right after all: in the dawn of the 21st century, with modern programming paradigms and all that computing power, why get stuck with Pascal and outdated space concerns? And so they did [2]:

"NTS is written in Java; the group debated for some time which language was the ideal language for a complete re-implementation, and although the original desiderata stressed that it should be a modern, rapid-prototyping language, further introspection suggested that a modern, object-oriented, truly portable language was even more important. On that basis, Sun's Java was chosen, and experience during the first year has suggested that that decision was justified. Even though Java lacks something in terms of type declarations and static polymorphism, its genuinely portable nature ("compile once, run anywhere" is Sun's justifiable claim for this language), combined with its network-awareness and widespread availability, make it an ideal language for the task."

Suffice to say their efforts didn't pan out as they hoped. It turns out, efficiency is a thing --even with modern hardware [5,6]-- and, as it happens, high extensibility/modularity and efficiency are very, very hard to achieve in the general case.

[1] https://news.ycombinator.com/item?id=8026671

[2] http://nts.tug.org/

[3] http://en.wikipedia.org/wiki/New_Typesetting_System

[4] Or their immediate successors, the ExTeX project - http://www.extex.org/

[5] http://en.wikipedia.org/wiki/Wirth%27s_law

[6] http://en.wikipedia.org/wiki/Induced_demand

[+] deskamess|11 years ago|reply
I am still not clear what the output format is - is it a set of html pages that can be hosted on a server? A compiler produces a binary and the binary generates document(s)?

I am looking for an authoring tool where I can create the text of a tutorial but have Bret Victor like sections of interactiveness that illustrate the details of the text. The end format would be a set of web pages. I could code it all up with html+javascript+backend but I am wondering if there is an existing tool to do all this (some sort of task specific editor).

[+] mangecoeur|11 years ago|reply
It's about time someone gave this a bash - though to some extent this covers the same ground a Pandoc.

However, some of the technological choices seem... unwise if the aim is a lively community project. Version control for instance - whatever technical advantages Darcs may have are heavily outweighed by it's much smaller user base (having to learn a new VCS just to contribute to a project is a massive pain!)

[+] jeffreyrogers|11 years ago|reply
What are the advantages of this over TeX/LaTeX? I already know LaTeX quite well as will most people who are looking for a typesetting system.
[+] a-nikolaev|11 years ago|reply
It seems, Patoline's syntax resembles LaTeX syntax a lot, but it lets you embed code more easily. And you can use OCaml or some it's subset instead of TeX's \if, \loop, etc [1], which is probably a good thing.

   \begin{genumerate}(AlphaLower, fun s -> [tT (s^". ")])
     \item First item
     \item Second item
   \end{genumerate}
Which produces:

   a. First item
   b. Second item
[1] http://en.wikibooks.org/wiki/LaTeX/Plain_TeX#Conditionals
[+] adrusi|11 years ago|reply
I find it ironic that a website for a typesetting tool requires me to zoom in to be able to comfortably read it.

It looks like it could be an improvement over TeX. I like that modularity is a focus, because that's where TeX really suffers. Hopefully the typographical elements will actually be composable and extendable rather than having the loose facade of such functionality found in TeX.