top | item 3965588

Show HN: PDF parsing in Racket, my weekend project

101 points| gcr | 14 years ago |planet.racket-lang.org

28 comments

order
[+] jonathansizz|14 years ago|reply
This is excellent. Having recently discovered Racket, it has become my language of choice for personal projects. It has been a revelation to me after a lengthy false start with Haskell ultimately led nowhere.

Racket really deserves a lot more recognition and visibility: the language builds on the traditional strengths of Scheme and adds multiple compatible and innovative language dialects, first-rate documentation, extensive libraries, a great IDE and other tools, along with an active and enthusiastic community.

I really hope Racket grows in popularity and we keep seeing more stories like this, as the developers deserve praise for what they have created. And of course, HN is built on it too!

[+] dsrguru|14 years ago|reply
Racket really is an awesome language. When you want a LISP that you can optimize to get near-C performance, go with Common Lisp. When you want a LISP that does concurrency more concisely than Scala or Erlang, go with Clojure. But for literally anything else, you probably want Racket. Don't forget to add its ridiculously awesome continuation-based web server to your above description!
[+] Peaker|14 years ago|reply
What kind of difficulties did you encounter when you tried Haskell?
[+] prezjordan|14 years ago|reply
As a sort of beginner to functional programming, could you explain to me why you chose a functional language to do this sort of thing? Just for fun? I love it but I can't see the benefits of functional programming.
[+] gcr|14 years ago|reply
Belive it or not, this library grew out of a direct need of mine. :)

One thing that I've been using Racket for is to make research posters for conferences. Racket has an excellent library for functional picture/slideshow composition; you can read about that here (which doubles as a great intro to racket in general): http://docs.racket-lang.org/quick/index.html

It's sort of like a "LaTeX for pictures"; where you can say

  (vc-append (square 10) (circle 10))
to have a 10px square sitting on top of a 10px circle (vertical, centered). Once you build your poster this way, you can save it as a PDF. This is geat for having perfectly aligned blocks of text sitting in perfectly spaced colums, for example. It's much better than fiddling with the layout manually in powerpoint.

However, in designing my poster, I have to include PDF figures. Racket didn't include a way of rendering PDFs, so in my last poster, I had to use 600DPI bitmaps of my figures, which was slow and made the file terribly huge. This library binds to libpoppler, which is great because Racket's native pictures are Cairo-backed anyway, and Racket's FFI is top notch (once you can figure it out). Now I can use the usual functional composition to add these PDF figures to the rest of my poster.

[+] pavpanchekha|14 years ago|reply
It's not a functional programming language. It isn't Haskell or Coq. It's Racket, a derivative of Scheme, which happily supports mutable or immutable state, monads or continuations, imperative, procedural, functional, object-oriented, or logical programming.
[+] zem|14 years ago|reply
parsing is absolutely the sort of problem for which functional programming excels. if you think about it, there is no time dependence, no need to respond to external inputs, no concurrent access to mutable data, none of the things that would make pure functional programming a constraint rather than a help. all your are doing is writing one conceptual function of the form output = f(input), and that transformation is in turn made up of smaller transformations that can be written and tested independently, and then composed together to build up the solution.

now it might still be a little harder to do this sort of thing in haskell, which is aggressively pure, simply because some algorithms are pure from the outside but have internal steps that involve mutating data for efficiency. but racket is not a pure functional language; if you need to, say, transform an array in place rather than take an array and return a new one, it will not stand in your way. the difference between racket and, say, java, is not that it enforces functional over imperative programming, but that it makes functional programming a lot easier, and fp has a lot of powerful tools in its toolbox for tackling this class of problem.

[+] exim|14 years ago|reply
This is great of course, but I'd be more happy to see the actual PDF renderer with a more permissive license than GPL.
[+] gcr|14 years ago|reply
I would too, but

- To my knowledge, there are no sane C-based PDF renderers with a permissive BSD-like license

- Poppler renders to Cairo --- a _huge_ win because I don't actually have to do anything to convert Cairo surfaces to Racket's native drawing type

- I don't know / don't have the time to implement my own PDF reader/parser from scratch.