top | item 43004386

(no title)

adelpozo | 1 year ago

it does not have any dependency to a pdf parsing library, correct? That's a cool way to learn to file format and be able to work around weird pdf file. But what was the motivation to not use a library to do the pdf parsing work? is it the case that there is none available? Nice work!

discuss

desgeeko|1 year ago

Correct, PDFSyntax implements everything at the lowest level. You can ignore the HTML visualization and use it as an API to access PDF objects. Why? Because I started a very small tool as a week-end project and I got hooked reading the PDF Specification so it is becoming a general purpose PDF library for Python. I am not familiar with other libraries but I have the impression that mine implements things that are often overlooked in others, like incremental updates.