top | item 30709524

Show HN: PDF API – Generate, convert, and modify PDF documents

204 points| arkgil | 4 years ago

Hi HN,

Arek here. We’re super excited to officially launch PSPDFKit API [1].

PSPDFKit API is a collection of HTTP APIs that enable you to convert, generate, and edit documents without running any service on your infrastructure.

What differentiates our API from others is that you can chain together multiple “actions” as part of a single API request. For example, you can convert, OCR, watermark, edit, and flatten a document — all in one call.

Available actions [2]:

- PDF Generator

- PDF Converter

- Image Converter

- OCR

- Watermark

- Merge

- Split

- Duplicate

- Delete

- Flatten

Our documentation includes sample code for JavaScript [3], Python [4], Java [5], C# [6], PHP [7], and the command line. We also have a Postman collection [8].

Let us know what you think or if you have any questions.

[1] https://pspdfkit.com/api/

[2] https://pspdfkit.com/api/documentation/tools-and-api/

[3] https://pspdfkit.com/api/tools/javascript/

[4] https://pspdfkit.com/api/tools/python/

[5] https://pspdfkit.com/api/tools/java/

[6] https://pspdfkit.com/api/tools/csharp/

[7] https://pspdfkit.com/api/tools/php/

[8] https://pspdfkit.com/api/documentation/getting-started/postm...

116 comments

order

lgvld|4 years ago

Very useful product, congratulations. ;-)

Quite expensive though. When I use an API, I usually assume (1) there will be some significant base volume, and (2) this volume has no upper bound, depending on my users behavior. For ~750 € you can only process 1k documents during the month... hard-capped? The price schemes seem to target entreprise but entreprises usually have bigger volumes than that. (But maybe I confuse API calls and document processing with your product?)

But it’s nice you released SDK in several common languages.

Good luck!

TheJoeMan|4 years ago

Wow so if I’m reading this they charge about $1 USD per “document”?

I’ve been looking for an easy OCR solution considering I have about 10,000 one-page documents a month (invoices). For comparison, Amazon textract is ~$0.05/pg for key-value pairs, but it involves more programming to set up.

fullyforged|4 years ago

Thanks! This is Claudio, PSPDFKit's CTO.

At this point in time the price is per generated document - irrespectively of how complicated the operation is.

Because you can combine operations in one http call, you're incentivised to do that as opposed to perform separate calls which increase the possibility of errors and cost for all sides.

Happily taking feedback though - your comment around hard-cap is definitely sound, for example.

victop|4 years ago

For higher volume, but simpler operations (merge, watermark, encrypt/decrypt, etc.) you can try https://www.pdfblocks.com/api. You get 10K docs processed for $29/mo, and 1M for $99/mo. We don't have conversion, OCR, generation, or chained operations though.

Source: I work at PDF Blocks

gyulai|4 years ago

Not a good usecase for an online API. To the extent that those PDFs could include sensitive information, there's a huge security/privacy headache there, with no real benefit when compared to performing these functions offline. It also seems to me a lot more expensive than alternative ways of doing the same thing.

chadash|4 years ago

> with no real benefit when compared to performing these functions offline

Most organizations these days are developing in the cloud. I assume you don't mean "offline" but rather performing these functions yourself.

I've tried, but it's a huge pain in the butt. PDFs are very quirky. Things work well 95% of the time and the 5% takes a lot of time to figure out.

When trying to do this myself in an app deployed to AWS, I've had many issues with getting all characters in different languages to work. Every few months, some new thing in a PDF file throws an error and the file won't generate. You get weird file size errors. And the quality of PDF generation varies a lot by language. I'd much rather have an API that I can just call from any of my code if it JUST WORKS.

Now, their pricing is strange and might be a dealbreaker for me. I'd like to see an option to pay per transaction with no cap without having to negotiate with their sales team.

> there's a huge security/privacy headache there

eh, maybe. Some use cases don't require privacy. In my case, I'm mostly assembling PDFs from various sources with my company's documents. No, I don't want a vendor that is going to post my documents to twitter, but I can sleep at night if I have some kind of assurance that they don't use or sell my data.

simion314|4 years ago

Would work if you want to publish the pdf anyway.

chrisseaton|4 years ago

Vast majority of organisations already store all their working documents and data in the cloud.

matchagaucho|4 years ago

FUD. Most companies are using Box, Dropbox, or some variant of cloud-based document storage today. Extending storage services with document transformations and conversions is a logical evolution.

yyyk|4 years ago

A few years ago, $COMPANY had similar needs for a client. I ended up creating an in-house solution, which has a surprisingly close API (well, there are only so many ways to do this).

So I asked myself: if this kit existed at the time, would we have used it? I don't think so. For all specified pricing plans, the document limit is way too low for what $COMPANY or its clients do. Judging by the progression of the costs, we'd have gone in-house instead of negotiating an Enterprise plan.

If you don't want to adjust pricing, perhaps you could add a consumption based plan. The plan could have a much larger limit, but the client also pays per API call.

m12k|4 years ago

I recently discovered that search-replacing text in a PDF without changing the layout is much harder than I thought it would be (a customer forgot to change their billing address, and now that the invoice is finalized, Stripe won't let me edit anything, so down the PDF-editing rabbithole I went). I would love it if I could just use an API for this.

zubspace|4 years ago

There are so many ways to layout text on a PDF page, that this is nearly impossible to implement for all scenarios. I don't know a PDF editor which works in all cases.

Sometimes text is positioned absolute to the page border, sometimes relative to other elements, where moving a word shifts all following elements around. There can be multiple matrices involved for positioning text elements. Sometimes text elements are all positioned independently, sometimes by using newlines with custom size. Text elements can span multiple lines or words but sometimes each letter is a single text element where it is even hard to determine, which letters go together or if there's meant to be a space. Additionally fonts can be subsetted, where it's impossible to use other unused letters without knowing the original font. And than there can be OCR'ed PDF's, where an image of scanned text is overlayed on top of the real text. Oh and there can be clipping paths: Rectangles which erase all text below.

And each PDF-Producer creates a different PDF structure.

For reading, PDF's are awesome. For editing, PDF's are a nightmare.

laurent123456|4 years ago

If it's just one off, I'd draw a white rectangle over the text that needs to be changed, then add the text on top of that.

danielrhodes|4 years ago

This isn't easy because PDFs are PostScript, so text is laid out absolutely. You can make very small changes but a larger change requiring a reflow of the text would break things. In some cases it is possible to convert the PDF to a Word document, make edits, and then save it back to a PDF.

martin_a|4 years ago

You only need an Acrobat Pro for that.* That's daily business for me, although not with invoices but printing data.

* (becomes harder when the font is not embedded/existent as a subset, but Acrobat let's you choose another font, so no big deal.)

citruscomputing|4 years ago

What I used for this exact problem was pdftk's `stamp` option, with a stamp pdf that was just a white rectangle with text on it, as a sibling commenter mentioned. Worked for several hundred documents!

voiper1|4 years ago

I recently went down the PDF rabbit hole for a project.

I had to use different OSS tools to do everything I wanted. I was able to access three from within nodejs without touching the disk:

1) Libreoffice CLI for converting doc/docx to PDF. It handled the formatting remarkably well. WARNING: you must have the fonts on the system doing the generating or it will substitute "similar" fonts! NPM: libreoffice-convert

2) NPM pdfjs-dist from mozila for extracting text and finding page numbers.

3) NPM pdf-lib for manipulating PDFs: deleting pages, adding pages from other PDF files (even to the middle of a PDF.)

4) PDF Jam commandline for resizing a pdf `pdfjam --keepinfo --outfile "${path}.resized.pdf" --paper letterpaper "${path}"`;

danielrhodes|4 years ago

Libreoffice only does a mediocre job of rendering Word documents. There are a number of cases where it really mangles things. An example would be some types of bulleted lists or indentation.

ianhawes|4 years ago

Just an FYI but PSPDFKit has a very predatory sales model. Our organization received pricing that was generally very high and we pushed back on it because we were a startup and it was outside of our budget.

I've connected with several other customers of PSPDFKit over the years and they almost all have much more reasonable pricing.

Beware!

ethotool|4 years ago

I experienced the same. We ended up developing our own solution and no longer have to rely on any 3rd party framework.

user_7832|4 years ago

I've always wondered (as someone who hasn't made corporate purchases), wouldn't there be a point where it would be easier to teach all employees LibreOffice (Draw) instead of buying proprietary/expensive software that might break/has external dependencies?

upbeat_general|4 years ago

Maybe I’m missing something but how is that a “predatory sales model”?

Their prices may be too high, so you declined to pay. I don’t see that as predatory.

etothepii|4 years ago

How have you avoided the AGPL headache that comes with almost all the open source libraries for PDF editing?

Have you written your own code from scratch?

arkgil|4 years ago

Our engine is based on Google's PDFium, which is Apache licensed. We use it for rendering and reading the PDF object tree. Editing, annotations, etc. are all built on top of that.

koinexpert|4 years ago

Very cool!

Consider perhaps adding one more call, one to remove passwords. Not brute force, although that would be cool too, but just one that let's you specify the password and it makes the pdf no longer require a password.

lgvld|4 years ago

Cool idea, yep. ;-)

throw03172019|4 years ago

It all sounds great until you get a quote. $15,000/yr for an API to sign a PDF. Come on.

smashah|4 years ago

I was contracted to make a legal document generation service for a client. I looked at all of these tools and decided to just use HTML/CSS and then print to PDF via puppeteer on a serverless cloud function.

yashg|4 years ago

Congratulations. Last year I launched a PDF generator API here on HN, got zero upvotes, you have managed to get to front page. Wish you all the success.

zihotki|4 years ago

Some time ago at a $Company we needed to generate pdfs and also OCR incoming documents. In order to quickly release a product we decided to use online API from a $Vendor. Initial price was quite OK-ish but a year-two later we saw a significant increase in price. At that time we also started moving to on-premise hosting to decrease latency and to address other GDPR stuff. Considering high volume of documents it was just too much for us and the $Vendor didn't want to negotiate. In addition to that we also needed to implement reports to show to the $Vendor how many items we procesed for on-premise licensing..

Long story short, instead of that we spent a few two-week sprints of two-men team and were able to successfully fulfill our needs using open source software. $Company saved hundreds of thousands per year. We also tried to influence company to donate to OSS, that unfortunately never happened, but that's another story.

So please be aware of vendor lock-in and of possible price increase. Always think of a plan-b.

tonyedgecombe|4 years ago

>We also tried to influence company to donate to OSS, that unfortunately never happened, but that's another story.

I don't blame you for going down that route. But it feels to me that open source is devaluing our work. PDF is a big and complex specification, there must be thousands of hours of work in the software you chose and yet you are getting all that value for free. Is there any other industry that does this to itself?

wfn|4 years ago

Very nice and useful product!

Question: do you have or plan to support PDF signatures? This may be then useful for us[1], we issue qualified certificates and eIDAS-compliant legal qualified electronic signatures which often need to then be embedded into PDFs.

[1]: https://www.zealid.com/en/

arkgil|4 years ago

Signing is something we'd like to explore, we often hear from folks who'd want to simplify their signing workflows. Thanks for the feedback!

brudgers|4 years ago

It looks interesting, but addresses a scale that I don't work on.

For my personal needs, I use pdftk from the command line.

vmchale|4 years ago

dawg why can't you just make a normal program I don't want a REST API

fullyforged|4 years ago

What languages do you need support for?

mdellavo|4 years ago

How well does this handle large tables that span pages? That seems to be a key differentiator for most PDF libs I sampled. I'd assume this works well if it's coming from Chromium

arkgil|4 years ago

Do you mean tables when converting HTML to PDF, or simply rendering the PDFs with tables in them?

SilentM68|4 years ago

Does this API allow for the generation of accessible documents, such as PDFs, which can then be read by blind persons using a screen reader such as Jaws or NVDA? These tools have the ability to bring up a dialog box (e.g. elements list) listing the links, headings, form fields, buttons and landmarks present on documents, (e.g. html, pdf, and so on) that blind people would need in order for them to navigate a document.

fullyforged|4 years ago

We’ve done some tests in that area and while Chromium is technically able to generate tagged PDFs, which would be accessible for the most part, it’s far from perfect.

We have some work planned in that direction, but nothing close to release at this stage.

steerablesafe|4 years ago

From the pricing page, limits seem to be on number of documents, not number of pages. Is the number of pages per document also limited?

fullyforged|4 years ago

No, just number of created documents.

somehowadev|4 years ago

The pricing may need to have a revisit! Enterprises would probably be the most keen and also find better alternatives for the cost.

kvz|4 years ago

> What differentiates our API from others is that you can chain together multiple “actions” as part of a single API request.

https://transloadit.com offers similar composable workflows in a single request, and supports more file types besides PDFs.

Disclosure, I am a founder :)

sideproject|4 years ago

Nice set of tools!! I recently launched a PDF-related project.

https://www.scholars.io

It's a tool for reading research papers (PDFs) together with your colleagues. You can read, annotate, comment etc.

Needless to say, it led me down quite deep into the PDF world and it.. was interesting.

wolverine876|4 years ago

Incidentally, I wonder if you can answer a question: I want my books in an electronic format, that will be usable for the rest of my life or longer, and which preservers annotations.

As far as I know, PDF/A is the only format that fits the first two specs. I know annotations are in the PDF specs but is it reasonable to think that annotations I make today will be readable - and updatable - in (e.g.,) 30 years?

stevenminhhh|4 years ago

Do you have any plan in your roadmap to support different languages in the OCR feature? I'm specifically interested in recognize and processing PDF files written in Japanese and Korean.

I am also dealing with some clients that are struggling with processing handwriting in their document, but I guess it will be a little far fetched.

BOOSTERHIDROGEN|4 years ago

I’m sorry if this stupid question. What kind of industry (having thousand process document a month) or use case for someone using this maybe an expensive tools ? If there’s a use case what is the manual process that usually happen, thanks

renato_casutt|4 years ago

Hi Arek...congrats on the launch.

Maybe PSPDFKit is interested in integrating Bionic Reading into their products? Take a look at the website (bionic-reading.com) to see if BR can add value to your users.

Let me know if you are interested and best regards from the Swiss Alps, Renato

jwillmer|4 years ago

We need to offline convert HTML to PDF. We created a small docker container with Chromium and Selenium and added a small HTTP API layer on top. Works like a charm and it is easy to keep it up to date.

gw67|4 years ago

Would be great to have a file upload button to test the OCR API from the UI, without to perform the CURL. Just to test how your API works from UI.

martin_a|4 years ago

Which rendering engine are you using in the backend?

whylo|4 years ago

The metadata on an output PDF I tested says Skia (though I guess that could be being wrapped by another library)

kareemm|4 years ago

Is this used for creating pdfs from scratch (a docraptor alternative)?

Or filling out pdfs programmatically (a docspring alternative)?

fullyforged|4 years ago

You can create a PDF from scratch starting from HTML - see https://pspdfkit.com/api/documentation/developer-guides/pdf-.... Note that HTML generation has a few nice quality of life additions around headers/footers, logos and conversion of HTML forms to PDF forms, which are things that you don't normally get with the print to PDF workflow you would normally build from scratch.

Filling out forms is not supported, but I'll take a note. The engine can do it, but we haven't got it exposed via the API.

amluto|4 years ago

I find this utterly bizarre. Once upon a time, if you wanted to left pad a string, you would just do it. A while later, people discovered that you could use a library. (I’m joking a bit here, but libraries are genuinely useful.). With a library, you get to pick from various schemes and schedules for updating the library, but you have a degree of control.

But now apparently you’re supposed to use a web API and depend on an external service. This has all kinds of downsides: it has latency (and potentially tail latency). It has larger security issues. It doesn’t work in many sandboxes. It requires an asynchronous call. Callers have to handle timeouts and retries. (If you left pad a string with a normal library, it either works or it doesn’t. With a web service, it can fail transiently or give wrong answers transiently.). It updates on its own schedule, without notice, and cannot be rolled back. And it can charge an utterly outrageous per-call price, so instead of merely profiling and debugging slowness due to making too many calls, developers also have to worry about inadvertently spending hundreds of thousands of dollars.

Replace “left pad a string” with “generate a PDF” and you get this. Why is this desirable?

I suppose things like this may partially explain the stunning slowness of bank websites.

danielrhodes|4 years ago

This really does not resonate at all, and I have the scars to prove it.

I used to work on a browser-based document management system, and I would have used (or at least tried) all of these APIs without hesitation. PDFs are a pain and the mish mash of poor functioning tools that exist provides a constant headache.

1) OCR'ing of a PDF is difficult. The only good service is Google, but requires that you break it into pages as images to be performant. This would have simplified things greatly. Even if the PDF has text inside and is not an image, it can be wrong or not laid out in a linear way, so you have to OCR it. Command line tools do not get you very far. An example: if you OCR or text extract a PDF with multiple columns of text, does it handle the columns well?

2) People want searchable OCR'd PDFs where you can highlight the text, even when it's a bitmap underneath. This requires a technique where you overlay transparent text in the exact position of text in the bitmap. This does not come for free and I've only seen this done on proprietary Windows-only software. This alone would be worth it.

3) Office to PDF is an extremely standard need, especially if you want to display them online. But it's not easy. You have to hack together a headless OpenOffice to have it work at all, but it doesn't do a great job. It's difficult to do well because Office docs are like HTML pages in that it greatly depends on the renderer, not to mention the fonts. Microsoft does not offer a service to do this, unfortunately. If you think anything will do, it really won't: when people see their PDF looks very different than what they saw on Word, they get upset.

4) Table extraction APIs are super important, especially if you are trying to automatically extract data from PDFs (e.g. analyze financial disclosures). There have been whole startups dedicated to this.

5) HTML to PDF is also a pain: you have to set up an instance that is running headless Chromium, which can be quite slow. This has become the defacto standard to quickly create complex PDFs. Having a simple API wrapper around this is just one less thing to manage.

The rest of the APIs, like the merging/splitting/watermarking etc., are pretty standard and you do not need APIs if you already have access to the PDF on a server. But if you were in a browser or on mobile, you might not.

ho_schi|4 years ago

Same on my mind. Let say you have to create an invoice for a customer and your operations stop just because your not using {Cario, Skia, PoDoFo, JagdPDF, Haru, Whatever} on the local environment but relied upon an external service which halted. This introduces a huge dependency chain across the web. But they don't provide anything which cannot provided autonomously by a local library. Integrate with external services because you must and not because you can.

newlisp|4 years ago

Nodejs forces this architecture(no, worker threads are not a solution, they are heavy and have too many restrictions), you don't want to slow down the event loop with heavy PDF processing.

jfk13|4 years ago

Taking a quick look at https://pspdfkit.com/api/documentation/tools-and-api/, I'm puzzled.... what distinguishes a "PDF Generator" from a "PDF Creator" from a "PDF Writer"? How would I know which one I want?

Oh, looks like they're the exact same thing: a webpage-to-PDF service.

Then there are a whole bunch of "PDF Converter" options, including "HTML > PDF", which seems to be yet another name for the same thing.

For me, all this has a whiff of SEO spam that I find quite distasteful. Just tell me what the product does. Don't try to list it under a collection of different titles in the hope of catching more search terms, it just makes you sound like a snake-oil salesman.

arkgil|4 years ago

I'm sorry you find it distasteful - that was never our intention. What we found was that a user searching for a PDF generator, creator or writer are generally looking for the same solution - to create a PDF. So by repositioning our tool we were hoping to provide a better landing page experience for users that were searching for one of those specific keywords.

Of course the downside is as you've pointed out - it can be seen as distasteful and in some cases confusing to our users. We will review this on our side and see if it makes sense to remove some of those tools to reduce confusion.

unknown|4 years ago

[deleted]