Paperless-Ngx v2.0.0

[+] ydant|2 years ago|reply

There was a pretty big discussion about paperless-ngx a couple of months ago:

https://news.ycombinator.com/item?id=37800951 (183 comments)

I tested it out then and am considering migrating from my current system (Google Drive) to using a self-hosted approach. Paperless seems to have a good approach for minimizing the mental overhead of ingesting and categorizing new documents - which is what ultimately leads me to stacking documents up for months before processing them. My initial pilot run was promising, but I haven't gotten around to switching yet.

From the changelog, it's not really clear to me what's notable about this release, especially as a new/potential user.

This page is a better introduction to the product, although it doesn't mention the v2 release yet:

https://docs.paperless-ngx.com/

[+] andrew_eu|2 years ago|reply

I've been using Paperless for several years now very happily and can recommend it over my previous system, also Google Drive. During the transition I found it helpful to set up a cron which (A) made an export of Paperless and (B) uploaded that export to a Google Drive folder.

One feature which seems to be quite a nice improvement (speculating as I haven't upgraded yet) is consumption templates [0]. My workflow involves an ADF scanner with an Android application, sharing the scanned PDF with Paperless Share [1] and then it's uploaded to the server via API. It seems that consumption templates will enable adjusting tags/sharing settings/permissions of a document at ingestion time based on where it's ingested from.

[0] https://github.com/paperless-ngx/paperless-ngx/pull/4196

[1] https://github.com/qcasey/paperless_share

[+] ydant|2 years ago|reply

One feature that isn't mentioned on this release that I was looking for before actually got added in the RC1 for 2.0.0:

https://github.com/paperless-ngx/paperless-ngx/releases/tag/...

    * Feature: Implement custom fields for documents @stumpylog (#4502)

[+] jdoss|2 years ago|reply

If anyone is looking to kick the tires of Paperless NGX quickly, check out my little pet project [1] for running it with Podman. I use it every week to scan papers from my Brother ADS2800w which will SFTP the PDFs into a directory for Paperless NGX to consume.

I just updated my install to v2.0.0 with a simple podman pull and a systemctl restart of my paperless pod and everything looks great. Hats off to the contributors of the project. Every update, even major ones like this have been really smooth.

1: https://github.com/jdoss/ppngx

[+] cwiggs|2 years ago|reply

How do you like this setup?

I've been thinking of moving from docker-compose to podman, specifically using the [podman-play-kube](https://docs.podman.io/en/v4.2/markdown/podman-play-kube.1.h...) but haven't gotten around to it.

I like Podman has a lot to offer for self-hosters but it isn't popular (yet?)

[+] edward|2 years ago|reply

I love paperless-ngx but I wish it had a rotate button. Some of my document scans are upside down.

[+] diarrhea|2 years ago|reply

I don't think I'd be comfortable with it having elaborate editing functionality. PDF editing in a browser is finicky, and an enormous bug fest.

I do PDF editing offline, on the desktop, then re-upload to paperless. Not the most integrated flow, but much more bulletproof. I want the PDFs themselves to be immutable once on paperless. Only metadata should be editable.

[+] prometheanfire|2 years ago|reply

There is an issue about this, basically it's not going to happen because it is editing functionality. They suggest using another solution before import (build a pipeline).

[+] ndsipa_pomu|2 years ago|reply

It does have rotate clockwise/anticlockwise

[+] el_sinchi|2 years ago|reply

you can use an opensource tool for scanning, like NAPS2, which will let you rotate before you mail it to paperless-ngx

[+] CommanderData|2 years ago|reply

I wish paperless-ngx included native advertising to printers for the "Sent to PC" feature.

Last I checked it doesn't and had to run a separate service to advertise to the printer the paperless endpoint.

[+] throwaway69123|2 years ago|reply

What service do you run for this?

[+] matrss|2 years ago|reply

I haven't been using it too much yet but I am really impressed by paperless-ngx so far. It just works(TM) and the auto-tagging functionality is surprisingly good, even with just a few documents in it.

Does anyone have a good scanner recommendation though? I am eyeing the Brother ADS-1700W since it seems to be recommended often, but I would really like to use the "scan to webhook" feature (it's 2023 after all) instead of SMTP or whatever else are the options I would have with the Brother.

[+] draugadrotten|2 years ago|reply

Recommendation: https://www.quickscanapp.com/

I am using iPhone as a scanner and it automatically scans, OCRs, uploads and ingests to the paperless-ngx instance, even remotely using tailscale.

The iPhone camera is more than good enough for scanning documents.

[+] pintxo|2 years ago|reply

I am scanning from my Brother multi-function device to an SMB share, which paperless monitors for changes. Works like a charm. You can even bulk move files there using your local file manager.

[+] tecleandor|2 years ago|reply

I'll start with Paperless NGX sooon, and after looking around for lots of document scanners with autofeed (that are quite expensive) I found that in my office they were getting rid of a big multifunction HP printer that was sitting unused since COVID and remote work, and I got that for free.

I'll clean all the rollers and stuff next week and test it :P

[+] andrew_eu|2 years ago|reply

I've had great luck with an Epson Workforce scanner. Originally I got it to scan ~10k family photos -- took about 1 hour and entirely smooth.

In that case I scanned to a USB drive attached to the scanner (since each photo was a separate file). For Paperless I use the Epson Smart app, scan the document with whatever settings, remove/rotate pages as needed, and then share it to Paperless with Paperless Share [0].

Many network attached scanners can scan to SMB, no device needed, but I kind of like the human-in-the-loop aspect. Since my Paperless server runs on an HDD next to the scanner I can actually hear once the file lands which is quite satisfying.

[0] https://github.com/qcasey/paperless_share

[+] daveguy|2 years ago|reply

Paperless-ngx + ScanSnap iX1600. Works with a samba share that is very easy to set up in Linux these days. Fast, easy, and you can have different scan profiles to set the destination folder. Push a button for the type and a button to scan. Paperless-ngx automatically files and tags reliably. It is saving me hours per week in filing. Can't recommend it enough. This is a personal system -- not sure how it would scale to 100k - 1M+.

[+] WXLCKNO|2 years ago|reply

Almost 600 Canadian for that scanner. Is it mainly that's it's incredibly fast and can go through a stack of pages?

[+] xattt|2 years ago|reply

I’ve got an ix500 and I’m suffering for no SMB support.

The only thing that comes to mind is either do a convoluted SnapScan Online -> Google Drive -> rclone -> Paperless or bite the bullet and figure out how to directly scan into the local box via USB.

[+] somehnguy|2 years ago|reply

Paperless is one of my favorite pieces of software. A few years ago I got fed up with my filing cabinet full of folders & tons of documents that didn't quite fit into any of the categories.

I installed Paperless on my home server & spent a night digitizing everything. After being comfortable with it for a few months I went back & shredded all my paper copies. Today my process is similar - when I get a document I would normally toss in that filing cabinet I just scan, upload to Paperless, and shred it. It's also really nice for storing large purchase receipts - I've previously had the writing on thermal paper receipts go invisible after a period of time, no longer an issue.

Searching for something specific is so easy now! Huge QOL improvement. Just make sure you have a solid backup strategy, losing my Paperless database & filestore would be devastating.

[+] itslennysfault|2 years ago|reply

Just curiosity... What does "ngx" mean in this context?

To me it means Angular (the web framework). So, I was surprised to learn this wasn't an Angular plugin. Angular is often referred to as ng for short and as such their plugins tend to have ngx as a prefix. For example, the angular wrapper for ChartJS is ngx-chartjs.

[+] georgehotelling|2 years ago|reply

Paperless started as "paperless" but the dev stopped work so another dev forked it to "paperless-ng" (for "next generation" I think). That dev, too, stopped work, so "paperless-ngx" was created.

The paperless-ngx's core team focused on gathering a group of people to support it to avoid any burnout problems and keep the project sustainable.

[+] ydant|2 years ago|reply

I don't know if it has a specific meaning. There have been multiple forks:

paperless (https://github.com/the-paperless-project/paperless) -> paperless-ng (https://github.com/jonaswinkler/paperless-ng/) -> paperless-ngx (https://github.com/paperless-ngx/paperless-ngx/)

[+] unknown|2 years ago|reply

[deleted]

[+] __jonas|2 years ago|reply

As others said I'm not sure if the name relates to Angular but it's worth saying that the frontend is in fact Angular

https://github.com/jonaswinkler/paperless-ng/tree/master/src...

[+] jdoss|2 years ago|reply

Paperless was a project and then it died, so it got forked to Paperless NG (Next Generation). Paperless NG died off and it got forked again to Paperless NGX.

At least that is my understanding following the Paperless project over the years.

[+] lhl|2 years ago|reply

I set up paperless-ngx w/ a scanner attached to my nas and a bit of scripting to get the scan button working a while back, but then forgot about it.

For me, as someone who wants my docs on my own server, but well, doesn't care enough to want to constantly keep up with forks/changes/migration/updates, I've been looking for just something stable I can use for years (or maybe decades?, eg part of the appeal of something like Obisidian is that it just falls back to .md text files).

Curious if there are any long-term active users of this (or other systems) for handling all their paper and what they think about maintainability/longevity?

[+] nitsua2|2 years ago|reply

I had the same concern as you when I started, and after roughly two years of use I’ve been impressed with how minimal the maintenance overhead has been.

So far I’ve probably updated the software ~5 times across various releases, each time I’ve updated it been because there was a new feature I wanted rather than needing to pull in fixes (the software has been bug free for me). The update process is well documented and very straight forward if you are using their docker compose setup to run the application

[+] Wool2662|2 years ago|reply

I have been using paperless for years now. There was the 1 issue a while back when the original maintainer stopped and they had to fork it. But otherwise it's super stable. They keep to semver religiously and all your documents are neatly organised in original format on disk if you ever need them.

[+] sigwinch28|2 years ago|reply

I am in the process of getting this running on a Kubernetes cluster in my home. That’s where I throw all self-hosted containerised applications these days. But there’s a bit of friction.

Their entrypoint script makes a lot of assumptions and in their docker-compose example they use a single container running supervisord instead of multiple containers, each with a dedicated purpose (ingestion, consuming, web server). The setup is almost insistent on logging to a file instead of stdout. It also checks and tries to modify permissions of some folders(!!). This requires quite a bit of unpicking.

This is doable, but not frictionless to get it to do what I consider “best practices” but I understand that it’s probably a mix of “easy for someone who’s day job is not to be an infrastructure engineer” and “we were using supervisord for baremetal anyway”. Maybe a lot of it is personal preference but I do feel like the project is not taking containerisation fully to heart. Maybe being more user-friendly in their eyes is more important than being a containerisation purist.

Either way, I’ve got it nearly working with my Brother ADS-1700W, which has shortcuts for me, my wife, and “joint”, which uploads documents to different directories via SFTP which then automatically have their paperless-ngx owner set appropriately.

[+] ornornor|2 years ago|reply

I finally switched from my ancient Mayan EDMS running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore. I’m not a huge user but I shred everything I can and have around 1000 documents.

I have zero regrets so far. Paperless ngx is so much more user friendly, the automatic date extraction from OCR, the auto tagging and document type classification, and the ease to backup and restore sold me. I highly recommend it.

[+] justsomehnguy|2 years ago|reply

> running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore

For years I was eyeing Mayan as one the variants I could use. Not anymore.

[+] rmu09|2 years ago|reply

I recently migrated from another (more "enterprisey") open-source EDMS system that shall remain unnamed to paperless-ngx. Can't praise this high enough. Where the other system needed multiple clicks for the easiest things and had a bunch of UI antifeatures, paperless has a very intuitive and well thought-out UI and handles ~30k documents without issues.

[+] tobi1449|2 years ago|reply

Has any paperless user found a good way to "deskew" scanned pages? Sometimes, when scanning from my Brother printer through the ADF, the pages are skewed/rotated and it can be pretty jarring.

[+] tyingq|2 years ago|reply

There's this:

https://github.com/the-paperless-project/paperless/issues/20

I don't know if it made it's way into this fork.

[+] KennyBlanken|2 years ago|reply

Deskew is on by default unless you disabled it?

[+] cgeier|2 years ago|reply

I'd love for this to be able to use something like s3 as a backend and (tax) audit prove archiving.

[+] trallnag|2 years ago|reply

There are various FUSE-based file systems that use S3 under the hood.

[+] unknown|2 years ago|reply

[deleted]

72 comments