I tested it out then and am considering migrating from my current system (Google Drive) to using a self-hosted approach. Paperless seems to have a good approach for minimizing the mental overhead of ingesting and categorizing new documents - which is what ultimately leads me to stacking documents up for months before processing them. My initial pilot run was promising, but I haven't gotten around to switching yet.
From the changelog, it's not really clear to me what's notable about this release, especially as a new/potential user.
This page is a better introduction to the product, although it doesn't mention the v2 release yet:
I've been using Paperless for several years now very happily and can recommend it over my previous system, also Google Drive. During the transition I found it helpful to set up a cron which (A) made an export of Paperless and (B) uploaded that export to a Google Drive folder.
One feature which seems to be quite a nice improvement (speculating as I haven't upgraded yet) is consumption templates [0]. My workflow involves an ADF scanner with an Android application, sharing the scanned PDF with Paperless Share [1] and then it's uploaded to the server via API. It seems that consumption templates will enable adjusting tags/sharing settings/permissions of a document at ingestion time based on where it's ingested from.
If anyone is looking to kick the tires of Paperless NGX quickly, check out my little pet project [1] for running it with Podman. I use it every week to scan papers from my Brother ADS2800w which will SFTP the PDFs into a directory for Paperless NGX to consume.
I just updated my install to v2.0.0 with a simple podman pull and a systemctl restart of my paperless pod and everything looks great. Hats off to the contributors of the project. Every update, even major ones like this have been really smooth.
I don't think I'd be comfortable with it having elaborate editing functionality. PDF editing in a browser is finicky, and an enormous bug fest.
I do PDF editing offline, on the desktop, then re-upload to paperless. Not the most integrated flow, but much more bulletproof. I want the PDFs themselves to be immutable once on paperless. Only metadata should be editable.
There is an issue about this, basically it's not going to happen because it is editing functionality. They suggest using another solution before import (build a pipeline).
I haven't been using it too much yet but I am really impressed by paperless-ngx so far. It just works(TM) and the auto-tagging functionality is surprisingly good, even with just a few documents in it.
Does anyone have a good scanner recommendation though? I am eyeing the Brother ADS-1700W since it seems to be recommended often, but I would really like to use the "scan to webhook" feature (it's 2023 after all) instead of SMTP or whatever else are the options I would have with the Brother.
I am scanning from my Brother multi-function device to an SMB share, which paperless monitors for changes. Works like a charm. You can even bulk move files there using your local file manager.
I'll start with Paperless NGX sooon, and after looking around for lots of document scanners with autofeed (that are quite expensive) I found that in my office they were getting rid of a big multifunction HP printer that was sitting unused since COVID and remote work, and I got that for free.
I'll clean all the rollers and stuff next week and test it :P
I've had great luck with an Epson Workforce scanner. Originally I got it to scan ~10k family photos -- took about 1 hour and entirely smooth.
In that case I scanned to a USB drive attached to the scanner (since each photo was a separate file). For Paperless I use the Epson Smart app, scan the document with whatever settings, remove/rotate pages as needed, and then share it to Paperless with Paperless Share [0].
Many network attached scanners can scan to SMB, no device needed, but I kind of like the human-in-the-loop aspect. Since my Paperless server runs on an HDD next to the scanner I can actually hear once the file lands which is quite satisfying.
Paperless-ngx + ScanSnap iX1600. Works with a samba share that is very easy to set up in Linux these days. Fast, easy, and you can have different scan profiles to set the destination folder. Push a button for the type and a button to scan. Paperless-ngx automatically files and tags reliably. It is saving me hours per week in filing. Can't recommend it enough. This is a personal system -- not sure how it would scale to 100k - 1M+.
I’ve got an ix500 and I’m suffering for no SMB support.
The only thing that comes to mind is either do a convoluted SnapScan Online -> Google Drive -> rclone -> Paperless or bite the bullet and figure out how to directly scan into the local box via USB.
Paperless is one of my favorite pieces of software. A few years ago I got fed up with my filing cabinet full of folders & tons of documents that didn't quite fit into any of the categories.
I installed Paperless on my home server & spent a night digitizing everything. After being comfortable with it for a few months I went back & shredded all my paper copies. Today my process is similar - when I get a document I would normally toss in that filing cabinet I just scan, upload to Paperless, and shred it. It's also really nice for storing large purchase receipts - I've previously had the writing on thermal paper receipts go invisible after a period of time, no longer an issue.
Searching for something specific is so easy now! Huge QOL improvement. Just make sure you have a solid backup strategy, losing my Paperless database & filestore would be devastating.
Just curiosity... What does "ngx" mean in this context?
To me it means Angular (the web framework). So, I was surprised to learn this wasn't an Angular plugin. Angular is often referred to as ng for short and as such their plugins tend to have ngx as a prefix. For example, the angular wrapper for ChartJS is ngx-chartjs.
Paperless started as "paperless" but the dev stopped work so another dev forked it to "paperless-ng" (for "next generation" I think). That dev, too, stopped work, so "paperless-ngx" was created.
The paperless-ngx's core team focused on gathering a group of people to support it to avoid any burnout problems and keep the project sustainable.
Paperless was a project and then it died, so it got forked to Paperless NG (Next Generation). Paperless NG died off and it got forked again to Paperless NGX.
At least that is my understanding following the Paperless project over the years.
I set up paperless-ngx w/ a scanner attached to my nas and a bit of scripting to get the scan button working a while back, but then forgot about it.
For me, as someone who wants my docs on my own server, but well, doesn't care enough to want to constantly keep up with forks/changes/migration/updates, I've been looking for just something stable I can use for years (or maybe decades?, eg part of the appeal of something like Obisidian is that it just falls back to .md text files).
Curious if there are any long-term active users of this (or other systems) for handling all their paper and what they think about maintainability/longevity?
I had the same concern as you when I started, and after roughly two years of use I’ve been impressed with how minimal the maintenance overhead has been.
So far I’ve probably updated the software ~5 times across various releases, each time I’ve updated it been because there was a new feature I wanted rather than needing to pull in fixes (the software has been bug free for me). The update process is well documented and very straight forward if you are using their docker compose setup to run the application
I have been using paperless for years now. There was the 1 issue a while back when the original maintainer stopped and they had to fork it. But otherwise it's super stable. They keep to semver religiously and all your documents are neatly organised in original format on disk if you ever need them.
I am in the process of getting this running on a Kubernetes cluster in my home. That’s where I throw all self-hosted containerised applications these days. But there’s a bit of friction.
Their entrypoint script makes a lot of assumptions and in their docker-compose example they use a single container running supervisord instead of multiple containers, each with a dedicated purpose (ingestion, consuming, web server). The setup is almost insistent on logging to a file instead of stdout. It also checks and tries to modify permissions of some folders(!!). This requires quite a bit of unpicking.
This is doable, but not frictionless to get it to do what I consider “best practices” but I understand that it’s probably a mix of “easy for someone who’s day job is not to be an infrastructure engineer” and “we were using supervisord for baremetal anyway”. Maybe a lot of it is personal preference but I do feel like the project is not taking containerisation fully to heart. Maybe being more user-friendly in their eyes is more important than being a containerisation purist.
Either way, I’ve got it nearly working with my Brother ADS-1700W, which has shortcuts for me, my wife, and “joint”, which uploads documents to different directories via SFTP which then automatically have their paperless-ngx owner set appropriately.
I finally switched from my ancient Mayan EDMS running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore. I’m not a huge user but I shred everything I can and have around 1000 documents.
I have zero regrets so far. Paperless ngx is so much more user friendly, the automatic date extraction from OCR, the auto tagging and document type classification, and the ease to backup and restore sold me. I highly recommend it.
I recently migrated from another (more "enterprisey") open-source EDMS system that shall remain unnamed to paperless-ngx. Can't praise this high enough. Where the other system needed multiple clicks for the easiest things and had a bunch of UI antifeatures, paperless has a very intuitive and well thought-out UI and handles ~30k documents without issues.
Has any paperless user found a good way to "deskew" scanned pages?
Sometimes, when scanning from my Brother printer through the ADF, the pages are skewed/rotated and it can be pretty jarring.
[+] [-] ydant|2 years ago|reply
https://news.ycombinator.com/item?id=37800951 (183 comments)
I tested it out then and am considering migrating from my current system (Google Drive) to using a self-hosted approach. Paperless seems to have a good approach for minimizing the mental overhead of ingesting and categorizing new documents - which is what ultimately leads me to stacking documents up for months before processing them. My initial pilot run was promising, but I haven't gotten around to switching yet.
From the changelog, it's not really clear to me what's notable about this release, especially as a new/potential user.
This page is a better introduction to the product, although it doesn't mention the v2 release yet:
https://docs.paperless-ngx.com/
[+] [-] andrew_eu|2 years ago|reply
One feature which seems to be quite a nice improvement (speculating as I haven't upgraded yet) is consumption templates [0]. My workflow involves an ADF scanner with an Android application, sharing the scanned PDF with Paperless Share [1] and then it's uploaded to the server via API. It seems that consumption templates will enable adjusting tags/sharing settings/permissions of a document at ingestion time based on where it's ingested from.
[0] https://github.com/paperless-ngx/paperless-ngx/pull/4196
[1] https://github.com/qcasey/paperless_share
[+] [-] ydant|2 years ago|reply
https://github.com/paperless-ngx/paperless-ngx/releases/tag/...
[+] [-] jdoss|2 years ago|reply
I just updated my install to v2.0.0 with a simple podman pull and a systemctl restart of my paperless pod and everything looks great. Hats off to the contributors of the project. Every update, even major ones like this have been really smooth.
1: https://github.com/jdoss/ppngx
[+] [-] cwiggs|2 years ago|reply
I've been thinking of moving from docker-compose to podman, specifically using the [podman-play-kube](https://docs.podman.io/en/v4.2/markdown/podman-play-kube.1.h...) but haven't gotten around to it.
I like Podman has a lot to offer for self-hosters but it isn't popular (yet?)
[+] [-] edward|2 years ago|reply
[+] [-] diarrhea|2 years ago|reply
I do PDF editing offline, on the desktop, then re-upload to paperless. Not the most integrated flow, but much more bulletproof. I want the PDFs themselves to be immutable once on paperless. Only metadata should be editable.
[+] [-] prometheanfire|2 years ago|reply
[+] [-] ndsipa_pomu|2 years ago|reply
[+] [-] el_sinchi|2 years ago|reply
[+] [-] CommanderData|2 years ago|reply
Last I checked it doesn't and had to run a separate service to advertise to the printer the paperless endpoint.
[+] [-] throwaway69123|2 years ago|reply
[+] [-] matrss|2 years ago|reply
Does anyone have a good scanner recommendation though? I am eyeing the Brother ADS-1700W since it seems to be recommended often, but I would really like to use the "scan to webhook" feature (it's 2023 after all) instead of SMTP or whatever else are the options I would have with the Brother.
[+] [-] draugadrotten|2 years ago|reply
I am using iPhone as a scanner and it automatically scans, OCRs, uploads and ingests to the paperless-ngx instance, even remotely using tailscale.
The iPhone camera is more than good enough for scanning documents.
[+] [-] pintxo|2 years ago|reply
[+] [-] tecleandor|2 years ago|reply
I'll clean all the rollers and stuff next week and test it :P
[+] [-] andrew_eu|2 years ago|reply
In that case I scanned to a USB drive attached to the scanner (since each photo was a separate file). For Paperless I use the Epson Smart app, scan the document with whatever settings, remove/rotate pages as needed, and then share it to Paperless with Paperless Share [0].
Many network attached scanners can scan to SMB, no device needed, but I kind of like the human-in-the-loop aspect. Since my Paperless server runs on an HDD next to the scanner I can actually hear once the file lands which is quite satisfying.
[0] https://github.com/qcasey/paperless_share
[+] [-] daveguy|2 years ago|reply
[+] [-] WXLCKNO|2 years ago|reply
[+] [-] xattt|2 years ago|reply
The only thing that comes to mind is either do a convoluted SnapScan Online -> Google Drive -> rclone -> Paperless or bite the bullet and figure out how to directly scan into the local box via USB.
[+] [-] somehnguy|2 years ago|reply
I installed Paperless on my home server & spent a night digitizing everything. After being comfortable with it for a few months I went back & shredded all my paper copies. Today my process is similar - when I get a document I would normally toss in that filing cabinet I just scan, upload to Paperless, and shred it. It's also really nice for storing large purchase receipts - I've previously had the writing on thermal paper receipts go invisible after a period of time, no longer an issue.
Searching for something specific is so easy now! Huge QOL improvement. Just make sure you have a solid backup strategy, losing my Paperless database & filestore would be devastating.
[+] [-] itslennysfault|2 years ago|reply
To me it means Angular (the web framework). So, I was surprised to learn this wasn't an Angular plugin. Angular is often referred to as ng for short and as such their plugins tend to have ngx as a prefix. For example, the angular wrapper for ChartJS is ngx-chartjs.
[+] [-] georgehotelling|2 years ago|reply
The paperless-ngx's core team focused on gathering a group of people to support it to avoid any burnout problems and keep the project sustainable.
[+] [-] ydant|2 years ago|reply
paperless (https://github.com/the-paperless-project/paperless) -> paperless-ng (https://github.com/jonaswinkler/paperless-ng/) -> paperless-ngx (https://github.com/paperless-ngx/paperless-ngx/)
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] __jonas|2 years ago|reply
https://github.com/jonaswinkler/paperless-ng/tree/master/src...
[+] [-] jdoss|2 years ago|reply
At least that is my understanding following the Paperless project over the years.
[+] [-] lhl|2 years ago|reply
For me, as someone who wants my docs on my own server, but well, doesn't care enough to want to constantly keep up with forks/changes/migration/updates, I've been looking for just something stable I can use for years (or maybe decades?, eg part of the appeal of something like Obisidian is that it just falls back to .md text files).
Curious if there are any long-term active users of this (or other systems) for handling all their paper and what they think about maintainability/longevity?
[+] [-] nitsua2|2 years ago|reply
So far I’ve probably updated the software ~5 times across various releases, each time I’ve updated it been because there was a new feature I wanted rather than needing to pull in fixes (the software has been bug free for me). The update process is well documented and very straight forward if you are using their docker compose setup to run the application
[+] [-] Wool2662|2 years ago|reply
[+] [-] sigwinch28|2 years ago|reply
Their entrypoint script makes a lot of assumptions and in their docker-compose example they use a single container running supervisord instead of multiple containers, each with a dedicated purpose (ingestion, consuming, web server). The setup is almost insistent on logging to a file instead of stdout. It also checks and tries to modify permissions of some folders(!!). This requires quite a bit of unpicking.
This is doable, but not frictionless to get it to do what I consider “best practices” but I understand that it’s probably a mix of “easy for someone who’s day job is not to be an infrastructure engineer” and “we were using supervisord for baremetal anyway”. Maybe a lot of it is personal preference but I do feel like the project is not taking containerisation fully to heart. Maybe being more user-friendly in their eyes is more important than being a containerisation purist.
Either way, I’ve got it nearly working with my Brother ADS-1700W, which has shortcuts for me, my wife, and “joint”, which uploads documents to different directories via SFTP which then automatically have their paperless-ngx owner set appropriately.
[+] [-] ornornor|2 years ago|reply
I have zero regrets so far. Paperless ngx is so much more user friendly, the automatic date extraction from OCR, the auto tagging and document type classification, and the ease to backup and restore sold me. I highly recommend it.
[+] [-] justsomehnguy|2 years ago|reply
For years I was eyeing Mayan as one the variants I could use. Not anymore.
[+] [-] rmu09|2 years ago|reply
[+] [-] tobi1449|2 years ago|reply
[+] [-] tyingq|2 years ago|reply
https://github.com/the-paperless-project/paperless/issues/20
I don't know if it made it's way into this fork.
[+] [-] KennyBlanken|2 years ago|reply
[+] [-] cgeier|2 years ago|reply
[+] [-] trallnag|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]