For me the problem is stability and future-proofness. Technology changes very quickly. If the maintainer loses interest, the software may rot away as the dependencies change, etc.
Important documents often need to be stored for 5-10-20 years. Why put everything in this shiny new software, when it may change in 1 or 2 years?
I think it's best to just put scanned pdfs in folders based on year and topic. Those can be easily and transparently backed up and searched.
But on a few months timescale this software could be useful.
Funnily enough, I use a commercial app with the same name (Paperless) that does exactly that. It scans the documents, applies OCR and saves the pdfs in a folder that, in my case, is automatically synced with dropbox and backed up to a local NAS.
It doesn't have search functionality (well, it does, but it's basically useless) but allows to set categories and tags, which is more than enough for me.
There's an added issue with this kind of solutions, in most cases you still need to keep the original. Having them scanned is great for record keeping and for communicating with you own accountant, but if there is a problem (tax audit, proving ownership, etc, etc) you'll have to produce the paper original.
That's why I archive my document scans as one bit per pixel PNGs. It ends up being 20KB-50KB per page at 150 PPI. I figure that there will always be a way to get the pixels out of a PNG. PDF is a more complex and dynamic standard.
Those can be easily and transparently backed up and searched.
I was recently asked if this is easy to do on Windows, especially the search part. What solution would you propose to someone who wants to index many PDF files already in such a folder structure?
I have been doing something similar for a couple of years.
My printer can scan to a shared drive on my home LAN, saving files as PDFs. These are then uploaded Google Drive where everything else happens automatically (e.g. if you search for something, it will find it in scanned PDFs automatically).
Its super-useful especially since the mobile clients for drive is rock solid. I can be on the phone to someone and pull up basically any document I've had since the 90s in a couple of seconds, for free. Its kinda fun being on the phone to a call centre and being able to pull up data quicker than they can. Tax returns are an absolute doddle when everything is paperless.
The only thing that is missing for me from Google Drive is like a "Knowledge Graph" for my own documents - I can search by keyword or filename etc sure, but I'd like to get some "intelligence" next like we're used to with Google Now, but for my scanned docs, like "show me my bank statements with a payment to Amazon in the last 3 months" etc.
If you don't want to buy a document scanner, just use your mobile phone for this.
I personally use Scanbot for this, it automatically recognizes, crops and OCRs documents (on the device) and stores them as PDF with the extracted text in the location of your choosing. Works well enough.
I use google docs for it. You can upload scanned documents to Google docs. Documents are automatically OCRed, you can search by keywords and you can still access the original image.
Disclaimer: I work at google, although not on the Google docs team.
Last time I checked it's much cheaper to get document scanner with ADF built together with a laser printer than to buy one standalone. I was quite surprised.
I can't really back this up with empirical evidence but in my experience the ADF's on consumer all in one's tend to be a bit crap compared to getting a standalone one.
Nice combination of technologies to solve a problem -- could be very useful for a business that needs to be able to archive and access paper records.
But for a household -- there are very few documents you need to keep long term. Better to just keep those in a fireproof file box, and shred and discard everything else rather than devote any resources or mental energy to keeping them around in either paper or digital form.
I bought a high-speed scanner with OCR a few years ago. MacOS automatically indexes PDFs, so I can easily search through my scanned documents in Finder.
A magic folder system, like Dropbox or Syncplicity, makes sure that the pdfs are safely backed up for me.
You can use Docady's scanner that also does OCR and recognizes its content. It then stores your documents and encrypts them. At the moment it's available on iOS, but should be available soon in Android too.
There's lots I don't like about Acrobat X (and now DC), but ClearScan is an awesome format for scanning and retaining PDF documents. I wish (though don't expect) Adobe would open source it.
It seems somewhat ironic to me that someone built this whole paper to ocr system, and then says "hey use it with a scanner like X", which has OCR capabilities (producing searchable PDFs) built in.
OP, great job. I have been trying to solve this very same problem for over an year now, and have a business plan based on the same. Is there a way I can pm you and get some clarifications. Thanks.
Take a picture with your stock camera, or use an app that willcrop, apply OCR and convert your image to a document format? The former lacks features without relying on another program. The latter is only good for people who infrequently need something scanned. Most people can probably get by using a phone app, but this is for people with lots of paper documents.
[+] [-] bonoboTP|10 years ago|reply
Important documents often need to be stored for 5-10-20 years. Why put everything in this shiny new software, when it may change in 1 or 2 years?
I think it's best to just put scanned pdfs in folders based on year and topic. Those can be easily and transparently backed up and searched.
But on a few months timescale this software could be useful.
[+] [-] loopbit|10 years ago|reply
It doesn't have search functionality (well, it does, but it's basically useless) but allows to set categories and tags, which is more than enough for me.
There's an added issue with this kind of solutions, in most cases you still need to keep the original. Having them scanned is great for record keeping and for communicating with you own accountant, but if there is a problem (tax audit, proving ownership, etc, etc) you'll have to produce the paper original.
[+] [-] upofadown|10 years ago|reply
[+] [-] lucaspiller|10 years ago|reply
[+] [-] chm|10 years ago|reply
[+] [-] mattdlondon|10 years ago|reply
My printer can scan to a shared drive on my home LAN, saving files as PDFs. These are then uploaded Google Drive where everything else happens automatically (e.g. if you search for something, it will find it in scanned PDFs automatically).
Its super-useful especially since the mobile clients for drive is rock solid. I can be on the phone to someone and pull up basically any document I've had since the 90s in a couple of seconds, for free. Its kinda fun being on the phone to a call centre and being able to pull up data quicker than they can. Tax returns are an absolute doddle when everything is paperless.
The only thing that is missing for me from Google Drive is like a "Knowledge Graph" for my own documents - I can search by keyword or filename etc sure, but I'd like to get some "intelligence" next like we're used to with Google Now, but for my scanned docs, like "show me my bank statements with a payment to Amazon in the last 3 months" etc.
[+] [-] stephenr|10 years ago|reply
/facepalm
[+] [-] tmaly|10 years ago|reply
[+] [-] cstuder|10 years ago|reply
I personally use Scanbot for this, it automatically recognizes, crops and OCRs documents (on the device) and stores them as PDF with the extracted text in the location of your choosing. Works well enough.
[+] [-] jkmcf|10 years ago|reply
Scannable works really fast and Evernote indexes PDFs.
If only Evernote's editor didn't make me want to switch away every time I use it...
[+] [-] kozikow|10 years ago|reply
Disclaimer: I work at google, although not on the Google docs team.
[+] [-] leni536|10 years ago|reply
[+] [-] thenipper|10 years ago|reply
[+] [-] ams6110|10 years ago|reply
But for a household -- there are very few documents you need to keep long term. Better to just keep those in a fireproof file box, and shred and discard everything else rather than devote any resources or mental energy to keeping them around in either paper or digital form.
[+] [-] payne92|10 years ago|reply
I disagree. While I'm a huge fan of purging, there are many, many cases where you need/want documents.
Theft/fire/casualty: old receipts prove ownership and value.
Maintenance: who worked on the furnace 4 yrs ago?
Warranty: our windows have a 20 year warranty (and we're using it!)
Basis for home improvements: when you sell your home, if you can document improvements, you can raise your basis and lower your capital gains.
Repair: where's the part number & diagram for the faucet that's leaking?
School records for your children.
Etc.
[+] [-] DannoHung|10 years ago|reply
[+] [-] zellyn|10 years ago|reply
[+] [-] epaulson|10 years ago|reply
https://camlistore-review.googlesource.com/#/c/5416/
[+] [-] gwbas1c|10 years ago|reply
A magic folder system, like Dropbox or Syncplicity, makes sure that the pdfs are safely backed up for me.
[+] [-] avirambm|10 years ago|reply
Demo: https://www.youtube.com/watch?v=cN_Zw6xoUaw
App: https://itunes.apple.com/US/app/id921250909?mt=8
(Full Disclosure: I work at Docady and part of its team)
[+] [-] petemc_|10 years ago|reply
[+] [-] atourgates|10 years ago|reply
[+] [-] pbhjpbhj|10 years ago|reply
[+] [-] stephenr|10 years ago|reply
[+] [-] mayoff|10 years ago|reply
[+] [-] ictaot|10 years ago|reply
[+] [-] Chris2048|10 years ago|reply
I have a HP envy I'd like to glue to the cloud.
[+] [-] hendry|10 years ago|reply
[+] [-] ddd1600|10 years ago|reply
[deleted]
[+] [-] nickthemagicman|10 years ago|reply
[+] [-] noxToken|10 years ago|reply