I just dealt with this crap storm of a project last month. We had 40k photos scattered across three laptops, old hard drives, and sd cards. First I just crudely copied all of the folders on to an external hard drive and ran a freeware duplicate remover to clean out about 20% of them. Then I used a python script to go through this giant pile of pics and copy them in to folders by year and month based on the created date. It also added yyyy-mm-dd to the beginning of each file name. Now we are slowly going through month by month and adding simple tags in the file name (event, location, names). It’s far from perfect, but I didn’t want to deal with keeping everything synced in a database or locked in to a certain OS or app, plus it should still be searchable in 15 years when we are all running Windows 30 and Mac OS Ozarks or whatever.
mceachen|6 years ago
Be careful with overwriting your originals. Many years ago I used jpegtran to rotate losslessly, but didn't realize it was removing all the metadata as well.
I added a bunch of heuristics to PhotoStructure to infer missing tags based on sibling files, specifically because I'd borked so many of my own photos.
FWIW, I've tried to make design decisions that will hopefully allow libraries to be very long-lived. PhotoStructure can copy unique (by SHA) originals into a dated subdirectory, and has what may be the most advanced duplicate image detection around (just added in the newest version). Your library is cross-platform (for example, stored on your NAS, created on your mac, then opened on your Windows box, and everything just works). The sqlite database is a straightforward schema.
jdmcnugent|6 years ago
rajesh-s|6 years ago