Perkeep – Open-source data modeling, storing, search, sharing and synchronizing

[+] mastax|8 years ago|reply

Since people are confused about what this is I'll write a summary (from old memory so it's probably 80% correct)

It is a consumer-oriented storage system that is:

- Content addressable

- Indexed

- Tag-oriented (vs. hierarchical)

- Permissions, encryption, compression, sharing, etc.

- Spans storage across machines and clouds

- FUSE mountable

- Has CLI and Web interfaces built-in

The intent is to be a personal data dumpster that you can throw all of your files and other data (tweets, etc.) into for search and backup.

The website could be better organized to convey this information quickly.

[+] bradfitz|8 years ago|reply

Camlistore (renamed to Perkeep) author here.

It is true that the website needs some love & updated docs. We've been working on Camlistore for 8 years now (with a few drier spells) but our focus has never been marketing. If anything, we didn't want too many non-nerd users for a number of years because it wasn't ready for non-developer usage. That's starting to change.

We have pretty good docs for configuration and such, but we lack some concise high-level text about what the project is and why.

I'll prioritize that.

[+] nayuki|8 years ago|reply

Some previous threads on Camlistore/Perkeep:

* [2014 Jun] https://news.ycombinator.com/item?id=7842629

* [2011 Jan] https://news.ycombinator.com/item?id=2156374

[+] euske|8 years ago|reply

The thing is that nothing is good enough for keeping it for lifetime. A hardware might be broken, a supply might be discontinued and a software maintainer might disappear. You'll need to keep refreshing the data from one device to another, for the rest of your life. That said, I'm curious how easy this system can handle porting from one device or service to another, in varying formats and architectures. The only way to stay relevant is to constantly keep changing/adapting to new things.

[+] bradfitz|8 years ago|reply

A huge focus of the project is on human-readable schemas and formats. Even if all specs & source code of the project is lost, the data should still be recoverable from a curious archaeologist.

Between replicating between several companies as well as your own hardware & having friends & family mirror your stuff (encrypted or not), the ideas is that some copies will continue to exist.

Hardware failures are a given. Companies failing and friends & family dying is also a given. Natural disasters too. The only option seems to be trusting nothing and replicating all your data to lots of places, in future-friendly formats, and that's what Perkeep aims to do. And then a ton of tooling on top of that.

[+] davidbanham|8 years ago|reply

Looks like there's been some nice progress since I last looked at Camlistore! The importers from cloud services like Twitter look really interesting.

[+] natural219|8 years ago|reply

Camlistore & Brad Fitzpatrick's original writings are what initially got me into decentralized web advocacy. Since then, I've moved on from this project, since it seems to move at a very slow place and the authors do not seem very interested in widespread user adoption.

With this name change, I'm slightly more interested again. We'll have to see in the coming months whether they become ready to displace actual large social media platforms or whether it remains a toy project.

[+] nerdponx|8 years ago|reply

How does this work?

[+] jamestomasino|8 years ago|reply

That was my first question, too. After clicking through a few links and even opening up an intro presentation I was left unsatisfied and closed the tab. This project desperately needs an FAQ or overview video up-front.

[+] emmelaich|8 years ago|reply

It's content addressable storage - as used by git and plan9's fossil/venti.

https://perkeep.org/doc/prior-art

https://en.wikipedia.org/wiki/Fossil_(file_system)

[+] unknown|8 years ago|reply

[deleted]

[+] random3|8 years ago|reply

https://perkeep.org/doc/overview

[+] linsomniac|8 years ago|reply

I've been watching Camlistore for a few years. I peek in on it every once in a while, long enough between that I usually can't remember the name. I like the look of it, but haven't been convinced to go from my decade old ZFS setup to Camlistore.

I feel like OwnCloud is more compelling, from a glance. Anyone use one or both and able to comment?

[+] bradfitz|8 years ago|reply

Camlistore author here.

If you only store files, sure, use ZFS.

Perkeep (Camlistore) doesn't write to a block device. It has storage backends for a filesystem (which can be ZFS) and any number of cloud object storage providers (S3, GCS, etc).

Perkeep's main value over a fancy POSIX filesystem is storing nameless things (tweets, other social media content + interactions, bookmarks) in common schemas, and permitting search over it all, and then having a variety of ways to browse it (CLI, FUSE, API, web UI, etc).

It's also good at sync to & from things any which way without merge conflicts.

[+] _m8fo|8 years ago|reply

How is this any better than just burning your data to a blu-ray, which lasts centuries when stored under proper conditions (theoretically, anyway) I need to give this a closer look.

[+] gf263|8 years ago|reply

This is such a classic hacker news comment

[+] ams6110|8 years ago|reply

Not having to worry if there will be any Blu-Ray readers available in a century.

[+] milcron|8 years ago|reply

M-DISC is even better. Burnable discs use an organic dye which oxidizes over time. M-DISC uses a "glassy carbon" layer that is inert to oxidation.

They adhere to DVD-R, BD-R, and BD-XL standards so it's readable in standard disc drives. You need a special drive to burn them, however (requires a high-power laser).

[+] davidbanham|8 years ago|reply

It's different (better?) in that it doesn't rely on you remembering to actually burn that data, then store it safely. It comes with an app you can run on your phone to upload all your photos immediately, for instance. It has importers to archive all your tweets automatically, for example. It allows you to outsource the task of "Keep this blu-ray safe" to a cloud provider (or a friend) while encrypting your data to keep it private.

[+] stevekemp|8 years ago|reply

I've been keeping an eye on this project for years, because it seems well-designed, and the authors are very capable developers.

The biggest problem I found was getting documentation on replication. Having two+ servers mirror-each other, across the internet, seems like a good idea given that otherwise you have a single point of failure as you import all your media/files.

[+] teddyh|8 years ago|reply

I’d be interested in a system for converting existing stuff from, for example, the Firefox “ScrapBook” plugin, to this format. (The ScrapBook plugin is not compatible with Firefox 57’s plugin API, so anyone who upgrades to Firefox 57 immediately loses all their saved ScrapBook pages.)

[+] sp332|8 years ago|reply

I have no idea how compatible this is, but someone is working on a new version. https://addons.mozilla.org/en-US/firefox/addon/scrapbookq/

[+] andrepd|8 years ago|reply

The perfect tool for a digital hoarder like myself. Will follow this with attention.

[+] didibus|8 years ago|reply

So, its just a document server that can be run over multiple computers? I was expecting something peer to peer. If I understand correctly, you can think of this as a dropbox that you can self host?

[+] kindfellow92|8 years ago|reply

What is the target audience of this? What are the intended use cases?

Is this supposed to be used directly by users or as an API for a user-facing application? How is this different from a document DB like MongoDB?

[+] flarg|8 years ago|reply

Long time follower of the project here... So far it's been aimed at geeks who want to archive their content from the cloud, eg tweets, but it also stores files. Because of the way it is designed I've always thought there is a compelling use case for its use as a file and object store for organizations where auditing of data records is expected and sharing of data is a requirement.

[+] brotherjerky|8 years ago|reply

So is this ready for prime time yet? I used to follow camlistore, and it was still a little rough even for CLI nerds.

[+] gh02t|8 years ago|reply

So I just downloaded it and played around and as far as I can tell there is no way to delete files. Or, more specifically there is a way but it's not implemented or otherwise accessible as far as I can figure from the rather sparse documentation.

If someone would like to explain to me how (if?) the garbage collection works I'd appreciate it, because I like the concept and kinda want to use this, but deleting stuff is a rather important feature for me. All I could find searching was a post by the devs saying it was already mostly implemented but not finished and not a priority...

https://github.com/camlistore/camlistore/issues/792

Like, I understand that this is a spare time project (I think) but not considering deleting/pruning files to be an important feature is really confusing to me. In its current state, if I accidentally upload the wrong file, am I now stuck with it forever?

Edit: ok I figured out how to at least delete things in the UI (clicking the check mark opens a side menu apparently, `camput delete` doesn't seem to do anything), but as far as I can tell it doesn't actually delete them from the database without running a garbage collect, which isn't implemented so it just hangs around in purgatory.

[+] unknown|8 years ago|reply

[deleted]

[+] j7ake|8 years ago|reply

Is this possibly a Dropbox replacement ? do I have to host the files on my own server ?

[+] tradersam|8 years ago|reply

Alternatively: "Hard-drives let you permanently keep your stuff, for life"

[+] melq|8 years ago|reply

Hard drives are an especially bad choice for lifetime reasons, and SSDs don't solve the problem either :P

[+] milcron|8 years ago|reply

Check out M-DISC https://en.wikipedia.org/wiki/M-DISC

[+] passwordqq2|8 years ago|reply

Question if anybody gets to this: I'm taking a break from work and computers for a year. How would you guys suggest I store my kbdx data securely In a failsafe manner without worrying about forgetting passwords or losing paper chits or USB keys?

Edit: after seeing some good suggestions about physical storage, I've decided to increase the difficulty of the question, hard mode- How would you do this without physical stuff? (more, new answers about physical welcome too)

[+] jacquesm|8 years ago|reply

For something on the timescale of a year I would just keep the system that you already have up and running. It it were much longer than that I'd go with a bank vault that contains the access keys and something like tarsnap and yet another backup with another cloud provider.

[+] Dylan16807|8 years ago|reply

> Edit: after seeing some good suggestions about physical storage, I've decided to increase the difficulty of the question, hard mode- How would you do this without physical stuff? (more, new answers about physical welcome too)

Store one copy in a gmail account, and another on imgur.

> assuming [...] memory goes away. (to be safe)

And tattoo the site+username+pass on your thigh.

[+] quickthrower2|8 years ago|reply

I wonder if a system like this would be good for your general problem:

Generate a random seed sentence of so many words. From the secret seed + site domain name generate a password

Store piece of paper with:

Algorithm (could be public in github too) Seed word Site names

[+] simcop2387|8 years ago|reply

For a year? a burned CD in a safe deposit box. Also a USB key there for convenience. Basically paying for physical security of the devices/data.

[+] zyxzkz|8 years ago|reply

I was gonna say, this sounds like Camlistore.

[+] DiThi|8 years ago|reply

Because it is! (edit: oh I see it's in the header)

105 comments