Local-first software: you own your data, in spite of the cloud

[+] herf|6 years ago|reply

I spent a lot of time with photos (Picasa) trying to do this peer to peer - this is what we built in the 2002 era. Here are a few issues:

1. Identity is hard to do on the LAN, so any sharing ends up using the cloud to figure out who has access. Similarly, identity is hard to move around, so representing your Facebook comments feed outside Facebook is difficult to do.

2. Any time you have a "special" server that handles identity, merging, or any other task, it ends up in the effective role of master, even if the rest of the parts of your system are peers. You want all your collaborations in Dropbox to survive their infrastructure being offline? It's tough to do.

3. p2p stalled a bit in the mid-2000s when storage and bandwidth got much cheaper--in a period of just two years (2002-2004), it became 100x cheaper to run a cloud service. But what continued to stall p2p was mobile. Uploading and syncing needs to run in the background, and if you're on a limited bandwidth client or a battery-limited device like iOS, sync can effectively diverge for months because the applications can't run in the background. So changes you "thought" you made don't show up on other devices.

4. For avoiding mass surveillance, what we are missing from this older time is the ability to make point to point connections (between peers) and encrypt them with forward secrecy, without data at rest in the cloud. Even systems that try to do some encryption for data at rest (e.g., iMessage) keep keys essentially forever, so data can be decrypted if you recover a private key later on. A system that only makes direct connections between peers does not have this issue.

5. Anytime you have multiple peers, you have to worry about old versions (bugs), or even attackers in the network, so it's fundamentally harder than having a shared server that's always up to date and running the latest code.

[+] masukomi|6 years ago|reply

SSB (Secure Scuttlebutt)[1] has solved all of these problems except the encryption of data using the same key forever.

The apps all currently use one identity per device but it's not actually hard to use the same identity on two devices and multiple apps share the same identity commonly.

Old versions could of software could be a concern but really that's what Versioned APIs are for. It's a solved problem.

The only notable problem with local first dev that I'm aware of currently is storage requirements being confounded by the limited size of SSDs on most people's computers.

[1]: https://scuttlebutt.nz/

[+] koheripbal|6 years ago|reply

With regards to identity management, maybe there should be a formalized integration between browsers and password managers such that the concept of "registration" goes away and new logins just automatically create user accounts with default permissions, according to email address.

[+] 3xblah|6 years ago|reply

"I spent a lot of time with photos (Picasa) trying to do peer to peer..."

As I recall, one of Google's many "failed" projects was a means for transferring large numbers of photos person to person called "HELLO". I reckon this existed for a short time post-2004, then AFAIK disappeared from public view (or morphed into something else). I could be remembering the details incorrectly.

[+] mceachen|6 years ago|reply

Thanks for taking the time to share this!

The issue I face with PhotoStructure is that people's home network frequently has throttled upstream rates. I'd love to provide a caching CDN, but I want all content encrypted. I don't want my pipes to see any data from my users.

How would you do perfect forward secrecy when only the library owner's key is available at upload time? Is it possible? If not, it seems that every bit of shared content would have to be re-encrypted and re-uploaded when someone new is granted access to that content.

[+] nihonde|6 years ago|reply

This is fantastic first-hand feedback. Thanks for sharing this.

[+] jka|6 years ago|reply

As far as possible, I'm following a local-first methodology for a recipe search, meal planner, and shopping list application:

https://www.reciperadar.com

There's a 'collaboration' mode which allows peer-to-peer sharing of a session via CRDTs over IPFS. My partner and I select our meals for the week, and then when one of us is doing the shopping, we can mark ingredients as found -- the other person's view reflects those updates in near-real-time.

If either of us lose connectivity, we can continue to use the app, and when data access is restored, those changes are synced with the shared session (with automatic conflict resolution). All data in the shared session is encrypted, and the collaboration link contains the keys.

Much of this functionality is thanks to peer-base, which is an experimental but extremely useful library:

https://github.com/peer-base/peer-base

A side-benefit of this approach is that all user data can be stored locally (in browser localStorage) - there are no cookies used in communication between the app and server.

[+] jmathai|6 years ago|reply

I've been trying to document my "local-first" approach to managing photos. I've made it a ways through but am not sure when I'll finish. Posting here since it is relevant.

A Pragmatic Photo Archiving Solution: https://docs.google.com/document/d/1JzqT-DJFlS2e8ZC00HrsQITq...

It's the culmination of software I've written [1] + a workflow that's resulted from it [2, 3, 4, 5].

[1] Elodie - https://github.com/jmathai/elodie

[2] Understanding my need for an automated photo workflow - https://medium.com/vantage/understanding-my-need-for-an-auto...

[3] Introducing Elodie; Your Personal EXIF-based Photo and Video Assistant - https://medium.com/vantage/understanding-my-need-for-an-auto...

[4] My Automated Photo Workflow using Google Photos and Elodie - https://medium.com/@jmathai/my-automated-photo-workflow-usin...

[5] One Year of Using an Automated Photo Organization and Archiving Workflow - https://artplusmarketing.com/one-year-of-using-an-automated-...

[+] neLrivVK|6 years ago|reply

Just read through your google doc, interesting! But what about additional family members, with their own cameras, and no interest in any clever workflow activities :) I'm currently using Google Photos as my main service, and it's working good enough for now: each family member has the Google Photos app which uploads pics automatically, to their own account. We all share our Google Photos with each other. This way I (as main curator) have access to everyone's pics, without anyone having to do anything. Google lets you store the original size pics, so that is great (not like iCloud that resizes all pics!). Google also adds face recognition, which is very practical, and also provides a good interface for everyone to view the pictures. Regarding safekeeping: I use the Google Drive interface to backup all my photos to my local linux storage (combination of rsync and https://github.com/astrada/google-drive-ocamlfuse to mount Google Drive). This way I always have all original photos locally. Finally I backup everything offsite using BackBlaze.

All this relies heavily on Google Photos, but I have my own local backup of all original files. So if I need to change service, it should just be a one-time effort to migrate.

[+] geolgau|6 years ago|reply

This looks great and feels a lot like beets [1] for music, only that they use a database. I'll try it when I have the time to re-organize ~20 years of photos.

[1] http://beets.io/

[+] emilburzo|6 years ago|reply

elodie looks amazing! I definitely need to try it out

it was just the thing I was thinking of building recently as it was getting really tiring to manually organize photos

one extra idea that I had: there's a cool project that would enable offline geocoding[1], which would help get rid of API limits while making the reverse geocode queries almost instant

(the included dataset is pretty limited, but it's not hard to extend from an openstreetmap planet dump)

[1] https://pypi.org/project/reverse_geocoder/

[+] atoav|6 years ago|reply

When I select software these are among the list of things I am looking for generally:

- file formats that won’t lock you in or are even openly hackable (allows you to automate things)

- no clouds that will break the software once it is gone

- local storage with custom syncing or backup options

- strictly no weird data collection or “We own the rights to your data”-Type of terms

So if I get the slightest feeling of a lock in or unnecessary data collection you are scaring me away, because mentally I would then already look at the time after you decide to scrap your cloud or abandon your file formats. The data collection bit shows me your users aren’t front and center but something else is which makes your product even less of a good choice.

If your product runs on the web, allowing for self-hosted solutions is also a big plus.

[+] jwr|6 years ago|reply

While I fully agree with your selection criteria, please consider the other side of the equation, because engineering (and the world) is all about compromises.

I am the author of a SaaS app (https://partsbox.io/). I export in open formats (JSON), there is no lock-in, it's easy to get all of your data at any time. But the app is online and will remain so. Why? Economics. Maintaining a self-hosted solution is an enormous cost, which most people forget about. You need to create, document, maintain and support an entirely different version of your software (single-user, no billing/invoicing, different email sending, different rights system, etc). And then every time you change something in your software you have to worry about migrations, not just in your database, but in all your clients databases.

I am not saying it's impossible, it's just expensive, especially for companies which are built to be sustainable in the first place (e.g. not VC-funded). Believe me, if you don't have VC money to burn, you will not be experimenting with CRDTs and synchronizing distributed data from a multitude of versions of your application.

I regularly have to explain why there is no on-premises version of my app. The best part is that many people think that an on-premises version should be less expensive than the online version, and come without a subscription.

[+] mbalex99|6 years ago|reply

Martin Kleppmann is a major inspiration for our startup, Ditto.

We take the local-first concept and p2p to the next level with CRDTs and replication. But what we really do is leverage things like AWDL, mDNS, and or Bluetooth low energy to sync mobile databases instances with each other even without internet connectivity. www.ditto.live

Check it out in action!

https://youtu.be/1P2bKEJjdec https://youtu.be/ITUrk_rjnvo

We found that CRDTs, local first, and eventual consistency REALLY shines in the mobile phones since they constantly experience network partitions.

[+] Jyaif|6 years ago|reply

Very interesting. Will the upcoming server support be end-to-end encrypted? In other words, will the server be able to read the data?

[+] mamcx|6 years ago|reply

I made "traditional" enterprise-style-apps for small business and have tried to crack the sync of data several times.

Exist a resource in how leverage that tech with boring stuff like inventory, invoices, etc? Hopefully,. without a total change of stacks (I use postgresql, sqlite as dbs, and need to integrate with near 12+ different db engines)

[+] radium3d|6 years ago|reply

https://github.com/syncthing/syncthing

Syncthing solves a large part of syncing data between devices using your own VPS, server(s), etc. If your VPS provider goes out of business, you can then just fire up a new VPS and hook it back up to your local machine(s).

[+] Fnoord|6 years ago|reply

Cryptomator [1]. Cross-platform, allows you to encrypt your data in the cloud, and access it transparently.

Thing is, like Syncthing, it lacks a collaborative feature. Nextcloud has it, but only if you have the Nextcloud accessible (I want to host only on LAN). Something like IPFS (or Tor) is a solution to such problem.

[1] https://cryptomator.org/

[+] mackrevinack|6 years ago|reply

there's also resilio sync that gives you the option have the node/folder on your vps be encrypted. hopefully syncthing will add that feature at some point in the future

[+] josteink|6 years ago|reply

Sadly Syncting still lacks iOS-support. As a result I'm running Nextcloud instead.

[+] daleharvey|6 years ago|reply

I may be biased as the maintainer of PouchDB but you can do all this today (and for the last 5+ years) with PouchDB.

The comment about CouchDB and the "difficulty of getting application-level conflict resolution right" I am not really certain how it applies, You dont have to handle conflicts in Pouch/CouchDB if you dont want to, there is a default model of last write (actually most edits) wins but you can handle them if needed

[+] adamwiggins|6 years ago|reply

Hi Dale, I'm one of the local-first paper coauthors. I'm a fan of PouchDB (thanks for that) and the whole CouchDB lineage--the CouchDB book[1] was an early inspiration in my exploration of next-gen storage and collab models.

I've been down the CouchDB/PouchDB path several times with several different engineering teams. Every time we were hopeful and every time we just couldn't get it to work.

As one example, I worked with a small team of engineers to implement CouchDB syncing for the Clue Android and iOS mobile apps a few years back. Some of my experience is written down here[2]. After investing many months of engineering time, including some onsite help from Jan Lehnardt[3] we abandoned this architecture and went with a classic HTTPS/REST API.

Other times and with different teams we've tried variations of Couch including PouchDB with web stack technology including Electron and PWA-ish HTML apps. None of these panned out either. Wish I could give better insights on why--we just can't get it to be reliable, or fast, or find a good way to share data between two users (the collaboration thing is kind of the whole point).

[1]: https://guide.couchdb.org/

[2]: https://medium.com/wandering-cto/mobile-syncing-with-couchba...

[3]: https://neighbourhood.ie/couchdb-support/

[+] knubie|6 years ago|reply

I've been using a combo of PouchDB/CouchDB in my app[0] for the past few months, and I find it a hard combo to beat at the moment. I just haven't found anything else that works as seamlessly. While going through the article I found I was able to tick most of the boxes thanks to PouchDB.

[0] https://mochi.cards/

[+] unknown|6 years ago|reply

[deleted]

[+] have_faith|6 years ago|reply

> you OWN YOUR data

Like most people here I'm fairly hard line when it comes to personal data abuses but I still struggle with the concepts of owning data about yourself. It's a confusion I see amongst less technically literate people when a well meaning person explains to them the importance of some latest data breach and they try to understand the concept that they owned this data, it was theirs but now it has been "stolen" or abused in some way.

I would consider going as far to say that framing the data as owned by you is a bad approach, but maybe I'm just being pedantic about the language. Company A does have data about me, but I don't own it, and they have responsibilities to protect it (or delete it if requested), but I don't see any ownership in the equation, especially when the nature of the data can become quite abstract while still maintaining some reference to you.

Not to take away from the intention or sentiment of framing it that way though, I'm just musing.

[+] ricg|6 years ago|reply

"It should support data access for all time." - This is key for me after I had to convert my notes more than once between formats after the original app(s) went into extinction (beloved Circus Ponies Notebook).

That's why I'm designing any new apps around a file format that can be accessed even without the app.

I have a "local-first" Kanban/Trello-style app, "Boards" (http://kitestack.com/boards/), that uses zipped HTML files (to support rich text with images). No collaboration and cross-device support just yet, but it works without a network and saves everything locally.

[+] milansuk|6 years ago|reply

>It should be fast. We don’t want to make round-trips to a server to interact with the application.

The cloud apps are not slow only because of moving data, but there is also a problem that an average server is fast(16cores CPU + 64GB RAM), but If it's used by let's say 100users, It means one user has only 0.16core + 0.64GB memory. So an average laptop(4cores/4GB) or phone(4cores/1GB) is way faster. Basically people buy billions of transistors to use them only as a terminal to the cloud. Not to mention privacy risks.

A week ago, I did showHN for skyalt.com. It's a local accessible database(+ analytics, which is coming soon). I'm still blown away how fast it is, that you can put tens of millions of rows to single table with many columns on consumer hardware and you don't pay for scale or attachments.

[+] ksml|6 years ago|reply

> If it's used by let's say 100users, It means one user has only 0.16core + 0.64GB memory. So an average laptop(4cores/4GB) or phone(4cores/1GB) is way faster.

This is overly simplistic. You're pretending that cores/memory are "allocated" to users, but really, a user might only make a few tens of requests, and the server only needs to spend a second or two servicing each request. On a server with only 100 users, it could very well be the case that a user has all 16 cores + 64gb available at the time they make a request. Also, as another commenter pointed out, you could use a large chunk of that memory for shared resources, and then each request might only need a few mb of memory to service a request.

[+] jdnenej|6 years ago|reply

That's not a fair comparison since most of the memory usage is just loading the app in to memory and then everyone is sharing the same app already loaded. Web apps don't have to be nearly as slow as they are. It's just that it's easier to make a slow app than a fast one. Also desktop apps are becoming super slow and bloated now thanks to electron.

[+] gobengo|6 years ago|reply

The blog post doesn't mention this, so I thought I'd point this out. One of the paper's authors is Martin Kleppmann, who wrote the very good https://dataintensive.net/ book.

[+] ignoramous|6 years ago|reply

Kleppmann co-created/major contributor to Apache Kafka along with Jay Kreps and Neha Narkhede. He also co-founded Rapportive, a YC company acquired by LinkedIn, along with Rahul Vohra, who is presently the CEO of Superhuman.

[+] ninkendo|6 years ago|reply

Nearly all of Apple’s first party software works this way.

Notes, Reminders, Pages, Numbers, Keynote, Voice memos, etc

All using iCloud APIs to synchronize what is essentially local-first software.

You could even count Mail, Contacts, and Calendar, although they rely on more established protocols to sync.

[+] davecap1|6 years ago|reply

Apple Health and Activity also seem to work this way, although they also sync to iCloud.

[+] TAForObvReasons|6 years ago|reply

Today's SaaS world is largely economically opposed to the idea of data ownership. It's a lot easier to make money by renting people access to their data.

The problem is not inherently technical. The solution must address the fact that the software businesses favor cloud solutions and other systems that make it difficult to stop spending money

[+] brynb|6 years ago|reply

I've been working for a few months on a database called Redwood that's intended to make it easier to build this kind of software. Having spent much of the past couple of years working with libp2p, IPFS, Dat, and similar technologies, I was curious to see what would result if I started from the ground up.

https://github.com/brynbellomy/redwood

So far, the model seems promising. It's fully peer-to-peer, and supports decentralized identity, configurable conflict resolution, read/write access, asset storage, and currently is running across 3 different transports:

- HTTP (augmented with extensions proposed by the Braid project [1][2])

- Libp2p

- WebRTC

I've included two simple demos, a collaborative document editor (well, it's just a textarea at the moment), and a chat app. Would appreciate any feedback or participation that folks are willing to give.

[1] https://github.com/braid-work/braid-spec

[2] https://groups.google.com/forum/#!forum/braid-http

[+] krzepah|6 years ago|reply

Hi everyone, I've been working on this subject for a few months already ;

Thank you OP, your work is wonderful to read and even though I've spent a few months on the idea already I haven't thought of reusing Dropbox or similar. I think exciting things are about to come :)

I'd like to submit Working Group proposal to the IETF.

Why would we need an RPC for Independent Apps?

Independent Apps are surfacing as being a solution to the lack of control of our own data. oAuth Framework has allowed a more secure web, but even if it makes a difference between an identity provider and a resource host, it does confuse the resource and the service hosts.

Independent Apps should NOT be claimed by a lone company, let's make it something that the web owns.

How would it be structured?

I personally believe there should be multiple subjects treated by the IWA Framework, as one being the qualities of independent apps, and second being how data is accessed. Both of these are currently Topics of Interest for the IETF : https://ietf.org/topics/ - However the way this Working Group would proceed should be discussed and decided by it's members.

Why not submit a single person draft?

I could propose a draft but it wouldn't have the same meaning as if it would be drafted by a Working Group. As individuals, we are motivated by our own agenda and the quality of said draft wouldn't be the same. I'm volunteering, but I'd like to allow other persons to join in as well.

You can join your mails here https://forms.gle/igNdd6rH4MnPK8rb8 , at December 6 I will send the Working Group proposal to the IETF with gathered persons, if accepted I believe it should remain open to anybody to join.

[+] brynb|6 years ago|reply

You should check out the Braid project. We're already working on IETF drafts for a protocol of this nature within the HTTP working group.

- https://github.com/braid-work/braid-spec

- https://groups.google.com/forum/#!forum/braid-http

[+] thawaway1837|6 years ago|reply

Isn’t Office 365 the platonic ideal of a local first software (suite) by this definition?

High quality desktop apps, data saved in discrete documented file formats, optional ability to save in the cloud, the presence of collaborative editing, privacy is protected if you’re using it locally only, etc.

[+] mwilcox|6 years ago|reply

any marginally successful "local-first" app is going to go and raise $10m in vc, switch to software as a service, and add an enterprise mode that requires user permissions and data access to be managed on the server

[+] 0xCMP|6 years ago|reply

I don't see what's wrong with that. Local First really just means distributed, fault tolerant, and eventually consistent but designed for user devices instead of a cloud "scale" service.

Why couldn't an enterprise run a "device" (a server) which others can easily sync to ("sync.enterprise.com") and which also only allows authorized users to access data which they're allowed to access? Maybe using Macaroons or something and devices can still sync locally via Bluetooth, Wifi, or whatever.

Now you have a full back up of everything on that server which IT could now more easily ensure it backed up, secured, and etc.

Not to mention the same idea could be used by a normal person just running a NAS at home or a server in DO/AWS/GCS/etc.

[+] toomim|6 years ago|reply

That's why we need to add local-first features into HTTP itself.

I've been doing some work on it over the last few days: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-b...

[+] LeftHandPath|6 years ago|reply

Sure, any one company probably will - but there’s a whole market.

As soon as that one company abandons the local-first model, a gap opens, which will (usually, eventually) be filled by a new company offering local-first until that new company does the same.

As long as the companies don’t band together and agree to end it, there should be a company offering that model somewhere somehow.

[+] rapsey|6 years ago|reply

Because local-first is not a viable business model compared to the cloud. Software goes where the money is.

[+] chuhnk|6 years ago|reply

Interesting to see this. We've taken a similar view for the initial phase of the micro network. Locality is going to matter more and more as we move into the future. Although Cloud still has its place and we don't ignore that either.

https://github.com/micro/micro https://micro.mu/docs/network.html

[+] tannhaeuser|6 years ago|reply

You don't have to invent entire new paradigms such as CRDT for this. Unix is all about site autonomy, no-BS tooling, simplicity, and portability. So for your next project, consider Unix/Linux as deployment target during development, and only then deploy it to a cloud-hosted Unix cluster, with a local-first but cloud-hosted DB such as PostgreSQL and standardized middleware such as AMQP/RabbitMQ/qpid rather than provider-specific solutions, or at least use de-facto standard protocols such as s3 and MongoDB (if needed) and supported by multiple clouds. Many people are prematurely committing to k8s and "microservices" but in my experience, even though k8s as such isn't intended as a lock-in strategy, it has the effect of absorbing so much energy in projects (with devs more than happy to spend their time setting up auth, load balancing, and automating things rather than on business functionality), and then still ending up with a non-portable, incomprehensible mess of configs and deploy scripts that it just isn't worth it.

[+] dijksterhuis|6 years ago|reply

Just wanted to point out that iTunes has had a local focused set up since inception, using xml format for a library’s database.

That seems to still exist with the introduction of Apple Music. So all library data (play counts, skips, file locations etc) are stored locally, but streaming files are hosted remotely.

Although whether this was by accident or design I have no idea.

239 comments