top | item 46665839

A Social Filesystem

510 points| icy | 1 month ago |overreacted.io

238 comments

order

swyx|1 month ago

> Apps may come and go, but files stay—at least, as long as our apps think in files.

yes: https://www.swyx.io/data-outlasts-code-but

all lasting work is done in files/data (can be parsed permissionlessly, still useful if partially corrupted), but economic incentives keep pushing us to keep things in code (brittle, dies basically when one of maintainer|buildtools|hardware substrate dies).

when standards emerge (forcing code to accept/emit data) that is worth so much to a civilization. a developer ecosystem tipping the incentive scales such that companies like the Googl/Msft/OpenAI/Anthropics of the world WANT to contribute/participate in data standards rather than keep things proprietary is one of the most powerful levers we as a developer community collectively hold.

(At the same time we shoudl also watch out for companies extending/embracing/extinguishing standards... although honestly outside of Chrome I struggle to think of a truly successful example)

zahlman|1 month ago

Indeed. My first reaction was:

> Files are the source of truth—the apps would reflect whatever’s in your folder.

Now that the "app" is a web site that supports itself with advertising revenue, it has no incentive whatsoever to work this way.

willtemperley|1 month ago

> a developer ecosystem tipping the incentive scales such that companies like the Googl/Msft/OpenAI/Anthropics of the world WANT to contribute/participate

I think Apache Arrow has achieved exactly that [1]. It's also very file-friendly, in that Arrow IPC files are self describing, zero-copy, and capable of representing almost any data structure.

[1] https://insights.linuxfoundation.org/project/apache-arrow/co...

danabramov|1 month ago

Nice to see you :) I didn't know the "indirection" law, that's funny.

isodev|1 month ago

> At the same time we shoudl also watch out for companies extending/embracing/extinguishing standards

Is ATProto actually a standard? But regardless, nothing prevents Bluesky from enschitifying.

I’m somewhat concerned that the “file system” or the storage where all of our things are supposed to be stored is now suddenly in the cloud. We actually have a real file system … it backs itself up on iPhone even.

It feels like the entire pitch is based on some FOMO factor “oh but my posts” - do people really care that much about their short form outbursts? I mean the whole point of twitter was to post and forget but maybe not for everybody.

bigyabai|1 month ago

I think that's an overly charitable take. Giving Google/MSFT/OpenAI/Anthropic what they want does not guarantee a return on dividends. Standards are nice, but Apple is a giant testament to the fact that all the standards in the world won't move an adequately entrenched business.

skybrian|1 month ago

This article goes into a lot of detail, more than is really needed to get the point across. Much of that could have been moved to an appendix? But it's a great metaphor. Someone should write a user-friendly file browser for PDS's so you can see it for yourself.

I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.

So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.

So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.

danabramov|1 month ago

My goal with writing is generally to move things out of my head in the shape that they existed in my head. If it's useful but too long, I trust other people to pick what they find valuable, riff on it, and so on.

>Someone should write a user-friendly file browser for PDS's so you can see it for yourself.

You can skip to the end of the article where I do a few demos: https://overreacted.io/a-social-filesystem/#up-in-the-atmosp.... I suggest a file manager there:

>Open https://pdsls.dev. [...] It’s really like an old school file manager, except for the social stuff.

And yes, the paradigm is essentially "everyone is a scraper".

seridescent|1 month ago

> Someone should write a user-friendly file browser for PDS's so you can see it for yourself.

https://pdsls.dev/ can serve this purpose IMO :) it's a pretty neat app, open source, and is totally client-side

edit: whoops, pdsls is already mentioned at the end of the article

extraduder_ire|1 month ago

You can use pdsfs[0] to mount a user's pds locally using FUSE read-only. It's mentioned in the blogpost. I remember seeing a tool posted for mounting them read-write, if you sign in, but can't remember where to find it.

0: https://tangled.org/oppi.li/pdsfs

swyx|1 month ago

> So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed.

whats the sota on atproto encryption dan? just publish encrypted stuff with sha 256 and thats it?

verdverm|1 month ago

Private data will come to ATProto, it's not a finished protocol

DustinBrett|1 month ago

I think that is the general style of overreacted.io posts.

theturtletalks|1 month ago

POSSE and AT Protocol can be understood as interoperable marketplaces. Platforms like Reddit and Instagram already function this way: the product is user content, the payment is attention, and the platform’s cut is ads or behavioral data. Dan argues that this structure is not inevitable. If social data is treated as something people own and store themselves, applications stop being the owners of social graphs and become interfaces that read from user-controlled data instead.

I am working on a similar model for commerce. Sellers deploy their own commerce logic such as orders, carts, and payments as a hosted service they control, and marketplaces integrate directly with seller APIs rather than hosting sellers. This removes platform overhead, lowers fees, and shifts ownership back to the people creating value, turning marketplaces into interoperable discovery layers instead of gatekeepers.

dharmatech|1 month ago

Was going to ask this but then I found openship in your profile. Will check it out. Thanks!

articsputnik|1 month ago

not sure if you understood the article, isn't the whole point to own your data as "it's just a filesystem". Reddit, Instagram, etc. are the total opposite.

motoxpro|1 month ago

I've always thought walled gardens are the effect of consumer preferences, not the cause.

The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.

I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.

I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.

dameis|1 month ago

>I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social.

ATProto apps don't automatically work like this and don't support all types of "files" by default. The app's creator has to built support for a specific "file type". My app https://anisota.net supports both Bluesky "files" and Leaflet "files", so my users can see Bluesky posts, Leaflet posts, and Anisota posts. But this is because I've designed it that way.

Anyone can make a frontend that displays the contents of users PDSs.

Here's an example...

Bluesky Post on Bluesky: https://bsky.app/profile/dame.is/post/3m36cqrwfsm24

Bluesky Post on Anisota: https://anisota.net/profile/dame.is/post/3m36cqrwfsm24)

Leaflet post on Leaflet: https://dame.leaflet.pub/3m36ccn5kis2x

Leaflet post on Anisota: https://anisota.net/profile/dame.is/document/3m36ccn5kis2x

I also have a little side project called Aturi that helps provide "universal links" so that you can open ATProto-based content on the client/frontend of your choice: https://aturi.to/anisota.net

jrowen|1 month ago

I agree. I don't understand the driving force here.

I have all of the raw image files that I've uploaded to Instagram. I can screenshot or download the versions that I created in their editor. Likewise for any text I've published anywhere. I prefer this arrangement, where I have the raw data in my personal filesystem and I (to an extent) choose which projections of it are published where on the internet. An IG follow or HN upvote has zero value to me outside of that platform. I don't feel like I want this stuff aggregated in weird ways that I don't know about.

jrv|1 month ago

> I think I don't really understand the benefit of data portability in the situation.

Twitter was my home on the web for almost 15 years when it got taken over by a ... - well you know the story. At the time I wished I could have taken my identity, my posts, my likes, and my entire social graph over to a compatible app that was run by decent people. Instead, I had to start completely new. But with ATProto, you can do exactly that - someone else can just fork the entire app, and you can keep your identity, your posts, your likes, your social graph. It all just transfers over, as long as the other app is using the same ATProto lexicon (so it's basically the same kind of app).

danabramov|1 month ago

>The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.

I actually agree with that. See from the post:

>For some use cases, like cross-site syndication, a standard-ish jointly governed lexicon makes sense. For other cases, you really want the app to be in charge. It’s actually good that different products can disagree about what a post is! Different products, different vibes. We’d want to support that, not to fight it.

AT doesn't make posts from one app appear in all apps by default, or anything like that. It just makes it possible for products to interoperate where that makes sense. It is up to whoever's designing the products to decide which data from the network to show. E.g. HN would have no reason to show Instagram posts. However, if I'm making my own aggregator app, I might want to process HN stuff together with Reddit stuff. AT gives me that ability.

To give you a concrete example where this makes sense. Leaflet (https://leaflet.pub/) is a macroblogging platform, but it ingests Bluesky posts to keep track of quotes from the Leaflets on the network, and display those quotes in a Leaflet's sidebar. This didn't require Leaflet and Bluesky to collaborate, it's just naturally possible.

Another reason to support this is that it allows products to be "forked" when someone is motivated enough. Since data is on the open network, nothing is stopping from a product fork from being perfectly interoperable with the original network (meaning it both sees "original" data and can contribute to it). So the fork doesn't have to solve the "convince everyone to move" problem, it just needs to be good enough to be worth running and growing organically. This makes the space much more competitive. To give an example, Blacksky is a fork of Bluesky that takes different moderation decisions (https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...) but remains interoperable with the network.

hahahahhaah|1 month ago

Posts feel less valuable as files as they are of the moment. I don't need this comment as a docx, thank you all the same.

So I agree.

Maybe building it that way opens possibilities I can't see?

But for functional apps it seems useful to have a file you can download and use somewhere else. Images and movies are a great example. I can record a video in Loom and chuck it on Youtube. Youtube can't lock me on that video. it is my file. Imagine if that was not the case. It would suck.

jimbokun|1 month ago

Ever heard of the World Wide Web?

It was great. Anyone could host web pages, anyone could access and read anyone’s web pages with a tool called a web browser, of which there used to be several compatible implementations.

extraduder_ire|1 month ago

If truth social didn't remove all the federation code, posts from mastodon and many other ActivityPub sites would have appeared there.

christophilus|1 month ago

I’ve been reading “The Unix Programming Environment”. It’s made me realize how much can be accomplished with a few basic tools and files (mostly plain text). I want to spend some time thinking of what a modern equivalent would look like. For example, what would Slack look like if it was file (and text) oriented and UNIXy? Well, UNIX had a primitive live chat in the form of live inter-user messaging. I’d love to see a move back to simpler systems that composed well.

zahlman|1 month ago

Unix gave the world a lot of good ideas about architecture, but I think it really hamstrung itself by treating all the data as plain text and resisting the idea of having any kind of structured, formatted data passed within a pipeline. It's nice to be able to serialize to something human-readable and -editable, but constantly re-parsing and re-formatting it becomes a real pain.

japanuspus|1 month ago

> Identity -- This is a difficult problem.

My hope is that in 5 years, I will not have anything in my feeds that have not been signed in a way that I can assign a trust level.

Here in the Nordics, we are already seeing messaging apps such as [hudd] that require government issued ID to sign in. I want this to spread to everything from podcasts and old-school journalism to the soccer-club newsletter, so that I can always connect a piece of information back to a responsible source.

[hudd]: (https://about.hudd.dk/))

wink|1 month ago

So you're simply not interested in reading any random website by random people who don't see a benefit of establishing any form of trust, especially if should not be connected to their official government IDs?

Or to put it differently: Where should this come from, and which issuer would you trust? And why should anyone else agree with you that this is good?

trizuz|1 month ago

[deleted]

skeledrew|1 month ago

I've been thinking of this for some time, conceptually, but perhaps from a more fundamental angle. I think the idea of "files" is pretty dated and can be thrown out. Treat everything as data blobs (inspired by PerKeep[0]) addressed by their hashes and many of the issues described in the article just aren't even a thing. If it really makes sense, or for compatibility sake, relevant blobs can be exposed through a filesystem abstraction.

Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps is that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.

Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.

It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.

[0] https://perkeep.org/

danabramov|1 month ago

I'm using filesystem more as a metaphor than literally.

I picked this metaphor because "apps" are many-to-many to "file formats". I found "file format" to be a very powerful analogy for lexicons so I kind of built everything else in the explanation around that.

You can read https://atproto.com/specs/repository for more technical details about the repository data structure:

The repository data structure is content-addressed (a Merkle-tree), and every mutation of repository contents (eg, addition, removal, and updates to records) results in a new commit data hash value (CID). Commits are cryptographically signed, with rotatable signing keys, which allows recursive validation of content as a whole or in part. Repositories and their contents are canonically stored in binary DAG-CBOR format, as a graph of data objects referencing each other by content hash (CID Links). Large binary blobs are not stored directly in repositories, though they are referenced by hash (CID).

Re: apps, I'd say AT is actually post-app to some extent because Lexicons aren't 1:1 to apps. You can share Lexicons between apps and I totally can see a future where the boundaries are blurring and it's something closer to what you're describing.

mike_hearn|1 month ago

It's bold to say users don't want apps when the expressed preferences are so strongly for apps that "App Store" is a well known brand.

The word app is short for application, which in this context is just a synonym for capability. People have tried breaking apps apart into more fine grained capabilities before and it never worked (OpenDoc and OLE being two well known examples). Brand names matter, they have value to people. YouTube isn't an abstract collection of decomposable capabilities, it's a brand that summarizes numerous unarticulated pieces of information, like its commenting culture, what kind of content is allowed and what isn't, etc.

> I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it.

This is already a built-in feature on macOS. If I right click in your message "Look up PerKeep" is literally the first command, and if I click it I get a nice info window telling me what it is that I can copy/paste from. There is nothing to solve here.

Jonovono|1 month ago

I can’t remember how many times I’ve read an article and enjoyed it so much and then looked and saw it was written by Dan ;) always a pleasure !

camgunz|1 month ago

I'm skeptical of these kind of like, self-describing data models. Like, I generally like at proto--because I like IPFS--but I think the whole "just add a lexicon for your service and bickety bam, clients appear" is a leap too far.

For example, gaze upon dev.ocbwoy3.crack.defs [0] and dev.ocbwoy3.crack.alterego [1]. If you wanted to construct a UI around these, realistically you're gonna need to know wtf you're building (it's a twitter/bluesky clone); there simply isn't enough information in the lexicons to do a good job. And the argument can't be "hey you published a lexicon and now people can assume your data validates", because validation isn't done on write, it's done on read. So like, there really is no difference between this and like, looking up the docs on the data format and building a client. There are no additional guarantees.

Maybe there's an argument for moving towards some kind of standardization, but... do we really need that? Like are we plagued by dozens of slightly incompatible scrobbling data models? Even if we are, isn't this the job of like, an NPM library and not a globally replicated database?

Anyway, I appreciate that, facially, at proto is trying to address lock in. That's not easy, and I like their solution. But I don't think that's anywhere near the biggest problem Twitter had. Just scanning the Bluesky subreddit, there's still problems like too much US politics and too many dick pics. It's good to know that some things just never change I guess.

[0]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...

[1]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...

danabramov|1 month ago

Not sure I fully get you... In your example, isn't the problem that nobody cares about this data? So there is no motivation to build a client. Whereas if these were beloved notes or minisites or whatever that got wiped out by the latest acquisition (e.g. see https://bento.me/ shutting down), people would know exactly what those are, and there would be incentive for someone to compete for the userbase.

E.g. Blento (https://blento.app/) is atproto Bento that I only saw a couple of days ago. But the cool thing is that if it shuts down, not only someone else can set it up again (it's open source), but they're also gonna be able to render all of the users' existing content. I think that's a meaningful step forward for this use case.

Yes, there's gonna be tons of stuff on the network that's too niche, but then there's no harm in it either. Whereas wherever there is enough interest, someone can step in and provide the code for the data.

echoangle|1 month ago

Is there anything stopping me from backdating my own records? Since the createdAt is just an arbitrary field, I can just write whatever I want in there, right? Is there a way for the viewing application to verify when the record was created (and not modified since), maybe by looking at the mentioned signing?

danabramov|1 month ago

You can indeed backdate records. Since the application knows when it has first seen (i.e. indexed) your record, it can decide what to do with that information. If there's a difference, the Bluesky app, for example, shows its own indexing time, but also shows a separate panel saying the post is backdated to some other date. Other apps could choose to show something else.

It is possible to create links asserting a specific version by making a "strong ref" which includes content hash.

James_K|1 month ago

AT Proto seems very overengineered. We already have websites with RSS feeds, which more or less covers the publishing end in a way far more distributed and reliable than what AT offers. Then all you need is a kind of indexer to provide people with notifications and discovery and you're done. But I suppose you can't sell that to shareholders because real decentralised technology probably isn't going to turn as much of a profit as a Twitter knockoff with a vague decentralised vibe to it that most users don't understand or care about.

danabramov|1 month ago

Why so much cynicism? The people working there genuinely care about this stuff. Maybe you disagree with technical decisions but why start by projecting your fantasies about their motivations?

RSS is OK for what it does, but it isn't realtime, isn't signed, and doesn't support arbitrary structured data. Whereas AT is signed, works with any application-defined data structures, and lets you aggregate over millions of users in real time with subsecond end-to-end latency.

sroerick|1 month ago

Interesting - I just spent all day on this on an app which I'm using. My architecture is a little different (probably worse).

The app lives on a single OpenBSD server. All user data is stored in /srv/app/[user]. Authentication is done by accessing OpenBSD Auth helper functions.

Users can access their data through the UI normally. Or they can use a web based filesystem browser to edit their data files. Or, alternately, they can ssh into the server and have full access to their files with all the advantages this entails. Hopefully, this raises the ceiling a bit for what power users of the system can accomplish.

I wanted to unify the OS ecosystem and the web app ecosystem and play around with the idea of what happens if those things aren't separate. I'm sure I'm introducing all kinds of security concerns which I'm not currently aware of.

Another commenter brought up Perkeep, which I think is very interesting. Even though I love Plan 9 conceptually, I do sort of wonder if "everything is a file" was a bit of a wrong turn. If I had my druthers, I think building on top of an OS which had DB and blob storage as the primary concept would be interesting and perhaps better.

If anybody cares, it's POOh stack, Postgres, OCAML, OpenBSD, AND htmx

dannyfritz07|1 month ago

Is there a community or forum where questions about ATProtocol can be asked?

Can you can spread your identity across multiple PDS repositories? Was thinking about creating a PDS that can store binary blobs to download, but not all PDS would like that amount of binary data stored.

UPDATE: found it: https://discourse.atprotocol.community

clnhlzmn|1 month ago

Seems similar to remoteStorage [0]. What happened to that anyway?

[0]: https://remotestorage.io/

danabramov|1 month ago

This doesn't look similar to me.

remoteStorage seems aimed at apps that don't aggregate data across users.

AT aims to solve aggregation, which is when many users own their own data, but what you want to display is something computed from many of them. Like social media or even HN itself.

Vinnl|1 month ago

remoteStorage is still occasionally getting updates. https://solidproject.org is a somewhat newer, similar project backed by Tim Berners-Lee. (With its own baggage.)

I think of those projects as working relatively well for private data, but public data is kinda awkward. ATProto is the other way around: it has a lot of infra to make public data feasible, but private data is still pretty awkward.

It's a lot more popular though, so maybe has a bigger chance of solving those issues? Alternatively, Bluesky keeps its own extensions for that, and starts walling those bits off more and more as the VCs amp up the pressure. That said, I know very little about Bluesky, so this speculation might all be nonsense.

metabagel|1 month ago

How does this relate to the SOLID project?

https://solidproject.org/

danabramov|1 month ago

I'd say some of the worldview is shared but the architecture and ethos is very different. Some major differences:

- AT tries to solve aggregation of public data first. I.e. it has to be able to express modern social media. Bluesky is a proof that it would work in production. AFAIK, Solid doesn't try to solve aggregation, and is focused on private data first. (AT plans private data support but not now.)

- AT embraces "apps describe on their own formats" (Lexicons). Solid uses RDF which is a very different model. My impression is RDF may be more powerful but is a lot more abstract. Lexicon is more or less like *.d.ts for JSON.

hollowonepl|1 month ago

Interesting concept for all new social platforms that already live in federated, distributed environments that share communication protocols and communication data formats.

I bet more difficult to push existing commercial platforms to anyhow consider.

That would make marketing tools to manage social communications and posting across popular social media, much easier. Never the less Social Marketing tools have already invented we similar analogy just to make control over own content and feedback across instances and networks.

We still live in a world where some would say BSKY some would say Mastodon is the future… while everybody still has facebook and instagram and youngsters tik tok too. Those are closed platforms where only tools to hack them, not standards persist

xk3|1 month ago

Makes me think of Syncthing... I wrote a wrapper for it that attempts to make it easier for people to use:

https://github.com/chapmanjacobd/syncweb-py

https://github.com/chapmanjacobd/syncweb-ui

Unfortunately, mesh storage systems are very different conceptually so it is difficult for people to think about permissions and access. You can bolt on something familiar but then it really limits the usefulness of mesh storage and you may as well just be using HTTP servers.

nonethewiser|1 month ago

But how do you get people to actually want this? This stuff is pretty niche even within tech.

danabramov|1 month ago

Bluesky is not huge, but 40M users is not nothing either. You don't get people to want this, you just try to build better products. The hope is that this enables us all to build better products by making them more interoperable by default. Whether this pans out remains to be seen.

heyitsaamir|1 month ago

I think most people do want this. They want to own their data. If you ask someone if they post on IG, if they should own that, or IG, they'll tell you it's them.

The hard problem IMO is how do you incentivize companies from adopting this since walled gardens helps reduce competition.

johanneskanybal|1 month ago

I think the opportunity is huge, that a new social media without the current flaws will emerge is inevitable and this is could be part of the implementation solution but sure no real universal success. Think it probably needs to come outside of the us to have any credibility.

jimbokun|1 month ago

“You know how much it sucked when Elon bought Twitter and it became a cesspool? Wouldn’t it have been great if you could easily take your entire post history to some other platform?”

itmitica|1 month ago

To share is to lose control. You can't undo, even once shared, it can't be undone. You can't retract a published novel. You can't retract a broadcast music or show. What makes you think you can do it over internet?

danabramov|1 month ago

I don't think my article makes any claims that one can undo sharing. What I'm saying is that we benefit collectively from being able to untether data from applications. It's the same logic as https://stephango.com/file-over-app but applied to the aggregating web applications.

geokon|1 month ago

This was a nice intro to AT (though I feel it could have been a bit shorter)

The whole things seems a bit over engineered with poor separation of concerns.

It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that

Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.

Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.

evbogue|1 month ago

Yes. SSB and ANProto do this. We actually can simply link to a hash of a pubkey+signature which opens to a timestamped hashlink to a record. Everything is a hash lookup this way and thus all nodes can store data.

danabramov|1 month ago

I'm not sure I understand your proposal. Do you taking my example (a Twitter post) and showing how it would be stored in your system?

extraduder_ire|1 month ago

You mention having a "self" record in the app.bsky.actor.profile lexicon to store profile information, is there any reason to have more records of that type in your repository?

I've seen a few people make other ones when examining their accounts with pdsls, but they seem to be there for "just because I can" reasons.

danabramov|1 month ago

Different apps can have different notions of a profile so you'd probably have one per app.

ahussain|1 month ago

It seems like the biggest downside of this world is iteration speed.

If the AT instagram wants to add a new feature (i.e posts now support video!) then can they easily update their "file format"? How do they update it in a way that is compatible with every other company who depends on the same format, without the underlying record becoming a mess?

danabramov|1 month ago

That's a great question!

Adding new features is usually not a problem because you can always add optional fields and extend open unions. So, you just change `media: Link | Picture | unknown` to `media: Link | Picture | Video | unknown`.

You can't remove things true, so records do get some deprecated fields.

Re: updating safely, the rule is that you can't change which records it would consider valid after it gets used in the wild. So you can't change whether some field is optional or required, you can only add new optional fields. The https://github.com/bluesky-social/goat tool has a linting command that instantly checks whether your changes pass the rules. In general it would be nice if lexicon tooling matures a bit, but I think with time it should get really good because there's explicit information the tooling can use.

If you have to make a breaking change, you can make a new Lexicon. It doesn't have to cause tech debt because you can make all your code deal with a new version, and convert it during ingestion.

hahahahhaah|1 month ago

I have always thought open file format > open source. My ideal web everyone has their own web file storage (get from anywhere e.g. email provider) and web apps use that to store things. Team collab etc. built on top of that e.g. sharing a file means share ann accept edits type flow. Everyone owns their file.

voidUpdate|1 month ago

How does this system determine the amount of likes a post has? Since there is no back reference on a post to people who have liked it, don't you have to iterate over every single person, iterate over their likes, to see if one of them is the post you are viewing, and add all them up?

galactus|1 month ago

In the ATProto architecture, this function is handled by the AppView, which monitors the full network and produces the corresponding aggregates.

yladiz|1 month ago

I know this is somewhat covered in another comment, but, the concepts described in the post could have been reduced quite a bit, no offense Dan. While I like the writing generally, I would consider writing and then letting it sit for a few days, rereading, and then cutting chaff (editing). This feels like a great first draft but without feedback, and could have greatly benefited from an editing process, and I think using the argument that you want to put out something for others to take and refine isn’t really a strong one… a bit more time and refinement could have made a big difference here (and given you have a decently sized audience I would keep in mind).

danabramov|1 month ago

From my perspective, there is no chaff. I've already the read the entire thing from top to bottom over 20 times (as I usually do with my writing), I've done several full edit passes, and I've removed everything inessential that I could find. The rest is what I wanted to be included into this article.

I know my style is verbose but I try to include enough details to substantiate the argument at the level that I feel confident it fully stands for itself. If others find something useful in it, I trust that they can riff on those bits or simplify.

lanyard-textile|1 month ago

There is not much actionable here, as well intentioned as your comment is.

It's like saying this MR could use some work but not citing a specific example.

jrm4|1 month ago

The more I read and consider Bluesky and this protocol, the more pointless -- and perhaps DANGEROUS -- I find the idea.

It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.

Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.

This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.

I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.

danabramov|1 month ago

Author here. I think it's fair to say that AT protocol's model is "everyone is a scraper", including first party. Which has both bad and good. I share your concern here. For myself, I like the clarity of "treat everything you post as scraped" over "maybe someone is scraping but maybe not" security by obscurity. I also like that there is a way for me to at least guarantee that if I intentionally make something public, it doesn't get captured by the container I posted it into.

bee_rider|1 month ago

This seems like tensions between normal/practical and “opsec” style privacy thinking… Really, we can never be sure anything that gets posted on the internet won’t be captured by somebody outside our control. So, if we want to be full paranoid, we should act like it will be.

But practically lots of people have spent a long time posting their opinions carelessly on the internet. Just protected by the fact that nobody really has (or had) space to back up every post or time to look at them too carefully. The former has probably not been the case for a long time (hard drives are cheap), and the latter is possibly not true anymore in the LLM era.

To some extent maybe we should be acting like everything is being put into a perfect distributed record. Then, the fact that one actually exists should serve as a good reminder of how we ought to think of our communications, right?

skybrian|1 month ago

It's true that Mastodon is somewhat better if you don't want to be found, though it's hardly a guarantee. From a "seeing like a state" perspective, Bluesky is more "legible" and that has downsides.

But I think there's room for both models. There are upsides to more legibility too. Sometimes we want to be found. Sometimes we're even engaging in self-promotion.

Also, I'll point out that Hacker News is also very legible. Everything is immutable after the first hour and you can download it. We just live with it.

case0x|1 month ago

>helping build a perfect decentralized surveillance record

a record of what? Posts I wish to share with the public anyway?

mozzius|1 month ago

This is a line of thinking that just supposes we shouldn’t post things on the internet at all. Which, sure, is probably the right move if you’re that concerned about OPSEC, but just because ActivityPub has a flakier model doesn’t mean it isn’t being watched

iameli|1 month ago

what if I want to publish something publicly on the internet though

pcthrowaway|1 month ago

In theory it should be possible to allow users to upload ciphertext that they can then share a decryption key with their intended audience. I believe atproto has dissuaded against this with the argument that ciphertext shouldn't be in public view, but this seems to hinge on the idea that the cipher is insecure, or will be in the future. I don't see why using a post-quantum encryption scheme shouldn't provide the appropriate security, which may still not be foolproof, but it certainly would make indexing the data much more difficult

skeledrew|1 month ago

When it comes to the internet, tech is law. There is no way to publicly share something and maintain control over it. Even on the Fediverse, if either a client or server wants to ignore part of the protocol or model, it can. Like a system message to delete particular posts for anti-surveillance reasons can simply be ignored by any servers or clients that were designed/modified for surveillance. Ultimately the buck lies with the owner of some given data to not share that data in the first place if there's a chance of misuse.

plagiarist|1 month ago

Shouldn't the ability to forget and delete content that was ever public on the internet be considered fictional anyway?

eduction|1 month ago

Unpopular opinion: this should be done with xml, not json. XML can have types, be self describing, and be extended (the X in XML).

That said it’s a very elegant way to describe AT protocol.

danabramov|1 month ago

I'd be curious to see what that would look like!

aembleton|1 month ago

Why would we need to store the createdAt value in a file? The filesystem already stores this information. We could just store the text which would mean no Json would be needed.

danabramov|1 month ago

I'm using "filesystem" a bit loosely here.

The important parallel I was going for was "file format" as interface between apps (= lexicons being an interface between social apps).

If you want details on the actual data structures, check https://atproto.com/specs/repository.

air217|1 month ago

nostr protocol and the client/relay model is one simple way to separate apps (clients) from the data (relays)

diceduckmonk|1 month ago

Git is the API.

Github/Gitlab would be a provider of the filesystem.

The problem is app developers like Google want to own your files.

EGreg|1 month ago

As someone who explicitly designed social protocols since 2011, who met Tim Berners-Lee and his team when they were building SOLID (before he left MIT and got funded to turn it into a for-profit Inrupt) I can tell you that files are NOT the best approach. (And neither is SPARQL by the way, Tim :) SOLID was publishing ACLs for example as web resources. Presumably you’d manage all this with CalDAV-type semantics.

But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.

Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.

Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.

ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:

Ian Clarke, founder of Freenet, probably the first decentralized (not just federated) social network: https://www.youtube.com/watch?v=JWrRqUkJpMQ

Noam Chomsky, about Free Speech and Capitalism (met him same day I met TimBL at MIT) https://www.youtube.com/watch?v=gv5mI6ClPGc

Patri Friedman, grandson of Milton Friedman on freedom of speech and online networks https://www.youtube.com/watch?v=Lgil1M9tAXU

danabramov|1 month ago

To be clear, I'm using files in a relatively loose sense to focus on the "apps : formats are many-to-many" angle. AT does not literally implement a full filesystem. As the article progresses, I restrict some freedoms in the metaphor (no directories except collections, everything is JSON, etc). If you're interested in the actual low-level repository format, it is described here: https://atproto.com/specs/repository

demux|1 month ago

FYI the CTO of Bluesky was an early dev of Secure ScuttleButt

nonethewiser|1 month ago

Ironically, DID is the perfect vehicle for age verification.

sneak|1 month ago

Losing private keys is much more common than losing domains.

danabramov|1 month ago

Yes, which is why by default, key management is done by your hosting. You log into your host with login/password or whatever mechanism your host supports.

Adding your own emergency rotational key in case your hosting goes rogue is supported, but is a separate thing and not required for normal usage. I'd like this to be more ergonomical though.

viraptor|1 month ago

I like the write-up of this idea. It's well presented. But I'd change one aspect: "We could leave author: 'dril' in the JSON but this is unnecessary too." - kind of. What the post lacks is the record of the identity at the time. What the user's username and the avatar was at the time can change the meaning of the post entirely. To really preserve the message, you need to reference what the displayed identity was used to post it - not just the account id.

There's a number of famous accounts that do it continuously. For example popehat today is "Fucking Bitch Hat" but will change to something else soon that may be related to the current events.

danabramov|1 month ago

I think most people's mental model is that they should be able to change their handle / display name / avatar freely, and their posts would display the new versions. So those aren't a part of the post itself.

That said, you could create an AT app that displays a version of the post using the profile at the time. You'd just need to index all profile changes into a local database, and then query that database for the "profile at that time" to display the post. So what you're describing is possible—it just requires a different aggregation. The source of truth, however, should be denormalized and reflect most recent data.

LoganDark|1 month ago

I did a double take at "DID as identity" because Dissociative Identity Disorder shares the same acronym

black_puppydog|1 month ago

The premise of this article resonates so much with me! I didn't see the angle on ATproto coming, and frankly this description of it is the first that makes me want to dig into it a bit.

The issue of "file-less" computing has been bothering me a lot. It's worst on iOS, where apple are really pushing hard to have users never ever think in "files". Closely followed by MacOS and only then android, imho. "That thing over there? That's not a file! That's a photo! Very different thing, that. No, you can't use $app to view it or share it, sorry. And if you want to copy it, you have to go through our export functionality which is buggy and strips all kind of info and generally works best through iCloud.

My mind's insistance that behind all the flashy apps must he a backing store, and that it's most likely file based, makes it even more infuriating when trying to get unprocessed photos off a relative's iPhone or such.

catapart|1 month ago

yeah yeah yeah, everyone get on the AT protocol, so that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money) while still maintaining the original, largest, and currently only portal to actually publish the content (which makes money[0]). let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.

if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.

but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.

I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.

[0] *potentially/eventually

lou1306|1 month ago

> until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage

Quite some BSky users are publishing on their own PDS (Personal Data Server) right now. They have been for a while. There are already projects that automate moving or backign up your PDS data from BSky, like https://pdsmoover.com/

danabramov|1 month ago

>that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money)

That's not correct, actually hosting user data is cheap. Most users' repos are tiny. Bluesky doesn't save anything by having someone move to their own PDS.

What's expensive is stuff like video processing and large scale aggregation. Which has to be done regardless of where the user is hosting their data.

elbci|1 month ago

agree! Social-media contributions as files on your system: owned by you, served to the app. Like .svg specifications allows editing in inkscape or illustrator a post on my computer would be portable on mastodon or bluesky or a fully distributed p2p network.

ninkendo|1 month ago

> When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It’s the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more.

https://www.joelonsoftware.com/2001/04/21/dont-let-architect...

its_ethan|1 month ago

Hi, so I generally actually agree with you and your criticisms of this blog post (in your thread with the author). I think there's something pretty true in the blog post you shared from Joel (true in that it applies to more than just the software world) and looked at some of his more recent posts.

https://www.joelonsoftware.com/2022/12/19/progress-on-the-bl...

This one in particular reads similar to what this comment section is about, it looks like Joel is basically becoming an architecture astronaut himself? Not sure if that's actually an accurate understanding of what his "block protocol" is, but I'm curious to hear from you what you think of that? In the 25 years since that post, has he basically become the thing he once criticized, and is that the result of just becoming a more and more senior/thinker within the industry?

danabramov|1 month ago

Author here! I grew up reading Joel's blog and am familiar with this post. Do you have a more pointed criticism?

I agree something like "hyperlinked JSON" maybe sounds too abstract, but so does "hyperlinked HTML". But I doubt you see web as being vague? This is basically web for data.

doctorflan|1 month ago

I was hoping this was literally just going to be some safe version of a BBS/Usenet sort of filesharing that was peer-based king of like torrents, but just simple and straightforward, with no porn, infected warez, randomware, crypto-mining, racist/terrorist/nazi/maga/communist/etc. crap, where I could just find old computing magazines, homebrew games, recipes, and things like that.

Why can’t we have nice things?

I guess that’s what Internet Archive is for.