top | item 4178487

Why Files Exist

183 points| brettcvz | 13 years ago |blog.filepicker.io | reply

90 comments

order
[+] ChuckMcM|13 years ago|reply
If you want to get existential files don't exist. What exists is a way to name a non-volatile data set. Given the name, and a non-volatile memory unit, and an algorithm for translating between that name and a memory unit specific representation of its internal structure, you can retrieve the data set. If the name is sufficiently portable you could in theory hand it to another program/process/thread and that thread could translate to the actual data on the memory unit.

It sounds all horribly abstract but it is the actual reason file names, file systems, and file system APIs exist. Then there is a whole different set of semantic interpretation of the contents of files. Whether it is a simple stream of UTF8 encoded code points, or an ELF file which can describe executable code that is ready to be loaded into memory and executed.

The OP decrys the lack of an interchange format which is simply a convention by which two programs can both interpret the contents of of non-volatile memory which they have both accessed using a unique name. And that mostly because of iOS devices and the applications which have eschewed the idea of putting the names of their non-volatile data sets into a globally accessible namespace.

[+] quesera|13 years ago|reply
That's exactly right. Files don't exist. The "filesystem" just a big fat KV store.

Somewhere along the way someone decided it'd be a useful abstraction to imagine a hierarchical organization system (directories) on top, so that was glommed on, but it's not real either.

It was never a perfect system, but it worked well enough when most users were fairly technical and had a humanly comprehensible set of labelled data streams.

I appreciate the effort to insulate users from the complexity that has grown up underneath them. The simple fact is that most people fail at large scale taxonomy and organization. It's hard. And it's a lot of work to maintain even if you're good at it. See: library science. So I don't think there is another model that will succeed as well as "files" have.

iOS hides the filesystem, but it's still there obviously. So far all we've seen is insulation for those who need it, as a byproduct of huge control loss for everyone. The other (valuable) byproduct is security.

We haven't found the compromise yet. There might not be one.

[+] curiousfiddler|13 years ago|reply
Well, then there's the file metadata (inode in Linux: http://www.linux-mag.com/id/8658/). It is a very useful chunk of properties attached to the memory unit and provides a base for ton of essential facilities that make life easier.
[+] _quora|13 years ago|reply
"What exists is a way to name a non-volatile data set."

   1. What if it's "code" not "data"? I thought "data" in this case was just 1's and 0's, i.e. what the represent, e.g. electric charges on a medium? I thought that how we interpret these charges is what makes them "code" or "data".

   2. What if the "file" is stored on a "RAM disk"? Is that non-volatile? If no, then does that mean this is not, by your definition, a "file"?
[+] brettcvz|13 years ago|reply
Interesting to think of it from that layer of abstraction. I definitely agree, but "Why Files Exist" fits nicer as a title
[+] kahirsch|13 years ago|reply
There is one aspect that everybody here seems to be ignoring. Files can span boundaries of time, space, connectivity, bandwidth, and trust. They also span boundaries of architecture--CPU and OS.

I have files "in the cloud" that were born on systems that haven't been manufactured since before Google existed. Those files are self-contained units that I control and can move to whatever system I desire.

And, although people here say that end-users just don't know how to use files, I have relatives who are over 85 years old who still manage to attach photos--and Powerpoint presentations, for some reason--to emails and share them.

Saying that we only need an API is saying that it's okay for the data to die when the manufacture goes out of business, or decides that it's time to shut down the DRM servers, or you just lose your phone.

Files are reifications of data that allow us to separate some concerns. Transporting and backing-up files are orthogonal to the data that is in the files. We can compress and email and FTP any kind of data whatsoever.

That's not an insignificant thing.

[+] zanny|13 years ago|reply
You "could" in theory write apps for the iPhone that interface over localhost ports. Which would be awful. (too bad ports in unix are described with file descriptors!)
[+] thoughtsimple|13 years ago|reply
I disagree that files are the only solution. Back in the 90's Apple had an OS that had fully interoperable data in applications and that OS didn't have a file system.

It was Newton-OS and it used something known as soups for persistent storage. Soups were discoverable databases that intelligently handled Flash cards insertion/ejection. The ability to handle Flash on removable media is still something that mobile OS's have trouble with to this day.

The OS could merge soups on different stores dynamically and detect if some data in a soup was currently in use on an ejected card and ask for the card back. This merging of soups on different storage devices is something I've never seen duplicated in the subsequent 20 years.

Files are not the only way to achieve the requirements in the article. They are just the common solution.

[+] wrs|13 years ago|reply
Hey, someone remembers! (I did the Newton object store.)

I spent years of my life trying to get rid of treating direct user access to the filesystem as a foundational UI metaphor, at both Apple and Microsoft. As I liked to say, why is the UI based on a filesystem debugger? (If you can see /dev or C:\windows\system32, then yeah, you're running a debugger.)

Many people who aren't programmers don't seem to get deep hierarchy (deep meaning > 2 levels). Searching works, tags kind of work, but few people really know how to set up and use a folder hierarchy.

The reason it works to let the app deal with navigation is that the app knows how to do type-specific, contextual navigation. People like concrete things (whereas programmers like abstract things—a constant struggle). If you're trying to find a song, you want to have a UI that knows about songs: they come in albums, the same song may be on multiple albums, they have artists and composers, etc. Any attempt to represent that in a filesystem hierarchy can be nothing but a compromise.

This has nothing to do with defining standard formats for exchanging units of data. Just how you find them once you've stored them.

[+] kabdib|13 years ago|reply
We stored lots more than just "notecard entries" in the Newton object store. We had application packages (with demand paging of compressed code). We had "large binary" support, similarly demand-paged. And all of this stuff was hooked up to the garbage collector.

I don't know how well it would scale to a non-hand-held device, but it worked really well on the Newton.

Files are useful, but they are not necessary. We are used to them, but there can be better ways to do things.

[+] brettcvz|13 years ago|reply
Interesting solution - this is similar to what I was referring to in the last paragraph that a traditional folder-based filesystem isn't necessarily the only way, but that a system-wide, abstract, and inter-operable content wrapper was the key requisite
[+] icebraining|13 years ago|reply
git-annex is a similar system. Files are tracked and can be on many storage devices (local, removable and even "cloud") simultaneously, and you can always know where they are from any machine.
[+] spdegabrielle|13 years ago|reply
True. Another approach was used by the OLPC project. Their ui had a journal(chronological order) view of user created documents. There must be other approaches?
[+] wes-exp|13 years ago|reply
Databases are often excellent for interoperability. This happens all the time in business environments; e.g. a reporting tool can access an application's data using the database as the middle man.

Files, on the other hand, often have peculiar formats. Is .xls "interoperable"?

[+] juriga|13 years ago|reply
IMHO, interoperability between apps is the main benefit of Android against iOS (see Intents in Android).

I can take any photo, URL or file and open/share it in/to another app. The OS and app developers take care of which apps support which resources, so as a user I'm always presented with a sensible list of apps currently available on my device.

I'd say the notion of a file with a specific file type is too abstract and technical for most use cases for casual users. The UI should group pieces of data as human-understandable resources (i.e. a "picture" can be a .jpg, .png etc.). With this level of abstraction, a user can be expected to understand when presented with a list of apps:

OS: "What do you want to do with this URL?"

User: "Share to Twitter/Facebook/My other browser"

[+] MatthewPhillips|13 years ago|reply
Don't you see the problem with this? By allowing the app, rather than the user or the system, to own the file you wind up with multiple copies.

App A shares data with App B. The user makes some changes and App B saves its new copy. It doesn't send the data back to the originating app. It could, but then the user would have to manually do that and save it in App A again.

This is horrendous! People are confusing the poor UI that we have for files (file pickers) with thinking that files themselves are a bad abstraction.

[+] sehugg|13 years ago|reply
I think the "here today, gone tomorrow" nature of app stores is impairing file interoperability. There's just little incentive to allow your productivity or creative app to play with others (unless that's the whole point of your app, like PlainText). I've given up on fancy note-taking apps, knowing there will always be a better one that's not compatible with my old data.

In another decade we'll have a whole lot of unreadable proprietary app data, inaccessible because the original app doesn't work on new hardware. Extracting it will be a tedious process of either reverse engineering or emulating the old hardware/software combination.

Not that we haven't been down this road before, but it just seems like it's worse this time. Even the word "file format" seems archaic, and not many (other than pirates) seem interested in reverse engineering and/or documenting them.

[+] guard-of-terra|13 years ago|reply
File systems have to evolve. These days, file system means two things: an application-independent API to access common documents, and a hierarchical local storage. But it doesn't have that way.

The best thing I've seen in file system evolution is KDE's KIO: Any KDE application can take any KIO url and use it; all file operations are asynchronious (even if you open local files, and that's very nice even for local files that are big), and any program can use network resources as easy as local with little to no effort.

But we should improve on that: a heterogenous user file system should provide discoverability (e.g. your social network photos are automatically available in any program once you bind the account, and you know where to find them). File system branches restrict some operations on files or hint on their cost (scanning a huge photo bank is a very expensive operation; you can't access the contents of audio files inside a streaming service but you can play them in the program). There also should be other ways to organize files than just dumb hierarcheries (imagine a search box in place of a folder, you need a query and then you enter the search results; or you can have tag cloud in your file system)

There's a great deal of work of innovation here and nobody does it at the moment, so it seems.

Sorry for mistakes 'cause I'm hurrying to go to bed :)

[+] paulsutter|13 years ago|reply
Files exist because decks of punched cards were cumbersome. A best practice was to make a diagonal stripe across the top with magic marker so that you could restore the ordering if you accidentally dropped the deck. Files in a filesystem eliminated the need for that. And the cards were heavy.

That's why files exist. Not sure what the article is trying to say, TL/DR

[+] agumonkey|13 years ago|reply
The diagonal trick reminds me of CDROM CRC error mechanism (reversed of course)
[+] ori_b|13 years ago|reply
The key isn't the specific file abstraction used today. The key is being able to name data. Whether that is through a traditional hierarchical file system, through an activity log, through URLs, through some hash-based key-value store, the requirement is being able to refer to data independent of the application that produced it.
[+] jules|13 years ago|reply
We will still need a way to pass information between applications, but that may be so different from the concept of files that it would be ridiculous to call it files. For example applications on the internet exchange information via APIs. Microsoft is doing something similar with Windows 8: if you want to get a photo into an application you show the user a menu to get a photo. The user gets a list of all his photos to pick from. Where this list comes from is dependent on which other applications are installed: if you have a facebook application you can choose your photos from facebook, if you have picasa then you can also choose photos from picasa, etc. This works because each application that has photos is supposed to provide an API to the OS to access its photos. Exchanging information by exchanging it directly via standardized APIs makes a lot more sense than exchanging it via an abstraction layer designed to operate on top of a hard disk. This is similar to the difference between Unix pipes and getting the output of one program, storing it on your hard disk, and then reading it in with another program. With the API model the disk loses its special status, and instead becomes just one other data source/sink like any other (FUSE turned on its head, if you will).
[+] aganek|13 years ago|reply
I love this post.

There is no doubt in my mind that the file system (as we know it) is dead. Daily workflows are becoming more and more integrated with the social graph. Its one thing to manage your own file set, but try keeping track of everyone's files... or even your own across multiple different purpose devices for that matter.

If I save files using one filtering scheme and someone else saves to the same shared drive using another scheme... both of our files eventually become lost in a mess.

Like others have posted, I believe the solution is search. Maybe not textbox search like Google, but certainly different ways to view lists of files. Can you imagine viewing the most recent files edited by a certain coworker, or the most recent files edited within range of a certain GPS location. I don't have an exact answer how to sort the data, but in my mind... there is a lot of additional data that can be used to help filter file presentations beyond the just the file index and file attributes used today.

I'm in the bay area, working on a startup to address this shift. Message me if interested... I'm always looking for people to talk about it with.

[+] 7952|13 years ago|reply
"but in every OS there needs to be at least some user-facing notion of a file, some system-wide agreed upon way to package content and send it between applications."

This is what the world wide web does. DNS, HTTP, and MIME types solve these problems. The problem is that it is still to difficult to make things on a device into URLs.

[+] brettcvz|13 years ago|reply
But even on the web there is limited ability for applications to share data without explicitly working with the apis. A central filesystem allows for "star network" integration rather than point-to-point
[+] tagx|13 years ago|reply
How many different file types do you typically use in a week?
[+] tlrobinson|13 years ago|reply

    $ find ~ -type f -atime -1w | awk -F/ '{print $NF}' | awk -F\. '{if (NF>2 || (NF>1 && $1!="")) print tolower($NF)}' | sort | uniq | wc -l
          83
Granted a lot of those are system files.
[+] brettcvz|13 years ago|reply
Code (multiple types), images (multiple types), documents (multiple types), videos (multiple types), audio (multiple types)

And for all of them, I should be able to move them between services and applications as I please

[+] drostie|13 years ago|reply
Just today I worked with Markdown, LaTeX, PDF, Postscript, SVG, Python, GIF, and HTML. Most of that was in some way or another related to the Master's thesis I'm working on, but it's honestly pretty typical. (The HTML was not -- I decided to curl some spam sites linked from my email and read through their JavaScript, to trace down and see what sorts of deliciously painful exploits they were trying to install -- turned out it was just advertising crap.)
[+] tobyjsullivan|13 years ago|reply
The limitation of apps not saving to the iOS file system is not a bad thing. It is progress.

There is nothing preventing my shiny new iOS app from sharing files with other applications. Apple is just preventing those files from being stored on the device. Instead, if an app developer wants interoperability, they can have the app save a file to Drop Box, or my Google Drive. Any other application can access that same cloud storage and access the file.

The beauty is we've moved beyond sharing between applications on a single device. Now EVERY application I run on EVERY device I have has the potential to share the same data seamlessly.

This is why iOS doesn't open its file system. It wants the app developers to use something a little more flexible and reliable.

[+] juriga|13 years ago|reply
> if an app developer wants interoperability

Implementing interoperability with all the possible cloud storage systems shouldn't be left to each app developer separately. This should be a feature of the operating system.

As an Android user, I'm genuinely interested if iOS users find the sharing options between apps too limited. Do you often end up requesting new sharing options from the developers of your favorite apps?

Also, not every piece of data is a file I'd want to save to Dropbox. For example, I share article URLs from Flipboard to 2cloud many times a day (2cloud opens the URL on my desktop browser). I'd hate to have an extra save/open step between the apps.

[+] colinsidoti|13 years ago|reply
We'll see how it plays out, but I imagine the notion of a file will continue to decline, and end up replaced by APIs.

APIs continue to provide interoperability, but instead of having the user select a file to upload, they select an image through the Facebook API. This should ultimately improve the user's experience, but there are some downsides during this transition period (IE: photoshop touch lack of sharing features).

While you could argue FilePicker brings back the file, you could also hedge your bets the other way, and work as much as possible to abstract away the file. Instead of grabbing Facebook photos as a set of files, what if I could easily grab the set of photos than contain me and a friend?

[+] guard-of-terra|13 years ago|reply
This way if you start a new social network, in addition to network effect it would have that huge disadvantage to Facebook that no programs are willing to interoperate with is because they only know that proprietary Facebook API.

Good for Facebook, bad for you and me.

[+] brettcvz|13 years ago|reply
The problem is that it requires each application to have to wrap each API. You end up with a huge amount of redundant work, and in general a dearth of integrations as developers are lazy/reprioritize. For example, there are apps that support integration with Dropbox, but don't support Google Drive, Box, SugarSync, etc.
[+] colourforth|13 years ago|reply
Forget about "filesystems" for a moment.

Files exist because the amounts of "stuff" users want to "store" do not always correlate well to block size.

To put it another way, block size is fixed. But the size of "stuff" is variable.

OK, now you can go back to thinking in terms of "file systems".

[+] lucb1e|13 years ago|reply
Oh, I thought this was going to be about who invented files in the first place. Why'd you call something a file? How'd that idea arise? The only real-life "files" I know are these reports the police keeps on people, or I guess any dataset. But who invented filesystems?

Edit: 302 found http://en.wikipedia.org/wiki/Computer_file#History

[+] njharman|13 years ago|reply
Interfaces (read APIs) and data types (PDF, png, json, markdown, etc.) much superior to files for consumer level users. This is we're iOS is heading. Itseemsby evolution, not design.

Files are great when there is a competent, skilled user to provide the interface glue between apps. To automate, and have things just work, interfaces and data types are needed.

[+] AsylumWarden|13 years ago|reply
I remember the Palm Pilot didn't use files. I think I was using the Palm IIIxe may be. Developers, perhaps a little freaked out by the idea of no files, actually created an api to make it look like there were files. I thought it was pretty funny at the time.