top | item 2580383

Erlang inventor: Why do we need modules and files?

252 points| kungfooguru | 15 years ago |erlang.org | reply

97 comments

order
[+] stcredzero|15 years ago|reply
You don't need both modules and files. Decades of Smalltalk development experience backs this up.

Files should be an implementation detail. It should be just one back-end option out of many for persisting code.

There needs to be some way of conceptualizing a set of code. The programmer needs to perform operations on sets of code. (Merging, diffs, patches, etc...) Programmers need to store code and share it with other programmers. Files are good at doing all of these things, but none of these have anything to do with some special inherent quality only files possess.

Smalltalkers have been operating on the level of granularity of individual methods for quite awhile. It brings a lot of nice flexibility.

I would suggest that Namespaces have enough overlapping functionality with other "set of code" type entities, that they can just subsume the role of those other entities. The Java community has shown that it's workable for every project to have its own namespace. Namespaces also resolve a lot of name collision problems that keep cropping up. (Isn't that dandy?)

EDIT: The entire set of libraries for a variety of programming languages is readily available. It would be a cool project to re-granulize the commonly used library corpus for a particular programming language at the level of individual functions. There would be a lot of small-granularity dependency information that wouldn't be available but some of it could be inferred. Statically typed languages could have more of it inferred than dynamically typed languages, but there could still be a lot for the latter.

[+] JulianMorrison|15 years ago|reply
Plain text files are a communication protocol.

Files allow you to give code to your friends without extraordinary means, to use generalized programs like git and grep, and to post a copy up on the web.

Files also mean that your IDE doesn't have to try to be an OS. (Still can if it wants to, hello Emacs.)

[+] fexl|15 years ago|reply
I took the "flat" approach in the Loom server code (see https://loom.cc/source). Formerly I used all the object-oriented module techniques, but finally I scrapped all that for direct calls to functions in a flat name space. I found it more flexible and easier to understand, and it didn't force me to think about artificial boundaries. This makes the Perl code similar to my C code. I don't even use many static functions in C, preferring everything to have a unique name.

I also wrote a functional language interpreter in C. The language is called Fexl (see http://fexl.com/code/). The C code obviously uses a uniquely named flat function space. However, the Fexl language itself has far more flexibility, since you can easily create functions within functions.

For example you can easily do things like this:

  \test_print =
  (
      \test1 = ...
      \test2 = ...
      ...
  )

  \test_read =
  (
      \test1 = ...
      \test2 = ...
      ...
  )
In that case you're not at all worried about the names test1 and test2 conflicting with anything else. It's very lightweight, just like the Fibonacci example shown on this thread.

If you really want to "export" functions declared inside a scope, you can do it like this:

  \handy_module =
  (
      ...
      \fun1 = ...
      \fun2 = ...

      \return return fun1 fun2
  )
Then to grab the functions you say:

  handy_module \fun1 \fun2 ... (now use fun1 and fun2)
You can change the names too, like this:

  handy_module \f1 \f2 ... (now use f1 and f2)
There's no extra magic in the language, you're just applying the module to a handler which grabs the exported functions.
[+] technomancy|15 years ago|reply
> You don't need both modules and files. Decades of Smalltalk development experience backs this up.

While decades of elisp development shows us that without modules, nobody will want to use your language to write programs spanning more than one file. =\

[+] Todd|15 years ago|reply
I understand his frustration, but a single global namespace is not the solution. This would be a regression to the place that PHP has been trying to escape from.

If modules aren't available, then people will invent pseudo namespace qualifiers (e.g., misc_foo). A global namespace will become like the .com TLD--a few early landgrabbers with the cool names like foo and bar, then a bevy of latecomers with fooo and baaar67.

The module/namespace concept isn't a 100% solution, but it's sufficient. People understand hierarchy. It's simple and effective. And it also solves the global-visibility/encapsulation issue that he sidestepped.

[+] ohyes|15 years ago|reply
I don't think its really a single global 'namespace'.

It seems that instead of using actual names for the key in the k/v database, you would use something closer to an ip-address. give the function a unique id, and then make it have a really good description. To find the function, you use the description, and then reference the unique id.

To reference it in your code, you would bind the unique id to the descriptive name.

What I want to know is how you prevent massive duplication of functionality in this k/v database (with slightly different argument conventions or implementation details, for example).

[+] pwpwp|15 years ago|reply
Is misc.foo better than misc_foo?

My point being - a point also raised by Joe - is that just "adding dots" doesn't solve anything.

[+] joelangeway|15 years ago|reply
The global namespace in PHP is awesome. I don't have to learn a framework to solve a problem, there's a function for it.

Namespace qualifiers in names are a fine solution to the collision problem. That's effectively what's there already except every darn function has to be in a module.

Names don't have to be treated like property, you can invoke policy on them and change bad names. His dreams about rich metadata probably include version tags and hashes that would prevent this from silently breaking anything or in a way that couldn't be repaired by a tool.

Modules don't solve the problem you think they do, they just make the name of all functions longer and harder to figure out. They're not hierarchical.

He addressed issues of visibility/encapsulation and gave a great example of where modules fail to solve the problem. Though I don't quite follow his suggested fix, it looks like using lexical scope to encapsulate fib/3 but my Erlang is actually a bit rusty.

[+] akeefer|15 years ago|reply
Where all these grand proposals break down is when you actually have to deal with real, domain-specific problems. It's not too hard to clearly express what the split_string() function or the multiply() function do, so namespace collisions aren't a problem. But what about the function that "assigns" an "activity" to a "user?" All of a sudden, every single person is going to mean something different by those three words, and it suddenly requires a huge amount of context to find that function. Searching for something that assigns activities to users as some sort of global search is worthless; the only stuff you care about is the set of functions within your code base that deal with those concepts, or perhaps within some library that you understand and whose notions of users, activities, and assignment matches what you need to do. So you need some unit to pull in that's larger than just a bag of functions; you need a group of functions that collectively operate on the same sets of data in consistent ways. And that's just a fundamentally hard problem; finding the right split so that you have a set of functions that can be used together, that are reusable, that don't rely on or expose additional concepts or libraries, that's all very difficult. Just flattening the function namespace, killing modules, and doing global searches doesn't do anything for that problem. Again, as much as we all wish that the proper unit of reusability in programming is a single function, since it makes life easier, in most complicated cases that's simply not the case. That's why we have classes, or modules, or namespaces, or packages in the first place: they're all various attempts to group together functions that work together on the same kinds of data, that share the same understanding of that data and that work towards the same goal. Just punting on that encapsulation problem doesn't make it go away.
[+] joelangeway|15 years ago|reply
He is probably imagining that the name of that function would be "nameofproject_assign_user_activity/2", and that you would typically use it along with some others from the same source, following some conventions. Seems that obviates modules to me. In Erlang you give the name of the module with a function anyway, so names like that harm nothing. It would be real nice to type "map" instead of "lists:map" though.
[+] bartonfink|15 years ago|reply
Just to play devil's advocate, you might be able to handle this by hacking types through Erlang's pattern matching syntax.

Thus, you might have a function assign_activity(("KeeferUser", user_no), ("KeeferActivity", activity_no)) and rely on the compiler to match the "type" in the first field of each tuple you pass. It's not pretty, but it is possible.

[+] hxa7241|15 years ago|reply
This is significantly similar to part of a note I wrote a month or two ago, about 'Software as web' http://www.hxa.name/notes/note-hxa7241-20110314T2011Z.html

Functions calling (linking) each other, and being built out of available pieces, ideally at a global scale -- this is just like the structure of the web.

* Functions should have URLs

* Functions should be augmented with metadata, like their language/platform

* There should be more and more fully defined IMT/MIME data types for lots of data

(Although I am more casual about this idea, and am only thinking half-seriously.)

[+] toddh|15 years ago|reply
So you'll start prefixing all names with a string so they don't clash. Then you'll notice all your related code uses these strings, what a waste, so you'll create a "module" in order to be able to drop the string in a defined context. Thus namespaces/modules are reinvented. Function drives form.
[+] iandanforth|15 years ago|reply
Let me rephrase this: Search don't sort.

There cannot be a single organizational hierarchy that is intuitive to all users.

So lets try a flat namespace where every class and method can be found both by name, metadata, comments etc. Think the chrome universal search autocomplete.

Content should be addressable by more than any one namespace!

[+] danenania|15 years ago|reply
Isn't the problem more about access than storage?

We naturally recoil from the global namespace idea because we (rightly) anticipate huge issues with organization and duplication, so we want a hierarchical structure (files and modules) to keep our functions organized.

How about leaving our storage hierarchical for organizational purposes, but streamlining access? Instead of always using tedious import and require statements, simply call/use your functions, objects, gems, plugins, etc. directly and let the compiler and/or runtime infer from your usage which you are referring to in the case of ambiguity. Only if the ambiguity cannot be resolved in this manner would the programmer need to be explicit. Intelligent metadata and indexing could also add a lot of power to this sort of system.

Currently even our best languages require a large amount of cruft and legwork that is only really necessary in those 5% of cases where ambiguity can't be automatically inferred away. Seems like optimizing for the edge case if I've ever seen it.

[+] dgreensp|15 years ago|reply
No one has mentioned that modules/packages hold together code that shares concepts.

Surely getting rid of Java packages and keeping the classes would be far less extreme than what Joe is suggesting for Erlang -- yet then we would have a zillion "meanings" of things like Image, Server, PDFFile, etc.

What about the Erlang functions that aren't "file2md5" and "downcase_char"? Is he way over-generalizing, or do Erlang programs typically just munge data in obvious ways?

[+] gnaritas|15 years ago|reply
You can package code without making the package a formal part of the namespace. Having a global namespace doesn't preclude packaging code, even if they are traditionally mixed as one thing. Many Smalltalk's have a single global namespace, it works pretty well.
[+] Peaker|15 years ago|reply
Modules are units of encapsulation.

In my experience, if you have trouble deciding where a function "foo" needs to go to -- you probably have divided your code across arbitrary/poor boundaries.

Modules should have clean API's that do not leak implementation details and of course the implementation itself.

If you break down the barriers between modules, you remain with only implementation details.

A well-written module that's divided across meaningful boundaries can usually be read in separation of the project it is a part of and actually understood as whole.

All that said, I think storage of code could be done far better than serialization in text files, but that's another matter, and whatever code editing form we use, we should still have "modules".

[+] gnaritas|15 years ago|reply
Sounds like you didn't read the article, he specifically addressed encapsulation and an alternative way of doing it.
[+] limmeau|15 years ago|reply
From a collaboration point of view, the thought of programming with single-function modules sounds like a nightmare. Now you don't have to find and install matching versions of five libraries, you have to find suitable versions of a hundred functions.

Perhaps nice people step in to stop the suffering and provide packages of function versions which are mutually compatible and which are usually used together -- but that's modules again.

[+] stcredzero|15 years ago|reply
Perhaps nice people step in to stop the suffering and provide packages of function versions which are mutually compatible and which are usually used together -- but that's modules again

Not exactly. You've sort of moved the large globs of stuff from the back-end to the front-end, where the grouping is more useful.

I'm reminded of difficulties, like the kind faced when purchasing 1.5" electrical tape. That stuff's mostly used only by pro electricians, so you won't find it retail, and it's even hard to find a particular brand from a contractor's supply outfit, and when you do, you have to buy it in big lots, like a whole box of 10 rolls. What if you just want to fix a drum and you need a particular make of tape made by Scotch and you only want one roll? Out of luck.

So say you need a particular function. You end up importing this entire module, which has dependencies that also are defined in large-granularity terms (other modules) that have their own dependencies. So to use one function, you get saddled with a whole heap of dependency overhead. You're really paying the price for "a whole box" where what you really need is just one particular thing.

But if everything was stored in easily accessible public repositories at the granularity of individual functions, this wouldn't be the case. You'd be able to pull the particular version of the function you need, and it would just pull the particular versions of the functions it depends on, and so on.

Things would be a whole lot more memory efficient. Another way to think of it: modules are a lot less modular than than they really should be.

[+] BruceForth|15 years ago|reply
> The more I think about it the more I think program development should viewed as changing the state of a Key-Value database.

This is pretty much how I view my programming in Common Lisp.

[+] stcredzero|15 years ago|reply
The more I think about it the more I think program development should viewed as changing the state of a Key-Value database.

This is also how a lot of Smalltalkers operate.

[+] evangineer|15 years ago|reply
Can you say more about that or do you have a blog post on this somewhere that you can link to?
[+] daleharvey|15 years ago|reply
I think this just exacerbates the problem with erlangs flat namespace, right now it is impossible to have 2 versions of a library inside the same vm because they both exist inside the same namespace, it is insane the amount of people that have had very obscure errors because they happened to call a module "http.erl"

I would appreciate a solution that is a first class solution 'in code' as opposed to some special case with module loaders, but I would like to see people talking about the problems they are solving before pontificating about solutions

[+] mjs|15 years ago|reply
This is kinda off the point, but I'm surprised at the spelling and punctuation mistakes in that mail--generally high-profile hackers have excellent written English.

i.e. "but their isn't", "Do we need module's at all?", "do suggest alternative syntax's here."

[+] mishmash|15 years ago|reply
> spelling and punctuation mistakes > excellent written English

Erlang's syntax, naming, and abbreviation conventions are straight schizophrenic.

Keywords, directives, method/module names, variables, and arguments are randomly spelled out, others shortened, if you're lucky with underscores or CamelCase, there are some familiar C style conventions but not widespread, directory structure seems to be highly project dependent, etc. - the list goes on.

It's a very "cluttered" language and the lack of a strong proficiency in written English as you mention clearly shows (to me) in the language/framework.

[+] jsmcgd|15 years ago|reply
He may be dyslexic or he may just have rushed this to print and to hell with the typos.
[+] cbr|15 years ago|reply
I'm not sure his native language is english; he may be swedish.
[+] malkia|15 years ago|reply
Emacs is using single namespace, and people are just contributing .el scripts here and there (put in the right place, they get automatically loaded - (I'm just an user of these .el scripts - mainly for lisp/lua development))

Also "C" - single namespace, and though very verbose sometimes, google it, and you'll find result (saved me many times looking for Win32 API, GTK, Cairo, lua api, etc.)

But what about data? Static data, vars, etc. Also some languages/systems have initialization/deinitialization of the module (register/unregister, etc.)

But in general I like the idea, and thought about it, now I'm even thinking more.

He talks about putting them in database, well each database would have to have a name - maybe that's the name of the package (and you can rename), and you can merge. And if the DB is say SQL - you can even operate on merging databases way better than the methods of "ar, lib (msvc), ranlib, etc." or whatever the language/runtime provides

[+] abecedarius|15 years ago|reply
Erlang modules hold nothing but functions. In most popular languages there can be mutable data, too, either public or private; this extra complication is what would tip it over from interesting to silly in my mind. For Erlang it strikes me as worth exploring, and then maybe the result could be generalized to other languages.

He considers private implementation-only functions using letrec, but what about multiple public functions sharing the same privates? You can model them as an object (or rather a nullary function returning a tuple or some other representation of a table, since Erlang modules have functions only). Hurrah, we've reified modules, oh well.

I'll bet these points are addressed in the long followup thread.

(Obligatory mention of Zooko's triangle: http://www.zooko.com/distnames.html )

[+] Xurinos|15 years ago|reply
This issue of where to place functions is something I have been curious about, too.

In the Lisp community, what are the standards for where to store your generic methods as opposed to defined classes they work with? Rephrased, if I define classes Foo and Bar, and I write generic method foobar (accepts as params instances of Foo and Bar), where do I put foobar?

In the C++ world, where should I put my friend functions that suffer a similar lack of obvious home?

I have often seen solutions where some package/class is chosen arbitrarily as the "proper" home for these cross-class communicators, but I have long felt like this is a compromise rather than good organization. And yes, I recognize that there are at least two kinds of organizations: In what file is my code? In what namespace is my code? I am concerned with the namespace aspect.

[+] NickM|15 years ago|reply
Erlang's module system isn't just a namespace mechanism; it really lends itself well to stuff like gen_server where you need to provide a set of callbacks. I'm surprised he didn't even touch on this in his post, as it definitely seems like one of Erlang's strengths to me....
[+] pumpmylemma|15 years ago|reply
Joe Armstrong is a really interesting guy. I think he was my favorite interview in Coders at Work.
[+] bartonfink|15 years ago|reply
Armstrong doesn't seem to be saying anything about one of the more damning problems he points out - encapsulation. All functions in a module are available to each other, but hidden from code not in that module.

His solution of simply storing all code in a key-value DB only makes this worse in exchange for the dubious benefit of removing the ? "where do I put this new function?". Since he makes a big deal of encapsulation, his solution seems questionable.

EDIT - Please don't upvote this. I'm wrong in an incredibly silly way, and my wrongness doesn't need to be rewarded. If you feel generous, upvote ericflo instead for pointing out just how badly I missed the point.

[+] ericflo|15 years ago|reply
Actually he addresses that and proposes syntax to address it.

    let fib = fun(N) -> fib(N, 1, 0) end
    in
       fib(N, A, B) when N < 2 -> A;
       fib(N, A, B) -> fib(N-1, A+B, A).
    end.
Where everything in the "in" block would be hidden.
[+] pohl|15 years ago|reply
Perhaps I misunderstand, but couldn't one just as easily provide package scoping in a name/value store if the key was a hierarchical dotted-identifier?

Edit: I think I just answered my own question...the question of what exactly that dotted-identifier should be has just as much burden as the current "where do I put this" question.

[+] vilya|15 years ago|reply
It seems that this might maker code easier to write (the first time around) at the expense of making it more difficult to read. That's totally counter to the prevailing wisdom.

I'd also be curious to know what metadata the functions could be tagged with to make them sufficiently easy to find. The module and project that a function or type belongs to provides a lot of contextual information about it; coming up with metadata which captures that information without simply duplicating it (and therefore reinventing modules under another guise) sounds fairly non-trivial.