top | item 43554444

Why do we need modules at all? (2011)

151 points| matthews2 | 11 months ago |groups.google.com

88 comments

order

ludston|11 months ago

We need modules so that my search results aren't cluttered with contamination from code that is optimised to be found rather than designed to solve my specific problem.

We need then so that we can find all functions that are core to a given purpose, and have been written with consideration of their performance and a unified purpose rather than also finding a grab bag of everybody's crappy utilities that weren't designed to scale for my use case.

We need them so that people don't have to have 80 character long function names prefixed with Hungarian notation for every distinct domain that shares the same words with different meanings.

sweezyjeezy|11 months ago

I agree, but also agree with the author's statement "It's very difficult to decide which module to put an individual function in".

Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc. This is great if you are trying to hunt down a single module or line of code quickly. But it can become absolute misery to actually read the 'flow' of the codebase, because every file has a million dependencies, and the logic jumps in and out of each file for a few lines at a time. I'm a big fan of the "proximity principle" [1] for this reason - don't divide code to optimise 'searchability', put things together that actually depend on each other, as they will also need to be read / modified together.

[1] https://kula.blog/posts/proximity_principle/

taeric|11 months ago

We could get that without a hierarchical categorization of code, though?

Makes me wonder what it would look like if you gave "topics" to code as you wrote it. Where would you put some topics? And how many would you have that are part of several topics?

bob1029|11 months ago

I feel like you are arguing more for namespaces than modules.

Having a hierarchical naming system that spans everything makes it largely irrelevant how the functions themselves are physically organized. This also provides a pattern for disambiguating similar products by way of prefixing the real world FQDNs of each enterprise.

AtlasBarfed|11 months ago

The article references the true granularity issue (actually the function names need a version number as well, not sure in my scan of the article if it was mentioned).

Modules being collections of types and functions obviously increases coarseness. I'm not a fan of most import mechanisms because it leaves versioning and namespace versioning (if it has namespaces at all...) out, to be picked up poorly by build systems and dependency graph resolvers and that crap.

norman784|11 months ago

Don't forget about encapsulation, there's most likely a lot of functions that aren't relevant outside the module.

Gurkenglas|11 months ago

just deduce the domain from text similarity :o)

rdtsc|11 months ago

I miss Joe, he left us too early. He always had wild ideas like that. For a while he had this idea of a git + bittorrent he called it gittorrent, only to find out someone had already used the name. I think it was a bit of an extension of this universal functions idea.

If you expand some of the comments below, he and other members of the community at the time have a nice discussion about hierarchical namespace.

I particularly like his "flat beer and chips" comment:

https://groups.google.com/g/erlang-programming/c/LKLesmrss2k

---

> I'd like to know if there will be hierarchial modules in Erlang, because tree of packages is a rather good idea:

No it's not - this has been the subject of long and heated discussion and is why packages are NOT in Erlang - many people - myself included - dislike the idea of hierarchical namespaces. The dot in the name has no semantics it's just a separator. The name could equally well be encoders.mpg.erlyvideo or mpg.applications.erlvideo.encoder - there is no logical way to organise the package name and it does not scale -

erlyvideo.mpegts.encoder erlyvideo.rtp.encoder

But plain module namespace is also ok. It would be impossible for me to work with 30K LOC with plain function namespace.

The English language has a flat namespace.

I'd like a drink.alcoholic.beer with my food.unhealthy.hamburger and my food.unhealthy.national.french.fries

I have no problem with flat beer and chips.

/Joe

---

hinkley|11 months ago

Software development is continually emotionally stunted by a lack of people with expertise in multiple other fields.

English absolutely has namespaces. Every in-group has shibboleths and/or jargon, words that mark membership in the group that have connotations beyond the many dictionary definitions of that word (in fact I wonder how many words with more than three definitions started out as jargon/slang words that achieved general acceptance).

You cannot correctly parse a sentence without the context in which it was written. It’s a literary device some authors use. By letting the reader assume one interpretation of a prophetic sentence early on, the surprise the reader experiences when they discover a different interpretation at the end intensifies the effect.

twic|11 months ago

> The dot in the name has no semantics it's just a separator.

That's not true of all module systems. It's true in Java, but not in Rust, where it establishes a parent-child relationship, and in which context [1]:

> If an item is private, it may be accessed by the current module and its descendants.

[1] https://doc.rust-lang.org/reference/visibility-and-privacy.h...

auggierose|11 months ago

but we do have alcoholic beer, and non-alcoholic beer, and it is nice to be able to say which one you want. And yes, there is a separator here, too, it is called a space.

neongreen|11 months ago

> database of functions

This is exactly what Unison (https://www.unison-lang.org/) does. It’s kinda neat. Renaming identifiers is free. Uh… probably something else is neat (I haven’t used Unison irl)

brabel|11 months ago

A lot of things are neat because of this. Refactoring becomes trivial and safe. If you do not change the type of the refactored function, you can safely do a batch replace and everywhere the old function was used, the new one will be used after that. If you do change the type, the compiler interface will guide you through an interactive flow where you have to handle the change everywhere the function was being used. You can stop in the middle and continue later... and once you're done you just commit and push... all the while the code continues to work. Even cooler, perhaps: no unit test is re-run if not affected. And given the compiler knows the full AST of everything , it knows exactly when a test must run again.

jweir|11 months ago

I tried it out. Fascinating language and a completely different paradigm. The language itself is familiar, but the structure of the program is different - no files – all functions are in a database and their history. I found the language a bit difficult to navigate, but that is probably because of my experience of work with files, and having tools based on files.

anonzzzies|11 months ago

Makes me think of Unison [0]. I never used it but I found it interesting to read about.

[0] https://www.unison-lang.org

jonnycat|11 months ago

This is one of those things where I don’t agree with the argument, but know the person making it knows way more than I do on the subject and has given it way more thought. In these cases it’s usually best to sit back and listen a bit...

GrantMoyer|11 months ago

I think Hoogle[1] is proof this concept could work. Haskell has modules, of course, but even if it didn't, Hoogle would keep it still pretty usuable.

The import piece here which is mentioned but not very emphasized in TFA is that Hoogle lets you search by meta data instead of just by name. If a function takes the type I have, and transforms it to the type I want, and the docs say it does what I want, I don't really care what module or package it's from. In fact, that's often how I use Hoogle, finding the function I need across all Stack packages.

That said, while I think it could work, I'm not convinced it'd have any benefit over the statys quo in practice.

[1]: https://hoogle.haskell.org/

kibwen|11 months ago

Hoogle works because of how richly-typed Haskell is, but Erlang is dynamically-typed.

VMG|11 months ago

Functions are not isolated values.

They are nodes in a graph, where the other nodes are the input types, output types and other functions.

It makes sense to cluster closely associated notes, hence Modules.

Etheryte|11 months ago

As with other similar proposals, doesn't this simply move the complexity around without changing anything else? Now instead of looking for the right module or whatnot, you'll be sifting through billions of function definitions, trying to find the very specific one that does what you need, buried between countless almost but not quite similar functions.

jasode|11 months ago

If there are no modules but a "flat" global namespace which requires every function name to be unique to avoid collisions... it means people would inevitably re-invent pseudo/fake "modules" and hierarchy in metadata tags in large non-trivial codebases.

Consider a function name: log()

Is it a function to log an event for audit history?

Or is it a function to get the mathematical natural logarithm of a number?

The global namespace forces the functions to be named differently (maybe use underscore '_') in "audit_log()" and the other "math_log()". With modules, the names would isolated be colons "::" or a period '.' : Audit.log() and Math.log(). Audit and Math are isolated namespaces. You still have potential global namespace collisions but it happens at the higher level of module names instead of the leaf function names. Coordinating the naming at the level of modules to avoid conflicts is much less frequent and more manageable.

Same issue in os file systems with proposing no folders/directories and only a flat global namespace with metadata tags. The filenames themselves would have embedded substrings with underscores to recreate fake folder names. People would reinvent hierarchy in tag names with concatenated substrings like "tag:docs_taxes_archive" to recreate pseudo folders/directories of "/docs/taxes/archive". Yes, some users could deliberately avoid hiearchies and only name tags as 1-level such as "docs", "taxes", "archive" ... but that creates new organizational problems because some have "work docs" vs "personal docs" ... which gravitates towards a hierarchical organization again.

john2x|11 months ago

This is what Emacs Lisp has, and what indeed does happen with libraries

skydhash|11 months ago

Same thing happens to me with Bear.app (note taking). It only has tags, and the first thing I believe everyone does is to go with hierarchical structure again, because you need some tag, but also an additional specifier. Which help with grouping an location (And Bear.app have support for that naming scheme and displays it as a tree).

bionhoward|11 months ago

IMHO, aren’t modules necessary for big projects to limit the amount of complexity we have to deal with at any one time?

Our minds can (allegedly) only handle 7+/-2 concepts in working memory at once. Your whole codebase has way more than that, right? But one module could easily fit in that range.

TOGoS|11 months ago

Unison of course works this way, as has been mentioned.

I like Deno for similar reason. It's a coarser level of granularity, and not explicitly content-addressed, but you can import specific versions of modules that are ostensibly immutable, and if you want, you could do single-function modules.

I like the idea so much that I'm now kind of put off by any language/runtime that requires users of my app/library to do a separate 'package install' step. Python being the most egregious, but even languages that I am otherwise interested in, like Racket, I avoid because "I want imports to be unambiguous and automatically downloaded."

Having a one-step way to run a program where all dependencies are completely unambiguous might be my #1 requirement for programming languages. I am weird.

One reason not to do things this way is if you want to be able to upgrade some library independently of other components that depend on it, but "that's what dependency injection is for". i.e. have your library take the other library as an argument, with the types/APIs being in a separate one. TypeScript's type system in particular makes this work very easily. I have done this in Deno projects to great effect. From what I've heard from Rich Hickey[1] the pattern should also work well in Clojure

[1] something something union types being superior to what you might call 'sum types'; can't find the link right now. I think this causes some trouble in functional languages where instead of something being A|B it has to be a C, where C = C A | C B. In the former case an A is a valid A|B, but a C A is not an A, so you can't expand the set of values a function takes without breaking the API. Basically what union types require is that every value in the language extends some universal tagged type; if you need to add a tag to your union then it won't work.

bjourne|11 months ago

This is an all-time classic, but, sadly, most HN commenters just don't "get it". Perhaps because they have no experience with the Erlang VM so they don't understand Joe's premises The Erlang VM is best described as a dynamic process manager and a "function" is just a callstack template. You want to fix a bug in your executing function without stopping it? Sure, no problem. Just reload the function and have the VM seamlessly upgrade the callstack to the new template. Since data is immutable it mostly just works. Now since functions forms the basic unit of work in Erlang modules are kind of irrelevant. Recompiling a module is the same as recompiling every function in the module. Hence, what use does the abstraction serve? The proliferation of "utils" or "misc" modules in not only Erlang but many other languages supports his point.

Btw, the more experienced I've gotten the more I've found that organizing code is mostly pointless. A 5000-line source file (e.g., module) isn't necessarily worse than five 1000-line files.

skydhash|11 months ago

It's all related to naming. You can refer to a symbol with auth/guard/token/authenticate or auth_guard_token_authenticate, and what matters is the amount of characters you type sometimes. Also you can have encapsulation with the first option.

Smalltalk have the same live experience, but do have modules, because it makes editing easier and encapsulation is nice for readability and clarity.

igouy|11 months ago

Is a 5,000-line function worse than 500 10-line functions?

(Locality of reference.)

jiggawatts|11 months ago

I had a vaguely similar notion of a global proof database. Picture something like a blockchain (actually a "blockgraph") of Lean theorems built up from other theorems and axioms also on the same distributed global data structure.

A use-case could be optimising compilers. These need to search for alternative (faster) series of statements that are provably equivalent to the original given some axioms about the behaviour of the underlying machine code and basic boolean algebra and integer mathematics.

This could be monetised: Theorems along the shortest path from a desired proof to the axioms are rewarded. New theorems can be added by anyone at any time, but would generate zero income unless they improve the state-of-the-art. Shortest-path searches through the data structure would remain efficient because of this incentive.

Client tools such as compilers could come with monthly subscriptions and/or some other mechanism for payments, possibly reusing some existing crypto coin. These tools advertise desired proofs -- just like how blockchain clients advertise transactions they like to complete along with a fee -- and then the community can find new theorems to reach those proofs, hoping not just for the one-time payment, but the ongoing reward if the theorems are general and reusable for other purposes.

Imagine you're a FAANG and there's some core algorithm that uses 1% of your global compute. You could advertise a desire to improve the algorithm or the assembly code to be twice as efficient for $1M. Almost certainly, this is worth it. If no proof turns up, there's no payment. If a proof does turn up, a smart contract debits the FAANG's crypto account and they receive the chain of theorems proving that there's a more efficient algorithm, which will save them millions of USD in infrastructure costs. Maths geeks, AI bots, and whomever else contributed to the proof get their share of the $1M prize.

It's like... Uber for Fields medals, used for industrial computing.

Fully automated gig work for computer scientists and mathematicians.

LegionMammal978|11 months ago

The Metamath Proof Explorer (AKA the set.mm database) works on a similar principle, of all theorems forming a tree of backreferences that ultimately lead to the axioms [0].

Though it wouldn't make sense to build something like that on top of such a fast-moving, complex, and bug-prone target like Lean.

[0] https://us.metamath.org/mpeuni/mmset.html

rapjr9|11 months ago

Code reuse used to be thought of as the ultimate destiny for computer code. Eventually you would never have to write new code, it would all exist in some form and you would just reuse it. We may actually be getting something like that through the use of AI for coding now. The problems with code reuse are many though. A function written in C for a microcontroller often won't work in a python program, especially not without all the include files, libraries, and a wrapper. How do you find the function that matches your needs? Coding language and function parameters offer some clues but are not sufficient. For example one function may do rounding on return values while another does not. Do you need a function optimized for speed, for object size, for time latency? Will the function work in your computing space? For example one function might use a loop, another might use recursion so running on an MCU one might work and the other might crash your stack. This might be a set of problems a LLM could sort out, but it would need some most excellant training data, not random stuff off the web. Even existing, working, industrial code bases would not be sufficient since there are huge sections of code in them that are legacy and mostly no longer used or relevant. It would probably have to be a constantly curated training database. And there is an inherent ambiguity in code, who is to say one implentation is better than another if they both work? This could make debugging challenging if the coding style is different in all the code snippets.

lorentzoh|11 months ago

I sometimes want to save a useful python script, or a function in github, but don't want to create a whole repository with README, name, description and so on. This model would be nice for that. But then, the access control system (e.g. private/public repos) would be completely different. Perhaps you could have groups of users, and give group access to individual objects (classes/functions) in your repo. But giving access to each object one by one would probably be tedious, so you need some kind of modules, or other way of grouping objects (maybe tags?). I also thought of tags on one of my jobs as DevOps. We had a lot of types of repos, that we needed to organize, and enforce access control: - code repos like backend and frontend - infrastructure repos with k8s manifests, helm charts etc. - also a lot of microservice/microfrontend repos The team lead was trying to come up with a hierarchy for all this, but struggled. If github allowed for a flat namespace of repos, and tags-based access control, we could create tags like 'frontend', 'devops', 'backend', give the tags to people and repos.

immibis|11 months ago

I think they already tried this "single flat key/value namespace of all functions" in the JavaScript ecosystem - it was called npm. It was a mess when someone claimed a function based on a trademark and someone else deleted the function for padding a string to a certain length with spaces on the left in retaliation.

layer8|11 months ago

Aside from grouping functions together that work together, for example working with data types/structures also defined in or by the module, modules also serve the purpose to hide implementation-detail code (“private” functions) shared between those functions. Modules provide a form of information hiding.

Furthermore, modules are the unit of versioning. While one could version each individual function separately, that would make managing the dependency graph with version compatibility considerably more complex.

There is the adage “version together what changes together”. That carries over to modules: “group together in a module what changes together”. And typically things change together that are designed together.

Namespaces are an orthogonal issue. You can have modules without namespaces, and namespaces without modules.

friendzis|11 months ago

Global namespace clobbering has huge implications. With modules/namespaces you have a well defined and limited blast radius: a change is limited to a module and calling code.

Now, imagine your environment of choice supported dynamic runtime loading of code where the code is just dropped to the global namespace. This screams "insecure" and "how do I know if I call the code I want to call?".

Now imagine the only mitigating mechanism was `include_once`. It would make sense software written in this environment requires own CVE namespace as new security vulns are discovered every second

jghn|11 months ago

What he wound up arguing for was that everything would have a globally unique name.

kazinator|11 months ago

Modules are often about a single function. But the function has helper functions that are not part of the API. Then maybe some functions that are part of the API which help use that function. Sometimes functions beget families.

If a function is built on several helper functions, it may be that those same helper functions can also be used to make other, related things which round out the functionality area. Perhaps they provide an API that's easier to use for different scenarios or whatever.

MrBuddyCasino|11 months ago

We need modules because they demarcate social units of collaboration.

greener_grass|11 months ago

This could be achieved with a hierarchical namespacing scheme for functions, no?

    universe.mega_corp.finance_dept.team_alpha.foo
But to use `universe.mega_corp.finance_dept.team_alpha.foo` in your application, you don't import a module, just the function `foo`.

Who controls what goes into the namespace `universe.mega_corp.finance_dept.team_alpha`? That would be Team Alpha in the Finance Department of Mega Corp.

I guess this is like tree-shaking by default.

gatinsama|11 months ago

"Namespaces are one honking great idea -- let's do more of those!" Zen of Python

nsonha|11 months ago

This is the same to "why do we need inheritance", although less obvious. Grouping things is humans' instinct, but it may not map nicely to reality where everything belongs to more than one categories.

wruza|11 months ago

That would be too useful. Imagine adding tags to functions and generally treat them as "items" which you can multi-categorize, search through, select, etc like with any dataset. Way too advanced.

porkbrain|11 months ago

1. Have a global append-only function key-value store.

2. A key of a function is something like `keccak256(function's signature + docstring)`

3. A value is a list of the function's implementation (index being the implementation's version) and some other useful metadata such as the contributor's signature and preferred function name. (Compiler emits a warning that needs to be explicitly silenced if preferred name is not used.)

4. IDE hints and the developer confirms to auto import the function from the global KV store.

5. Import hash can be prepended with a signers name that's defined in some config file. This makes it obvious in git diffs if a function changes its author. Additionally, the compiler only accepts a short hash in import statements if used with a signer.

package.toml

  [signers]
  mojmir = "mojmir's pubkey"
  radislava = "radislava's pubkey"

source.file

  // use publisher and short hash
  import "mojmir@51973ec9d4c1929b@1" as log_v1;
  // or full hash
  import "51973ec9d4c1929bdd5b149c064d46aee47e92a7e2bb5f7a20c7b9cfb0d13b39" as log_latest;
  import "radislava@c81915ad12f36c33" as ln;

  log_v1("Hello");
  log_latest(ln(0));

Philpax|11 months ago

You've just invented Unison :)

andrewcl|11 months ago

Hard to see a world without modules as a means of compartmentalization for various reasons. You do have to appreciate the exercise what a world without them looks like / the implications.

DarkNova6|11 months ago

Should have put on the (2011) label

unwind|11 months ago

Meta: this needs "(2011)" in the title, please.