top | item 30898803

A project with a single 11,000-line code file

498 points| todsacerdoti | 4 years ago |austinhenley.com | reply

329 comments

order
[+] userbinator|4 years ago|reply
I remember many years ago coming across a reimplementation of the server side for a popular MMORPG of the time, reverse-engineered from the client (which was Flash) by what was likely a teenager --- it was over 100k lines in a single file, written in Visual Basic. Global variables everywhere, short names, and not even indentation. All the account data was stored in flat files, there was no actual DB. No "best practices" at all. Yet, not surprisingly, it worked pretty well and was actually not difficult to modify --- Ctrl+F would easily get you to the right place in the code.

I guess the moral is, never underestimate what determination and creativity can do, and always be skeptical when someone says there's only one best way to do something.

[+] sillysaurusx|4 years ago|reply
Ironically, this is a description of hacker news itself. https://github.com/shawwn/arc/blob/arc3.1/news.arc

(HN has indentation, though.)

It’s important to realize that this is good design. It’s hard to separate yourself from the time you live in, but the rewards are worthwhile.

[+] gofreddygo|4 years ago|reply
Funnily enough, I have recently had great success by reversing the "best practices" on a distributed "micro services" architecture application into a single big Java file.

Best practices were the usual suspects DRY, IOC, SQL + NoSQL, separation of concerns, config files over code, composition over inheritance, unexplainable overlapping annotations, dozens of oversimplified components doing their own thing, and some $something_someone_read_on_a_medium_post

The Single Java File was around 500 lines no db, lots of globals, a dozen or so classes and some interfaces, Threads for simulating event based concurrency, generous use of Java queues and stacks but i specifically made it static with Zero dynamic hashmaps.

It actually runs in my IDE, I can understand what the hell the product is supposed to do what component is doing more than it should and more valuable was to predict what could break if I change that value in the helm chart from 5.0 to 5.1.

It is quite useful and pleasing, I can actually reason about things and I have new found use and appreciation for Type Systems and compile errors. And I can write tests that run in under 3seconds.

[+] formerly_proven|4 years ago|reply
It always seemed surprising to me how some of the big Oblivion and Skyrim mods would get by with fairly few bugs despite there being no way to have automated tests and some of them having 10k lines of scripting (or much more in some cases) spread around dozens or hundreds of quests (quests in the CE engine are not just the quests you as a player see, but also a huge number of invisible quests because quest state machines and associated scripting is how scripting works).
[+] wefarrell|4 years ago|reply
You can get away with a lot on a single developer project and best practices aren’t in place solely to make code functional.

That application would likely fall apart if multiple developers of with diverse backgrounds had to maintain it and add new features.

[+] yoursunny|4 years ago|reply
Circa 2006, I was in college, and I got hired to write a webapp for a college department. I didn't know JavaScript could have classes and capture variables, so I made the app with entirely global variables and plain functions combined with `eval`. It's over 2000 lines, and nobody after me could understand it.
[+] rezonant|4 years ago|reply
I suspect the game was Runescape. My brother used to be a fan of these custom servers.
[+] swayvil|4 years ago|reply
Ah, that takes me back. Commodore 64 freeware games written in Basic.

Ya, you could just go in there and mess with the code all over the place.

[+] weq|4 years ago|reply
me when i look back at the code i pumped at as a 15yr old writing a java servlet web app that could admin a quake1 tf server game
[+] cookiengineer|4 years ago|reply
Back in the days at Zynga, there was this ritual that new members of the STG (Shared Tech Group, which developed the game engine stack) had to try to refactor the road logic code.

Suffice it to say, it's a 28k LOC file that was so bad, it could even hold up in court as evidence that a South American company stole the code of Zynga's -ville games. We could reproduce each and every single bug and its effects 1:1 in their games, with all the crashing scenarios that were easy to reproduce, hard to debug, and almost impossible to fix.

Once you dig into the hole of depth sorting and being smart by "just slicing" everything into squared ground tiles on the fly, there's no way out of that spaghetti code anymore.

Fun times, was always a joy seeing people give up to a single code file. The first step to enlightenment was always resignation :)

[+] kbrannigan|4 years ago|reply
So afraid to write bad, spaghetti code, I ended up writing no code at all.

This thread made me realize that it's better to have a working profitable project with bad code, than a perfect unfinished project, with meticulously chosen design patterns.

Afraid of being judged for bad code, I could not start until I had the right architecture.

I'm glad I read this.

This is developers therapy.

[+] khazhoux|4 years ago|reply
I've sadly come to realize (after witnessing on many projects) that there's a pattern that goes like this:

* Team A writes code quickly. Not bad code, really, but they take shortcuts everywhere they can. They don't have the strongest tests, they don't generalize for all the known use cases, etc. Their code goes to beta and gets users and makes progress.

* Team B deliberates and deliberates. They try to avoid taking shortcuts. But in the end, even their code doesn't have the strongest tests, doesn't generalize for all the known use cases, etc. Team B never gets users or gains momentum, and their code+architecture was probably no better than Team A -- they just took 3x the time to get there.

[+] squeaky-clean|4 years ago|reply
The Player controller from the game Celeste is a single 5600 line file that includes things like systems only used in the tutorial. I honestly don't think it's as bad as some of the criticism it got when the code was released makes it seem, but it certainly could be better looking code.

But ultimately, Maddy Thorson isn't selling a block of code. They're selling a game and it has extremely satisfying control of the character. And that's all that really matters for a player controller.

Maybe better organization and design patterns could have made it faster to develop? But I don't believe it would.

But also the type of product does matter for this. Celeste had 2 programmers so a lot of the things necessary for a team of 100 devs would just be harmful. If you're making a library/framework to be used and modded by others, architecture matters a lot more. If you're designing an enterprise application that you know will need business logic customizations for 25 business customers it matters more. It's all about knowing the scope of your project. But also until you start getting that many customers, maybe the unsustainable method is what will allow you to reach those first few sales more quickly to be able to stay in business long enough to be bit by the technical debt.

[+] heywire|4 years ago|reply
I am part of a small team that maintains a legacy point of sale system that is still used by thousands of stores around the world. It started life as a DOS application written in C with some ASM bits, and has since accumulated some C++ and C#. There are functions over 5000 lines long. Files over 50000. Global all over the place. It can be a challenge sometimes, but after almost 30 years, it still brings in millions of dollars a year in maintenance and enhancements, and still processes millions of transactions for those retailers.
[+] wonderwonder|4 years ago|reply
Getting something working and out there is 90% of the battle, especially on small or single person teams. I wrote a saas php app with vanilla html and JS that ran without issue for 8.5 years for a fortune 1000 company. About twice a year I would return to it to add or modify a feature and I had no idea how a lot of it worked and even had duplicated or redundant files that I was too afraid to delete. It worked though and I got paid every month for a very long time. Sometimes delivering a product is all it takes and getting trapped in delivering 'clean' code is just a blocker. Not often, but sometimes :)
[+] Oras|4 years ago|reply
Been there. I worked in a company where we had a codebase like the one mentioned in the article and over the years we started developing microservices with 100% code coverage.

The new shiny services took much longer to identify bugs and add new features due to the complexity of the design and endless interfaces.

[+] 2143|4 years ago|reply
+100

I'm so afraid of creating programs in languages that don't enforce a structure at all, even though I know how to write everything from scratch make it work.

If it's some framework, then it'll already be structured somewhat.

In the rare event that I do create something with no frameworks, I ensure that there aren't much global variables.

[+] knome|4 years ago|reply
You sound like you should read this: [removed]

apparently jwz decided not to be linked from here :/

there's an archive.org link below.

[+] leetcrew|4 years ago|reply
happy medium: write shitty code with strong API boundaries. at least the damage is localized and every so often you can go back and clean up or replace components independently. or more likely, just leave it that way and make more poorly implemented features that make money.
[+] bpicolo|4 years ago|reply
> it's better to have a working profitable project with bad code, than a perfect unfinished project, with meticulously chosen design patterns.

A lot of businesses were built on PHP this way

[+] 29athrowaway|4 years ago|reply
Restaurant industry version:

    I was so afraid to cook in a dirty kitchen, I ended up not cooking at all.

    This thread made me realize that's better to sell food prepared on dirty surfaces with unrefrigerated ingredientes half-eaten by rodents and roaches that makes people sick, than fresh food prepared on clean surfaces with clean utensils.

    I'm glad I read this.

    This is a restaurant worker story.
Construction industry version:

    I was so afraid of not using the right construction materials and not building code-compliant structures, I ended up not building at all.

    This thread made me realize that's better to sell houses with structural problems and low quality materials that will be unsafe to live in, than houses built according to code.

    I'm glad I read this.

    This is a builder story.
In any other industry, a person would go to jail for saying that. You won't, because luckily for you, software development is not a regulated activity, and people with your mindset can make a happy living outside of jail. But hopefully one day some types of neglect in software development become illegal.

"Better is the enemy of the worse" is no excuse to have spaghetti code, or 50,000 lines of code files. It means that good is sometimes more convenient than perfect. Spaghetti code is not good to begin with.

[+] shakna|4 years ago|reply
I learned that not all text editors go to the effort of loading the file data very carefully with careful underlying data structures when I tried to open a 67K LOC COBOL file on a 32bit system, a while back. (Sidenote: COBOL has a 999,999 LOC hard limit in the compiler spec.)

So very many editors just couldn't open it.

Some would use so much memory that the system would either freeze, or the OS would kill them.

Some would silently truncate at 65,535 lines.

Some would produce a load error.

Some would pop up with an error indicating the developer thought it was an unreachable state. e.g. "If you're seeing this error... Call me. And tell me how the fuck you broke it."

Others would manage to open it, but were completely unuseable. Where moving the cursor would take literal minutes.

There were exactly three editors I found at the time that worked (none of which were graphical editors). And they worked without any increased latency, letting you know that the developers just thought through what they were doing: vim, emacs, nano.

(A few details because people are probably curious - the vast majority of that single file project was taxation formulae. It was National Mutual's repository of all tax calculations for every jurisdiction that they worked in, internationally, for the entire several hundred years of the company. They just transcribed all their tax calculation records into COBOL.)

[+] lordnacho|4 years ago|reply
No doubt dozens of devs will throw in their own 10k LOC story here, and yes it's painful to watch so many people having professional cramps over it.

But don't forget society itself if governed by OOM larger bits of text with no referential integrity, no machine to tell you if it's inconsistent, and no way to test anything, other than making humans write more text to each other and occasionally show up in court. The law itself, even parts of it like the tax code, and regulations on various areas, are a melange of text and cultural understandings between lawyers, judges and government. We collect the data for this machine in the form of contracts and receipts, and it piles up in mountains.

As with code, it's not just legal professionals who have to deal with law. It spills into everyone's life, and there's nothing to do about it other than either guess what to do or pay a pro to tell you what to do.

[+] cormacrelf|4 years ago|reply
You are wrong to say there's no way to test anything. Imagine an enormous AI generating test cases for you constantly, in an adversarial fashion, with built-in rewards for advancing a more correct understanding of the text. Lawyers call this "testing", rightly so. If you are interested in efficiency / cost-effectiveness, it's got lots to be desired. But if you are interested in the internal integrity of the document etc, then this is better than almost anything developers have.

I hate these words as I type them but the law is also "agile" (ugh). It gets modified as it's used. It does not need high-assurance machine-verified "referential integrity". In my entire course of studying the law I don't think I've seen a single legal dispute over a problem of referential integrity. Mistakes, especially drafting mistakes, are corrected on the fly pretty much everywhere they appear, and then they disappear. For a dev, using the wrong variable name in a bad language could mean you introduce a huge security vulnerability and massive loss of trust. (Or if you write smart contracts, $100M down the drain.) For lawyers, referring to the wrong section has essentially zero consequences. Nobody cares. Maybe you get a funny look from a senior.

Finally re the 10k LOC tangent that this is supposed to be connected to, I'm not really sure what you're complaining about. You get "10kLOC" cases, but you also get well-organised practice guides & bench books. Laws in statute are typically very well organised, in my experience about 5-10x better than the average codebase. Laws are organising large swaths of the sum total of human endeavour, just as code does. I would say developers are behind overall, which makes sense for a discipline that's less than a century old.

[+] Fwirt|4 years ago|reply
The payroll check printer for my employer was once a couple thousand lines that generated raw PCL to be sent to a LaserJet that used magnetic toner to produce checks that had a working MICR number. It was rendered into spaghetti by multiple GOTOs that jumped to helpful labels like "666", and calls into other helper programs to generate more PCL that did things like change fonts and draw graphics. Of course none of it was commented, so you had to have a copy of the PCL spec on hand to know what any of it did. It was the product of a retired cowboy that had also written the rest of our custom payroll system over a number of years.

I attacked it by printing out and taping together each program into "scrolls" and tracing control flow with highlighters and sharpies. Had them all taped up on my office wall so I could refactor the whole thing from scratch, coworkers found that entertaining. Got a much more readable replacement working nicely. Then a couple years later HR bought a new system and we stopped printing our own checks. I was not sorry to see the whole thing go.

[+] briantkelley|4 years ago|reply
I worked on Word for years. Office has thousands of files over 10,000 lines with, uh, various degrees of test coverage and comprehensibility. After some time and experience, your mental model of the architecture ends up being way more important than simple metrics on source code organization.

IMO, organizing source code in files seems archaic. E.g. tracing the history of a function moved across files can be tedious even with the best tools. I’d like to see more discussion around different types of source storage abstraction.

There are benefits of large source files... When compiling hundreds of thousands of files (like Office), the overhead of spawning a compiler process, re-parsing or deserializing headers, and orchestrating the build is non-trivial. Splitting large files into smaller ones could add hours of just overhead to the full build time.

[+] JoeAltmaier|4 years ago|reply
I've refactored monolithic code several times in my career. It starts with a thorough going-over, making notes, identifying the state machines and drawing what was handled and what was not.

Then, reimplement as a simple state machine but this time, fill in all the transitions (event+state => new state + action)

One was an Infiniband code base from the vendor - a 'computer scientist' had written several layers to do what one or two could accomplish. Another, the Windows CE DHCP client (went from seconds to choose an address to milliseconds). Then there was an HDLC modem protocol - I got done, that was sped up a multiple and no longer crashed.

I can't understand them by just reading. I had to make a road-map of all the states, events, actions and interfaces. Design a new code. Then make sure every function of the 'old' code was represented in the new code - line by line. So nothing got dropped.

Satisfying. But more like turning the crank and making sausages than design or architecture.

[+] holoduke|4 years ago|reply
Better one good organized file than 100s of folders and subfolders and files and symlinks. I have worked on projects where even after 2 years I didn't grasp the folder structure and just used search to locate files.
[+] civilized|4 years ago|reply
People love to complain about things that are simple, fast, and easy to complain about, without regard to whether the complaint is insightful or useful. It's sort of the dark twin of bikeshedding.

If you divide the single 11k-line file into a thousand 11-line files, it may become objectively much harder to understand, but it'll also receive much less flak, guaranteed.

I suspect this is also why Architecture Astronaut-ery can be so successful within a company. If code is chock-full of superficial signs of order and craftsmanship, such as hierarchy, abstraction, and Design Patterns(TM), it takes a lot of mental effort to criticize it, and most people won't.

[+] thedanbob|4 years ago|reply
I once inherited a mission-critical PHP project which had no version control, no tests, and no development environment (all edits were made directly on the server). It used a custom framework of the original author’s own devising which made extensive use of global variables and iframes and mostly lived in several enormous PHP files. I was able to clean it up somewhat, but there was one particularly important file that was so dependent on global variables and runtime state that I never dared touch it.

When I was finally able to retire the project several years later, I first replaced the home page with this picture: http://2.bp.blogspot.com/-6OWKZqvvPh8/UjBJ6xPxwjI/AAAAAAAAOv...

[+] choletentent|4 years ago|reply
If well done, single file projects are not bad. They save a lot boilerplate code. It is also easier to find things, since it is all in the same file.

EDIT: I'll go even further. Programmers who don't like long files are probably using the scrollbar to navigate around the file. Vim saves me from that bad habit.

[+] activitypea|4 years ago|reply
I don't get the obsession with file length. What's the benefit of having 100 files with one 50-line function per file, over having a 5000 file with 100 functions? Obviously not counting extreme cases where the file size would break some editors' buffers
[+] perlgeek|4 years ago|reply
Usually (but not always) a single, huge file points towards missing structure, missing abstractions, missing boundaries that aid with understanding.

If it were a huge, single file, with very understandable modularity within that file, likely nobody would've bothered to write a blog post about it :-)

[+] deergomoo|4 years ago|reply
Personally I find it much more difficult to keep n places in one giant file in my head than I do n individual files.

We have a few multi-kloc legacy monsters where I work and I quite often completely lose my place when working on them (and, by association, my train of thought), even though they’re actually structured somewhat reasonably.

[+] enneff|4 years ago|reply
I think the problem in this case was that the entire file was the script that ran top to bottom. It’s not so much that the file was big, but that the function was huge and impossible to reason about.

I agree that obsessing over file length is it’s own kind of anti pattern. I have had colleagues who insist on putting every little thing in a different file and that is its own special kind of hell.

[+] montenegrohugo|4 years ago|reply
Try debugging a single 10k loc file versus fifty small modules where each takes care of a distinct part of the logic.
[+] yk|4 years ago|reply
The most strangely maintainable code, though I should probably put maintainable in scare quotes, I have ever seen was a astrophysics code that calculated the changes to spectrum during interactions with background fields. That thing hat two loong nested loops, in the outer loops it calculated local backgrounds, and the inner loop was basically a Euler solver that used the backgrounds from the outer loops.

The outer loop was something like 4 kLOC, and consisted of blocks where there were first 20 lines of loadTable(filename) calls, then a call to calculateLosses( <all the tables just loaded> ) and then freeTable( <just loaded tables> ) calls. The inner loop was a little bit of setup and then a very long part where all those losses would be subtracted from the spectra.

The funny thing was, that once you got the structure, that code was actually not that bad. However, I told my boss several times that the second something comes along and doesn't exactly fit into that pattern the entire thing will blow up, and was always told that they maintain that code for 15 years and that didn't happen yet.

[+] dudeinjapan|4 years ago|reply
10 years back when I was first starting my company, we wanted to build a phone IVR system with Twilio to book tables at restaurants. It was fairly complex; it to track many different aspects of call state, including client being able to enter things like date/time/party size etc. with push-dial, and call our internal APIs. I assigned a recent college grad to the task.

In a week, she came back and said "OK I've finished the prototype." I thought no way and I asked her to demo. Try X, try Y, try X + Y, etc. -- it all worked.

Then I looked at the code.

She had written the API handler as a one gigantic function, presumably because Twilio gives you a single API callback on an incoming call. It was a maze of nested if-statements going 10+ levels deep, subroutines relentlessly copy-pasted inline throughout the whole thing. Then she manually tested by dialing the phone 100s of times, putting in hacks throughout the if-tree.

Her prototype ended up being pretty easy to refactor and was ultimately the basis (at least logic-wise) of what we put into production.

[+] lkrubner|4 years ago|reply
I've never written a file with 11,000 lines of code, but I have often built Clojure projects like this, with everything in one file. I think I might have once had a file with 4,000 lines of code. Maybe 5,000? A complete system might be 5 apps, working together, each made of one large file. It does help with some things. Especially if I try to on-board another programmer, if they don't know Clojure very well, using one file means they don't ever get tripped up by name spaces, instead, they just open one file, and then they can load it into the REPL and start working. I would not recommend this style for every project, but it does offer a kind of simplicity for the projects I work on.
[+] throwaway71271|4 years ago|reply
i used to work on 40-50k line files with one function and a bunch of gotos in perl, in a multi billion dollar company

its fine

you just binary search your way into it, put print "AAAA" in the middle, see if its printed, then put it in the half of the half and try agian.

emacs couldnt even find the bracket ending of the if condition (not the block, the condition..), have you ever seen if conditions(again, not the block) that spans your whole screen?

its not as bad as you think, it made me realize we take code very seriously, but its actually ok, 10k line file 100k line file, whatever.. its all the same

[+] mbrodersen|4 years ago|reply
The size of a code file doesn’t matter. What matters is the amount of state the code in the file manipulates. For example, a 100k line code file with 500 pure functions not using any global state is fine. It is simple. However a 100k line code file with 500 functions that all manipulate 1000+ global variables is extremely complex and hard to maintain because of the undocumented global state invariants and hidden side effects.
[+] honkycat|4 years ago|reply
File length is a bit of a bike shed in my opinion. My main concern here would be separation of concerns and code quality.

I prefer many short files and folders structured hierarchically and grouped semantically. I have no proof this is better so I would probably just leave it to a vote with the team.

In the end I think they is how a lot of this should be viewed until we get proper research. How do you WANT to code? TDD? No tests? One giant file? It should be a team and executive decision.

If you don't like the style on your team, and nobody wants to change it, move on or adapt.

Technical debt is like a superfund site. It renders the real estate worthless and poisons the rest of the company.

It does matter. My current gig is hemorrhaging money because we can't keep devs even though the pay and benefits are great. We cannot execute on mission critical initiatives.

We cannot adapt our product to meet the needs of the market in an agile way.

This is due to people saying "a working product is more important blah blah.." for years. I would argue there is a balance to strike and you can do both with a good team and realistic planning. But there is always the nay-sayer who is willing to step in and say whatever product wants to hear.

It is so bad we cannot train people to use the software anymore. It is too poor quality and were can't on-board them before they decide to go elsewhere.

Everyone who knew anything has left and there is too much of it. So the remaining devs get overwhelmed, they leave... It is a vicious cycle.

The funny thing is the money machine works, but it is so frustrating to see all of the extra money we could be making and having to leave it on the table.