top | item 26337046

It Can Happen to You

1683 points| mooreds | 5 years ago |mattkeeter.com | reply

403 comments

order
[+] johnfn|5 years ago|reply
Loving the progression here. Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle. A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.
[+] CGamesPlay|5 years ago|reply
And maybe, in a decade or so, the man page for these functions will list their algorithmic complexity! That was the most interesting takeaway from this article, for me at least. I have only seen a one or two libraries that actually list this in their documentation.
[+] TeMPOraL|5 years ago|reply
You're joking, but now I'm thinking about the XML we parse at work and the library we're using to do it. We parse a lot of it, but I've always had this vague feeling that it takes a bit too long (given the codebase is C++).

The XML library we use is rather well-known, so if someone found a bug like this there, I'd suspect a general improvement of performance across the board in the entire industry. Efficient Market Hypothesis tells me it's unlikely the library has this problem, but then again, so I thought about AAA videogames, and then GTA Online thing came out.

[+] ineedasername|5 years ago|reply
That's actually a very simple one. Just run a regex on "P != NP" to remove the "!" and you're good to go.
[+] fanf2|5 years ago|reply
Well, https://twitter.com/stephentyrone/status/1366573121365565444

>> So many years ago when I first took over the iOS string functions, I found that like 30% of boot time in debug builds was in strstr. <<

>> Needless to say, we did not fix that issue by writing a more efficient strstr. Removed the parser and then removed strstr from the environment where it had been in use =) <<

[+] specialist|5 years ago|reply
You kid. But truer things are said in jest.

> ...Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle.

My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.

Clearly there's a contended lock buried deep. Something non-obvious.

I'm certain everything these days has dozens of hidden quadratics and contended locks.

Which is one of the reasons I'm excited for stuff like structured concurrency (Java's Project Loom) and retoolings like systemd becoming the norm.

Ages ago I worked on kitchen sink app that had a very finicky startup. Any breeze would break it. Much consternation by mgmt. Apparently if we only clapped louder, Tinkerbell would fly. I couldn't take it any more. LARPing as a bulldozer, I replumbed the innards, changing from something like initd to be more like systemd with some lazy loading for good measure. Voila!

Back to GTA. The failure here is the product owner didn't specify a max load time, and then hold the team to it. Devs will mostly do the work that's expected of them. If load time wasn't measured (and managed), no one is going to bother with expunging sscanfs.

[+] nightmunnas|5 years ago|reply
I don't think I've laughed this much since the pandemic started, well done.
[+] coolgeek|5 years ago|reply
I just got an infinite loop down to 8.6 seconds! And I'm not done yet!
[+] eru|5 years ago|reply
You joke, but there's actually lots of work going on into what techniques will definitely NOT be enough to settle P=NP.

(I find it pretty exciting, that this kind of negative result is possible. Ain't mathematics wonderful?)

[+] bsldld|5 years ago|reply
> A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.

You owe me a keyboard!

[+] xconverge|5 years ago|reply
Agreed, also hey bud, hope you are doing well
[+] stainforth|5 years ago|reply
Next Hacktober should offer a free t-shirt to anyone who goes out and issues a PR to a repo containing sscanf
[+] mkeeter|5 years ago|reply
Blog author here! Thanks to HN for warning me about sscanf at exactly the right time – within a day of me trying to load some ASCII STLs and noticing it was slow...

Linked deep in the Twitter replies [1], there's an open glibc issue about this, dating back to 2014:

https://sourceware.org/bugzilla/show_bug.cgi?id=17577

C doesn't have any requirements on the complexity of sscanf, so it might not be a bug per se, but it's certainly... a pitfall.

[1] https://twitter.com/impraxical/status/1367194430835425283

[+] simias|5 years ago|reply
I think the really embarrassing part for Rockstar is that they didn't bother to investigate what took 5+ minutes to load in their star product, a simple profiling would've made the issue obvious. So either they knew and they didn't care, or they didn't know and they didn't care.

That being said both for GTA and for TFA the issue is a very similar sscanf call:

     sscanf(data, "%f", &f);
I already posted a similar comment in the GTA story but I really want to emphasize it: scanf is almost never the right tool for the job, and it's definitely not the right tool in this situation. Just use strtof. That's literally what it's for. String to float. There. Done.

Scanf is crappy and if it were up to me would've been deprecated a while ago. I can sort of see using it for a quick one-off "script", for instance to parse user input, but seeing it in the middle of a program will always raise a huge red flag for me.

Use strtok_r if you need to split a string, then parse every entry individually. It's more robust, more flexible (you can parse custom types and formats that way) and allows for much better error handling and diagnostics. And of course it's also usually vastly faster.

Scanf is an antipattern in my opinion. I literally never use it and I'm better off for it. The last time I interviewed for a C coder position I managed to answer the full C test quizz except for the one question regarding scanf. That's how much I don't use it.

I think it's even worse for developers who come from higher level languages and (reasonably) expect to be able to deserialize data easily. You simply can't do that in C, the type system and general philosophy of the language won't let you, but scanf may convey the illusion that it's sort of possible. Don't believe its lies.

[+] tgtweak|5 years ago|reply
I was thinking about it the other day when reading the original article, and this was the only plausible (and defensible) cause for it not being addressed:

When GTA online was released 7 years ago in 2013, the list of DLC items was probably much shorter, and grew over time. The performance issue is exponentially aggravated with list-length. The list growth was probably bell-curve shaped over the lifetime of the game.

This has an interesting dynamic when it comes to perceived performance:

In the beginning, on consoles and PCs - it was already a pretty long load time, but would have been 90s or so on an average gaming PC (I remember this from the early days playing it, on a modest gaming PC with an FX-8150 cpu). This is long, but tolerable for a game of this size. I'm certain that early complaints that it was sluggish to load were profiled and looked at, and at the time it wasn't a 4 minute ordeal to load the json and probably represented a fraction of the CPU time it takes today - not standing out as obviously as in OPs guerilla profiling. Devs put a pin in it and say "this is netcode related, it is what it is"

Over time, the list gets longer, the loading time takes more cycles, BUT, PCs are getting progressively faster year over year as well, with many of those improvements happening at the instruction-level - optimizing for things like, surprise, string scanning. Two console generations are released since, masking the problem on that side. For comparison sake, I just checked and I can load GTA online in about 75s on my Ryzen 3900x. This cpu is probably 4-6x faster in single core performance than the 8150 for most workloads. Again, it's slow but tolerable and by this time it's "yeah GTA online is just a big game and takes a while to load, it's always been that way". Complacency is the enemy of improvement, and things that regress slowly over time are hard for us to notice in general.

Don't take this as a "this is fine" comment, but instead the only reasonable justification I can think of as to why it might have flown under the radar all these years.

[+] lostcolony|5 years ago|reply
I think 'embarrassing' is too strong a word. AAA game development is rushed; the pressure is to ship. Something has to give. This is a user facing issue, but one that doesn't actually affect the gameplay. Assuming they had -time- to profile the load process, given that low a priority, seems extremely optimistic.
[+] froh|5 years ago|reply
scanf and printf have complementary format specifiers, which can make maintaining serialization and parsing of regular data a breeze...

the proper remedy is to simply wrap the string to parse with fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.

https://news.ycombinator.com/item?id=26343149

[+] IanNorris|5 years ago|reply
Not excusing this but there are likely a few mitigating factors here.

* Tight deadlines result in shipping code that's barely tested and may have resulted in minimal code reviews on it. * The original article mentioned how the ids were always unique. It may have been intended to load content from multiple sources or to allow patching of content on disk (or repurposed entirely from a different game). Or it could well be an oversight/over-engineering. * It may even be a general purpose json parser from another project that had never been tested with data of this size until after launch. * It probably wasn't always this bad. Likely when the game launched the loading times were much more reasonable as the amount of in-app-purchases was an order of magnitude smaller.

Typically most of the IAPs will be added much later, so much of the profiling work would have been done with this code having a much smaller json block.

When the game was shipped the dev team will likely have been shrunk significantly as the bulk of the team moves to a new project leaving a smaller team with a focus more on the content itself and the engine team that likely deal with and spot stuff like this will probably have their attention elsewhere.

Don't work for R*, have shipped many high budget titles though including live services.

[+] SomeCallMeTim|5 years ago|reply
Agreed. My first instinct was the same: *scanf is never the right tool for pretty much any job.

I learned this 20+ years ago. As far as I'm concerned it should have been considered deprecated along with gets; it was considered dangerous in the early 90s and probably before. Not sure why people are still using it in the 2000s+.

[+] earthboundkid|5 years ago|reply
The Go standard library is pretty good, but unfortunately, it includes a scanf clone, so every once in a while you see a poor new developer posting to help forums trying to get it to work properly and you have to break it to them that they're using the wrong tool for basically any job.
[+] _kst_|5 years ago|reply
There's a bigger potential problem with the *scanf() functions than performance. They are inherently unsafe for reading numeric input.

For example, if you do something like this:

    int n;
    sscanf("9999999999999999999999999", "%d", &n);
the behavior is undefined. As the C standard says:

> ... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

You can control the appropriate type by writing the call directly, but you can't guarantee that the result can be represented unless you have control over the input.

Remember that in C "undefined behavior" doesn't mean that your program will fail, or will crash, or will tell you there was a problem. It means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Now most implementations will probably do something sane, like setting a floating-point object to infinity or an integer object to some arbitrary value, but the language doesn't guarantee anything.

If you want to write safe code, you can extract a substring that represents a number and pass it to one of the strto*() functions, which do have well defined behavior on overflow. (But I couldn't tell you exactly what that behavior is without looking it up.)

[+] sicariusnoctis|5 years ago|reply
> Remember that in C "undefined behavior" [...] means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Actually, the worst case possibility is that your program will become Skynet, enslave humanity for 10000 years, collapse all stars in the universe into black holes, and significantly accelerate processes such as the heat death of the universe.

[+] vlovich123|5 years ago|reply
You’re misreading the standard a bit I think. It’s saying undefined behavior comes from the format string (which you should control and is a common compiler warning if it’s not a literal) doesn’t match the types of variables you pass it. This is kind of obvious when you think about it. Variadic C functions lose type information so the format string is the source of that.

The “out-of-range” issue just means that the library isn’t going to mandate every implementation of this function is guaranteeing to provide the same overflow behavior (some might stop when you saturate, others might stop at the end of digits input and overflow, others might detect the overflow and saturate).

The Linux man page is clearer here IMO:

> If the number of conversion specifications in format exceeds the number of pointer arguments, the results are undefined. If the number of pointer arguments exceeds the number of conversion specifications, then the excess pointer arguments are evaluated, but are otherwise ignored.

That’s the only spot the word “undefined” appears and doesn’t discuss overflow. My general impression is that the “undefined” problem largely only applies to language operations or user input causing a library to perform such an undefined behavior. Older C functions with older documentation may be using “undefined” with a less strict meaning to also cover “implementation-defined”. The “undefined behavior” brouhaha came up in the past 5-10 years only when compilers actually started leveraging it breaking a lot of assumptions.

[+] imtringued|5 years ago|reply
Wow, the problems are so deep that no human should ever use sscanf again. It's just as bad as gets() because there is no way for the programmer to deal with the error case.
[+] ajkjk|5 years ago|reply
Obligatory: it is flabbergasting that this is a quality of language that is still in active use.
[+] codetrotter|5 years ago|reply
I am writing an app for iOS in Swift and I have an array of structs with some 70,000 elements or thereabouts and for some bizarre reason the compiler uses so much memory if I define it as such directly in the source, that I run out of memory. So instead as a workaround for now I am storing the data as a JSON string that I parse at runtime. It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.

But I don’t understand why the Swift compiler decides to use so much RAM when compiling it in the first place. The string representation of the data itself is only ~3 MB. But when I tried to declare the data as an array of structs in Swift directly it uses gigabytes of memory when I try to compile it, which causes the system to start swapping and then the disk space runs out because I only have about ~20 GB of free space on the disk, so then the system can’t swap no more and is out of RAM also.

And my struct is very simple it’s just

  struct Bazinga: Identifiable, Codable {
    let id: Int32
    let name: String
  }
And before I had to turn to JSON it used to be only Identifiable even. So it’s like one of the simplest possible structs, and the 70,000 items of data only a few MB when written in the source. Yet more GB of memory is needed to compile an array of these structs than I have RAM, and even exceeds the amount of disk space I have that it can swap to. It’s super weird to me that this is even a problem, and it’s insane how many GB of memory it consumes trying to compile my code.
[+] saagarjha|5 years ago|reply
Looks like it's solving constraints in the typechecker:

  8140 swift::ASTVisitor<(anonymous namespace)::StmtChecker, void, swift::Stmt*, void, void, void, void>::visit(swift::Stmt*)  (in swift-frontend) + 125  [0x110560f9d]
    8140 (anonymous namespace)::StmtChecker::typeCheckASTNode(swift::ASTNode&)  (in swift-frontend) + 1043  [0x11055dcc3]
      8140 (anonymous namespace)::DeclChecker::visit(swift::Decl*)  (in swift-frontend) + 4497  [0x1104e0721]
        8140 swift::TypeChecker::typeCheckPatternBinding(swift::PatternBindingDecl*, unsigned int, swift::Type)  (in swift-frontend) + 250  [0x11049648a]
          8140 swift::TypeChecker::typeCheckBinding(swift::Pattern*&, swift::Expr*&, swift::DeclContext*, swift::Type, swift::PatternBindingDecl*, unsigned int)  (in swift-frontend) + 140  [0x1104962bc]
            8140 swift::TypeChecker::typeCheckExpression(swift::constraints::SolutionApplicationTarget&, swift::OptionSet<swift::TypeCheckExprFlags, unsigned int>)  (in swift-frontend) + 897  [0x110495e71]
              8140 swift::constraints::ConstraintSystem::solve(swift::constraints::SolutionApplicationTarget&, swift::FreeTypeVariableBinding)  (in swift-frontend) + 974  [0x11032cb1e]
                8140 swift::constraints::ConstraintSystem::solve(llvm::SmallVectorImpl<swift::constraints::Solution>&, swift::FreeTypeVariableBinding)  (in swift-frontend) + 52  [0x11032d8b4]
                  8140 swift::constraints::ConstraintSystem::solveImpl(llvm::SmallVectorImpl<swift::constraints::Solution>&)  (in swift-frontend) + 372  [0x11032aa14]
                    8135 swift::constraints::ComponentStep::take(bool)  (in swift-frontend) + 2911  [0x1103393af]
                    + 4015 swift::constraints::ConstraintSystem::finalize()  (in swift-frontend) + 5258,5080,...  [0x110325a7a,0x1103259c8,...]
                    + 1819 swift::constraints::ConstraintSystem::finalize()  (in swift-frontend) + 5291  [0x110325a9b]
[+] bobbylarrybobby|5 years ago|reply
Do you explicitly indicate the type of the array? E.G., `let array: [Bazinga] = [ … 70k elements …]` instead of `let array = [ … 70k elements …]`.
[+] Gibbon1|5 years ago|reply
Your story reminds me of watching a modern CAD program run out of memory and barf when trying to import a DXF file with a few thousand elements.
[+] bogwog|5 years ago|reply
I don't know why you're running into that issue, but...

> It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.

You should look into Flatbuffers (https://google.github.io/flatbuffers/flatbuffers_guide_use_s...). It's a tool that can generate an API for reading/writing binary data based on a schema file where you design the layout (similar to protocol buffers). The data is ready to read, so you don't have to do any parsing at all, AND the compiler includes a feature to convert JSON files into binary that matches your given schema.

It won't solve your compiler woes, but it will help you avoid having to store and parse JSON, and it's a tiny dependency.

[+] decasia|5 years ago|reply
It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

Absent that, of course we can potentially read the source code and find out, but I think for the most part we tend to operate based on an informed assumption about what we imagine the algorithmic complexity of a given operation would be. Inevitably, sometimes the assumption is incorrect.

There's no way to develop software without making assumptions, some of which inevitably turn out to be incorrect, so I don't think there is any great shame in having that happen, in itself. But better docs could help us make better assumptions with less effort, at least.

[+] epr|5 years ago|reply
Keeping track of algorithmic complexity would be nice as a language and/or static analysis feature. If you wanted to be exact or do it for a language with complex metaprogramming I assume it would be a nightmare to implement. Absent those complications and especially if you always reduced it to O(1), O(n), O(log(n)), etc it might not even be that difficult given the potential advantages.
[+] beaconstudios|5 years ago|reply
personally, I think I wouldn't even bother to check the algorithmic complexity of every external function I call. I'd just use the logical choice (like sscanf) and only consider optimising if things started to slow down and profiling the application highlighted it as a bottleneck.
[+] tpetry|5 years ago|reply
I don‘t get the heat of this topic. Yes they wrote some very slow code because it‘s easy to shoot in your foot with scanf. It‘s nothing new that most software could be heavily optimized by just benchmarking slow parts. There is no reason for this shit storm than to feel better than other developers.

The real problem is that they shipped a game with a loading screen which is taking minutes and not looking whether they could optimize it. THAT is the real shit show!

[+] scaramanga|5 years ago|reply
"it will open a 97 MB binary STL file in about 165 milliseconds flat, on a 2013 Macbook Pro. This is blinding fast."

This actually sounds incredibly slow, that's nearly 1/5th of an entire second. What can it possibly be doing? :)

In case anyone else was wondering, I followed the link and clicked the description and this is actually based on the time to the first frame being rendered - not just the time to load the file (which should be essentially infinitely fast). So it's actually more impressive than it sounds.

[+] tux3|5 years ago|reply
The moral of the story, as far as I'm concerned: do NOT parse strings in C! Use a library, prefferably in a higher-level language.

C string handling is a mess of viciously surprising APIs, juggling those particular footguns is almost certainly not your least bad option.

[+] tshaddox|5 years ago|reply
I didn’t follow the original story or comments about GTA, but based on the description in this article, I wouldn’t be surprised that this sort of problem could happen to any coder of any experience level and I wouldn’t give them any grief, but I would be surprised that the problem would be live in production for a very long time without ever having been profiled. Surely seeing JSON parsing taking more than 70% of the time would have made it onto someone’s radar?
[+] jchw|5 years ago|reply
It is a good opportunity to mention that you should not use strtod/strtol either if you can help it, since they are impacted by locale. Exactly what to use instead is a bit of a tough nut to crack; you could extract musl’s floatscan code, or implement the Clinger algorithm yourself. Or, of course, use programming languages that have a more reasonable option in the standard library...
[+] tdeck|5 years ago|reply
This just makes me think that null-terminated strings are the bad gift that keeps on giving. If we were to design an OS, language, or standard library in 2021 (or even 1999) we probably wouldn't use them, but we're stuck with this relic of a former era.
[+] tom_mellior|5 years ago|reply
Years ago I was working on a compiler frontend that was capable of reading multiple files into one program representation, as opposed to compilers that compile each input file completely separately. At some point we were trying to compile some project made up of many small files, and our frontend got very slow. However, when we concatenated all those files into one big source file, everything went smoothly.

I investigated. It turned out that we were keeping some sort of global symbol table structure. It was updated after parsing each file, something like this (the original was C++, but this is pseudocode because I can't be bothered):

    list<symbol> global_symbol_table;

    void update_symbol_table(list<symbol> new_symbols) {
        global_symbol_table.add_all(new_symbols);
        // Keep sorted so we can do efficient lookups with binary search.
        sort(global_symbol_table);
    }
For N files, this meant N calls to sort() on a list that grew linearly in N, so having something like O(N^2 log N) complexity overall.

This has a few problems:

1. Even if you want to use a "sorted list" representation, it would suffice to sort the new symbols only (the global table is always sorted by its invariant) and do a merge of the sorted list.

2. But really, you want a set data structure that does the same thing better.

3. But also, could we maybe speed up the lookups in some other way? I looked around for the uses of this global symbol table and found... none. We were keeping data around and updating it in the least efficient manner imaginable, without ever using that data.

I deleted the above function and the global symbol table, and performance was back to expected levels.

[+] ineedasername|5 years ago|reply
Still, folks in the comments section generally agreed: they wouldn't write anything that silly.

Well, if you've never accidentally crashed a system running an unexpectedly (and unnecessarily) "non-performant" piece of code then you're either an absolute genius of a coder, or relatively inexperienced.

[+] geerlingguy|5 years ago|reply
So many times, in higher level code, it's seeing a foreach loop in a foreach loop, and the nested loop is calling an API or re-rerunning the same database call 5000 times.

Move things around or just use a cache... and instant 1000%+ speedup.

I've seen this too many times to count, often in apps and on sites that are in fairly heavy use.

The answer is often to scale up and pay for 10x more server capacity to handle the load, rather than spend some time optimizing the slow code paths.

[+] moonchild|5 years ago|reply
(Originally on lobsters[0].)

I maintain my original position that sscanf calculating the entire length of its input is absolutely ridiculous. Are *scanf difficult to use safely, not very robust, and somewhat baroque? Yes. Should sscanf("%f") be a correct (not performance-killing) way of reading floats? Also yes. (Though aside: the OP seems to be reading data from files, so they could have just used fscanf, which has correct performance already.)

Unfortunately, many libcs are guilty of this:

- glibc uses memchr (the trail is convoluted, but ends up at _IO_str_init_static_internal)

- freebsd libc (and thus also the apple and android libcs, as well as those of the other BSDs) use strlen

- uclibc and newlib are the same as freebsd (appear to be copied directly from it)

- Since the original bug was in GTA, which only runs on windows, I must presume msvcrt has the same problem

- musl has the correct behaviour, processing input in 128-byte chunks

- managarm doesn’t strlen but looks broken for unrelated reasons. (Assumes nul byte means eof.) Also has codebloat because of templates.

- serenityos tries to implement fscanf in terms of sscanf, not the other way around! Unfortunately that means it chomps a whole line of input at every call, so it doesn’t even work correctly. Horrifying.

- pdclib has ok performance, but with an interesting implementation: it duplicates code between sscanf and fscanf, though the heavyweight format parsing is shared.

- dietlibc and sortix have the sensible, simple implementation

0. https://lobste.rs/s/0obriy/it_can_happen_you#c_giuxfq

[+] stefs|5 years ago|reply
many years ago i slaved away in a low end (5 people) ecommerce web shop with a homebrew php cms. i hated working there, but was too vain to quit. one day our biggest customer by far complained about slow loading speeds and i was set upon the task. it was spaghetti code of the worst sort, but i managed to find the culprit quickly: converting a list of rows (for pages) from the database into the hierarchical menu tree structure; of course it was O(n²). i fixed it and the page generation time went down from 7 seconds to a few milliseconds again.

they didn't let me push the fix back into the main repo because "it only affects this one customer, so we'll only fix it here". luckily i was fired shortly after. fin.

[+] froh|5 years ago|reply
I wonder if then The Right Way of scanf-scanning potentially large strings is the fmemopen route?

  /* ... goal: scanf some looong char *s in some loop */

  FILE *s_as_FILE = fmemopen(s, strlen(s), "r");
  /* loop loop loop */ ... {
      fscanf(s_as_FILE, "FORMAT", &v);
  }
  fclose(s_as_FILE);
fmemopen is around for a while now, it should work with many libc implementations these days.

?

[+] fireattack|5 years ago|reply
Speak of which, is there any official follow-up from R* yet?