Why fastDOOM is fast

[+] gwern|1 year ago|reply

> The MPV patch of v0.1 is without a doubt build 36 (e16bab8). The "Cripy optimization" turns status bar percentage rendering into a noop if they have not changed. This prevents rendering to a scrap buffer and blitting to the screen for a total of 2 fps boost. At first I could not believe it. I assume my toolchain had a bug. But cherry-picking this patch on PCDOOMv2 confirmed the tremendous speed gain.

Good example of how the bottlenecks are often not where you think they are, and why you have to profile & measure (which I assume Viti95 did in order to find that speedup so early on). The status bar percentage?! Maybe there's something about the Doom arch which makes that relatively obvious to experts, but I certainly would've never guessed that was a bottleneck a priori.

[+] robocat|1 year ago|reply

Example: "Our app was mysteriously using 60% CPU and 25% GPU. It turned out this was due to a tiny CSS animation [of an equaliser icon]"

https://www.granola.ai/blog/dont-animate-height

[+] inDigiNeous|1 year ago|reply

Reminds me of the performance optimization somebody discovered in Super Mario World for SNES, where displaying the player score in was very inefficient, taking about 1/6 of the frametime allocated.

"SMW is incredibly inefficient when it displays the player score in the status bar. In the worst case (playing as Luigi, both players with max score), it can take about a full 1/6 of the entire frame to do so, lowering the threshold for slowdown. Normally, the actual amount of processing time is roughly proportional to the sum of the digits in Mario's score when playing as Mario, and the to the sum of the digits in both players' scores when playing as Luigi. This patch optimizes the way the score is stored and displayed to make it roughly constant, slightly faster than even in the best case without."

https://www.smwcentral.net/?p=section&a=details&id=35746

[+] barbariangrunge|1 year ago|reply

As a gamedev, those slowdowns are common. Ui rendering, due to transparency, layering and having to redraw things, and especially from triggering allocations, can be a real killer. Comparing old vs new before allowing it to redraw is really helpful. I found layers and transparency was a killer in css as well in one project, but that was more about reducing layers there

[+] RankingMember|1 year ago|reply

My favorite example of this is the GTA Online insane loading time issue that ended up being due to poor handling of a 10MB json file (and was finally tracked down by someone outside their org). Took a 6 minute load time down to just under 2 minutes:

https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

[+] pjs_|1 year ago|reply

Reminds me of the incident where npm was 2X slower than it should have been because of the fancy terminal progress bar:

https://news.ycombinator.com/item?id=10974929

[+] ognarb|1 year ago|reply

I had a similar case when I work on a Matrix client (NeoChat) and I and the other devs were wondering why loading an account was so slow. Removing the loading spinner made it so much faster, because the animation to render the loading spinner uses 100% cpu.

[+] qingcharles|1 year ago|reply

The original Doom must have been heavily profiled by id though, surely? Obviously there is a bunch of things that were missed, but I was in game dev at Doom time and profiling was half the job back then.

[+] on_the_train|1 year ago|reply

I'm currently at the second job in my life where updating the progress bar takes up a tremendous percentage of the overall performance. Because our "engineers" have never used a profiler. At a large international tech giant :(

[+] aeyes|1 year ago|reply

He ported the optimization from the Crispy Doom fork. Since this is one of the first changes in the repo I bet that this was a known issue at the time.

[+] PlunderBunny|1 year ago|reply

Back at the turn of the century we found that a performance sensitive part of our WIN32 app was adversely affected by reading a setting from an ini file - in Windows 2000, it was significantly slower than on earlier versions of Windows. The setting was just to determine whether to enable logging for that particular part of the app.

[+] smat|1 year ago|reply

While this gives an impressive boost in performance, it also means that frametimes are around 10% longer when the status bar has to be updated.

Overall this can mean that in some situations the game feels not as smooth as before due to these variations.

Essentially when considering real time rendering the slowest path is the most critical to optimize.

[+] slavboj|1 year ago|reply

At one point the bottleneck to the Siri iOS client was rendering the animated glowy ball.

[+] yjftsjthsd-h|1 year ago|reply

> To get the big picture of performance evolution over time, I downloaded all 52 releases of fastDOOM, PCDOOMv2, and the original DOOM.EXE, wrote a go program to generate a RUN.BAT running -timedemo demo1 on all of them, and mounted it all with mTCP's NETDRIVE.

I'm probably not the real target audience here, but that looked interesting; I didn't think there were good storage-over-network options that far back. A little searching turns up https://www.brutman.com/mTCP/mTCP_NetDrive.html - that's really cool:)

> NetDrive is a DOS device driver that allows you to access a remote disk image hosted by another machine as though it was a local device with an assigned drive letter. The remote disk image can be a floppy disk image or a hard drive image.

[+] jandrese|1 year ago|reply

> I didn't think there were good storage-over-network options that far back.

Back in school in the early 90s we had one computer lab where around 25 Mac Plus machines were daisy chained via AppleTalk to a Mac II. All of the Plus machines mounted their filesystem from the Mac II. It was painfully slow, students lost 5-10 minutes at the start of class trying to get the word processor started. Heck, the Xerox Altos also used network mounts for their drives.

If you have networking the first thing someone wants to do is copy files, and the most ergonomic way is to make it look just like a local filesystem.

DOS was a bit behind the curve because there was no networking built-in, so you had to do a lot of the legwork yourself.

[+] somat|1 year ago|reply

There is a neat trick where ipxe can netboot dos from an iscsi target, with no drivers or config dos gets read write access to a network share(well, not a share, if you share it it gets corrupted fast, a network block device). it feels magical but I think ipxe is patching the bios to make disk access go over iscsi.

[+] tetrisgm|1 year ago|reply

I’m curious: were there NAS’ or WebDAV mount in the DOS era? Obviously there was FTP and telnet and such. Just curious if remote mounts was a thing, or if the low bandwidth made it impossible

[+] ndegruchy|1 year ago|reply

The linked GitHub thread with Ken Silverman is gold. Watching the FastDOOM author and Ken work through the finer points of arcane 486 register and clock cycle efficiencies is amazing.

Glad to see someone making sure that Doom still gets performance improvements :D

[+] kridsdale1|1 year ago|reply

I haven’t thought of KenS in ages but back in the 90s I was super active in the Duke3D modding scene. Scripting it was literally my first “coding”.

So in a way, I owe my whole career and fortune to KenS. Cool.

[+] ehaliewicz2|1 year ago|reply

Last year I emailed Ken Silverman about an obscure aspect of the Build Engine while working on a similar 2.5D rendering engine. He answered the question like he worked on it yesterday.

[+] phire|1 year ago|reply

There are some real gems in there.

I especially liked the idea of CR2 and CR3 as scratchpad registers when memory access is really slow (386SX and cacheless 386DXs). And the trick of using ESP as a loop counter without disabling interrupts (by making sure it always points to a valid stack location) is just genius.

[+] unleaded|1 year ago|reply

One feature of FastDOOM I haven't seen mentioned here are all the weird video modes, some interesting examples:

- IBM MDA text mode: https://www.youtube.com/watch?v=Op2tr2lGK6Y

- EGA & Plantronics ColorPlus: https://www.youtube.com/watch?v=gxx6lJvrITk

- Classic blue & pink CGA: https://youtu.be/rD0UteHi2qM

- CGA, 320x200x16 with 'ANSI from Hell' hack: https://www.youtube.com/watch?v=ut0V1nGcTf8

- Hercules: https://www.youtube.com/watch?v=EEumutuyBBo

Most of these run worse than with VGA, presumably because of all the color remapping etc

[+] toast0|1 year ago|reply

> - EGA & Plantronics ColorPlus: https://www.youtube.com/watch?v=gxx6lJvrITk

Any love for Tandy Graphics Adapter? I'd hate to have to run in CGA :( would need a 286 build for my Tandy 1000 TL/2, if it was still alive.

[+] Cthulhu_|1 year ago|reply

That's awesome, just a great demonstration why these aspects of the game should be separated. It reminds me of the "modern" Clean Architecture for back-end applications.

[+] tecleandor|1 year ago|reply

The IBM MDA text mode is terrible... Love it!

[+] jakedata|1 year ago|reply

"IBM PS/1 486-DX2 66Mhz, "Mini-Tower", model 2168. It was the computer I always wanted as a teenager but could never afford"

Wow - by 1992 I was on my fourth homebuilt PC. The KCS computer shows in Marlborough MA were an amazing resource for tinkerers. Buy parts, build PC and use for a while, sell PC, buy more parts - repeat.

By the end of 1992 I was running a 486-DX3 100 with a ULSI 487 math coprocessor.

For a short period of time I arguably had the fastest PC - and maybe computer on campus. It outran several models of Pentium and didn't make math mistakes.

I justified the last build because I was simulating a gas/diesel thermal-electric co-generation plant in a 21 page Excel spreadsheet for my honors thesis. The recalculation times were killing me.

Degree was in environmental science. Career is all computers.

[+] wk_end|1 year ago|reply

"Wow"? Is it really necessary to give this guy a hard time for being unable to afford the kind of computers you had in 1992?

Anyway, there's no such thing as a "DX3". And the first 100MHz 486 (the DX4) came out in March of 1994, so I don't see how you were running one at the end of 1992.

My family's first computer - not counting a hand-me-down XT that was impossibly out-of-date when we got it in 1992 or so - was a 66MHz 486-DX2, purchased in early 1995.

I can't quite explain why, but as a matter of pride it's still upsetting - decades later - to see someone weirdly bragging about an impossible computer that supposedly outran mine despite a three year handicap.

[+] bpoyner|1 year ago|reply

That definitely brought back memories. Around '92, being a poor college student I took out a loan from my credit union for about $2,000 to buy a 486 DX2-50. For you younger people, that's about $4,000+ in today's money for a pretty basic computer. I dual booted DOS and Linux on that bad boy.

[+] antod|1 year ago|reply

A 486DX and a 487? I thought the 487 was only useful for the SX chips?

...looked it up, apparently the standard 487 was a full 486DX that disabled and replaced the original 486SX. Was this some sort of other unusually awesome coprocessor I hadn't heard of?

[+] ForOldHack|1 year ago|reply

"It outran several models of Pentium and didn't make math mistakes." Total bragging rights. Total. You owned them. Good job.

[+] mmphosis|1 year ago|reply

On top of releasing often, Viti95 displayed outstanding git discipline where one commit does one thing and each release was tagged.

https://fabiensanglard.net/fastdoom/#:~:text=one%20commit%20...

[+] kingds|1 year ago|reply

> I was resigned to playing under Ibuprofen until I heard of fastDOOM

i don't get the ibuprofen reference ?

[+] kencausey|1 year ago|reply

Guess: headache from low frame rate?

[+] unknown|1 year ago|reply

[deleted]

[+] sedatk|1 year ago|reply

If the author reads this: John Carmack's last name was mistyped as "Carnmack" throughout the document.

[+] fabiensanglard|1 year ago|reply

Thank you for taking the time to report it. It has now been fixed.

[+] anilgulecha|1 year ago|reply

One non-cynical take on why modern software is slow, and not containing optimizations such as these: The standardization/optimization hypothesis.

If something is/has become a standard, then optimization takes over. You want to be fastest and meet all of the standard's tests. Doom is similarly now a standard game to port to any new CPU, toaster, whatever. Similarly email protocol, or a browser standard (WebRTC, Quic, etc).

The reason your latest web app/ electron app is not fast is that it is exploratory. It's updated everyday to meet new user needs, and fast-enough-to-not-get-in-the-way is all that's needed performance wise. Hence we see very fast IRC apps, but slack and teams will always be slow.

[+] z3t4|1 year ago|reply

It's not trivial to go back in versions to check for improvements or regressions, because some optimizations can introduce bugs that is later discovered, or you introduce a vital feature that degrades performance... So you can make your life easier by having automatic performance tests that are run before each release, and if you discovered a performance issue you write a regression test as usual... What I'm trying to say is: Do performance testing!

[+] manoweb|1 year ago|reply

Unlike the author, back in the day I would have preferred a 486DX50 to the DX2-66. 50MHz bus interface (including to the graphics card) instead of 33MHz

[+] antod|1 year ago|reply

My first job was AutoCAD drafting on a DX50 with 16MB. Quite high specced in the early 90s. Not sure I would've noticed the difference compared with a DX2 though.

[+] rasz|1 year ago|reply

>Optimize R_DrawColumn for Mode Y

Seeing this made a difference makes it clear Fabien ran fastdoom in Mode Y

>One optimization that did not work on my machine was to use video mode 13h instead of mode Y.

13h should work on anything, its the VBD that requires specific VESA 2.0 feature enabled (LFB * ). VBR should also work no problem on this IBM

Both 13h and VBR modes would probably deliver another ~10 fps on 486/66 with VESA CL5428.

* LFB = linear frame buffer, not available on most ISA cards. Somewhat problematic as it required less than 16MB ram or "15-16MB memory hole" enabled in bios. On ISA Cirrus Logic support depended on how the chip was wired to the bus, some vendors supported it while others used lazy copy and paste of reference design and didnt. With VESA Cirrus Logic lazy vendors continued to use same basic reference design wiring disabling LFB. No idea about https://theretroweb.com/motherboards/s/ibm-ps-1-type-2133a,-... motherbaord

[+] bee_rider|1 year ago|reply

From a quote in the article

> One of my goals for FastDoom is to switch the compiler from OpenWatcom v2 to DJGPP (GCC), which has been shown to produce faster code with the same source. Alternatively, it would be great if someone could improve OpenWatcom v2 to close the performance gap. > - Conversation with Viti95

Out of curiosity, how hard is it to port from OpenWatcom to GCC?

Clearly the solution here is to write a Watcom llvm front end…

[+] fabiensanglard|1 year ago|reply

> how hard is it to port from OpenWatcom to GCC?

I don't think it is that hard but likely very time consuming.

In theory it should only be about writing a new build script (not based on `wmake` but on a real `make`). And then workout the tiny flag/preprocessor/C compiler discrepancies.

[+] rob74|1 year ago|reply

Ah, that picture brings back memories - I used to have a successor of that machine in the mid nineties (PS/1000), it looked almost the same, except the handle was rounded and the power button was blue (a big blue button). And the CPU was IBM's very own "Blue Lightning" 486SX clone (75 MHz, but no FPU). It ran Doom great, but had to pass on Quake, which required an FPU for its polygon-based 3D graphics.

[+] ge96|1 year ago|reply

> I always wanted as a teenager but could never afford

Funny how that is, for me it was a Sony Alpha camera (~~flagship at the time~~) and 10 years later I finally buy it for $50.

[+] hyperman1|1 year ago|reply

I see the acronyms MVP and MPV in the post. Does someone know what they mean?

[+] fitsumbelay|1 year ago|reply

very nice website design

[+] cantrecallmypwd|1 year ago|reply

In high school, the fastest computer in the computer lab was an IBM-donated PS/1 486SX 25 all-in-one that also was used to play DOOM.

[+] dabeeeenster|1 year ago|reply

> I was resigned to playing under Ibuprofen until I heard of fastDOOM

WTH is Ibuprofen?!

[+] prox|1 year ago|reply

Is there a recommended place where I can play Doom in the browser?

If such a thing exists!

219 comments