I failed to recreate the 1996 Space Jam website with Claude

[+] wilsmex|3 months ago|reply

Well this was interesting. As someone who was actually building similar website in the late 90's I threw this into the Opus 4.5. Note the original author is wrong about the original site however:

"The Space Jam website is simple: a single HTML page, absolute positioning for every element, and a tiling starfield GIF background.".

This is not true, the site is built using tables, not positioning at all, CSS wasn't a thing back then...

Here was its one-shot attempt at building the same type of layout (table based) with a screenshot and assets as input: https://i.imgur.com/fhdOLwP.png

[+] thecr0w|3 months ago|reply

Thanks, my friend. I added a strike through of the error, a correction, and credited you.

I'm keeping it in for now because people have made some good jokes about the mistake in the comments and I want to keep that context.

[+] govping|3 months ago|reply

The failure mode here (Claude trying to satisfy rather than saying 'this is impossible with the constraints') shows up everywhere. We use it for security research - it'll keep trying to find exploits even when none exist rather than admit defeat. The key is building external validation (does the POC actually work?) rather than trusting the LLM's confidence.

[+] manbash|3 months ago|reply

Ah, those days, where you would slice your designs and export them to tables.

[+] alt227|3 months ago|reply

Off topic, but you have used imgur as your image hosting site, which cannot be viewed in the UK. If you want all readers to be able to see and understand your points, please could you use a more universally reachable host?

[+] johnebgd|3 months ago|reply

I cut my teeth developing for the web using GoLive and will never forget how they used tables to layout a page from that tool…

[+] thuttinger|3 months ago|reply

Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things. There are a lot of problems that are easy to get right for a junior web dev but impossible for an LLM. On the other hand, I was able to write a C program that added gamma color profile support to linux compositors that don't support it (in my case Hyprland) within a few minutes! A - for me - seemingly hard task, which would have taken me at least a day or more if I didn't let Claude write the code. With one prompt Claude generated C code that compiled on first try that:

- Read an .icc file from disk

- parsed the file and extracted the VCGT (video card gamma table)

- wrote the VCGT to the video card for a specified display via amdgpu driver APIs

The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).

[+] smoghat|3 months ago|reply

Ok, so here is an interesting case where Claude was almost good enough, but not quite. But I’ve been amusing myself by taking abandoned Mac OS programs from 20 years ago that I find on GitHub and bringing them up to date to work on Apple silicon. For example, jpegview, which was a very fast and simple slideshow viewer. It took about three iterations with Claude code before I had it working. Then it was time to fix some problems, add some features like playing videos, a new layout, and so on. I may be the only person in the world left who wants this app, but well, that was fine for a day long project that cooked in a window with some prompts from me while I did other stuff. I’ll probably tackle scantailor advanced next to clean up some terrible book scans. Again, I have real things to do with my time, but each of these mini projects just requires me to have a browser window open to a Claude code instance while I work on more attention demanding tasks.

[+] sqircles|3 months ago|reply

> The Space Jam website is simple: a single HTML page, absolute positioning for every element...

Absolute positioning wasn't available until CSS2 in 1998. This is just a table with crafty use of align, valign, colspan, and rowspan.

[+] thecr0w|3 months ago|reply

Thanks, my friend. I added a strike through of the error, a correction, and credited you.

I'm keeping it in for now because have made some good jokes about the mistake in the comments and I want to keep that context.

[+] DocTomoe|3 months ago|reply

Which would also render differently on every machine, based on browser settings, screen sizes, and available fonts.

Like the web was meant to be. An interpreted hypertext format, not a pixel-perfect brochure for marketing execs.

[+] sigseg1v|3 months ago|reply

Curious if you've tested something such as:

- "First, calculate the orbital radius. To do this accurately, measure the average diameter of each planet, p, and the average distance from the center of the image to the outer edge of the planets, x, and calculate the orbital radius r = x - p"

- "Next, write a unit test script that we will run that reads the rendered page and confirms that each planet is on the orbital radius. If a planet is not, output the difference you must shift it by to make the test pass. Use this feedback until all planets are perfectly aligned."

[+] yosito|3 months ago|reply

This has been my experience with almost everything I've tried to create with generative AI, from apps and websites, to photos and videos, to text and even simple sentences. At first glance, it looks impressive, but as soon as you look closer, you start to notice that everything is actually just sloppy copy.

That being said, sloppy copy can make doing actual work a lot faster if you treat it with the right about of skepticism and hand-holding.

It's first attempt at the Space Jam site was close enough that it probably could have been manually fixed by an experienced developer in less time than in takes to write the next prompt.

[+] jama211|3 months ago|reply

but my experience has also been that with every model they require less hand holding and the code is less sloppy. If I’m careful with my prompts, gpt codex 5.1 recently has been making a lot of typescript for me that is basically production ready in a way it couldn’t even 2 months ago

[+] charcircuit|3 months ago|reply

>I'd like to preserve this website forever and there's no other way to do it besides getting Claude to recreate it from a screenshot.

There are other ways such as downloading an archive and the preserving the file in one or more cloud storages.

https://archive.is/download/cXI46.zip

[+] a2128|3 months ago|reply

The sentence immediately after that would imply sarcasm

> Note: please help, because I'd like to preserve this website forever and there's no other way to do it besides getting Claude to recreate it from a screenshot. Believe me, I'm an engineering manager with a computer science degree. Please please please help (sad emoji)

[+] jonas21|3 months ago|reply

There was a response to this post on the front page earlier this morning that was able to get Claude to succeed simply by giving it access to Playwright so it could see what it was doing and telling it in the prompt that it needed to be pixel perfect:

https://news.ycombinator.com/item?id=46193412

As of right now, it seems to have been flagged into oblivion by the anti-AI crowd. I found both posts to be interesting, and it's unfortunate that one of them is missing from the conversation.

[+] thecr0w|3 months ago|reply

yeah that sucks. I also linked it to the top of my post so folks can take a look.

[+] Wowfunhappy|3 months ago|reply

Claude is not very good at using screenshots. The model may technically be multi-modal, but its strength is clearly in reading text. I'm not surprised it failed here.

[+] fnordpiglet|3 months ago|reply

Especially since it decomposes the image into a semantic vector space rather than the actual grid of pixels. Once the image is transformed into patch embeddings all sense of pixels is entirely destroyed. The author demonstrates a profound lack of understanding for how multimodal LLMs function that a simple query of one would elucidate immediately.

The right way to handle this is not to build it grids and whatnot, which all get blown away by the embedding encoding but to instruct it to build image processing tools of its own and to mandate their use in constructing the coordinates required and computing the eccentricity of the pattern etc in code and language space. Doing it this way you can even get it to write assertive tests comparing the original layout to the final among various image processing metrics. This would assuredly work better, take far less time, be more stable on iteration, and fits neatly into how a multimodal agentic programming tool actually functions.

[+] dcanelhas|3 months ago|reply

Even with text, parsing content in 2D seems to be a challenge for every LLM I have interacted with. Try getting a chatbot to make an ascii-art circle with a specific radius and you'll see what I mean.

[+] unknown|3 months ago|reply

[deleted]

[+] johnfn|3 months ago|reply

Context is king. The problem is that you are the one currently telling Claude how close it is and what to do next. But if you give it the tools to do that itself, it will make a world of difference.

Give Claude a way to iteratively poke at what it created (such as a playwright harness), and screenshot of what you want, and maybe a way to take a screenshot in Playwright and I think you will get much closer. You might even be able to one shot it.

I’ve always wondered what would happen if I gave it a screenshot and told it to iterate until the Playwright screenshot matched the mock screenshot, pixel perfect. I imagine it would go nuts, but after a few hours I think it would likely get it. (Either that or minor font discrepancies and rounding errors would cause it to give up…)

[+] alexandre_m|3 months ago|reply

The key is always feedback loop. If you give the AI the ability to verify itself, then it's able to iterate faster. Sure, it may take many iterations, but at least the iteration spans will be shorter than waiting for a human to validate each time.

I'd be curious to see how Antigravity compares for the same task with its automatic browser agentic validation logic.

[+] 999900000999|3 months ago|reply

Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

[+] GeoAtreides|3 months ago|reply

>which is something OP can manually fix

what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.

[+] smallnix|3 months ago|reply

That's not the point of the article. It's about Claude/LLM being overconfident in recreating pixel perfect.

[+] bigstrat2003|3 months ago|reply

> it just can't get the order right which is something OP can manually fix.

If the tool needs you to check up on it and fix its work, it's a bad tool.

[+] thecr0w|3 months ago|reply

ya, this is true. Another commenter also pointed out that my intention was to one-shot. I didn't really go too deeply into trying to try multiple iterations.

This is also fairly contrived, you know? It's not a realistic limitation to rebuild HTML from a screenshot because of course if I have the website loaded I can just download the HTML.

[+] soared|3 months ago|reply

I got quite close with Gemini 3 pro in AI studio. I uploaded a screenshot (no assets) and the results were similar to OP. It failed to follow my fix initially but I told it to follow my directions (lol) and it came quite close (though portrait mode distorted it, landscape was close to perfect.

“Reference the original uploaded image. Between each image in the clock face, create lines to each other image. Measure each line. Now follow that same process on the app we’ve created, and adjust the locations of each image until all measurements align exactly.”

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

[+] PaulHoule|3 months ago|reply

(1) Multi-modal is where a lot of these things go to die. You will hear people talk about the occasional striking success but so often I show Copilot an easily identifiable flower image and it gets it wrong even though Google Lens will get it right

(2) The kind of dialog he's having with Claude is a kind of communication pattern I've found never works with LLMs. Sure there is the kind of conversation that goes

   Do X

   ... that's pretty good except for Y

   Great!

but if it is

   Do X

and it comes back with something entirely wrong I'd assume the state of the thing is corrupted and it is never coming back and no matter how you interrogate it, encourage it, advise it, threaten it, whatever, you will go in circles.

[+] micromacrofoot|3 months ago|reply

I wouldn't call it entirely defeated, it got maybe 90% of the way there. Before LLMs you couldn't get 50% of the way there in an automated way.

> What he produces

I feel like personifying LLMs more than they currently are is a mistake people make (though humans always do this), they're not entities, they don't know anything. If you treat them too human you might eventually fool yourself a little too much.

[+] thecr0w|3 months ago|reply

As a couple other comments pointed out, it's also not fair to judge Claude based on a one shot like this. I sort of assume these limitations will remain even if we went back and forth but to be fair, I didn't try that more than a few times in this investigation. Maybe on try three it totally nails it.

[+] unknown|3 months ago|reply

[deleted]

[+] manlymuppet|3 months ago|reply

Couldn’t you just feed Claude all the raw, inspect element HTML from the website and have it “decrypt” that?

The entire website is fairly small so this seems feasible.

Usually there’s a big difference between a website’s final code and its source code because of post processing but that seems like a totally solvable Claude problem.

Sure LLMs aren’t great with images, but it’s not like the person who originally wrote the Space Jam website was meticulously messing around with positioning from a reference image to create a circular orbit — they just used the tools they had to create an acceptable result. Claude can do the same.

Perhaps the best method is to re-create, rather than replicate the design.

[+] blks|3 months ago|reply

What do you mean? Raw html is the original website source code.

Modern web development completely poisoned young generation

[+] personjerry|3 months ago|reply

If you have the raw HTML why would you need to do this at all?

[+] manlymuppet|3 months ago|reply

https://pastebin.com/raw/F2jxZTeJ

The HTML I'm referring to, copied from the website.

Only about 7,000 characters or just 2,000 Claude tokens. This is feasible.

[+] literalAardvark|3 months ago|reply

The space jam website used HTML tables for formatting and split images in each cell.

CSS didn't exist.

[+] pfix|3 months ago|reply

I checked the source of the original (like maybe many of you) to check how they actually did it and it was... simpler than expected. I drilled myself so hard to forget tables as layout... And here it is. So simple it's a marvel.

[+] COAGULOPATH|3 months ago|reply

And they do hacky things like space elements vertically using <br> tags.

[+] stared|3 months ago|reply

Just use Playwright Skill (https://github.com/lackeyjb/playwright-skill). It is a game changer. Otherwise it is Claude the Blind, as OP mentioned.

[+] pcwelder|3 months ago|reply

But that's cheating because it then has the source code containing the table and its styles.

I can confirm that this is what it does.

And if you ask it to not use tables, it cleverly uses div with the same layout as the table instead.

[+] ErrantX|3 months ago|reply

I just feel this is a great example of someone falling into the common trap of treating an LLM like a human.

They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.

All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)

I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)

[+] sallveburrpi|3 months ago|reply

> vastly less intelligent than a human

I would more phrase it like that they are a completely alien “intelligence” that cant really be compared to human intelligence

[+] naet|3 months ago|reply

The blog frequently refers to the LLM as "him" instead of "it" which somehow feels disturbing to me.

I love to anthropomorphize things like rocks or plants, but something about doing it to an AI that responds in human like language enters an uncanny valley or otherwise upsets me.

[+] mxfh|3 months ago|reply

Everything feels wrong with that approach too me, starting with calling a perfectly time-appropriate website anachronistic.

Anachronistic would be something like creating an apparent flash website for a fictional 90s internet related movie.

[+] robomc|3 months ago|reply

He's using it correctly, in its secondary sense of "belonging or appropriate to an earlier period, especially so as to seem conspicuously old-fashioned or outdated."

[+] buchwald|3 months ago|reply

Claude is surprisingly bad at visual understanding. I did a similar thing to OP where I wanted Claude to visually iterate on Storybook components. I found outsourcing the visual check to Playwright in vision mode (as opposed to using the default a11y tree) and Codex for understanding worked best. But overall the idea of a visual inspection loop went nowhere. I blogged about it here: https://solbach.xyz/ai-agent-accessibility-browser-use/

[+] MagMueller|3 months ago|reply

Interesting read. Agree that GUI is super hard for agents. Did you see "skills" from browser-use? We directly interact with network requests now.

460 comments