I love clocks and I love finding the edges of what any given technology is capable of.
I've watched this for many hours and Kimi frequently gets the most accurate clock but also the least variation and is most boring. Qwen is often times the most insane and makes me laugh. Which one is "better?"
Clock drawing is widely used as a test for assessing dementia. Sometimes the LLMs fail in ways that are fairly predictable if you're familiar with CSS and typical shortcomings of LLMs, but sometimes they fail in ways that are less obvious from a technical perspective but are exactly the same failure modes as cognitively-impaired humans.
I think you might have stumbled upon something surprisingly profound.
If you're keeping all the generated clocks in a database, I'd love to see a Facemash style spin-off website where users pick the best clock between two options, with a leaderboard. I want to know what the best clock Qwen ever made was!
Please make it show last 5 (or some other number) of clocks for each model. It will be nice to see the deviation and variety for each model at a glance.
This is honestly the best thing I've seen on HN this month. It's stupid, enlightening... funny and profound and the same time. I have a strong temptation to pick some of these designs and build them in real life.
Could you please change and adjust the positions of the titles (like GPT 5)? On Firefox Focus on iOS, the spacing is inconsistent (seems like it moves due to the space taken by the clock). After one or two of them, I had to scroll all the way down to the bottom and come back up to understand which title is linked to which clock.
Watching this over the past few minutes, it looks like Kimi K2 generates the best clock face most consistently. I'd never heard of that model before today!
Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.
I’ve been using Kimi K2 a lot this month. Gives me Japanese->English translations at near human levels of quality, while respecting rules and context I give it in a very long, multi-page system prompt to improve fidelity of translation for a given translation target (sometimes markup tags need to be preserved, sometimes deleted, etc.). It doesn’t require a thinking step to generate this level of translation quality, making it suitable for real-time translation. It doesn’t start getting confused when I feed it a couple dozen lines of previous translation context, like certain other LLMs do… instead the translation actually improves with more context instead of degrading. It’s never refused a translation for “safety” purposes either (GPT and Gemini love to interrupt my novels and tell me certain behavior is illegal or immoral, and censor various anatomical words).
It could be that the prompt is accidentally (or purposefully) more optimised for Kimi K2, or that Kimi K2 is better trained on this particular data. LLM's need "prompt engineers" for a reason to get the most out of a particular model.
>Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.
More like fell headfirst into the ground.
I'm disappointed with Gemini 2.5 (not sure Pro or Flash) -- I've personally had _fantastic_ results with Gemini 2.5 Pro building PWA, especially since the May 2025 "coding update." [0]
I'm a huge K2 fan, it has a personality that feels very distinct from other models (not syccophantic at all), and is quite smart. Also pretty good at creative writing (tho not 100% slop free).
K2 hosted on groq is pretty crazy for intellgence/second. (Low rate limits still, tho.)
Interestingly, either I'm _hallucinating_ this, or DeepSeek started to consistently show a clock without failures and with good time, where it previously didn't. ...aaand as I was typing this, it barfed a train wreck. Never mind, move along... No, wait, it's good again, no, wait...
Since the first (good) image generation models became available, I've been trying to get them to generate an image of a clock with 13 instead of the usual 12 hour divisions. I have not been successful. Usually they will just replace the "12" with a "13" and/or mess up the clock face in some other way.
I'd be interested if anyone else is successful. Share how you did it!
I've noticed that image models are particularly bad at modifying popular concepts in novel ways (way worse "generalization" than what I observe in language models).
A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).
> The farmer and the goat are going to the river. They look into the sky and see three clouds shaped like: a wolf, a cabbage and a boat that can carry the farmer and one item. How can they safely cross the river?
Most of them are just giving the result to the well known river crossing riddle. Some "feel" that something is off, but still have a hard time to figure out that wolf, boat and cabbage are just clouds.
This is really cool. I tried to prompt gemini but every time I got the same picture. I do not know how to share a session (like it is possible with Chatgpt) but the prompts were
If a clock had 13 hours, what would be the angle between two of these 13 hours?
Generate an image of such a clock
No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above
This is the same image. There need to be 13 hour marks around the dial, evenly spaced
... And its last answer was
You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.
Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.
And the very same clock, with 12 hours, and a 13th above the 12...
I was able to have AI generate an image that made this, but not by diffusion/autoregressive but by having it write Python code to create the image.
ChatGPT made a nice looking clock with matplotlib that had some bugs that it had to fix (hours were counter-clockwise). Gemini made correct code one-shot, it used Pillow instead of matplotlib, but it didn't look as nice.
Weird, I never tried that, I tried all the usual tricks that usually work including swearing at the model (this scarily works surprisingly well with LLMs) and nothing. I even tried to go the opposite direction, I want a 6 hour clock.
That's because they literally cannot do that. Doing what you're asking requires an understanding of why the numbers on the clock face are where they are and what it would mean if there was an extra hour on the clock (ie that you would have to divide 360 by 13 to begin to understand where the numbers would go). AI models have no concept of anything that's not included in their training data. Yet people continue to anthropomorphize this technology and are surprised when it becomes obvious that it's not actually thinking.
I've been trying for the longest time and across models to generate pictures or cartoons of people with six fingers and now they won't do it. They always say they accomplished it, but the result always has 5 fingers. I hate being gaslit.
I've been struggling all week trying to get Claude Code to write code to produce visual (not the usual, verifiable, text on a terminal) output in the form of a SDL_GPU rendered scene consisting of the usual things like shaders, pipelines, buffers, textures and samplers, vertex and index data and so on, and boy it just doesn't seem to know what it's doing. Despite providing paragraphs-long, detailed prompts. Despite describing each uniform and each matrix that needs to be sent. Despite giving it extremely detailed guidance about what order things need to be done in. It would have been faster for me to just write the code myself.
When it fails a couple of times it will try to put logging in place and then confidently tell me things like "The vertex data has been sent to the renderer, therefore the output is correct!" When I suggest it take a screenshot of the output each time to verify correctness, it does, and then declares victory over an entirely incorrect screenshot. When I suggest it write unit tests, it does so, but the tests are worthless and only tests that the incorrect code it wrote is always incorrect in the same ways.
When it fails even more times, it will get into this what I like to call "intern engineer" mode where it just tries random things that I know are not going to work. And if I let it keep going, it will end up modifying the entire source tree with random "try this" crap. And each iteration, it confidently tells me: "Perfect! I have found the root cause! It is [garbage bullshit]. I have corrected it and the code is now completely working!"
These tools are cute, but they really need to go a long way before they are actually useful for anything more than trivial toy projects.
I’m not sure if it's just me, but I've also noticed Claude becoming even more lazy. For example, I've asked it several times to fix my tests. It'll fix four or five of them, then start struggling with the next couple, and suddenly declare something like: "All done, fixed 5 out of 10 tests. I can’t fix the remaining ones", followed by a long, convoluted explanation about why that’s actually a good thing.
Have you given using MCPs to provide documentation and examples a shot? I always have to bring in docs since I don't work in Python and TS+React (which it seems more capable at) and force it to review those in addition to any specification. e.g. Context7
I know this has been said many times before, but I wonder why this is such a common outcome. Maybe from negative outcomes being underrepresented in the training data? Maybe that plus being something slightly niche and complex?
The screenshot method not working is unsurprising to me, VLLMs visual reasoning is very bad with details because they (as far as I understand) do not really have access to those details, just the image embedding and maybe an OCR'd transcript.
Amazing, some people are so enamored with LLMs who use them for soft outcomes, and disagree with me when I say be careful they're not perfect -- this is such a great non technical way to explain the reality I'm seeing when using on hard outcome coding/logic tasks. "Hey this test is failing", LLM deletes test, "FIXED!"
Something that struck me when I was looking at the clocks is that we know what a clock is supposed to look and act like.
What about when we don't know what it's supposed to look like?
Lately I've been wrestling with the fact that unlike, say, a generalized linear model fit to data with some inferential theory, we don't have a theory or model for the uncertainty about LLM products. We recognize when it's off about things we know are off, but don't have a way to estimate when it's off other than to check it against reality, which is probably the exception to how it's used rather than the rule.
> "Hey this test is failing", LLM deletes test, "FIXED!"
A nice continuation of the tradition of folk stories about supernatural entities like teapots or lamps that grant wishes and take them literally. "And that's why, kids, you should always review your AI-assisted commits."
Last year I wrote a simple system using Semantic Kernel, backed by functions inside Microsoft Orleans, which for the most part was a business logic DSL processor by LLM. Your business logic was just text, and you gave it the operation as text.
Nothing could be relied upon to be deterministic, it was so funny to see it try to do operations.
Recently I re-ran it with newer models and was drastically better, especially with temperature tweaks.
I'm having a hard time believing this site is honest, especially with how ridiculous the scaling and rotation of numbers is for most of them.
I dumped his prompt into chatgpt to try it myself and it did create a very neat clock face with the numbers at the correct position+animated second hand, it just got the exact time wrong, being a few hours off.
Edit: the time may actually have been perfect now that I account for my isp's geo-located time zone
LLMs can't "look" at the rendered HTML output to see if what they generated makes sense or not. But there ought to be a way to do that right? To let the model iterate until what it generates looks right.
Currently, at work, I'm using Cursor for something that has an OpenGL visualization program. It's incredibly frustrating trying to describe bugs to the AI because it is completely blind. Like I just wanna tell it "there's no line connecting these two points but there ought to be one!" or "your polygon is obviously malformed as it is missing a bunch of points and intersects itself" but it's impossible. I end up having to make the AI add debug prints to, say, print out the position of each vertex, in order to convince it that it has a bug. Very high friction and annoying!!!
Cursor has this with their "browser" function for web dev, quite useful
You can also give it a mcp setup that it can send a screenshot to the conversation, though unsure if anyone made an easy enough "take screenshot of a specific window id" kind of mcp, so may need to be built first
I guess you could also ask it to build that mcp for you...
You can absolutely do this. In fact, with Claude Anthropic encourages you to send it screenshots. It works very well if you aren't expecting pixel-perfection.
YMMV with other models but Sonnet 4.5 is good with things like this - writing the code, "seeing" the output and then iterating on it.
I had some success providing screenshots to Cursor directly. It worked well for web UIs as well as generated graphs in Python. It makes them a bit less blind, though I feel more iterations are required.
Claude totally can, same with ChatGPT. Upload a picture to either one of them via the app and tell it there's no line where there should be. There’s some plumbing involved to get it to work in Claude code or codex, but yes, computers can "see". If you have lm-server, there's tons of non-text models you can point your code at.
Kinda - Hand waiving over the question of if an LLM can really "look" but you can connect Cursor to a Puppeteer MCP server which will allow it to iterate with "eyes" by using Puppeteer to screenshot it's own output. Still has issues, but it does solve really silly mistakes often simply by having this MCP available.
Something I'm not able to wrap my head around is that Kimi K2 is the only model that produces a ticking second hand on every attempt while the rest of them are always moving continuously. What fundamental differences in model training or implementation can result in this disparity? Or was this use case programmed in K2 after the fact?
I'd say more like a blind programmer in the early stages of dementia. Able to write code, unable to form a mental image of what it would render as and can't see the final result.
Cool, and marginally informative on the current state of things. but kind of a waste of energy given everything is re-done every minute to compare. We'd probably only need a handful of each to see the meaningful differences.
It's actually quite fascinating if you watch it for 5 minutes. Some models are overall bad, but others nail it in one minute and butcher it in the next.
It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.
Because a new clock is generated every minute, looks like simply changing the time by a digit causes the result to be significantly different from the previous iteration.
This is such a great idea! Surprisingly, the Kimi K2 is the only one without any obvious problems. And it is even not the complete K2 thinking version? This made me reread this article from a few days ago:
I like Deepseek v3.1's idea of radially-aligning each hour number's y-axis ("1" is rotated 30° from vertical, "2" at 60°, etc.). It would be even better if the numbers were rotated anticlockwise.
I'm not sure what Qwen 2.5 is doing, but I've seen similar in contemporary art galleries.
`single html file, working analog clock showing current time, numbers positioned (aligned) correctly via trig calc (dynamic), all three hands, second hand ticks, 400px, clean AF aesthetic R/Greenberg Associates circa 2017. empathy, hci, define > design > implement.`
The more I look at it, the more I realise the reason for cognitive overload I feel when using LLMs for coding. Same prompt to same model for a pretty straight forward task produces such wildly different outputs. Now, imagine how wildly different the code outputs when trying to generate two different logical functions. The casings are different, commenting is different, no semantic continuity. Now maybe if I give detailed prompts and ask it to follow, it might follow, but from my experience prompt adherence is not so great as well. I am at the stage where I just use LLMs as auto correct, rather than using it for any generation.
In any case those clocks are all extremely inaccurate, even if AI could build a decent UI (which is not the case).
Some months ago I published this site for fun: https://timeutc.com There's a lot of code involved to make it precise to the ms, including adjusting based on network delay, frame refresh rate instead of using setTimeout and much more. If you are curious take a look at the source code.
This is cool, interesting to see how consistent some models are (both in success and failure)
I tried gpt-oss-20b (my go-to local) and it looks ok though not very accurate. It decided to omit numbers. It also took 4500 tokens while thinking.
I'd be interested in seeing it with some more token leeway as well as comparing two or more similar prompts. like using "current time" instead of "${time}" and being more prescriptive about including numbers
Security-wise, this is a website that takes the straight output of AI and serves it for execution on their website.
I know, developers do the same, but at least they check it in Git to notice their mistakes. Here is an opportunity for AI to call a Google Authentication on you, or anything else.
Ask Claude or ChatGPT to write it in Python, and you will see what they are capable of. HTML + CSS has never been the strong suit of any of these models.
Why does that give better results? Is this phenomena measurable? How would "you have a phd in computer science" change its ability to interpret prose? Every interaction with an LLM seems like superstition.
This is great. If you think that the phenomena of human-like text generation evinces human-like intelligence, then this should be taken to evince that the systems likely have dementia. https://en.wikipedia.org/wiki/Montreal_Cognitive_Assessment
Imagine if I asked you to draw as pixels and operate a clock via html or create a jpeg with a pencil and paper and have it be accurate.. I suspect your handcoded work to be off by an order of magnitutde compared
whats going on with kimi k2 and being reasonable/so unique in so many of these benchmarks ive seen recently? I will have to try it out further for stuff. is it any good at programming?
I love that GPT-5 is putting the clock hands way outside the frame and just generally is a mess. Maybe we'll look back on these mistakes just like watching kids grow up and fumble basic tasks. Humorous in its own unique way.
What a wonderfully visual example of the crap LLMs turn everything into. I am eagerly awaiting the collapse of the LLM bubble. JetBrains added this crap to their otherwise fine series of IDEs and now I have to keep removing randomly inserted import statements and keep fixing hallucinated names of functions suggested instead of the names of functions that I have already defined in the same file. Lack of determinism where we expect it (most of the things we do, tbh) is creating more problems than it is solving.
Honestly, I think if you track the performance of each over time, since these get regenerated once in a while, you can then have a very, very useful and cohesive benchmark.
The ? has "Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting."
Limiting the model to only use 2000 tokens while also asking it to output ONLY HTML/CSS is just stupid. It's like asking a programmer to perform the same task while removing half their brain and also forget about their programming experience. This is a stupid and meaningless benchmark.
Some comments were deferred for faster rendering.
lanewinfield|3 months ago
I love clocks and I love finding the edges of what any given technology is capable of.
I've watched this for many hours and Kimi frequently gets the most accurate clock but also the least variation and is most boring. Qwen is often times the most insane and makes me laugh. Which one is "better?"
jdietrich|3 months ago
I think you might have stumbled upon something surprisingly profound.
https://www.psychdb.com/cognitive-testing/clock-drawing-test
bspammer|3 months ago
smusamashah|3 months ago
charliewallace|3 months ago
chemotaxis|3 months ago
I applaud you for spending money to get it done.
AnonHP|3 months ago
anigbrowl|3 months ago
ks2048|3 months ago
brianjking|3 months ago
csours|3 months ago
It would be really cool if I could zoom out and have everything scale properly!
unknown|3 months ago
[deleted]
Fabricio20|3 months ago
hakcermani|3 months ago
otterley|3 months ago
Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.
jquery|3 months ago
frizlab|3 months ago
frankfrank13|3 months ago
Mistletoe|3 months ago
bArray|3 months ago
nightpool|3 months ago
paulddraper|3 months ago
oaktowner|3 months ago
stogot|3 months ago
abixb|3 months ago
More like fell headfirst into the ground.
I'm disappointed with Gemini 2.5 (not sure Pro or Flash) -- I've personally had _fantastic_ results with Gemini 2.5 Pro building PWA, especially since the May 2025 "coding update." [0]
[0] https://blog.google/products/gemini/gemini-2-5-pro-updates/
dilap|3 months ago
K2 hosted on groq is pretty crazy for intellgence/second. (Low rate limits still, tho.)
basch|3 months ago
buffaloPizzaBoy|3 months ago
I wonder if that is some type of fallback for errors querying the model, or k2 actually created the html/css to display that.
kbar13|3 months ago
wowczarek|3 months ago
baltimore|3 months ago
I'd be interested if anyone else is successful. Share how you did it!
Scene_Cast2|3 months ago
deathanatos|3 months ago
A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).
andix|3 months ago
> The farmer and the goat are going to the river. They look into the sky and see three clouds shaped like: a wolf, a cabbage and a boat that can carry the farmer and one item. How can they safely cross the river?
Most of them are just giving the result to the well known river crossing riddle. Some "feel" that something is off, but still have a hard time to figure out that wolf, boat and cabbage are just clouds.
echelon|3 months ago
Once companies see this starting to show up in the evals and criticisms, they'll go out of their way to fix it.
BrandoElFollito|3 months ago
If a clock had 13 hours, what would be the angle between two of these 13 hours?
Generate an image of such a clock
No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above
This is the same image. There need to be 13 hour marks around the dial, evenly spaced
... And its last answer was
You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.
Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.
And the very same clock, with 12 hours, and a 13th above the 12...
edub|3 months ago
ChatGPT made a nice looking clock with matplotlib that had some bugs that it had to fix (hours were counter-clockwise). Gemini made correct code one-shot, it used Pillow instead of matplotlib, but it didn't look as nice.
giancarlostoro|3 months ago
nl|3 months ago
My working theory is that they were trained really hard to generate 5 fingers on hands but their counting drops off quickly.
IAmGraydon|3 months ago
chanux|3 months ago
snek_case|3 months ago
usui|3 months ago
coffeecoders|3 months ago
My prompt to Grok:
---
Follow these rules exactly:
- There are 13 hours, labeled 1–13.
- There are 13 ticks.
- The center of each number is at angle: index * (360/13)
- Do not infer anything else.
- Do not apply knowledge of normal clocks.
Use the following variables:
HOUR_COUNT = 13
ANGLE_PER_HOUR = 360 / 13 // 27.692307°
Use index i ∈ [0..12] for hour marks:
angle_i = i * ANGLE_PER_HOUR
I want html/css (single file) of a 13-hour analog clock.
---
Output from grok.
https://jsfiddle.net/y9zukcnx/1/
ryandrake|3 months ago
When it fails a couple of times it will try to put logging in place and then confidently tell me things like "The vertex data has been sent to the renderer, therefore the output is correct!" When I suggest it take a screenshot of the output each time to verify correctness, it does, and then declares victory over an entirely incorrect screenshot. When I suggest it write unit tests, it does so, but the tests are worthless and only tests that the incorrect code it wrote is always incorrect in the same ways.
When it fails even more times, it will get into this what I like to call "intern engineer" mode where it just tries random things that I know are not going to work. And if I let it keep going, it will end up modifying the entire source tree with random "try this" crap. And each iteration, it confidently tells me: "Perfect! I have found the root cause! It is [garbage bullshit]. I have corrected it and the code is now completely working!"
These tools are cute, but they really need to go a long way before they are actually useful for anything more than trivial toy projects.
poszlem|3 months ago
rossant|3 months ago
fancy_pantser|3 months ago
jamilton|3 months ago
The screenshot method not working is unsurprising to me, VLLMs visual reasoning is very bad with details because they (as far as I understand) do not really have access to those details, just the image embedding and maybe an OCR'd transcript.
munro|3 months ago
derbOac|3 months ago
What about when we don't know what it's supposed to look like?
Lately I've been wrestling with the fact that unlike, say, a generalized linear model fit to data with some inferential theory, we don't have a theory or model for the uncertainty about LLM products. We recognize when it's off about things we know are off, but don't have a way to estimate when it's off other than to check it against reality, which is probably the exception to how it's used rather than the rule.
worldsayshi|3 months ago
mopsi|3 months ago
markatkinson|3 months ago
kylecazar|3 months ago
jeremycarter|3 months ago
Nothing could be relied upon to be deterministic, it was so funny to see it try to do operations.
Recently I re-ran it with newer models and was drastically better, especially with temperature tweaks.
anon_cow1111|3 months ago
Edit: the time may actually have been perfect now that I account for my isp's geo-located time zone
Zopieux|3 months ago
perfmode|3 months ago
porphyra|3 months ago
Currently, at work, I'm using Cursor for something that has an OpenGL visualization program. It's incredibly frustrating trying to describe bugs to the AI because it is completely blind. Like I just wanna tell it "there's no line connecting these two points but there ought to be one!" or "your polygon is obviously malformed as it is missing a bunch of points and intersects itself" but it's impossible. I end up having to make the AI add debug prints to, say, print out the position of each vertex, in order to convince it that it has a bug. Very high friction and annoying!!!
firtoz|3 months ago
You can also give it a mcp setup that it can send a screenshot to the conversation, though unsure if anyone made an easy enough "take screenshot of a specific window id" kind of mcp, so may need to be built first
I guess you could also ask it to build that mcp for you...
EMM_386|3 months ago
YMMV with other models but Sonnet 4.5 is good with things like this - writing the code, "seeing" the output and then iterating on it.
pil0u|3 months ago
fragmede|3 months ago
TheKidCoder|3 months ago
zkmon|3 months ago
unknown|3 months ago
[deleted]
bongodongobob|3 months ago
Create an interactive artifact of an analog clock face that keeps time properly.
https://claude.ai/public/artifacts/75daae76-3621-4c47-a684-d...
anotheryou|3 months ago
no thinking: better clock but not current time (the prompt is confusing here though): https://imgur.com/a/kRK3Q18
themgt|3 months ago
paxys|3 months ago
em3rgent0rdr|3 months ago
shafoshaf|3 months ago
pixl97|3 months ago
morkalork|3 months ago
energy123|3 months ago
mandolingual|3 months ago
ada1981|3 months ago
fouc|3 months ago
ugh123|3 months ago
whoisjuan|3 months ago
It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.
energy123|3 months ago
ascorbic|3 months ago
busymom0|3 months ago
unknown|3 months ago
[deleted]
edfletcher_t137|3 months ago
S0y|3 months ago
bobbylarrybobby|3 months ago
gwbas1c|3 months ago
Makes me think that LLMs are like people with dementia! Perhaps it's the best way to relate to an LLM?
chaosprint|3 months ago
https://entropytown.com/articles/2025-11-07-kimi-k2-thinking...
amelius|3 months ago
https://slate.com/human-interest/2016/07/martin-baas-giant-r...
cornonthecobra|3 months ago
I'm not sure what Qwen 2.5 is doing, but I've seen similar in contemporary art galleries.
adi_kurian|3 months ago
Got it to work on gpt 3.5T w modified prompt (albeit not as good - https://pastebin.com/gjEVSEcJ)
`single html file, working analog clock showing current time, numbers positioned (aligned) correctly via trig calc (dynamic), all three hands, second hand ticks, 400px, clean AF aesthetic R/Greenberg Associates circa 2017. empathy, hci, define > design > implement.`
fouc|3 months ago
wanderingmind|3 months ago
unknown|3 months ago
[deleted]
buzzm|3 months ago
earth2mars|3 months ago
lanewinfield|3 months ago
esafak|3 months ago
Bengalilol|3 months ago
It even made a Nietzsche clock (I saw one <body> </body> which was surprisingly empty).
It definitely wins the creative award.
collimarco|3 months ago
Some months ago I published this site for fun: https://timeutc.com There's a lot of code involved to make it precise to the ms, including adjusting based on network delay, frame refresh rate instead of using setTimeout and much more. If you are curious take a look at the source code.
ticulatedspline|3 months ago
I tried gpt-oss-20b (my go-to local) and it looks ok though not very accurate. It decided to omit numbers. It also took 4500 tokens while thinking.
I'd be interested in seeing it with some more token leeway as well as comparing two or more similar prompts. like using "current time" instead of "${time}" and being more prescriptive about including numbers
anonzzzies|3 months ago
fouc|3 months ago
syx|3 months ago
coffeecoders|3 months ago
9 AIs × 43,200 minutes = 388,800 requests/month
388,800 requests × 200 tokens = 77,760,000 tokens/month ≈ 78M tokens
Cost varies from 10 cents to $1 per 1M tokens.
Using the mid-price, the cost is around $50/month.
---
Hopefully, the OP has this endpoint protected - https://clocks.brianmoore.com/api/clocks?time=11:19AM
rtcode_io|3 months ago
AI-optimized <analog-clock>!
People expect perfection on first attempt. This took a brief joint session:
HI: define the custom element API design (attribute/property behavior) and the CSS parts
AI: draw the rest of the f… owl
speedgoose|3 months ago
eastbound|3 months ago
I know, developers do the same, but at least they check it in Git to notice their mistakes. Here is an opportunity for AI to call a Google Authentication on you, or anything else.
nasir|3 months ago
whimsicalism|3 months ago
Vera_Wilde|3 months ago
The thing I always want from timezone tools is: “Let me simulate a date after one side has shifted but the other hasn’t.”
Humans do badly with DST offset transitions; computers do great with them.
orly01|3 months ago
kfarr|3 months ago
BrandoElFollito|3 months ago
Place a baby elephant in the green chair
I cannot unsee what I saw and it is 21:30 here so I have an hour or so to eliminate the picture from my mind or I will have nightmares.
arendtio|3 months ago
I use 'Sonnet 4.5 thinking' and 'Composer 1' (Cursor) the most, so it would be interesting to see how such SOTA models perform in this task.
hansmayer|3 months ago
csours|3 months ago
fschuett|3 months ago
bpt3|3 months ago
I'm not sure if this was the intent or not, but it sure highlights how unreliable LLMs are.
system2|3 months ago
camalouu|3 months ago
bigbluedots|3 months ago
More seriously, I'd love to see how the models perform the same task with a larger token allowance.
aavshr|3 months ago
bwhiting2356|3 months ago
xyproto|3 months ago
This gives better results, at least for me.
bigfishrunning|3 months ago
unknown|3 months ago
[deleted]
boxedemp|3 months ago
3oil3|3 months ago
unknown|3 months ago
[deleted]
wewtyflakes|3 months ago
JamesAdir|3 months ago
bitwize|3 months ago
unknown|3 months ago
[deleted]
maxdo|3 months ago
shahzaibmushtaq|3 months ago
Why is a new clock being rendered every minute? Or AI models are evolving and improving every minute.
abathologist|3 months ago
AIorNot|3 months ago
zkmon|3 months ago
giancarlostoro|3 months ago
josfredo|3 months ago
DeathArrow|3 months ago
__fst__|3 months ago
imchillyb|3 months ago
HarHarVeryFunny|3 months ago
RugnirViking|3 months ago
Bolwin|3 months ago
Waterluvian|3 months ago
bloppe|3 months ago
bhandziuk|3 months ago
Imanari|3 months ago
esotericwarfare|3 months ago
novemp|3 months ago
Zeraous|3 months ago
bigbluedots|3 months ago
padolsey|3 months ago
accrual|3 months ago
palmotea|3 months ago
Or regret: "why didn't we stop it when we could?"
teaearlgraycold|3 months ago
0xCE0|3 months ago
stym06|3 months ago
AlfredBarnes|3 months ago
ssl-3|3 months ago
jcmontx|3 months ago
lovegrenoble|3 months ago
gloosx|3 months ago
surfingdino|3 months ago
mstipetic|3 months ago
unknown|3 months ago
[deleted]
bananatron|3 months ago
hollow-moe|3 months ago
fnord77|3 months ago
woopwoop|3 months ago
miohtama|3 months ago
shubham_zingle|3 months ago
lxe|3 months ago
larodi|3 months ago
creade|3 months ago
silexia|3 months ago
baidoct|3 months ago
1yvino|3 months ago
cyberjill|3 months ago
adriatp|3 months ago
shevy-java|3 months ago
Granted, it is not a clock - but it could be art. It looks like a Picasso. When he was drunk. And took some LSD.
jonplackett|3 months ago
kwanbix|3 months ago
warpspin|3 months ago
Great experiment!
jsmo|3 months ago
unknown|5 months ago
[deleted]
Gormanu|3 months ago
[deleted]
superlukas99|3 months ago
[deleted]
PeterStuer|3 months ago
kburman|3 months ago
awkwam|3 months ago