physix's comments

physix | 7 months ago | on: Ultrathin business card runs a fluid simulation

In the way back days when we submitted our CVs on paper, I always cut mine to a smaller size than letter, in a branded folder. People tend to stack things with the smaller items on top. I don't know if mine actually was on top of the stack, but I can say that I basically always got the contract.

physix | 7 months ago | on: GPT-5

For those who happen to have a subscription to The Economist, there is a very interesting Money Talks podcast where they interview Anthropic's boss Dario Amodei[1].

There were two interesting takeaways about AGI:

1. Dario makes the remark that the term AGI/ASI is very misleading and dangerous. These terms are ill defined and it's more useful to understand that the capabilities are simply growing exponentially at the moment. If you extrapolate that, he thinks it may just "eat the majority of the economy". I don't know if this is self-serving hype, and it's not clear where we will end up with all this, but it will be disruptive, no matter what.

2. The Economist moderators however note towards the end that this industry may well tend toward commoditization. At the moment these companies produce models that people want but others can't make. But as the chip making starts to hits its limits and the information space becomes completely harvested, capability-growth might taper off, and others will catch up. The quasi-monopoly profit potentials melting away.

Putting that together, I think that although the cognitive capabilities will most likely continue to accelerate, albeit not necessarily along the lines of AGI, the economics of all this will probably not lead to a winner takes all.

[1] https://www.economist.com/podcasts/2025/07/31/artificial-int...

physix | 7 months ago | on: Study mode

I had Google Gemini 2.5 Flash analyse a log file and it quoted content that simply didn't exist.

It appears to me like a form of decoherence and very hard to predict when things break down.

People tend to know when they are guessing. LLMs don't.

physix | 7 months ago | on: Study mode

A game changer in which respect?

Anyway, this makes me wonder if LLMs can be appropriately prompted to indicate whether the information given is speculative, inferred or factual. Whether they have the means to gauge the validity/reliability of their response and filter their response accordingly.

I've seen prompts that instruct the LLM to make this transparent via annotations to their response, and of course they comply, but I strongly suspect that's just another form of hallucination.

physix | 7 months ago | on: How Anthropic teams use Claude Code

Talking about hilarious, we had a Close Encounter of the Hallucinating Kind today. We were having mysterious simultaneous gRPC socket-closed exceptions on the client and server side running in Kubernetes talking to each other through an nginx ingress.

We captured debug logs, described the detailed issue to Gemini 2.5 Flash giving it the nginx logs for the one second before and after an example incident, about 10k log entries.

It came back with a clear verdict, saying

"The smoking gun is here: 2025/07/24 21:39:51 [debug] 32#32: *5902095 rport:443 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.100.128, server: grpc-ai-test.not-relevant.org, request: POST /org.not-relevant.cloud.api.grpc.CloudEventsService/startStreaming HTTP/2.0, upstream: grpc://10.233.75.54:50051, host: grpc-ai-test.not-relevant.org"

and gave me a detailed action plan.

I was thinking this is cool, don't need to use my head on this, until I realized that the log entry simply did not exist. It was entirely made up.

(And yes I admit, I should know better than to do lousy prompting on a cheap foundation model)

physix | 7 months ago | on: How and where will agents ship software?

Read the comments here so far and I find that they are absolutely right to offer an AI layer that speeds up building apps on their db.

Once built, the solution is plain-old-runnable-code (PORC :-), as long as the business logic implemented doesn't exit to LLM. So I don't fret so much about the AI hype story here.

For anyone starting off building with new tech, an AI assistant is really helpful.

physix | 7 months ago | on: The upcoming GPT-3 moment for RL

Actually I didn't. Correct me if I am wrong, but my understanding is that RL is still an LLM tuning approach, i.e. an optimization of its parameter set, no matter if it's done at scale or via HF.

physix | 7 months ago | on: Understanding Tool Calling in LLMs – Step-by-Step with REST and Spring AI

I've been actively working with Spring since about 2008. About 3-4 times a year, I cuss and curse some strange side effects that occur during refactorings. And in some areas we've painted ourselves into a corner.

But all in all, it's a great set of frameworks in the enterprise Java/Kotlin space. I'd say it's that synergy, which makes it worth the while.

I'm curious, though. Is the use of dependency injection part of the portfolio of criticisms towards Spring?

physix | 7 months ago | on: The upcoming GPT-3 moment for RL

I'm not sufficiently familiar with the details on ML to assess the proposition made in the article.

From my understanding, RL is a tuning approach on LLMs, so the outcome is still the same kind of beast, albeit with a different parameter set.

So empirically, I actually thought that the lead companies would already be strongly focused on improving coding capabilities, since this is where LLMs are very effective, and where they have huge cashflows from token consumptions.

So, either the motivation isn't there, or they're already doing something like that, or they know it's not as effective as the approaches they already have.

I wonder which one it is.

physix | 7 months ago | on: The upcoming GPT-3 moment for RL

I'd really like to know which use cases work and which don't. And when folks say they use agentic AI to churn through tokens to automate virtually the entire SDLC, are they just cherry picking the situations that turned out well, or do they really have prompting and workflow approaches that indeed increase their productivity 10-fold? Or, as you mention, is it possibly a niche area which works well?

My personal experience the past five months has been very mixed. If I "let 'er rip" it's mostly junk I need to refactor or redo by micro-managing the AI. At the moment, at least for what I do, AI is like a fantastic calculator that speeds up your work, but where you still should be pushing the buttons.

physix | 7 months ago | on: Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model

That reminds me of a thought I had about the poachings.

The poaching was probably more aimed at hamstringing Meta's competition.

Because the disruption caused by them leaving in droves is probably more severe than the benefits of having them on board. Unless they are gods, of course.

page 2