top | item 46154770

(no title)

OP here. We realized there are a ton of limitations with backtest and paper money but still wanted to do this experiment and share the results. By no means is this statistically significant on whether or not these models can beat the market in the long term. But wanted to give everyone a way to see how these models think about and interact with the financial markets.

discuss

anigbrowl|2 months ago

You should redo this with human controls. By a weird coincidence, I have sufficient free time.

apparent|2 months ago

> Grok ended up performing the best while DeepSeek came close to second.

I think you mean "DeepSeek came in a close second".

apparent|2 months ago

OK, now it says:

> Grok ended up performing the best while DeepSeek came close second.

"came in a close second" is an idiom that only makes sense word-for-word.

pottertheotter|2 months ago

Cool experiment.

I have a PhD in capital markets research. It would be even more informative to report abnormal returns (market/factor-adjusted) so we can tell whether the LLMs generated true alpha rather than just loading on tech during a strong market.

philipwhiuk|2 months ago

You're not really giving them any money and it's not actually trading.

There's no market impact to any trading decision they make.

joegibbs|2 months ago

I think it would be interesting to see how it goes in a scenario where the market declines or where tech companies underperform the rest of the market. In recent history they've outperformed the market and that might bias the choices that the LLMs make - would they continue with these positive biases if they were performing badly?

gerdesj|2 months ago

These are LLMs - next token guessers. They don't think at all and I suggest that you don't try to get rich quick with one!

LLMs are handy tools but no more. Even Qwen3-30B heavily quantised will do a passable effort of translating some Latin to English. It can whip up small games in a single prompt and much more and with care can deliver seriously decent results but so can my drill driver! That model only needs a £500 second hand GPU - that's impressive for me. Also GPT-OSS etc.

Yes, you can dive in with the bigger models that need serious hardware and they seem miraculous. A colleague had to recently "force" Claude to read some manuals until it realised it had made a mistake about something and frankly I think "it" was only saying it had made a mistake. I must ask said colleague to grab the reasoning and analyse it.

DennisP|2 months ago

Transformers are general-purpose learning machines. Guessing the next token is what we initially train them to do, but then we add other training on things like giving answers that humans like, or doing math that's judged correct by automated systems. These days they're getting quite good at advanced mathematics.

this_user|2 months ago

I can almost guarantee you that these models will underperform the market in the long run, because they are simply not designed for this purpose. LLMs are designed to simulate a conversation, not predict forward returns of a time series. What's more, most of the widely disseminated knowledge out there on the topic is effectively worthless, because there is an entire cottage industry of fake trading gurus and grifters, and the LLMs have no ability to separate actual information from the BS.

If you really wanted to do this, you would have to train specialist models - not LLMs - for trading, which is what firms are doing, but those are strictly proprietary.

The only other option would be to train an LLM on actually correct information and then see if it can design the specialist model itself, but most of the information you would need for that purpose is effectively hidden and not found in public sources. It is also entirely possible that these trading firms have already been trying this: using their proprietary knowledge and data to attempt to train a model that can act as a quant researcher.

beezle|2 months ago

What were the risk adjusted returns? Without knowing that, this is all kind of meaningless. Being high beta in a rising market doesn't equate to anything brilliant.

unknown|2 months ago

[deleted]

irishcoffee|2 months ago

> But wanted to give everyone a way to see how these models think…

Think? What exactly did “it” think about?

cheeseblubber|2 months ago

You can click in to the chart and see the conversation as well as for each trade what was the reasoning it gave for it

stoneyhrm1|2 months ago

"Pass the salt? You mean pass the sodium chloride?"