top | item 44700455

(no title)

I advise scepticism.

This work does have some very interesting ideas, specifically avoiding the costs of backpropagation through time.

However, it does not appear to have been peer reviewed.

The results section is odd. It does not include include details of how they performed the assesments, and the only numerical values are in the figure on the front page. The results for ARC2 are (contrary to that figure) not top of the leaderboard (currently 19% compared to HRMs 5% https://www.kaggle.com/competitions/arc-prize-2025/leaderboa...)

discuss

cs702|7 months ago

The authors' code is at https://github.com/sapientinc/HRM .

In fields like AI/ML, I'll take a preprint with working code over peer-reviewed work without any code, always, even when the preprint isn't well edited.

Everyone everywhere can review a preprint and its published code, instead of a tiny number of hand-chosen reviewers who are often overworked, underpaid, and on tight schedules.

If the authors' claims hold up, the work will gain recognition. If the claims don't hold up, the work will eventually be ignored. Credentials are basically irrelevant.

Think of it as open-source, distributed, global review. It may be messy and ad-hoc, since no one is in charge, but it works much better than traditional peer review!

smokel|7 months ago

I sympathize partially with your views, but how would this work in practice? Where would the review comments be stored? Is one supposed to browse Hacker News to check the validity of a paper?

If a professional reviewer spots a serious problem, the paper will not make it to a conference or journal, saving us a lot of trouble.

hodgehog11|7 months ago

Scepticism is generally always a good idea with ML papers. Once you start publishing regularly in ML conferences, you understand that there is no traditional form of peer review anymore in this domain. The volume of papers has meant that 'peers' are often students coming to grips with parts of the field that rarely align with what they are asked to review. Conference peer review has become a 'vibe check' more than anything.

Real peer review is when other experts independently verify your claims in the arXiv submission through implementation and (hopefully) cite you in their followup work. This thread is real peer review.

naasking|7 months ago

> Once you start publishing regularly in ML conferences, you understand that there is no traditional form of peer review anymore in this domain.

Which is fine, because peer review is not a good proxy for quality or validity.

dleeftink|7 months ago

I appreciate this insight, makes you wonder, why even publish a paper if it only amounts to a vibe check? If it's just the code we need we can get that peer reviewed through other channels.

rapatel0|7 months ago

THIS is so true but also not limited to ML.

Having been both a publisher and reviewer across multiple engineering, science, and bio-medical disciplines this occurs across academia.

d4rkn0d3z|7 months ago

Skepticism is best expressed by repeating the experiment and comparing results. I'm game and I have 10 days off work next month. I wonder what can be had in terms of full source and data, etc. from the authors?

PJones2000|7 months ago

24 hours on a 4070. Seems quite doable. https://github.com/sapientinc/HRM

JonathanRaines|7 months ago

Nice! They provide trained checkpoints on their GitHub. Repeating their results would be a good start. https://github.com/sapientinc/HRM

diwank|7 months ago

I think that’s too harsh a position solely for not being peer reviewed yet. Neither of yhe original mamba1 and mamba2 papers were peer reviewed. That said, strong claims warrant strong proofs, and I’m also trying to reproduce the results locally.

mitthrowaway2|7 months ago

Do you consider yourself a peer? Feel free to review it.

A peer reviewer will typically comment that some figures are unclear, that a few relevant prior works have gone uncited, or point out a followup experiment that they should do.

That's about the extent of what peer reviewers do, and basically what you did yourself.

halayli|7 months ago

The fact that you are expecting a paper just published to have been peer reviewed already tells me that you are likely not familiar with the process. The first step to have your work peer reviewed is to publish it.

frozenseven|7 months ago

>does not appear to have been peer reviewed

Enough already. Please. The paper + code is here for everybody to read and test. Either it works or it doesn't. Either people will build upon it or they won't. I don't need to wait 20 months for 3 anonymous dudes to figure it out.

riku_iki|7 months ago

> However, it does not appear to have been peer reviewed.

my observation is that peer reviewers never try to reproduce results or do basic code audit to check that there is no data leak for example to training dataset.

sigmoid10|7 months ago

Skepticism is an understatement. There are tons of issues with this paper. Why are they comparing results of their expert model that was trained from scratch on a single task to general purpose reasoning models? It is well established in the literature that you can still beat general purpose LLMs in narrow domain tasks with specially trained, small models. The only comparison that would have made sense is one to vanilla transformers using the same nr of parameters and trained on the same input-output dataset. But the paper shows no such comparison. In fact, I would be surprised if it was significantly better, because such architecture improvements are usually very modest or not applicable in general. And insinuating that this is some significant development to improve general purpose AI by throwing in ARC is just straight up dishonest. I could probably cook up a neural net in pytorch in a few minutes that beats a hand-crafted single task that o3 can't solve in an hour. That doesn't mean that I made any progress towards AGI.

bubblyworld|7 months ago

Have you spent much time with the ARC-1 challenge? Their results on that are extremely compelling, showing results close to the initial competition's SOTA (as of closing anyway) with a tiny model and no hacks like data augmentation, pretraining, etc that all of the winning approaches leaned on heavily.

Your criticism makes sense for the maze solving and sudoku sets, of course, but I think it kinda misses the point (there are traditional algos that solve those just fine - it's more about the ability of neural nets to figure them out during training, and known issues with existing recurrent architectures).

Assuming this isn't fake news lol.