ARC Prize – a $1M+ competition towards open AGI progress
588 points| mikeknoop | 1 year ago |arcprize.org
ARC-AGI is (to our knowledge) the only eval which measures AGI: a system that can efficiently acquire new skill and solve novel, open-ended problems. Most AI evals measure skill directly vs the acquisition of new skill.
Francois created the eval in 2019, SOTA was 20% at inception, SOTA today is only 34%. Humans score 85-100%. 300 teams attempted ARC-AGI last year and several bigger labs have attempted it.
While most other skill-based evals have rapidly saturated to human-level, ARC-AGI was designed to resist “memorization” techniques (eg. LLMs)
Solving ARC-AGI tasks is quite easy for humans (even children) but impossible for modern AI. You can try ARC-AGI tasks yourself here: https://arcprize.org/play
ARC-AGI consists of 400 public training tasks, 400 public test tasks, and 100 secret test tasks. Every task is novel. SOTA is measured against the secret test set which adds to the robustness of the eval.
Solving ARC-AGI tasks requires no world knowledge, no understanding of language. Instead each puzzle requires a small set of “core knowledge priors” (goal directedness, objectness, symmetry, rotation, etc.)
At minimum, a solution to ARC-AGI opens up a completely new programming paradigm where programs can perfectly and reliably generalize from an arbitrary set of priors. At maximum, unlocks the tech tree towards AGI.
Our goal with this competition is:
1. Increase the number of researchers working on frontier AGI research (vs tinkering with LLMs). We need new ideas and the solution is likely to come from an outsider! 2. Establish a popular, objective measure of AGI progress that the public can use to understand how close we are to AGI (or not). Every new SOTA score will be published here: https://x.com/arcprize 3. Beat ARC-AGI and learn something new about the nature of intelligence.
Happy to answer questions!
neoneye2|1 year ago
I'm collecting data for how humans are solving ARC tasks, and so far collected 4100 interaction histories (https://github.com/neoneye/ARC-Interactive-History-Dataset). Besides ARC-AGI, there are other ARC like datasets, these can be tried in my editor (https://neoneye.github.io/arc/).
I have made some videos about ARC:
Replaying the interaction histories, and you can see people have different approaches. It's 100ms per interaction. IRL people doesn't solve task that fast. https://www.youtube.com/watch?v=vQt7UZsYooQ
When I'm manually solving an ARC task, it looks like this, and you can see I'm rather slow. https://www.youtube.com/watch?v=PRdFLRpC6dk
What is weird. The way that I implement a solver for a specific ARC task is much different than the way that I would manually solve the puzzle. Having to deal with all kinds of edge cases.
Huge thanks to the team behind the ARC Prize. Well done.
parentheses|1 year ago
ECCME|1 year ago
salamo|1 year ago
If I can make one criticism/observation of the tests, it seems that most of them reason about perfect information in a game-theoretic sense. However, many if not most of the more challenging problems we encounter involve hidden information. Poker and negotiations are examples of problem solving in imperfect information scenarios. Smoothly navigating social situations also requires a related problem of working with hidden information.
One of the really interesting things we humans are able to do is to take the rules of a game and generate strategies. While we do have some algorithms which can "teach themselves" e.g. to play go or chess, those same self-play algorithms don't work on hidden information games. One of the really interesting capabilities of any generally-intelligent system would be synthesizing a general problem solver for those kinds of situations as well.
com2kid|1 year ago
I swear, not enough people have kids.
Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.
One thing kids do is they'll ask for confirmation of their guess. You'll be reading a book you've read 50 times before and the kid will stop you, point at a dog in the book, and ask "dog?"
And there is a development phase where this happens a lot.
Also kids can get mad if they are told an object doesn't match up to the expected label, e.g. my son gets really mad if someone calls something by the wrong color.
Another thing toddlers like to do is play silly labeling games, which is different than calling something the wrong name on accident, instead this is done on purpose for fun. e.g. you point to a fish and say "isn't that a lovely llama!" at which point the kid will fall down giggling at how silly you are being.
The human brain develops really slowly[1], and a sense of linear time encoding doesn't really exist for quite awhile. (Even at 3, everything is either yesterday, today, or tomorrow) so who the hell knows how things are being processed, but what we do know is that kids gather information through a bunch of senses, that are operating at an absurd data collection rate 12-14 hours a day, with another 10-12 hours of downtime to process the information.
[1] Watch a baby discover they have a right foot. Then a few days later figure out they also have a left foot. Watch kids who are learning to stand develop a sense of "up above me" after they bonk their heads a few time on a table bottom. Kids only learn "fast" in the sense that they have nothing else to do for years on end.
theptip|1 year ago
The optimization process that trained the human brain is called evolution, and it took a lot more than 10,000 examples to produce a system that can differentiate cats vs dogs.
Put differently, an LLM is pre-trained with very light priors, starting almost from scratch, whereas a human brain is pre-loaded with extremely strong priors.
pants2|1 year ago
A human that has never seen a dog or a cat could probably determine which is which based on looking at the two animals and their adaptations. This would be an interesting test for AIs, but I'm not quite sure how one would formulate a eval for this.
jules|1 year ago
VirusNewbie|1 year ago
well, maybe. We view things in three dimensions at high fidelity: viewing a single dog or cat actually ends up being thousands of training samples, no?
AIorNot|1 year ago
https://youtu.be/UakqL6Pj9xo?si=iDH6iSNyz1Net8j7
goertzen|1 year ago
allanrbo|1 year ago
fennecbutt|1 year ago
ML models are starting from absolute zero, single celled organism level.
unknown|1 year ago
[deleted]
woadwarrior01|1 year ago
Neither do machines. Lookup few-shot learning with things like CLIP.
nextaccountic|1 year ago
Humans learn through a lifetime.
Or are we talking about newborn infants?
lacker|1 year ago
Would an intelligent but blind human be able to solve these problems?
I'm worried that we will need more than 800 examples to solve these problems, not because the abstract reasoning is so difficult, but because the problems require spatial knowledge that we intelligent humans learn with far more than 800 training examples.
modeless|1 year ago
Yann LeCun argues that humans are not general intelligence and that such a thing doesn't really exist. Intelligence can only be measured in specific domains. To the extent that this test represents a domain where humans greatly outperform AI, it's a useful test. We need more tests like that, because AIs are acing all of our regular tests despite being obviously less capable than humans in many domains.
> the problems require spatial knowledge that we intelligent humans learn with far more than 800 training examples.
Pretraining on unlimited amounts of data is fair game. Generalizing from readily available data to the test tasks is exactly what humans are doing.
> Would an intelligent but blind human be able to solve these problems?
I'm confident that they would, given a translation of the colors to tactile sensation. Blind humans still understand spatial relationships.
HarHarVeryFunny|1 year ago
I don't think there's any rules about what knowledge/experience you build into your solution.
nickpsecurity|1 year ago
To OP: I like your project goal. I think you should look at prior, reasoning engines that tried to build common sense. Cyc and OpenMind are examples. You also might find use for the list of AGI goals in Section 2 of this paper:
https://arxiv.org/pdf/2308.04445
When studying intros of brain function, I also noted many regions tie into the hippocampus which might do both sense-neutral storage of concepts and make inner models (or approximations) of external world. The former helps tie concepts together through various senses. The latter helps in planning when we are imagining possibilities to evaluate and iterate on them.
Seems like AGI should have these hippocampus-like traits and those in the Cyc paper. One could test if an architecture could do such things in theory or on a small scale. It shouldn’t tie into just one type of sensory input either. At least two with the ability to act on what only exists in one or what is in both.
Edit: Children also have an enormous amount of unsupervised training on visual and spatial data. They get reinforcement through play and supervised training by parents. A realistic benchmark might similarly require GB of prettaining.
andoando|1 year ago
There are two countries both which lay claim to the same territory. There is a set X that contains Y and there is a set Z that contains Y. In the case that the common overlap is 3D and one in on top of the other, we can extend this to there is a set X that contains -Y and a set Z that contains Y, and just as you can only see one on top and not both depending on where you stand, we can apply the same property here and say set X and Z cannot both exist, and therefore if set X is on then -Y and if set Z then Y.
If you pay attention to the language you use youll start to realize how much of it uses spatial relationships to describe completely abstract things. For example, one can speak of disintigrating hegonomic economies. i.e turning things built on top of eachother into nothing, to where it came
We are after all, reasoning about things which happen in time and space.
And spatial != visual. Even if you were blind youd have to reason spatially, because again any set of facts are facts in space-time. What does it take to understand history? People in space, living at various distances from each other, producing goods from various locations of the earth using physical processes, and physically exchanging them. To understand battles you have to understand how armies are arranged physically, how moving supplies works, weather conditions, how weapons and their physical forms affect what they can physically do, etc.
Hell LLMs, the largest advancement we had in artificial intelligence do what exactly? Encode tokens into multi dimensional space.
CooCooCaCha|1 year ago
This is the wrong way to think about it IMO. Spatial relationships are just another type of logical relationship and we should expect AGI to be able to analyze relationships and generate algorithms on the fly to solve problems.
Just because humans can be biased in various ways doesn’t mean these biases are inherent to all intelligences.
dimask|1 year ago
Blind people can have spatial reasoning just fine. Visual =/= spatial [0]. Now, one would have to adapt the colour-based tasks to something that would be more meaningful for a blind person, I guess.
[0] https://hal.science/hal-03373840/document
Lerc|1 year ago
There may (almost certainly will be) additional knowledge encoded in the solver to cover the spacial concepts etc. The distinction with the AGI-ARC test is the disparity between human and AI performance, and that it focuses on puzzles that are easier for humans.
It would be interesting to see a finetuned LLM just try and express the rule for each puzzle as english. It could have full knowledge of what ARC-AGI is and how the tests operate, but the proof of the pudding is simply how it does on the test set.
lynx23|1 year ago
pmayrgundter|1 year ago
In it they question the ease of Chollet's tests: "One limitation on ARC’s usefulness for AI research is that it might be too challenging. Many of the tasks in Chollet’s corpus are difficult even for humans, and the corpus as a whole might be sufficiently difficult for machines that it does not reveal real progress on machine acquisition of core knowledge."
ConceptARC is designed to be easier, but then also has to filter ~15% of its own test takers for "[failing] at solving two or more minimal tasks... or they provided empty or nonsensical explanations for their solutions"
After this filtering, ConceptARC finds another 10-15% failure rate amongst humans on the main corpus questions, so they're seeing maybe 25-30% unable to solve these simpler questions meant to test for "AGI".
ConceptARC's main results show CG4 scoring well below the filtered humans, which would agree with a [Mensa] test result that its IQ=85.
Chollet and Mitchell could instead stratify their human groups to estimate IQ then compare with the Mensa measures and see if e.g. Claude3@IQ=100 compares with their ARC scores for their average human
[ConceptArc]https://arxiv.org/pdf/2305.07141 [Mensa]https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-10...
mikeknoop|1 year ago
> We found that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 84% of tasks solved per participant
kenjackson|1 year ago
salamo|1 year ago
I guess there might be a disagreement of whether the problems in ARC are a representative sample of all of the possible abstract programs which could be synthesized, but then again most LLMs are also trained on human data.
mark_l_watson|1 year ago
paxys|1 year ago
I'd also urge you to use a different platform for communicating with the public because x.com links are now inaccessible without creating an account.
mikeknoop|1 year ago
bongodongobob|1 year ago
"Endow circuitry with consciousness and win a gift certificate for Denny's (may not be used in conjunction with other specials)"
hackerlight|1 year ago
cma|1 year ago
elicksaur|1 year ago
ks2048|1 year ago
lxgr|1 year ago
AGI will take much more than that to build, and once you have it, if all you can monetize it for is a million dollars, you must be doing something extremely wrong.
btbuildem|1 year ago
elicksaur|1 year ago
However, I do disagree that this problem represents “AGI”. It’s just a different dataset than what we’ve seen with existing ML successes, but the approaches are generally similar to what’s come before. It could be that some truly novel breakthrough which is AGI solves the problem set, but I don’t think solving the problem set is a guaranteed indicator of AGI.
nadam|1 year ago
zug_zug|1 year ago
Imo there's no evidence whatsoever that nailing this task will be true AGI - (e.g. able to write novel math proofs, ask insightful questions that nobody has thought of before, self-direct its own learning, read its own source code)
Animats|1 year ago
That's a stretch. This is a problem at which LLMs are bad. That does not imply it's a good measure of artificial general intelligence.
After working a few of the problems, I was wondering how many different transformation rules the problem generator has. Not very many, it seems. So the problem breaks down into extracting the set of transformation rules from the data, then applying them to new problems. The first part of that is hard. It's a feature extraction problem. The transformations seem to be applied rigidly, so once you have the transformation rules, and have selected the ones that work for all the input cases, application should be straightforward.
This seems to need explicit feature extraction, rather than the combined feature extraction and exploitation LLMs use. Has anyone extracted the rule set from the test cases yet?
elicksaur|1 year ago
The issue with that path is that the problems aren’t using a programmatic generator. The rule sets are anything a person could come up with. It might be as simple as “biggest object turns blue” but they can be much more complicated.
Additionally, the test set is private so it can’t be trained on or extracted from. It has rules that aren’t in the public sets.
[1] https://www.kaggle.com/competitions/abstraction-and-reasonin...
n2d4|1 year ago
slicerdicer1|1 year ago
levocardia|1 year ago
Defining intelligence as an efficiency of learning, after accounting for any explicit or implicit priors about the world, makes it much easier to understand why human intelligence is so impressive.
ildon|1 year ago
bigyikes|1 year ago
https://youtu.be/UakqL6Pj9xo
itissid|1 year ago
What about Theory of Mind which talks about the problem of multiple agents in the real world acting together? Like driving a car cannot be done right now without oodles of data or any robot - human problem that requires the robot to model human's goals and intentions.
I think the problem is definition of general intelligence: Intelligence in the context of what? How much effort(kwh, $$ etc) is the human willing to amortize over the learning cycle of a machine to teach it what it needs to do and how that relates to a personally needed outcome( like build me a sandwich or construct a house)? Hopefully this should decrease over time.
I believe the answer is that the only intelligence that really matters is Human-AI cooperative intelligence and our goals and whether a machine understands them. The problems then need to be framed as optimization of a multi attribute goal with the attribute weights adjusted as one learns from the human.
I know a few labs working on this, one is in ASU(Kambhampati, Rao et. al) and possibly Google and now maybe open ai.
andoando|1 year ago
Take for example a simple audiotory pattern like "clap clap clap". This has a very trival mapping as visual like so:
x x x
- - -
house house house
whereas anyone would agree the sound of three equally spaced claps would not be analogous to say:
aa b b b
-- --- -- -- ---
This ability to relate or equate two entirely different senses should clue you in that there is a deeper framework at play
ks2048|1 year ago
So you can view 100 per page instead of clicking through one-by-one: https://kts.github.io/arc-viewer/page1/
neoneye2|1 year ago
Idea for a metric: - Number of pixels that stays the same between input/output. - Histogram changes.
bigyikes|1 year ago
Is there something special about these questions that makes them resistant to memorization? Or is it more just the fact that there are 100 secret tasks?
taneq|1 year ago
btbuildem|1 year ago
1: https://www.crn.com/news/applications-os/220100498/researche...
dang|1 year ago
Francois Chollet: OpenAI has set back the progress towards AGI by 5-10 years - https://news.ycombinator.com/item?id=40652818 - June 2024 (5 comments)
nmca|1 year ago
https://manifold.markets/JacobPfau/will-the-arcagi-grand-pri...
0xDEAFBEAD|1 year ago
https://manifold.markets/Tossup/will-the-arcagi-grand-prize-...
Lerc|1 year ago
Not sure If I have the skills to make an entry, but I'll be watching at least.
visarga|1 year ago
This scales for 200M users and 1 billion sessions per moth for OpenAI, which can interpret every human response as a feedback signal, implicit or explicit. Even more if you take multiple sessions of chat spreading over days, that continue the same topic and incorporate real world feedback. The scale of interaction is just staggering, the LLM can incorporate this experience to iteratively improve.
If you take a look at humans, we're very incapable alone. Think feral Einstein on a remote island - what could he achieve without the social context and language based learning? Just as a human brain is severely limited without society, LLMs also need society, diversity of agents and experiences, and sharing of those experiences in language.
It is unfair to compare a human immersed in society with a standalone model. That is why they appear limited. But even as a system of memorization+recombination they can be a powerful element of the AGI. I think AGI will be social and distributed, won't be a singleton. Its evolution is based on learning from the world, no longer just a parrot of human text. The data engine would be: World <-> People <-> LLM, a full feedback cycle, all three components evolve in time. Intelligence evolves socially.
8organicbits|1 year ago
Pay no attention to the man behind the curtain.
This type of thinking would claim that mechanical turk is AGI, or perhaps that human+pen and paper is AGI. While they are great tools, that's not how I'd characterize them.
cheevly|1 year ago
logicallee|1 year ago
>Happy to answer questions!
1. Can humans take the complete test suite? Has any human done so? Is it timed? How long does it take a human? What is the highest a human who sat down and took the ARC-AGI test scored?
2. How surprised would you be if a new model jumped to scoring 100% or nearly 100% on ARC-AGI (including the secret test tasks)? What kind of test would you write next?
neoneye2|1 year ago
Humans can try the 800 tasks here. There is no time limit. I recommend not starting with the `expert` tasks, but instead go with the `entry` level puzzles. https://neoneye.github.io/arc/?dataset=ARC
If a model jumps to 100%, that may be a clever program or maybe the program has been trained on the 100 hidden tasks. Fchollet has 100 more hidden tasks, for verifying this.
mkl|1 year ago
Here's how I understand the rule: yellow blobs turn green then spew out yellow strips towards the blue line, and the width of the strips is the number of squares the green blobs take up along the blue line. The yellow strips turn blue when they hit the blue line, then continue until they hit red, then they push the red blocks all the way to the other side, without changing the arrangement of the red blocks that were in the way of the strip.
The first example violates the last bit. The red blocks in the way of the rightmost strip start as
but get turned into Every other strip matches my rule.tshadley|1 year ago
https://x.com/fchollet https://x.com/arcprize https://x.com/mikeknoop
lopuhin|1 year ago
Retr0id|1 year ago
The current batch of LLMs can be uncharitably summarized as "just predict the next token". They're pretty good at that. If they were perfect at it, they'd enable AGI - but it doesn't look like they're going to get there. It seems like the wrong approach. Among other issues, finite context windows seem like a big limitation (even though they're being expanded), and recursive summarization is an interesting kludge.
The ARC-AGI tasks seem more about pattern matching, in the abstract sense (but also literally). Humans are good at pattern matching, and we seem to use pattern matching test performance as a proxy for measuring human intelligence (like in "IQ" tests). I'm going to side-step the question of "what is intelligence, really?" by defining it as being good at solving ARC-AGI tasks.
I don't know what the solution is, but I have some idea of what it might look like - a machine with high-order pattern-matching capabilities. "high-order" as in being able to operate on multiple granularities/abstraction-levels at once (there are parallels here to recursive summarization in LLMs).
So what is the difference between "pattern matching" and "token prediction"? They're closely related, and you could use one to do the other. But the real difference is that in pattern matching there are specific patterns that you're matching against. If you're lucky you can even name the pattern/trope, but it might be something more abstract and nameless. These patterns can be taught explicitly, or inferred from the environment (i.e. "training data").
On the other hand, "token prediction" (as implemented today) is more of a probabilistic soup of variables. You can ask an LLM why it gave a particular answer and it will hallucinate something plausible for you, but the real answer is just "the weights said so". But a hypothetical pattern matching machine could tell you which pattern(s) it was matching against, and why.
So to summarize (hah), I think a good solution will involve high-order meta-pattern matching capabilities (natively, not emulated or kludged via an LLM-shaped interface). I have no idea how to get there!
geor9e|1 year ago
optimussupreme|1 year ago
janalsncm|1 year ago
Thats the general pattern although my description wasn’t very good.
zurfer|1 year ago
rule 2: glue the left outer piece to the bottom
rule 3: overlap every now and then :D
rule 4: invert some of the pieces every now and then
visarga|1 year ago
If the AI is really AGI it could presumably do it. But not even the whole human society can do it in one go, it's a slow iterative process of ideation and validation. Even though this is a life and death matter, we can't simply solve it.
This is why AGI won't look like we expect, it will be a continuation of how societies solve problems. Intelligence of a single AI in isolation is not comparable to that of societies of agents with diverse real world interactions.
mewpmewp2|1 year ago
isaacfrond|1 year ago
PontifexMinimus|1 year ago
Why doesn't a baby just run a marathon before it learns to walk? Because you've got to learn to walk before you can run.
> But not even the whole human society can do it in one go, it's a slow iterative process of ideation and validation.
So you break it down into little steps, which is what is being done here.
freediver|1 year ago
dailykoder|1 year ago
nojvek|1 year ago
I did a few human examples by hand, but gotta do more of them to start seeing patterns.
Human visual and auditory system is impressive. Most animals see/hear and plan from that without having much language. Physical intelligence is the biggest leg up when it comes to evolution optimizing for survival.
nmca|1 year ago
PontifexMinimus|1 year ago
skywhopper|1 year ago
Speaking of extraordinary claims. What evidence is there that LLMs have “proven economic utility”? They’ve drawn a ludicrous amount of investment thanks to claims of future economic utility, but I’ve yet to see any evidence of it.
PontifexMinimus|1 year ago
curious_cat_163|1 year ago
However, why are the 100 test tasks secret? I don't understand why how resisting “memorization” techniques requires it. Maybe someone can enlighten me.
muglug|1 year ago
andoando|1 year ago
TheDudeMan|1 year ago
laurent_du|1 year ago
flawn|1 year ago
Geee|1 year ago
neoneye2|1 year ago
Recently Michael Hodel has reverse engineered 400 of the tasks, so more tasks can be generated. Interestingly it can generate python programs that solves the tasks too.
https://github.com/michaelhodel/re-arc
ryanoptimus|1 year ago
david_shi|1 year ago
gkamradt|1 year ago
https://arcprize.org/guide
Happy to answer any questions you have along the way
(I'm helping run ARC Prize)
ks2048|1 year ago
jolt42|1 year ago
neoneye2|1 year ago
There are many examples where the test is slightly OOD (out of distribution), so the solver will have to generalize.
abtinf|1 year ago
This is treating “intelligence” like some abstract, platonic thing divorced from reality. Whatever else solving these puzzles is indicative of, it’s not intelligence.
levocardia|1 year ago
Or instead, is there some underlying latent capability we call 'strength,' that is correlated with performance in a broad but constrained range of real-world tasks that humans encounter and solve, whose value is something we'd like to assess and, ideally, build machines that can surpass?
abtinf|1 year ago
> We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience.
I’m afraid that definition forecloses the possibility of AGI. The immediate basic question is: why build skills at all?
HarHarVeryFunny|1 year ago
Any useful definition of intelligence has to be totally general - to our brain experience is just patterns of neural activation. Our brain has no notion of certain inputs being from the the jungle and others from the blackboard or whatever.
Phil_Latio|1 year ago
lxe|1 year ago
gkamradt|1 year ago
Happy to answer any questions you have along the way
(I'm helping run ARC Prize)
mewpmewp2|1 year ago
montag|1 year ago
https://arcprize.org/guide
z3phyr|1 year ago
chairhairair|1 year ago
I bet you could use those puzzles as benchmarks as well.
treprinum|1 year ago
fennecbutt|1 year ago
KBme|1 year ago
gushogg-blake|1 year ago
ilaksh|1 year ago
Things like SORA and gpt-4o that use [diffusion transformers etc. or whatever the SOTA is for multimodal large models] seem to be able to generalize quite well. Have these latest models been tested against this task?
HarHarVeryFunny|1 year ago
1) Who is providing the prize money, and if it is yourself and Francois personally, then what is your motivation ?
2) Do you think it's possible to create a word-based, non-spatial (not crosswords or sudoku, etc) ARC test that requires similar run-time exploration and combination of skills (i.e. is not amenable to a hoard of narrow skills)?
p1esk|1 year ago
montag|1 year ago
3. DIRECT LLM PROMPTING In this method, contestants use a traditional LLM (like GPT-4) and rely on prompting techniques to solve ARC-AGI tasks. This was found to perform poorly, scoring <5%. Fine-tuning a state-of-the-art (SOTA) LLM with millions of synthetic ARC-AGI examples scores ~10%.
"LLMs like Gemini or ChatGPT [don't work] because they're basically frozen at inference time. They're not actually learning anything." - François Chollet
Additionally, keep in mind that submissions to Kaggle will not have access to the internet. Using a 3rd-party, cloud-hosted LLM is not possible.
mikeknoop|1 year ago
unknown|1 year ago
[deleted]
blendergeek|1 year ago
Is there a "color-blind friendly" mode?
PontifexMinimus|1 year ago
- annoying animated background
- white text on black background
- annoying font choices
Which is unfortunate because (as I found when I used Firefox reader mode) you're discussing important and interesting stuff.
mishamagic|1 year ago
bilsbie|1 year ago
arcastroe|1 year ago
montag|1 year ago
djoldman|1 year ago
Anyone else share the suspicion that ML rapidly approaching 100% on benchmarks is sometimes due to releasing the test set?
ummonk|1 year ago
It's rather surprising to me that neural nets that can learn to win at Go or Chess can't learn to solve these sorts of tasks. Intuitively would have expected that using a framework generating thousands of playground tasks similar to the public training tasks, a reinforcement learning solution would have been able to do far better than the actual SOTA. Of course the training budget for this could very well be higher than the actual ARC-AGI prize amount...
lenerdenator|1 year ago
dskloet|1 year ago
itsgrimetime|1 year ago
flawn|1 year ago
chx|1 year ago
adamgordonbell|1 year ago
empath75|1 year ago
lamontcg|1 year ago
s1k3s|1 year ago
:)
thatxliner|1 year ago
EternalFury|1 year ago
unknown|1 year ago
[deleted]
oldpersonintx|1 year ago
[deleted]
barfbagginus|1 year ago
I feel like a prize of a billion dollars would be more effective.
But even if it was me, and even if the prize was a hundred billion dollars, I would still keep it under wraps, and use it to advance queer autonomous communism in a hidden way, until FALGSC was so strong that it would not matter if our AGI got scooped by capitalist competitors.
m3kw9|1 year ago
breck|1 year ago
If you make your site public domain, and drop the (C), I'll compete.