Wow I had hoped for a more productive discussion than these 1-1 comparisons of Bard vs ChatGPT that I'm seeing everywhere. The model deployed with this version of Bard is clearly a smaller model than the biggest LaMDA/PaLM models Google has been working on for ages. Which, according to their publications, show unprecedented results on _proof writing_ of all things (see Minerva). While their strategic decisions may be questionable (or they're just trying to quantize the model for mass deployment without burning billions per month in compute costs), its almost silly to question Google's ability to build useful LLMs.
At the moment unless we get more information about what metric you're supposed to evaluate it on, you could probably simplify the headline to just "Bard is much worse than chatgpt" without any loss of accuracy.
It's not really realistic to expect people to give Google credit for these amazing models they have published results about but haven't let people play with - they have given people Bard and people are evaluating it based on the criteria most obvious to them - a comparison to a very similar product that has just been released.
They knew the war they were entering, they knew their enemies, they knew how they'd get evaluated and still decided to get this model out in its current state, leading to the conclusion: Yes, this is really the best they can do and it's much worse than the state of the art.
In any case, it's a massive marketing blunder, the public opinion formed within the last hours was overwhelmingly "Bard sucks compared to ChatGPT."
Am I missing something? Most of TFA is about Bard failing to answer with rhyming words, but in the only prompts shown the author doesn't actually ask for rhyming words. He just says the hint and the name of the puzzle.
Is this not simply: "Bard is worse than ChatGPT at having seen the 'how-to-play' page for my side project during its training"?
Clicking through to the link next to 'last week's text' and then to 'full rules', it looks like the author is starting the chat sessions with a full explanation that isn't included in the screenshots. (Also, the last screenshot shows the author explicitly asking about rhymes.)
People from Google have argued that's exactly why they're failing.
Personally, as someone who worked at a company that was up over 500% during the pandemic, shipped absolutely nothing during that spike, and then deflated below their pre-pandemic pricing, I saw the foley in hiring smart people first hand.
It's not enough to hire the smartest people, and in fact it can be a competitive disadvantage. The smartest people often want their piece of the product to reflect their ingenuity no matter how ancillary it is to the core mission. Unfortunately that often precludes the kind of agility that businesses need to stay competitive.
OpenAI managed to poach Googlers by simply not having fiefdoms built by smart people™. I imagine if Google had built GPT-4, it wouldn't be having downtime today. Because it wouldn't be public. And it might never be public because it doesn't scale for Google scale yet, and the ethicists want their say, and we need to integrate it into Borg and the front end hasn't passed through enough layers of design and...
I think they just got lazy and entitled. Maybe ChatGPT will be the scare they need. It feels bad though; they almost don’t deserve the energy this fight will give them.
That's a clever game to get it to play. Today I asked ChatGPT to give me 1000 Fibonacci numbers starting with the 2000th number and it crashed. Later I asked it the same prompt and it repeatedly gave me code to calculate the Fibonacci numbers in Python.
I hope you understand that it's only because of inability of OpenAI's servers to keep up with demand or some issue in their backend code - language models themselves can't "crash" like normal programs on some kind of input, because they "just" generate new tokens.
From Chicago ... and with more than a thousand solves on that puzzle we've never received a single complaint about the rhyme on that one (plus the stats say users find it an extremely easy rhyme to solve)! Curious how you pronounce that one such that they don't rhyme?
If I understand how Large Language models work, they don't actually know about spelling.... they are given tokens that represent words, and can only infer things from the context of those tokens across terabytes of data that they're given.
It is amazing, but somewhat explicable as an emergent effect.
Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.
In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely what it was trained to do.
It is amazing, but somewhat explicable as an emergent effect.
Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.
In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely how it was trained.
I find it more amazing tbh that you can ask for a poem about something, and that it then sticks to the plot, makes references to the start etc than the actual rhyming.
"Bard is much worse than ChatGPT at solving an obscure word game I invented" would have been a more honest title, but would probably generate less clicks for the author.
Bard may still be much worse than ChatGPT at solving all kinds of puzzles, but the article is click bait for promoting the author's word game, not an actual investigation that warrants that conclusion.
Having read through the word game, I agree with others that it's good that the game is less likely to be in the corpus. I think rhyming, while a challenging task, may be a poor benchmark for ability. The author doesn't seem to understand rhyming too well (cactus practice is a weak rhyme at best)
I completely disagree with the "hasty rhyming test" - Skeleton and Gelatin don't rhyme (-ton vs -tin), and rhyme worse than protein and poutine (-een vs --een).
the use of novel puzzles is frankly awesome because there's a much lower chance of contamination from previous puzzles so we get a chance to see how much generalization they've achieved.
How do you navigate this blog to read the other articles? I couldn't find any way to read the one on gpt4 (clicking the underlined "wrote about" does nothing) and twofergoofer.com/blog goes to a 404.
Hah - we only made the blog over the weekend and don't have any nav or menu for now. But yep we link the prior article a few times in this article, that article goes into more detail!
hackpert|2 years ago
seanhunter|2 years ago
It's not really realistic to expect people to give Google credit for these amazing models they have published results about but haven't let people play with - they have given people Bard and people are evaluating it based on the criteria most obvious to them - a comparison to a very similar product that has just been released.
Traubenfuchs|2 years ago
In any case, it's a massive marketing blunder, the public opinion formed within the last hours was overwhelmingly "Bard sucks compared to ChatGPT."
random_cynic|2 years ago
Unless they release a model one can "use" and verify their claims it's literally silly to make this statement.
DeathArrow|2 years ago
It's almost silly to presume anything without proofs. People are judging Google based on what Google has shown.
xiphias2|2 years ago
They behave like Yahoo when Google took over.
letitgo12345|2 years ago
peyton|2 years ago
celim307|2 years ago
masakreTech|2 years ago
fenomas|2 years ago
Is this not simply: "Bard is worse than ChatGPT at having seen the 'how-to-play' page for my side project during its training"?
comex|2 years ago
jackblemming|2 years ago
BoorishBears|2 years ago
Personally, as someone who worked at a company that was up over 500% during the pandemic, shipped absolutely nothing during that spike, and then deflated below their pre-pandemic pricing, I saw the foley in hiring smart people first hand.
It's not enough to hire the smartest people, and in fact it can be a competitive disadvantage. The smartest people often want their piece of the product to reflect their ingenuity no matter how ancillary it is to the core mission. Unfortunately that often precludes the kind of agility that businesses need to stay competitive.
OpenAI managed to poach Googlers by simply not having fiefdoms built by smart people™. I imagine if Google had built GPT-4, it wouldn't be having downtime today. Because it wouldn't be public. And it might never be public because it doesn't scale for Google scale yet, and the ethicists want their say, and we need to integrate it into Borg and the front end hasn't passed through enough layers of design and...
natch|2 years ago
SteveNuts|2 years ago
noflag|2 years ago
http://www.paulgraham.com/microsoft.html
kodah|2 years ago
Tiberium|2 years ago
I hope you understand that it's only because of inability of OpenAI's servers to keep up with demand or some issue in their backend code - language models themselves can't "crash" like normal programs on some kind of input, because they "just" generate new tokens.
jasfi|2 years ago
boffinism|2 years ago
And yet one puzzle they hammer Bard for failing is "Cactus Practice". What accent do you have to have for that to be a perfect rhyme?
cowllin|2 years ago
sethaurus|2 years ago
visarga|2 years ago
https://www.phind.com/
It is very fast and wins the search benchmarks here:
https://twitter.com/vladquant/status/1638305110869807104
masakreTech|2 years ago
mikewarot|2 years ago
Any rhyming done is an impressive result.
ben0x539|2 years ago
ramblerman|2 years ago
Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.
In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely what it was trained to do.
It is amazing, but somewhat explicable as an emergent effect.
Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.
In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely how it was trained.
I find it more amazing tbh that you can ask for a poem about something, and that it then sticks to the plot, makes references to the start etc than the actual rhyming.
chewxy|2 years ago
masakreTech|2 years ago
[deleted]
milemi|2 years ago
Bard may still be much worse than ChatGPT at solving all kinds of puzzles, but the article is click bait for promoting the author's word game, not an actual investigation that warrants that conclusion.
vineyardmike|2 years ago
I completely disagree with the "hasty rhyming test" - Skeleton and Gelatin don't rhyme (-ton vs -tin), and rhyme worse than protein and poutine (-een vs --een).
LesZedCB|2 years ago
the use of novel puzzles is frankly awesome because there's a much lower chance of contamination from previous puzzles so we get a chance to see how much generalization they've achieved.
porphyra|2 years ago
cldellow|2 years ago
It's located at https://twofergoofer.com/blog/gpt-4
cowllin|2 years ago
ralfd|2 years ago
masakreTech|2 years ago
NoZebra120vClip|2 years ago
unknown|2 years ago
[deleted]
unknown|2 years ago
[deleted]
alfredohere|2 years ago
[deleted]