top | item 34810091

(no title)

touringa | 3 years ago

If you're talking about Google Bard, they were very clear in the LaMDA 2 paper that they only used public sources.

"...from public dialog data and other public web documents..."

LaMDA 2 paper: https://arxiv.org/abs/2201.08239

My overview of Google Bard including dataset: https://lifearchitect.ai/bard/

My overview of Google PaLM and Pathways family including dataset: https://lifearchitect.ai/pathways/

Compare with other models including the use of DeepMind's MassiveWeb/MassiveText and EleutherAI's Pile dataset: https://lifearchitect.ai/whats-in-my-ai/

discuss

order