I thought this would be inherent just on their training? There are many multitudes more Reddit posts than scientific papers or encyclopedia type sources. Although I suppose the latter have their own biases as well.
I'd expect LLMs' biases to originate from the companies' system prompts rather than the volume of training data that happens to align with those biases.
I would expect the opposite. Seems unlikely to me an ai company would be spending much time engineering system prompts that way except in the case of maybe Grok where Elon has a bone to pick with perceived bias.
docmars|3 months ago
mrbombastic|3 months ago