(no title)
lamuswawir | 8 months ago
I am experimenting with the current SOTA multimodal LLMs, but performance is still not yet there, they still hallucinate non-existent teeth. (As an aside, I have found a simple but very telling test, I have an image with only 4 teeth visible up and 10 down, so I prompt the modal to count, non have been able to, but Gemini 2.5 pro is the closest of the lot, performance is worse in the description when the counting test fails).
I am going to try segmenting the image to see if I will have better results by prompting to describe segment by segment.
No comments yet.