No, not at all. There is a transformer obsession that is quite possibly not supported by the actual facts (CNNs can still do just as well: https://arxiv.org/abs/2310.16764), and CNNs definitely remain preferable for smaller and more specialized tasks (e.g. computer vision on medical data).
If you also get into more robust and/or specialized tasks (e.g. rotation invariant computer vision models, graph neural networks, models working on point-cloud data, etc) then transformers are also not obviously the right choice at all (or even usable in the first place). So plenty of other useful architectures out there.
Using transformers does not mutually exclude other tools in the sleeve.
What about DINOv2 and DINOv3, 1B and 7B, vision transformer models? This paper [1] suggests significant improvements over traditional YOLO-based object detection.
Is there something I can read to get a better sense of what types of models are most suitable for which problems? All I hear about are transformers nowadays, but what are the types of problems for which transformers are the right architecture choice?
D-Machine|2 months ago
If you also get into more robust and/or specialized tasks (e.g. rotation invariant computer vision models, graph neural networks, models working on point-cloud data, etc) then transformers are also not obviously the right choice at all (or even usable in the first place). So plenty of other useful architectures out there.
menaerus|2 months ago
What about DINOv2 and DINOv3, 1B and 7B, vision transformer models? This paper [1] suggests significant improvements over traditional YOLO-based object detection.
[1] https://arxiv.org/html/2509.20787v2
edge17|2 months ago