(no title)
tbalsam | 10 months ago
You don't need RL remotely for this usecase. Image resolution pyramids are pretty normal tho and handling them well/efficiently is the big thing. Using RL for this would be like trying to use graphene to make a computer screen because it's new and flashy and everyone's talking about it. RL is inherently very sample inefficient, and is there to approximate when you don't have certain defined informative components, which we do have in computer vision in spades. Crossentropy losses (and the like) are (generally, IME/IMO) what RL losses try to approximate, only on a much larger (and more poorly-defined) scale.
Please mark speculation as such -- I've seen people see confident statements like this and spend a lot of time/manhours on it (because it seems plausible). It is not a bad idea from a creativity standpoint, but practically is most certainly not the way to go about it.
(That being said, you can try for dynamic sparsity stuff, it has some painful tradeoffs that generally don't scale but no way in Illinois do you need RL for that)
hedgehog|10 months ago
tbalsam|10 months ago
Modern SSD/YOLO-style detectors use efficient feature pyramids, you need that to know where to propose where things are in the image.
This sounds a lot like going back to the old school object detection techniques which end up being more inefficient in general, generally very compute inefficient.