I think parsing a whole blueprint with monolithic models is really difficult, but the constrained object detection/semantic segmentation problems are significantly more tractable. You can chain those CV models with VLMs to do things like get scale right. I'm always interested in novel HCI paradigms like voice!
No comments yet.