Okay, do you mean use it for speeding up media I/O there? I don't know if that would work. Using Yolo and drawing bounding boxes should already work fine with the supervision integration. That's how the Colab notebook does it.
Yeah to speed up media I/O. The thing is that you're feeding Yolo frame per frame. That's not ideal because you have to reimplement streaming and batching into Yolo. Here's an example of what you can do: https://docs.ultralytics.com/modes/predict/#__tabbed_2_13
That's giving more control to Yolo as to when it pulls frames and how it processes them. In the Colab example you can't do this.
I get this error: "No server set for the cv2 frontend. Set VF_IGNI_ENDPOINT and VF_IGNI_API_KEY environment variables or use cv2.set_server() before use."
I tried to use set_server but I'm not sure what argument it needs.
Vidformer-py is a thin client library around a vidformer server. Details and install here: https://ixlab.github.io/vidformer/install.html
It's possible to embed that into the python library, but getting FFmpeg, OpenCV, Rust, and the Python build systems to all play nice across multiple operating systems is too big a task for me to take on.
I'm not sure vidformer is a great fit for this task, at least in that way. It's better at creating and serving video results, not so much at processing. However, the data model does allow for something similar. You can take a video and serve a vidformer VOD stream on top, and as segments are requested it can run the model on those segments. Essentially you can run CV models as you watch the video. Some of this code is still WIP though.
simlevesque|1 year ago
That's giving more control to Yolo as to when it pulls frames and how it processes them. In the Colab example you can't do this.
I get this error: "No server set for the cv2 frontend. Set VF_IGNI_ENDPOINT and VF_IGNI_API_KEY environment variables or use cv2.set_server() before use."
I tried to use set_server but I'm not sure what argument it needs.
dominikwin|1 year ago
I'm not sure vidformer is a great fit for this task, at least in that way. It's better at creating and serving video results, not so much at processing. However, the data model does allow for something similar. You can take a video and serve a vidformer VOD stream on top, and as segments are requested it can run the model on those segments. Essentially you can run CV models as you watch the video. Some of this code is still WIP though.