top | item 28738330

Ask HN: Computer Vision Project Ideas?

16 points| sss111 | 4 years ago | reply

We're a team of 5 senior undergrads taking a grad level Machine Learning + Computer Vision class. If you have ever had an idea but never had time to implement it, let us know here!

23 comments

order
[+] ArtWomb|4 years ago|reply
It seems like bar code scanning is a 1980s technology. The auto checkout kiosk could have product recognition, pricing & checkout built in ;)

State of the art in CV is remains video prediction: given N frames of patch input, generate the next frame.

If you are into space exploration, there are a lot of cool datasets like the "Spot the GEO" challenge

https://aiforspace.github.io/2021/

And if you get access to NVidia GPUs in the cluster, there's plenty of envelope pushing stuff you can do with Omniverse: AI for rendering, light transport, physics simulations, etc

http://cs348i.stanford.edu/

[+] ShamelessC|4 years ago|reply
What sort of compute do you get access to? There's a lot of cool stuff you could do depending on whether or not you have decent GPU's and for how much time you're allowed to experiment on them. Experimentation is fairly fundamental in practice.

There are a lot of pretraining tasks in vision/multimodal that are cool. Largely techniques introduced or refined by OpenAI re-implemented as pytorch open source codebases with varying degrees of success:

- Finetune your own CLIP https://github.com/mlfoundations/open_clip

- Train a (much smaller) DALLE https://github.com/lucidrains/DALLE-pytorch

- Train your own guided diffusion https://colab.research.google.com/drive/1javQRTkALBWLFWnx1K4... (pretty tough, may only be feasible on domain-specific data)

- Train a variational autoencoder (VAE)

- "VQGAN" from Heidelberg https://github.com/CompVis/taming-transformers

- "Discrete VAE", used as the backbone for OpenAI's DALL-E, reimplimented here (and other places) https://github.com/lucidrains/DALLE-pytorch

- "VQVAE2" https://github.com/tgisaturday/dalle-lightning

[+] Jefro118|4 years ago|reply
Identifying the elements within a GUI image (e.g. this is a button, that's an input field, etc.). I want this myself for a tool I'm building to turn Figma designs into code but it's also useful for things like automated testing. There are a bunch of papers on this already but no good public version that I can find. Probably companies like UIPath already have a sophisticated version of this internally. If you could do this and turn it into an API it would be quite valuable I think.
[+] no_time|4 years ago|reply
Predict the outcome of a roulette spin realtime.

Write a fully machine vision aimbot for CSGO. Perhaps you could feed the mouse and keyboard input into the tracking algorithm to improve accuracy. You need to intercept the mouse input anyway to tamper with the game state.

Predict a coin flip realtime.

Write a program that retroactively looks for a certain cat in a security cam footage (I miss my cat). This is the one I actually attempted a while ago using a the most dumb method known to man: Since it was an orange cat on a mostly grey/green footage I just defined a color range from dark brownish orange to light brownish orange and parsed each frame of the recording. It didn't work that well without defining a lot of treshold rules.

There are quite a few deterministic carnival/arcade games you could cheat with a bit of machine vision magic :^) Stacker comes to mind for example

[+] makersmasher|4 years ago|reply
I have spent a bit of time googling, but really can't find a whole lot. I am feeling around in the dark, and have no experience with CV (but would like to learn). Do you happen to have any resources relating to the roulette idea?
[+] the_only_law|4 years ago|reply
I’m not super familiar with the domain or if it’s trivial or not, but many years ago I had a budding idea for a stupid AR game, where using a phone’s camera, you would view the world around and the game detect human faces real time, drawing over them and turning them into “enemies”.

Also the idea, that many have shared of using CV to detect insects (say a cockroach) and then attacking it with some sort of weapon (everyone loves lasers, but a laser strong enough to kill an insect like that seems like it would introduce significant risk of collateral, so I wonder if instead a jet of household pesticide could be used). I wondered a while back if those little hexbug toys could be used for development.

[+] ragebol|4 years ago|reply
Take your favorite broad or card game, recognize the game state and suggest an optimal move.

I've created a bot for the card game Set years ago using classic computer vision. Should revisit that when I get my OAK-D Lite camera.

[+] jobigoud|4 years ago|reply
I have one that should be relatively easy: get popular magazines in PDF format and train a network to predict if a page is a full page ad or actual content. Then rebuild the PDF without the ads.
[+] judohacker|4 years ago|reply
I want to point my phone at a stack of poker chips and get a total of each chip color, e.g. 132 blue chips, 54 red chips, etc. Bonus points if I can then tell the app the value of each color and get the total.

I play cash games every week and it takes the host forever to count people's chips when they want to cash out.

[+] atatatat|4 years ago|reply
Weigh them.
[+] kordlessagain|4 years ago|reply
It would be great to have a model extract images from a screenshot of a webpage, then save them as their own images with locations of where they came from on the page. I haven’t been able to find this solution, although I’ve been able to do it with a color flattened palette approach using opencv.
[+] high_byte|4 years ago|reply
I'd like to convert FaceMesh (tensorflow) to Blend Shapes (aka morphs), like iPhone's LiveLink app but without iPhone. I have some solutions and workarounds but none are good enough.
[+] Raed667|4 years ago|reply
I have been going to the gym for the last couple of months. I would love to see an app that analyses my movements (dumbbell, barbell, kettlebell, etc..) and tell me how i can correct it.
[+] hellohntoday|4 years ago|reply
Please can you post a way for me to get in touch.
[+] hellohntoday|4 years ago|reply
My company has some pretty unique global shutter 360 camera footage, both indoors and out with precision GPS. Going back about 7 years. In all sorts of environments. It's quite a unique dataset and might be interesting to use for your project.
[+] fragmede|4 years ago|reply
read video from my dash cam and tell me if my car will fit into a parallel parking space before I drive by it

read video from my dash cam and classify at the vehicles around me for being a police car or not.