top | item 42277822

(no title)

blackjackfoe | 1 year ago

This project was really my first decent introduction to computer vision and machine learning (along with that of those who helped me in various ways; none of them desired to be credited here other than the guy who collected some of the data for me.)

It was definitely a successful learning exercise, and it's made me more confident tackling some other problems I've had in mind for awhile.

discuss

order

spookie|1 year ago

To help you out if you're interested:

- a smeared gaussian in one axis and another in another axis can really help segmenting chars, finding lines of text in OCR

- You can unshear chars using the Radon or Hough transform as a basis to understand the angle

Went through MNIST a few weeks ago and I agree it's interesting!

blackjackfoe|1 year ago

I am always interested! Thank you for the tips, I'll definitely research these.

sorenjan|1 year ago

Shearing is a linear operation that should be trivial for a NN to learn. Have you found that unshearing is actually useful? Was it to feed the image to an existing OCR program?

normie3000|1 year ago

How did this project help you to learn computer vision? I'd also like to write a basic captcha solver as an intro, but superficially this project just looks like a dump of generated code.

blackjackfoe|1 year ago

What do you mean by "generated code"? All of the code in the linked GitHub repo was written by me, with the assistance of a couple friends who helped here and there, but didn't request to be credited.

I learned a lot because I had to do a ton of research and experimentation (fancy word for trial-and-error) to write the code and have it work as I expected.