Why not use existing OCR/document extraction tools [0]? There are a number of options, and even a custom implementation is probably a reasonable side project given some standardized structure.
The structure isn't standardized --- it's a random check design placed on top of an invoice which may be printed from a wide variety of printers at some random scale, and possibly photocopied multiple times.
WillAdams|2 years ago