top | item 45853486

(no title)

evgpbfhnr | 3 months ago

To author -- code sample as images is great for syntax highlight but I wanted to play with the examples and.. got stuck trying to copy the content.

(also expected tesseract to do a bit better than this:

  $ wl-paste -t image/png | tesseract -l eng - -
  Estimating resolution as 199
  const std = @import("std");
  const expect = std.testing.expect;
  
  const Point = struct {x: i32, y: i32};
  
  test "anonymous struct literal" {
  const pt: Point = .{
  x = 13,
  -y = 67,
  33
  try expect (pt.x
  try expect(pt.y
  
  13);
  67);

)

discuss

order

porridgeraisin|3 months ago

tesseract does well for me...

    const std = @import("std");
    const expect = std.testing.expect;

    const Point = struct {x: i32, y: i32};

    test "anonymous struct literal" {

    const pt: Point = .{
    .x = 13,
    .y = 67,
    };
    try expect(pt.x == 13);
    try expect(pt.y == 67);

The trick is to preprocess the image a little bit like so:

    ocr () 
    { 
        magick - -monochrome -negate - | tesseract stdin stdout 2> /dev/null
    }

evgpbfhnr|3 months ago

Thank you!

Unfortunately I get the same kind of garbage around closing curly braces / closing parenthesis / dots with this magick filter... It seems to do slightly better with an extra `-resize 400%`, but still very far from as good as what you're getting (to be fair the monochrome filter is not pretty (bleeding) when inspecting the result).

I wonder what's different? ( ImageMagick-7.1.1.47-1.fc42.x86_64 and tesseract-5.5.0-5.fc42.x86_64 here, no config, langpack(s) also from the distro)

NewsaHackO|3 months ago

Yeah, he definitely should have used a code block for the examples. To the author, if you are trying to preserve code formatting and syntax highlighting, there are JS packages that will take care of all of that and produce clean, copyable, well-rendered, accessible code formatting for you.