(no title)
souvik3333 | 8 months ago
Regarding the token limit, it depends on the text. We are using the qwen-2.5-vl tokenizer in case you are interested in reading about it.
You can run it very easily in a Colab notebook. This should be faster than the demo https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...
There are incorrect words in the extraction, so I would suggest you to wait for the handwritten text model's release.
mdaniel|8 months ago
Apologies if there's some unspoken nuance in this exchange, but by "working correctly" did you just mean that it ran to completion? I don't even recognize some of the unicode characters that it emitted (or maybe you're using some kind of strange font, I guess?)
Don't misunderstand me, a ginormous number of floating point numbers attempting to read that handwriting is already doing better than I can, but I was just trying to understand if you thought that outcome is what was expected
Eisenstein|8 months ago
Page# 8
Log: MA 6100 2.03.15
34 cement emitter resistors - 0.33R 5W 5% measure 0.29R 0.26R
35 replaced R436, R430 emitter resistors on R-chn P.O. brd w/new WW 5W .33R 5% w/ ceramic lead insulators
36 applied de-oxit d100 to speaker outs, card terminals, terminal blocks, output trans jacks
37 replace R-chn drivers and class A BJTs w/ BD139/146, & TIP31AG
38 placed boards back in
39 desoldered grnd lug from volume control
40 contact cleaner, Deoxit D5, faderlube on pots & switches teflon lube on rotor joint
41 cleaned ground lug & resoldered, reattached panel
souvik3333|8 months ago
Log: MA 6100 Z. O 3. 15
<table> <tr> <td>34</td> <td>cement emitter resistors -</td> </tr> <tr> <td></td> <td>0.33 R SW 5% measure</td> </tr> <tr> <td></td> <td>0.29 R, 0.26 R</td> </tr> <tr> <td>35</td> <td>replaced R'4 36, R4 30</td> </tr> <tr> <td></td> <td>emitter resistor on R-44</td> </tr> <tr> <td></td> <td>0.0. 3rd w/ new WW 5W .33R</td> </tr> <tr> <td>36</td> <td>% w/ ceramic lead insulators</td> </tr> <tr> <td></td> <td>applied de-oat d100 to Speak</td> </tr> <tr> <td></td> <td>outs, card terminals, terminal</td> </tr> <tr> <td></td> <td>blocks, output tran jacks</td> </tr> <tr> <td>37</td> <td>replace &-clun diviers</td> </tr> <tr> <td></td> <td>and class A BJTs w/ BD139/140</td> </tr> <tr> <td></td> <td>& TIP37A2</td> </tr> <tr> <td>38</td> <td>placed boards back in</td> </tr> <tr> <td>39</td> <td>desoldered ground lus from volume</td> </tr> <tr> <td></td> <td>(con 48)</td> </tr> <tr> <td>40</td> <td>contact cleaner, Deox. t DS, facel/42</td> </tr> <tr> <td></td> <td>on pots & switches</td> </tr> <tr> <td></td> <td>ยท teflon lube on rotor joint</td> </tr> <tr> <td>41</td> <td>reably cleaned ground lus &</td> </tr> <tr> <td></td> <td>resoldered, reattatched panel</td> </tr> </table> ```
You can paste it in https://markdownlivepreview.com/ and see the extraction. This is using the Colab notebook I have shared before.
Which Unicode characters are you mentioning here?