(no title)
langcss | 1 year ago
You can work around this by the way by sending the output through a checking stage.
So picture -> gpt4o -> out1, picture -> tesseract -> out2, out1,out2 -> llm.
Might work for sound too.
langcss | 1 year ago
You can work around this by the way by sending the output through a checking stage.
So picture -> gpt4o -> out1, picture -> tesseract -> out2, out1,out2 -> llm.
Might work for sound too.
falcor84|1 year ago
langcss|1 year ago
schrodinger|1 year ago
killerstorm|1 year ago
Best speech to text is already NN transformer based anyway, so in theory it's only better to use a combined model