(no title)
olup
|
1 year ago
I am interested, but why should I use this one over jina ai reader (which is also free) or firecrawl, or the ten other puppeteer + readability + turndown pipeline (or even a AWS lambda doing the same) ? This is not sarcastic I am genuinely looking for something fresh in the field.
sachou|1 year ago
If yes then DataFuel is the right choice. Adding this feature as we speak.
Please let me know :)
olup|1 year ago
Having developed a couple of page to markdown myself, I think the bigger challenge is to make sense of so many pages that rely on spacial organisation of information that only makes sense to human, or even presence of images. One way to do it is to render the page as an image and extract data with a vision llm. But you do need heuristic on when to do classic extraction and when to use vision, plus get rid of cookie banner and overlays. This is more complex and costly, but have real business value, for the one that can pull it off.