(no title)
tmpfs | 9 months ago
In the end I found the python trifatura library to extract the best quality content with accurate meta data.
You might want to compare your implementation to trifatura to see if there is room for improvement.
tmpfs | 9 months ago
In the end I found the python trifatura library to extract the best quality content with accurate meta data.
You might want to compare your implementation to trifatura to see if there is room for improvement.
acrophobic|9 months ago
If you're using Go, I maintain Go ports of Readability[0] and Trafilatura[1]. They're actively maintained, and for Trafilatura, the extraction performance is comparable to the Python version.
[0]: https://github.com/go-shiori/go-readability
[1]: https://github.com/markusmobius/go-trafilatura
derekperkins|9 months ago
breadchris|9 months ago
fabmilo|9 months ago
for the curious: Trafilatura means "extrusion" in Italian.
| This method creates a porous surface that distinguishes pasta trafilata for its extraordinary way of holding the sauce. search maccheroni trafilati vs maccheroni lisci :)
(btw I think you meant trafilatura not trifatura)
thm|9 months ago
winddude|9 months ago