top | item 46340913 (no title) rynn | 2 months ago Where did you get all the data? The justice.gov site didn’t have a mass download option that I could find. discuss order hn newest slazien|2 months ago https://www.jmail.world/about"We compiled these Epstein estate emails from the House Oversight Committee release by converting the PDFs to structured text with an LLM"and:"Data Sources Gmail emails: House Oversight Committee Yahoo emails: DDoSecrets (brought to us by Drop Site News) TechnologyDocument parsing and extraction powered by reducto" dvrp|2 months ago Yes, also many were PPM images (or encoded as such) in PDFs and then I used (cheap/light) multimodal LLMs to classify documents from photos. It was surprisingly cheap: <$1 for a few thousand PDFs / Images.
slazien|2 months ago https://www.jmail.world/about"We compiled these Epstein estate emails from the House Oversight Committee release by converting the PDFs to structured text with an LLM"and:"Data Sources Gmail emails: House Oversight Committee Yahoo emails: DDoSecrets (brought to us by Drop Site News) TechnologyDocument parsing and extraction powered by reducto" dvrp|2 months ago Yes, also many were PPM images (or encoded as such) in PDFs and then I used (cheap/light) multimodal LLMs to classify documents from photos. It was surprisingly cheap: <$1 for a few thousand PDFs / Images.
dvrp|2 months ago Yes, also many were PPM images (or encoded as such) in PDFs and then I used (cheap/light) multimodal LLMs to classify documents from photos. It was surprisingly cheap: <$1 for a few thousand PDFs / Images.
slazien|2 months ago
"We compiled these Epstein estate emails from the House Oversight Committee release by converting the PDFs to structured text with an LLM"
and:
"Data Sources
TechnologyDocument parsing and extraction powered by reducto"
dvrp|2 months ago