(no title)
jeanloolz | 1 year ago
Blog post that explains the rationale behind the library: https://philippeoger.com/pages/can-we-rag-the-whole-web
Just submit your XML sitemap into a python class, and it will do the crawling, chunking, vectorizing and storage in an SQLite file for you. It's using SQLiteVSS integration with Langchain, but thinking of moving away from it, and do an integration with the new sqlite-vec instead.
samstave|1 year ago
A relational crawler on a particular subject with nuanced, opaque, seemingly-temporally-unrelated connections that show a particular MIC conduction of acts::
"Follow all the congress members who have been a part of a particular committee, track their signatory/support for particular ACTs that have been passed, and look at their investment history from open data, quiver, etc - and show language in any public speaking talking about conflicts and arms deals occurring whereby their support of the funding for said conflicts are traceable to their ACTs, committee seat, speaking engagements, investment profit and reporting as compared to their stated net worth over each year as compared to the stated gains stated by their filings for investment. Apply this pattern to all congress, and their public-profile orbit of folks, without violating their otherwise private-related actions."
And give it a series of URLs with known content for which these nuances may be gleaned.
Or have a trainer bot that will constantly only consume this context from the open internet over time such that you can just have a graph over time for the data...
PYTHON: Run it all through txtai / your library ? nodes and ask questions of the data in real time?
(And it reminds me of the work of this fine person/it::
https://mlops.systems/#category=isafpr
https://mlops.systems/#category=afghanistan
xrd|1 year ago
jeanloolz|1 year ago
I have not used sqlite-vec much because it was only alpha-released for now, but it finally came out a few days ago. I'm looking into integrating it and use it to make sqlite more my go-to RAG database.