top | item 41128255

(no title)

jeanloolz | 1 year ago

I built a similar thing as a python library that does just that: https://github.com/philippe2803/contentmap

Blog post that explains the rationale behind the library: https://philippeoger.com/pages/can-we-rag-the-whole-web

Just submit your XML sitemap into a python class, and it will do the crawling, chunking, vectorizing and storage in an SQLite file for you. It's using SQLiteVSS integration with Langchain, but thinking of moving away from it, and do an integration with the new sqlite-vec instead.

discuss

order

samstave|1 year ago

This is part of a dream of a tool I would like:

A relational crawler on a particular subject with nuanced, opaque, seemingly-temporally-unrelated connections that show a particular MIC conduction of acts::

"Follow all the congress members who have been a part of a particular committee, track their signatory/support for particular ACTs that have been passed, and look at their investment history from open data, quiver, etc - and show language in any public speaking talking about conflicts and arms deals occurring whereby their support of the funding for said conflicts are traceable to their ACTs, committee seat, speaking engagements, investment profit and reporting as compared to their stated net worth over each year as compared to the stated gains stated by their filings for investment. Apply this pattern to all congress, and their public-profile orbit of folks, without violating their otherwise private-related actions."

And give it a series of URLs with known content for which these nuances may be gleaned.

Or have a trainer bot that will constantly only consume this context from the open internet over time such that you can just have a graph over time for the data...

PYTHON: Run it all through txtai / your library ? nodes and ask questions of the data in real time?

(And it reminds me of the work of this fine person/it::

https://mlops.systems/#category=isafpr

https://mlops.systems/#category=afghanistan

xrd|1 year ago

I know sqlite-vss has been upgraded lately. But, it was unstable for a while prior. Are you having good experiences with it?

jeanloolz|1 year ago

Actually, Sqlite-vss has been untouched for quite some time, and the creator has officially communicated that it was deprecated to be replaced by sqlite-vec, which has recently seen its first non-alpha release (v0.1.0). So, I would embrace sqlite-vec now if I were you.

I have not used sqlite-vec much because it was only alpha-released for now, but it finally came out a few days ago. I'm looking into integrating it and use it to make sqlite more my go-to RAG database.