jeadie's comments

jeadie | 9 months ago | on: Airport for DuckDB

This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai

jeadie | 1 year ago | on: Pinecone integrates AI inferencing with vector database

This is a common feature now. If anything, for being so early to vector databases, Pinecone was rather late to integrating embeddings.

Timescale most recently added it but, yes a bunch of others: Weaviate, Spice AI, Marqo, etc.

jeadie | 1 year ago | on: Ask HN: Who is hiring? (April 2024)

Spice AI | Senior Software Engineer | GMT+10 (e.g. Australia) through GMT-7 (e.g. Seatle/SF/LA) | Remote | Full Time

Spice AI provides building blocks for data and AI-driven applications by composing real-time and historical time-series data, high-performance SQL query, machine learning training and inferencing, in a single, interconnected AI backend-as-a-service.

We just launched github.com/spiceai/spiceai, a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.

We're hiring experienced software engineers, ideally with Rust and/or Golang production experience. We're focused on large data and distributed systems, experience in these is important too. More details: https://spice.ai/careers#section-open-positions

jeadie | 2 years ago | on: GGML – AI at the Edge

I'm very glad that this has some added funding. I am building a serverless API on the cloudflare edge network using GGML as the backbone --> tryinfima.com

jeadie | 2 years ago | on: PrivateGPT

I've tried both Chroma and Qdrant. I don't think Chroma lacks that much. Definitely newer, but is also a great product. I think cloud support coming Q3 2023

jeadie | 2 years ago | on: After All Is Said and Indexed – Unlocking Information in Recorded Speech

Most people, like me, who end up needing to use vector DBs, are wanting to use LLMs on a specific, often private dataset/use case. Typically one starts with something like unstructured JSON data, then need to pick and manage LLMs to create embeddings, then store these and the original JSON data in a vectorDB. Then the application is some variety of CRUD operations + searching over both the original data and the embeddings.

Chroma, Pinecone, I guess FAISS/HNSWlib/etc only handle vector operations. Really what I'd want, which Marqo does, is handle everything end to end.

jeadie | 2 years ago | on: Do you need a vector database?

This is generally very context/use case specific. In general, if a document is a `Dict[str, Any]`, then you either have to have one (or multiple) vector(s) per field, unless you want to combine vectors across fields (it's not self-evident how you'd best do that). In saying that, specific reason's to do this (or why I've done it in the past).

1. Chunking long text fields in documents so as to get a better semantic vector for them (also you can only fit so much into an LLM). 2. Differently to 1. chunking long text fields (or even chunking images, audio, etc), is one way to perform highlighting. It helps to answer the question, for example, for a given document what about it was the reason it was returned? You can then point to the area in the image/text/audio that was most relevant. 3. You may want to run different LLMs on different fields (perhaps a separate multi-modal LLM vs a standard text LLM), or like another comment said have different transforms/representations of the same field.

Perhaps 100 vectors is non-standard, but definitely not unseen.

page 1