imaurer's comments

imaurer | 1 year ago | on: Prelude – a tiny CLI tool building context prompts from your code

Have a bunch of Makerile commands (pbcopy-api, pbcopy-ui, pbcopy-curr) that use some mishmash of git ls-files, grep, xargs tail -n +1 piped into pbcopy.

Kitchen sink command: pbcopy-all: git ls-files | xargs tail -n +1 | pbcopy

Works like a charm in Q2 2024.

I’m sure this will be a very solved problem by 2025.

imaurer | 1 year ago | on: The one about the web developer job market

“ Finding effective documentation, information, and training is likely to get harder, especially in specialised topics where LLMs are even less effective than normal.”

Who needs documentation with Claude and pbcopy?

imaurer | 1 year ago | on: Tech Debt: My Rust Library Is Now a CDO

I'm excited for the Michael Lewis version of the Rust library ecosystem.

imaurer | 2 years ago | on: OpenAI GPT-4 vs. Groq Mistral-8x7B

Groq will soon support function calling. At that point, you would want to describe your data specification and use function calling to do extraction. Tools such as Pydantic and Instructor are good starting points.

I am collecting these approaches and tools here: https://github.com/imaurer/awesome-llm-json

imaurer | 2 years ago | on: Inversion: Fast, Reliable Structured LLMs

Currently, LLM models are not state of the art at Named Entity Recognition. They are slower, more expensive and less accurate than a fine tuned BERT model.

However, they are way easier to get started with using in context learning. Soon, they will be cheaper and probably faster enough too that training your own model will be a waste of time for 95% of use cases (probably higher because it will unlock use cases that wouldn’t break even with the old NLP approaches from a value perspective).

This is why I am tracking LLM structured outputs here:

https://github.com/imaurer/awesome-llm-json

And created an autocorrecting pydantic library that could be used for Named entity linking:

https://github.com/genomoncology/FuzzTypes

imaurer | 2 years ago | on: Show HN: We built the fastest spreadsheet

R2 support for egress $$ reasons?

imaurer | 2 years ago | on: IAC sold 17 apps to Bending Spoons. $100M deal, all 330 employees fired

How'd they get both Chainsmokers?

imaurer | 2 years ago | on: LuaX: A Lua Dialect with JSX

I don’t know anything about Lua other than I want to try it out because of redbean [1]. Wonder if this project can work with that?

[1] https://redbean.dev/

imaurer | 2 years ago | on: Weaviate – Open-Source AI Native Vector Database

One feature I haven’t seen people write about is the ref2vec capability. I find this to be an interesting way to get some knowledge graph-like capabilities out of Weaviate.

Posting here to see if someone sees it by happenstance and writes an awesome article about it someday so I can read it.

https://weaviate.io/blog/ref2vec-centroid

imaurer | 2 years ago | on: Donut: OCR-Free Document Understanding Transformer

Two places I use it: Preview on my Mac, photos on my phone. Haven’t seen an api yet.

imaurer | 2 years ago | on: Which vector similarity metric should I use?

Yes

imaurer | 2 years ago | on: Which vector similarity metric should I use?

Yes, cosine distance works best in convex or normalized sets. Thinking about adding this caveat. Thanks for the question.

imaurer | 2 years ago | on: What is a Vector Database? (2021)

Well Weaviate is graphql and it has filtering and hybrid search which is a great feature that pg can’t fully support because it doesn’t have bm25

https://weaviate.io/developers/weaviate/api/graphql/filters

https://weaviate.io/blog/hybrid-search-explained

I have a ChatGPT session where I have asked it to do a hybrid search using filtering, pg fts and vector search. Looks reasonable just need to test it and write it up somewhere.

imaurer | 2 years ago | on: What is a Vector Database? (2021)

AWS just added yesterday. Hosting options tracked here:

https://github.com/pgvector/pgvector/issues/54

imaurer | 2 years ago | on: What is a Vector Database? (2021)

I am bullish Pgvector because I am “postgres for everything guy”.

Current concerns are the scaling and recall performance.

The author is looking at product quantization along with other ideas: https://github.com/pgvector/pgvector/issues/27

More details on product quantization: https://mccormickml.com/2017/10/13/product-quantizer-tutoria...

A nice repo that tracks the ANN relative performance of different indexes: https://mccormickml.com/2017/10/13/product-quantizer-tutoria...

Also shoutout to Weaviate because they have great docs, are open source and have very informative YouTube channel.

https://weaviate.io/

imaurer | 2 years ago | on: In PostgreSQL, powerful Full Text Search is available out of the box

I hope someone implements BM25 and combines it with Pgvector to bring hybrid search to Postgres. I feel like that is the jsonb of the next couple of years.

imaurer | 2 years ago | on: GitHub Accelerator: our first cohort and what’s next

Best part of Simon being part of it, is that there will be a great record of it from his blog and TILs.

imaurer | 2 years ago | on: Collection of LLM resources that can be used to build products you can “own”

HuggingFace’s platform allows for experimenting, learning and leveraging models and data sets including LLM and instruction sets to train a chat bot.

For “merging”, I would learn about fine tuning to see if that’s what you are looking to learn more about.

imaurer | 2 years ago | on: Collection of LLM resources that can be used to build products you can “own”

If Google Docs was the only way most people wrote text then I think your analogy would indeed be apt. In this case, nearly all people using Large Language Models are doing so through a web page (ChatGPT) or an API.

That's the inspiration behind the name, open for something better. Considered "Edge" as well, but was concerned that would seem IoT/mobile specific.

imaurer | 2 years ago | on: The Coming of Local LLMs

Tracking repos and resources for running LLMs locally here:

https://github.com/imaurer/awesome-decentralized-llm