chartpath's comments

chartpath | 1 year ago | on: Chatbot hinted a kid should kill his parents over screen time limits: lawsuit

Yeah same company. Clearly they haven't improved their guardrails in the last couple months.

It's never the technology that's the problem, it's the owners and operators who decide how to use it.

chartpath | 1 year ago | on: Your docs are your infrastructure

We should pivot the culture to one that is pro-liberal arts again. These people know how to read and write better than STEMs in general.

CS as the only path to programming was always too narrow, and often people with a broader education are better at creative solutions. With AI-assisted programming I'd argue they have an even clearer advantage now.

chartpath | 1 year ago | on: Fine-tuning now available for GPT-4o

Some much focus on fine-tuning when it can actively make performance on reasoning and planning benchmarks worse (over a baseline of already worse-than-coin-toss).

Why not give us nice things for integrating with knowledge graphs and rules engines pretty please?

chartpath | 1 year ago | on: Setting up PostgreSQL for running integration tests

I've never had a problem working https://pgtap.org/ into CI.

I know the article title says "integration tests" but when a lot of functionality is done inside PostgreSQL then you can cover a lot of the test pyramid with unit tests directly in the DB as well.

The test database orchestration from the article pairs really well with pgTAP for isolation.

chartpath | 2 years ago | on: Doubts grow about the biosignature approach to alien-hunting

Kind of a tangent but I'm really interested in why statements like:

> if you have carbon-based life forms, you will have water and CO2.

..can lead to statements like:

> it is just way more likely than any other form

I totally agree on the observation, but what is fascinating to me is why a deductive statement can be considered to indicate likelihood in probability. It seems there is a bit of abductive reasoning going on behind the scenes which neither the deductive logic or inductive probability can really capture on their own.

chartpath | 2 years ago | on: Seven signs of ethical collapse (2012)

How is wealth distribution wrong? It sounds like you are arguing in favour of inequality.

chartpath | 2 years ago | on: After OpenAI's blowup, it seems pretty clear that 'AI safety' isn't a real thing

There's a third kind, which is when unscrupulous business managers or politicians use it to make decisions that they would not be capable of auditing for a rationale when otherwise required to know why such a decision was made.

It's more of an ethics and compliance issue with the cost of BS and plausible deniability going to zero. As usual, it's what humans do with technology that has good or bad consequences. The tech itself is fairly close to neutral as long as training data wasn't chosen specifically to contain illegal substance or by way of copyright infringement (which isn't even the tech, it's the product).

chartpath | 2 years ago | on: Seven signs of ethical collapse (2012)

$10k is nothing for rich people. Just putting them in debt for it doesn't mean nothing of value was provided.

The reverse would be true where poor people could be ruined, unless the value provided is worth significantly more than the debt created, which seems doubtful.

chartpath | 2 years ago | on: YouTube cracks down on synthetic media with AI disclosure requirement

There is a lot of synthetic content about academic subjects on YT now, and it's very low quality. I used to search for lectures to listen to while walking or driving but now need to wade through tons of enshittified spam. Even if it's reading wikipedia or other long form articles, the voices and graphics are bad.

Actually I paid for Blinkist recently and really enjoyed it at first. They have a lot of "blinks" that state at the end that the voice was synthetic and I was legitimately surprised at the quality, having not even noticed until they told me.

This seems like a good move for YT to maintain a basic level of quality (which I'm amazed can actually get worse), but I suspect it's a pretext to avoid paying out to "illegitimate creators" for commercial reasons in a way that makes them look like they care about people.

chartpath | 2 years ago | on: Why does the USA use 110V and UK use 230-240V? (2014)

UK scored a massive own goal and are no longer part of Europe ;)

chartpath | 2 years ago | on: ChatGPT Enterprise

I've been wondering this myself lately.

After using RAG with pgvector for the last few months with temperature 0, it's been pretty great with very little hallucination.

The small context window is the limiting factor.

In principle, I don't see the difference between a bunch of fine-tuned prompts along the lines of "here is another context section: <~4k-n tokens of the corpus>", which is the same as what it looks like in a RAG prompt anyway.

Maybe the distinction of whether it is for "tone" or "context" is based on the role of the given prompts and not restricted by the fine-tuning process itself?

In theory, fine-tuning it on ~100k tokens like that would allow for better inference, even with the RAG prompt that includes a few sections from the same corpus. It would prevent issues where the vector search results are too thin despite their high similarity. E.g. picking out one or two sections of a book which is actually really long.

For example, I've seen some folks use arbitrary chunking of tokens in batches of 1k or so as an easy config for implementation, but that totally breaks the semantic meaning of longer paragraphs, and those paragraphs might not come back grouped together from the vector search. My approach there has been manual curation of sections allowing variations from 50 to 3k tokens to get the chunks to be more natural. It has worked well but I could still see having the whole corpus fine-tuned as extra insurance against losing context.

chartpath | 2 years ago | on: ChatGPT Enterprise

I thought they already didn't use input data from the API to train; that it was only the consumer-facing ChatGPT product from which they'd use the data for training. It is opt-in for contributing inputs via API.

https://help.openai.com/en/articles/5722486-how-your-data-is...

That said, for enterprises that use the consumer product internally, it would make sense to pay to opt-out from that input being used.

chartpath | 2 years ago | on: Ask HN: If we train an LLM with “data” instead of “language” tokens

Just like NNs did to symbolic AI which is sorely missed to build explainable and ethical systems.

chartpath | 2 years ago | on: Brave is a fork, not a Chromium reskin

Thank you for saying it!

chartpath | 2 years ago | on: Giving GPT “Infinite” Knowledge

Search query expansion: https://en.wikipedia.org/wiki/Query_expansion

We've done this in NLP and search forever. I guess even SQL query planners and other things that automatically rewrite queries might count.

It's just that now the parameters seem squishier with a prompt interface. It's almost like we need some kind of symbolic structure again.

chartpath | 2 years ago | on: Giving GPT “Infinite” Knowledge

I can understand why that framing would be attractive, but there is no real fundamental difference when considering JSONB/HSTORE in PostgreSQL, and now we have things like pgvector https://github.com/pgvector/pgvector to store and search over embeddings (including k-nn).

chartpath | 3 years ago | on: Anyone else witnessing a panic inside NLP orgs of big tech companies?

Even if that were true, LLMs don't give any kind of "handles" on the semantics. You just get what you get and have to hope it is tuned for your domain. This is 100% fine for generic consumer-facing services where the training data is representative, but for specialized and jargon-filled domains where there has to be a very opinionated interpretation of words, classical NLU is really the only ethical choice IMHO.

chartpath | 3 years ago | on: Amazon employees push CEO Andy Jassy to drop return-to-office mandate

Especially if they were hired with remote in their original agreement. Changing the location of work would necessitate some additional consideration and a new agreement.

chartpath | 3 years ago | on: A tiny genetic change inflicts old age on young kids

Haha, much better response than the usual "wake me up when it's replicated in humans" any time there is a mouse study shared.

chartpath | 3 years ago | on: Coworkers are less ambitious; bosses adjust to the new order

Thanks! Great points about leading with some of our own vulnerability as a way to create psychological safety. Refusing bad direction to protect the team is always hard, because it carries the burden of articulating how that protection is better for the company as a whole, which means politics and either CYA records or face saving flattery that is somewhat degrading for the manager to have to do. You sound like a good person to work with.