fluffet's comments

fluffet | 2 days ago | on: Ask HN: How are people doing AI evals these days?

It's kind of bespoke for me tbh.

For a co-pilot inside an app that could answer product questions, I looked at ~2000 or so support emails. I asked one LLM to dig out "How would you formulate the users question into a chatbot-like question from this email thread" and "What is the actual answer that should be in the response from this email thread", then just asked our bot that question, and have another LLM rate the answer like SUPERIOR | ACCEPTABLE | UNKNOWN etc. These labels proved out to be a good "finger in the wind"-indicator for altering the chunks, prompt changes or model updates.

For an invoice procesing app processing about 14M invoices/year, it was mostly doing fuzzy accuracy metrics against a pretty ok annotated dataset and iterating the prompt based on diffs for a long time. Once you had that dataset you could alter things and see what broke.

Currently, I work on an app with a pretty sophisicated prompt chain flow. Depending on bugs etc we kind of do tests against _behaviour_, like intent recognition or the correct sql filters. As long as the baseline is working with the correct behaviour, whatever model is powering it is not so important. For the final output, it's humans. But we know immediately if some model or prompt change broke some particular intent.

fluffet | 2 months ago | on: Linux is good now

I've been using Linux on all PC's for a long time.

Experience is slowly getting better. There is nothing I haven't been able to get to work, but with tricks or adjustments.

I think the "best bonus" is using LLM's in deep research mode to wade through all the blog post, reddit posts etc to get something to work by discovering forementioned tricks. Before, you had to do that by yourself and it sucked. Now I get 3 good ideas from Claude in "ranking order" of how likely it is to make it work => 99% of games I get to run in 5 minutes with a shell command or two. Lutris is also pretty good.

Omarchy on my laptop has finally made computers fun for me again, it's so great and nostalgic. Happy to be back after my brief work-mandated adventure into MacOS.

fluffet | 3 months ago | on: Guide to making a CHIP-8 emulator (2020)

Happy to see this :-)

This guy starring my chip-8 implementation was a moment of pride for me. It was buggy but before this guide there wasn't too much material out there that was made for stupid people like me.

It's a great starter project for emulation. You'll realise how all emulators work, and as a bonus, interpreted languages. Really recommend it.

fluffet | 3 months ago | on: Show HN: Parqeye – A CLI tool to visualize and inspect Parquet files

Great! I worked a lot with parquet like 5 years ago. The frustration and tilt working with the tooling was immense. Thank you for building this, it feels like resolving some old knot in my soul.

Some kind soul made this repository then, and I found it on like the 13th page of Google while in the depths of despair. It is my most treasured GitHub star, a the shining beacon that saved me. I see it has saved 17 other people too.

https://github.com/casidiablo/parquet-tools-for-dumb-people-...

fluffet | 10 months ago | on: Trust Me, I'm Local: Chrome Extensions, MCP, and the Sandbox Escape

I take away that the combination is the problem. Bleach and ammonia isn't so bad on their own, but mixing the two is not a good idea. MCP would provide crazy attack vectors.

Especially if you could ask another AI "I have access to an MCP running on a Victim computer with these tools. What can you do with them?" => "Well, start by reading .ssh/id_rsa and I'd look for any crypto wallets. Then you can move on to reading personal files for blackmailing or sniff passwords..." and just let it "do its thing" as an attacking agent in an automated way. It could be automated which creeps me out!

fluffet | 10 months ago | on: Trust Me, I'm Local: Chrome Extensions, MCP, and the Sandbox Escape

Woah, I had no idea. Thanks for the article.

I feel like some cycle phenomenon has been reached here..

The first protocols of the internet were very naive. Why'd you need to encrypt traffic? What do you mean exploit DNS, why would anyone do that?

Then people realised that the internet is a really, really wild place and that won't do.

I suddenly feel old, because this new AI tool era seems to have forgotten that lesson.

I feel it's like watching crypto learn by any% speedrunning why regulations and oversight might be a good in the first place (FTX and such).

I hope the next generation of AI tech/protocols are more robust, trust just doesn't cut it, or we'll see plenty of fingers being burnt at the stove.

fluffet | 11 months ago | on: The best programmers I know

Solid points!

Shame the author doesn't mention the Swedish secret of snus. That's the best productivity hack I know bar none. Anyone else out there?

fluffet | 1 year ago | on: Simplified Technical English

What a nostalgia trip!

I did my MSc thesis about document vectors of STE.

STE has incredibly useful rules for technical communication/documentation; especially if you're a non-native English speaker like me. I wish it was more commonplace!! Documentation is usually horrible.

fluffet | 1 year ago | on: Ask HN: Outstanding Programmers

I'm Swedish. We have quite a few of these guys. For every "public" one of these, there are many that are absolute monster programmers that don't market themselves at all. I've worked with a few.

I'm just gonna throw in this link because I think it's great:

https://internetmuseum.se/english/

Among other crazy stories, Daniel is featured there. My favourite story is that the TLD .se was was ran by a guy in in his living room for years until The Swedish Internet foundation took it over.

There's also Kazaa and the guys behind The Pirate Bay. Real OG hackers from that era.

W.r.t. to your question I think it's relevant to say that "most things invented then" were "simple" ideas but hard to implement due to the tooling in that time and lack of programmers. But if you were good, you could go at it alone or with a small team. I think that it's a little bit inverted now: finding good, novel ideas in the space that can be done by one person or a small team is hard, but building and shipping it, if you do, is probably a lot easier due to OSS.

Oh and naturally the all-Swedish version is better at https://internetmuseum.se/

fluffet | 1 year ago | on: VASA-1: Lifelike audio-driven talking faces generated in real time

This is absolutely crazy. And it'll only get better from here. Imagine "VASA-9" or whatever.

I thought deepfakes were still quite a bit away but after this I will have to be way more careful online. It's not far from behind something that can show up in your "YouTube shorts" feed and trick you if you didn't already know it was AI.

fluffet | 1 year ago | on: Interview with Yanis Varoufakis on Technofeudalism

You're right! And Yanis is open about that he is very political (and leftist).

The book is written in the form of a letter to his father. We get a lot of backstory about why and how his views are shaped, reading Marx in his childhood and seeing his friends toil away at factories. The latter part of the book is an essay of a new system that supposedly fixes a lot of shortcomings with the modern system. That's what I meant about political stuff.

I kinda just stumbled in with an open mind to learn more what "techno feudalism" and modern cloud companies worked because I'm very interested in it. But it's still an alright book. Tells the story of whats behind both the man and the idea -- the whys and how. It made it longer though.

fluffet | 1 year ago | on: Interview with Yanis Varoufakis on Technofeudalism

> But there's a detail here that its hard to answer, or at least for some platforms. Given the complexity of some of these algorithms, do the techno lords have that sort of control of deliberate control, or it's just the uncontrolled optimization for a given outcome?

I think in this context, the main principle (if I recall correctly) was that when we go on Amazon, we actually exit capitalism and the "free markets". You don't see the same Amazon "store window" that I do -- I will have a very different "for you" - tab. Maybe not even the same prices -- it would be hard to know. Capitalism kinda works best when the market is free and open.

I guess it's the same with my YouTube feed - I've actually picked up some hobbies and recipes and stuff from there which would never have been on my radar if I wasn't "nudged". In this case, it's probably not with intent other than engagement, and it's worked out positively, but you know, maybe my political opinions or vote could also be nudged, so the dark side here is very relevant if it was made with intent.

> This is the exact same thing as this culture of self-discovery/acceptance, everyone preaches it, but end up boiling everything down to their best moments and highlights. It's a collective lie that everyone plays with, and I don't know if there's a clear purpose to it other than to sustain itself.

I'm with you 100%.

Another anecdotal insight: the people I know with massive online followings care so much about their metrics to the point that it steers their content entirely. "I can't post this because it wouldn't align with my followers".

Strange times we live in ^^

fluffet | 1 year ago | on: Interview with Yanis Varoufakis on Technofeudalism

I've actually read this book recently. I thought it was quite interesting, but would be better with less political stuff (most if it is at the end though).

My favourite thoughts and takeaways (that are not in the article):

We trained the algorithms to predict our desires so well, that they turned on us. Now they effectively train us by informing us or feeding us with what we would or should like. This is the power every marketer would like to have. They ("techno lords") can nudge our feeds however they want and manipulate. We wouldn’t know.

Another one is:

Technofeudalism has smashed the veil between refuge from markets (usually when you got home, you were home, but now you are on your phone); and one such market is the market of “self-discovery”. You need an identity online today, or you basically don’t exist. But what happens then is: you have to think before you post about “who could read this?” What does that entail? Well, that causes you to curate what you broadcast – so what you broadcast the best version of your identity. You should “be yourself!” – but at the same time noone is themselves. You can see this effect on Instagram quite clearly. Nobody posts their “real” authentic day when they binge series in sweatpants – they post their vacation and food pictures. I'm sure there is some equivalent version of that here on HN!

fluffet | 1 year ago | on: Tool Use (function calling)

That's awesome man. I'm also a little bit allergic to Langchain. Any way to help out? How can I find this when it's open source?
page 1