timini's comments

timini | 4 months ago | on: Evaluating Control Protocols for Untrusted AI Agents

This paper evaluates three control strategies for untrusted agents: deferral to trusted models, resampling, and critical action deferral. Initial testing showed resampling and critical action deferral achieving 96% safety. However, adversarial testing revealed resampling crashes to 17% safety when attackers can detect resampling or simulate monitors, while critical action deferral remained robust against all attack strategies.

timini | 4 months ago | on: HaluMem: Evaluating Hallucinations in Memory Systems of Agents

HaluMem introduces the first benchmark for evaluating hallucinations in agent memory systems at the operation level. Through three evaluation tasks (memory extraction, updating, and question answering), it reveals that existing memory systems generate and accumulate hallucinations during early stages, which then propagate errors downstream. The benchmark uses two datasets spanning different context scales to systematically reveal these failure modes.

timini | 4 months ago | on: The OpenHands Software Agent SDK: Composable and Extensible

OpenHands SDK provides a complete architectural redesign for building production software development agents. It balances simplicity (few lines of code for basic agents) with extensibility (custom tools, memory management) while delivering seamless local-to-remote execution, integrated security, and connections to various interfaces (VS Code, command line, APIs).

timini | 4 months ago | on: Dynamic Tool Allocation for AI Agents (The Rats Pattern)

TL;DR Problem: "Tool overload" is a critical bottleneck for AI agents. Providing an LLM with a large, static list of tools bloats the context window, degrading performance, increasing costs, and reducing accuracy. Solution: Implement a "select, then execute" architectural pattern. Use a lightweight "router" agent to first retrieve a small, relevant subset of tools for a specific task. Then, a more capable "specialist" agent uses that curated set to execute the request. Benefits: Lower latency and cost (fewer tokens), higher tool-selection precision, a scalable architecture for large tool catalogs, and improved reliability. Pattern: This pattern is a form of Retrieval-Augmented Generation (RAG) applied to tools, often called Retrieval-Augmented Tool Selection (RATS). It can be combined with State-Based Gating for even greater precision. How: This post provides a complete, production-aware implementation using Google's Agent Development Kit (ADK).

timini | 3 years ago | on: Bard and new AI features in Search

Are you sure that google wont provide the link? If these chatbots could provide references for their answers it allows them to link back to websites, solving lots of problems mentioned here

timini | 4 years ago | on: Ask HN: Are most of us developers lying about how much work we do?

What practical things can I do to get better at my job? Ive always been a procrastinator but since the pandemic I've become like op. Very little work on job, whole weeks where I don't do anything. It's double edged sword, I have rekindled some old hobbies but I feel a lot of guilt about not being good at my job. I get by just about at work some people love my work some people hate working with me.

I would love to learn from successful people like you, is there anything you can recommend reading or any course to learn?

I'm in the middle of my life and feeling stuck and confused about working in tech. What's the remedy?

page 1