jmanhype | 5 days ago | on: Show HN: VAOS – A feedback loop that makes deployed agents less stupid
jmanhype's comments
jmanhype | 8 days ago | on: Show HN: VAOS – A feedback loop that makes deployed agents less stupid
Threshold control — yes, that's the plan. Right now it's a single 0.8 cutoff which is obviously too blunt. A social media agent vs a client-facing email agent have completely different risk profiles. Building per-channel thresholds into the next release.
Cold start seeding — we actually do something close to this already. The rules system lets you pre-load corrections before the agent handles its first real conversation. But you're right that 5-10 reference outputs would be even better than corrections. That's a cleaner onboarding UX. Adding it to the backlog.
Corrections as structured context vs fine-tuning — glad someone else sees this the same way. The portability argument is the one that convinced me. If you can export your corrections as JSON and bring them to another provider, that kills the lock-in problem. We store them as structured records in Supabase right now.
Correction categories — this is smart and we're halfway there. The dashboard already groups corrections by trace/conversation, but not by pattern type (tone, audience, missing context). That's a better abstraction. Would make it much easier to spot systemic issues instead of whack-a-mole-ing individual responses.
Appreciate the detailed feedback. Curious — are you running agents in production yourself?
jmanhype | 8 days ago | on: Show HN: VAOS – A feedback loop that makes deployed agents less stupid
The part I care about most: every 5 minutes, a loop scores each agent response on confidence. Low-confidence ones get flagged for you to review. When you correct something, that correction goes into the agent's context for future responses. Not fine-tuning -- just feeding corrections back as structured context. After a few days, the agent stops repeating the same bad answers.
I'm dogfooding it with an agent called Scribe that posts to X for me. Scribe was terrible for the first ~80 interactions. Now it's mostly fine. The cold start period is real and I haven't figured out how to shorten it.
What works: Telegram responses in under 2 seconds. Swap between GPT-5.2, Claude Opus 4.6, and Gemini without reconfiguring. The feedback loop does what I wanted.
What doesn't: Discord and WhatsApp aren't hooked up. No way to export learned corrections (lock-in problem I need to solve). Observability dashboard exists but only I can see it right now.
$29/mo, $10 in AI credits included, 14-day trial. Stack is Node.js on Fly.io.
Curious about the confidence-scoring approach. Anything above 0.8 auto-approves, below gets queued for human review. Should I give users that threshold control, or is one knob enough?
jmanhype | 16 days ago | on: Show HN: LoRA gradients on Apple's Neural Engine at 2.8W
192 gradient dispatches, zero GPU fallbacks, converging loss, all at ~2.8W.
Three discoveries found through iteration on real hardware: (1) ANE's matmul op compiles but never executes — everything must be rewritten as 1x1 convolutions, (2) spatial dimensions must be multiples of 16, (3) the ANE compiler leaks handles and silently fails after ~119 compiles.
Built on maderix's ANE reverse engineering work. The repo includes the full MIL kernel generator, subprocess isolation for the compile limit, and integration with MLX for hybrid GPU+ANE training.