top | item 46971984

(no title)

alameenpd | 19 days ago

Hey HN,I'm Al-ameen, been running OpenClaw as my personal agent over Telegram/WhatsApp for a few weeks now. It's incredibly powerful for proactive stuff (calendar, emails, code tasks), but the token burn is brutal — heartbeats alone were eating $50–100/mo on Claude Opus before I started optimizing.OpenClaw's native memory (SOUL.md, session history) works for short sessions but forgets across restarts or long threads, leading to repetitive prompting and higher costs. So I started hooking it up to Whisper (a context/memory API with hybrid search, knowledge graphs, and delta compression) via custom skills.It's still hacky — no native integration, just a Whisper skill that queries compressed context/memories before feeding prompts to the agent. But early results align with what others report: noticeably fewer tokens in multi-turn convos, especially when combining with cheap model routing (Gemini Flash / DeepSeek for heartbeats).What I did (basics)Baseline tweaks first (these alone helped a lot):In ~/.openclaw/openclaw.json:json

{ "agents": { "defaults": { "model": "gemini/flash-lite", "fallbacks": ["anthropic/claude-3-opus"], "contextTokens": 150000 }, "heartbeat": { "model": "deepseek/v3.2", "interval": "45m" } } }

→ Switched trivial stuff away from Opus. Whisper skill setupCreated ~/.openclaw/workspace/skills/whisper/:SKILL.md

# Whisper Context Fetch Pulls delta-compressed context / memories to reduce prompt size.

@whisper get "summarize auth changes in repo" --project openclaw-exp

index.ts (simplified)ts

import { Whisper } from '@usewhisper/sdk';

const whisper = new Whisper({ apiKey: process.env.WHISPER_API_KEY, orgId: process.env.WHISPER_ORG_ID });

export async function getContext(args: { query: string; project: string }) { const res = await whisper.query({ project: args.project || 'openclaw-exp', query: args.query, compress: true, compression_strategy: 'delta', include_memories: true }); return { context: res.context, memories: res.memories?.map(m => `${m.type}: ${m.content}`) || [], meta: res.meta }; }

Then in agent instructions (AGENTS.md or prompt templates): "If context/history is large, call @whisper get first and prepend the result." Wrapper example (in a custom loop or cron)ts

async function runAgentWithContext(msg: string) { let enhanced = msg; if (/* detect long context needed */) { const wc = await getContext({ query: msg }); enhanced = `Context diff: ${wc.context}\nMemories:\n${wc.memories.join('\n')}\n\n${msg}`; } return openclaw.run(enhanced); }

Observations so farDelta compression seems to genuinely cut repeat context (e.g., codebase overviews drop from 10–15k → 1–3k tokens after first message). Memory types (preferences, events, goals) help avoid re-explaining user prefs. Combined with routing, my rough tracking shows ~60–75% lower usage on mixed workloads (not 97% like some YouTube claims, but meaningful). Latency added (~200–500ms per query) but worth it for longer sessions. Still manual; over-indexing sources bloats Whisper project if not careful.

Caveats: This is custom / early. Whisper costs $20/mo after trial. OpenClaw itself has had security noise lately (skills gone rogue, etc.), so sandbox everything. Not production-ready for high-stakes use.Curious from folks running OpenClaw:What's your biggest token sink (heartbeats, tools, long histories)? Tried context layers / vector DBs? Actual savings numbers? Better cheap fallbacks than Flash/DeepSeek for non-reasoning tasks?

Happy to share more snippets or debug if anyone's experimenting similarly.

discuss

No comments yet.