Saurabh_Kumar_ | 1 month ago
Saurabh_Kumar_'s comments
Saurabh_Kumar_ | 2 months ago | on: Ask HN: Salesforce, SAP, or ServiceNow: Which Is Most Ripe for Disruption?
Saurabh_Kumar_ | 2 months ago | on: Show HN: API that falls back to humans when AI is unsure
While building fintech apps previously, I realized that GPT-4 is great, but getting it to read complex, messy invoices reliably (99.9%) is a nightmare. A 5% error rate is fine for a chatbot, but fatal for Accounts Payable.
I got tired of writing RegEx wrappers and retry logic, so I built SyncAI – a 'Safety Layer' for AI Agents.
How it works technically:
We ingest the PDF and run it through a mix of OCR + LLMs.
We calculate a 'Confidence Score' for every field extracted.
If confidence > 95%, it goes straight to your webhook.
If confidence < 95%, it routes to a Human-in-the-Loop (HITL) queue where a human verifies just that specific field.
Your Agent gets a strictly typed JSON 'Golden Record'.
Tech Stack: Python/FastAPI backend, React for the review dashboard, and we use a fine-tuned model for the routing logic.
The OCR Challenge: I know you guys are skeptical (as you should be). So I built a playground where you can upload your messiest, crumpled invoice to try it out without signing up: https://sync-ai-11fj.vercel.app/
Would love your feedback on the routing logic. I’ll be here answering questions all day!
Saurabh_Kumar_ | 3 months ago | on: AI agents break rules under everyday pressure
Saurabh_Kumar_ | 3 months ago | on: AI agents break rules under everyday pressure
The issue isn't the prompt; it's the lack of a runtime guardrail. An LLM cannot be trusted to police itself when the context window gets messy.
I built a middleware API to act as an external circuit breaker for this. It runs adversarial simulations (PII extraction, infinite loops) against the agent logic before deployment. It catches the drift that unit tests miss.
Open sourced the core logic here: [https://github.com/Saurabh0377/agentic-qa-api] Live demo of it blocking a PII leak: [https://agentic-qa-api.onrender.com/docs]"
Saurabh_Kumar_ | 3 months ago | on: Agentic QA – Open-source middleware to fuzz-test agents for loops
Connects to your GitHub repo (OAuth, read-only). Scans for issues and generates a heatmap (red = urgent, yellow = watch, green = healthy). Quantifies debt in dollars/ROI (e.g., potential $67k/qtr saved), with a basic calculator for team size/salary.
It's fully open source free to use, and no sign-up for basic scans (waitlist for advanced reports). I'm a solo tech guy iterating on this—would love feedback from the HN community:
Does the dollar/ROI framing feel useful for convincing non-tech stakeholders, or is hours/grading scale better? What metrics/integration would make this more valuable (e.g., SonarQube ties, flaky test detection)?
Site/try it: cosmic-ai.pages.dev Thanks! If it feels off or simplistic, be brutally honest—I'll use the feedback to improve (or pivot if needed)."