top | item 46715705

Show HN: Deterministic, machine-readable context for TypeScript codebases

2 points| AmiteK | 1 month ago |github.com

Hi HN,

I built a CLI that extracts a deterministic, structured representation of a TypeScript codebase (components, hooks, APIs, routes) directly from the AST.

The goal is to produce stable, diffable “codebase context” that can be used in CI, tooling, or reasoning workflows, without relying on raw source text or heuristic inference.

It supports incremental watch mode, backend route extraction (Express/Nest), and outputs machine-readable data designed for automation.

Repo + docs: https://github.com/LogicStamp/logicstamp-context

Happy to answer questions or hear where this would (or wouldn’t) be useful.

21 comments

order

verdverm|1 month ago

From the readme

> Pre-processed relationships - Dependency graphs are explicit (graph.edges) rather than requiring inference

I suspect this actually is the opposite. Injecting some extra, non-standard format or syntax for expressing something requires more cycles for the LLM to understand. They have seen a lot of Typescript, so the inference overhead is minimal. This is similar to the difference between a Chess Grandmaster and a new player. The master or llm has specialized pathways dedicated to their domain (chess / typescript). A Grandmaster does not think about how pieces move (what does "graph.edges" mean?), they see the board in terms of space control. Operational and minor details have been conditioned into the low level pathways leaving more neurons free to work on higher level tasks and reasoning.

I don't have evals to prove one way or the other, but the research generally seems to suggest this pattern holds up, and it makes sense with how they are trained and the mathematics of it all.

Thoughts?

AmiteK|1 month ago

That’s a reasonable hypothesis, and I agree LLMs are very good at inferring structure from raw TypeScript within a local reasoning window. However, that inference has to be repeated as context shifts or resets.

The claim I’m making is narrower: pre-processed structure isn’t about helping the model understand syntax, it’s about removing the need to re-infer relationships every time. The output isn’t a novel language - it’s a minimal, explicit representation of facts (e.g. dependencies, exports, routes) that would otherwise be reconstructed from source.

Inference works well per session, but it doesn’t give you a persistent artifact you can diff, validate, or assert against in CI. LogicStamp trades some inference convenience for explicitness and repeatability across runs.

I don’t claim one dominates the other universally - they optimize for different failure modes.

verdverm|1 month ago

How does this work mid-chat if the agent changes code that would require these mappings to be updated?

I put this information in my AGENTS.md, for similar goals. Why might I prefer this option you are presenting instead? It seems like it ensures all code parts are referenced in a JSON object, but I heavily filter those down because most are unimportant. It does not seem like I can do that here, which makes me thing this would be less token efficient than the AGENTS.md files I already have. Also, JSON syntax eats up tokens with the quotes, commas, and curlies

Another alternative to this, give your agents access to LSP servers so they can decide what to query. You should address this in the readme as well

How is it deterministic? I searched the term in the readme and only found claims, no explanation

AmiteK|1 month ago

Adding a bit more context since I didn’t see your expanded comment at first:

AGENTS.md and LogicStamp aren’t mutually exclusive. AGENTS.md is great for manual, human-curated guidance. LogicStamp focuses on generated ground-truth contracts derived from the AST, which makes them diffable, CI-verifiable, and resistant to drift.

On token usage: the output is split into per-folder bundles, so you can feed only the slices you care about (or post-filter to exported symbols / public APIs). JSON adds some overhead, but the trade-off is reliable machine selectability and deterministic diffs.

Determinism here means: same repo state + config ⇒ identical bundle output.

AmiteK|1 month ago

Good question.

LogicStamp treats context as deterministic output derived from the codebase, not a mutable agent-side model.

When code changes mid-session, watch mode regenerates the affected bundles, and the agent consumes the latest output. This avoids desync by relying on regeneration rather than keeping long-lived agent state in sync.