One of the key techniques is snapshotting the LLM (or any HTTP) request. This means that if the inputs won't change, the LLM will not be called. This will also snapshot /cache LLM verifications steps.
This doesn't only saves costs, but it's main goal was to force determinism and save time. Limited changes may need only the new/changed tests to be rerun with LLM. CI typically don't have LLM API keys and only rerun against snapshots with zero costs and delays
All LLM operations tend to be notoriously slow, and at least on our side: we are often more interested of how our code interacts with the LLM. Having the LLM being fully snapshopshotted does iterating the code delightfully fast.
If you want do sampling, this can be implemented in the test code. Booktest is a bit like pytest in the sense, that the actual testing logic heavylifting is left for the the developer. Lot of LLM test suites are more opinionated, but also more intrusive in that sense
arauhala|28 days ago
This doesn't only saves costs, but it's main goal was to force determinism and save time. Limited changes may need only the new/changed tests to be rerun with LLM. CI typically don't have LLM API keys and only rerun against snapshots with zero costs and delays
All LLM operations tend to be notoriously slow, and at least on our side: we are often more interested of how our code interacts with the LLM. Having the LLM being fully snapshopshotted does iterating the code delightfully fast.
If you want do sampling, this can be implemented in the test code. Booktest is a bit like pytest in the sense, that the actual testing logic heavylifting is left for the the developer. Lot of LLM test suites are more opinionated, but also more intrusive in that sense
clawsyndicate|28 days ago
[deleted]