top | item 45293780

(no title)

ag8 | 5 months ago

A) You could have an additional field in the jsonl file which says which rubric to use; then, your reward function could access this via `kwargs["rubric"]` and return a reward based on that example's preferred rubric;

B) currently, pricing on the deployed API is free, but the startup time is a few minutes and it's run on a small GPU node and is therefore not awfully fast. If you would like more production-level inference, email us at founders@runrl.com and we could set you up with something much faster (where we'd charge per token depending on model size)

discuss

No comments yet.