(no title)
bluehatbrit | 5 months ago
Usage pricing on something like aws is pretty easy to figure out. You know what you're going to use, so you just do some simple arithmetic and you've got a pretty accurate idea. Even with serverless it's pretty easy. Tokens are so much harder, especially when using it in a development setting. It's so hard to have any reasonable forecast about how a team will use it, and how many tokens will be consumed.
I'm starting to track my usage with a bit of a breakdown in the hope that I'll find a somewhat reliable trend.
I suspect this is going to be one of the next big areas in cloud FinOps.
garrickvanburen|5 months ago
https://forstarters.substack.com/p/for-starters-59-on-credit...
coder543|5 months ago
prasoon2211|5 months ago
jsheard|5 months ago
Spartan-S63|5 months ago
It already is. There’s been a lot of talk and development around FinOps for AI and the challenges that come with that. For companies, forecasting token usage and AI costs is non-trivial for internal purposes. For external products, what’s the right unit economic? $/token, $/agentic execution, etc? The former is detached from customer value, the latter is hard to track and will have lots of variance.
With how variable output size can be (and input), it’s a tricky space to really get a grasp on at this point in time. It’ll become a solved problem, but right now, it’s the Wild West.
scuff3d|5 months ago
Edit: To be clear, I'm not talking about Zed. I'm talking about the companies make the models.
GoatInGrey|5 months ago
I unfortunately have seen many AI-based tools being demoed with this approach. The goal is clearly to monetize every user action while piggybacking off of models provided by a third-party. The gross thing is that leadership from the director level up LOVES these demos, even when the models very clearly fuck up in the demo.
AI: "I have cleaned the formatting for all 4,650 records in your sample XML files. Let me know if there's anything else I can do to help!"
Me: "There are over 25,000 records in that data..."
AI: "You're absolutely right!"
potlee|5 months ago
mdasen|5 months ago
When you get Claude Code's $20 plan, you get "around 45 messages every 5 hours". I don't really know what that means. Does that mean I get 45 total conversations? Do minor followups count against a message just as much as a long initial prompt? Likewise, I don't know how many messages I'll use in a 5 hour period. However, I do understand when I start bumping up against limits. If I'm using it and start getting limited, I understand that pretty quickly - in the same way that I might understand a processor being slower and having to wait for things.
With tokens, I might blow through a month's worth of tokens in an afternoon. On one hand, it makes more sense to be flexible for users. If I don't use tokens for the first 10 days, they aren't lost. If I don't use Claude for the first 10 days, I don't get 2,160 message credits banked up. Likewise, if I know I'm going on vacation later, I can't use my Claude messages in advance. But it's just a lot easier for humans to understand bumping up against rate limits over a more finite period of time and get an intuition for what they need to budget for.
Filligree|5 months ago
My mental model is they’re assigning some amount of API credits to the account and billing the same way as if you were using tokens, shutting off at an arbitrary point. The point also appears to change based on load / time of day.
jklinger410|5 months ago