top | item 45362932

(no title)

Token based pricing generally makes a lot of sense for companies like Zed, but it sure does suck for forecasting spend.

Usage pricing on something like aws is pretty easy to figure out. You know what you're going to use, so you just do some simple arithmetic and you've got a pretty accurate idea. Even with serverless it's pretty easy. Tokens are so much harder, especially when using it in a development setting. It's so hard to have any reasonable forecast about how a team will use it, and how many tokens will be consumed.

I'm starting to track my usage with a bit of a breakdown in the hope that I'll find a somewhat reliable trend.

I suspect this is going to be one of the next big areas in cloud FinOps.

discuss

garrickvanburen|5 months ago

My rant on token-based pricing is primarily based on the difficulty in consistently forecasting spend.....and also that the ongoing value of a token is controlled by the vendor...."the house always wins"

https://forstarters.substack.com/p/for-starters-59-on-credit...

coder543|5 months ago

There are enough vendors that it's difficult for any one vendor to charge too much per token. There are also a lot of really good open-weight models that your business could self-host if the hosted vendors all conspire to charge too much per token. (I believe it's only economical to self-host big models if you're using a lot of tokens, so there is a breakeven point.)

prasoon2211|5 months ago

This is partially why, at least for LLM-assisted coding workloads, orgs are going with the $200 / mo Claude Code plans and similar.

jsheard|5 months ago

Until the rug inevitably gets pulled on those as well. It's not in your interest buy a $200/mo subscription unless you use >$200 of tokens per month, and long term it's not in their interest to sell you >$200 of tokens for a flat $200.

Spartan-S63|5 months ago

> I suspect this is going to be one of the next big areas in cloud FinOps.

It already is. There’s been a lot of talk and development around FinOps for AI and the challenges that come with that. For companies, forecasting token usage and AI costs is non-trivial for internal purposes. For external products, what’s the right unit economic? $/token, $/agentic execution, etc? The former is detached from customer value, the latter is hard to track and will have lots of variance.

With how variable output size can be (and input), it’s a tricky space to really get a grasp on at this point in time. It’ll become a solved problem, but right now, it’s the Wild West.

scuff3d|5 months ago

Also seems like a great idea to create a business models where the companies aren't incentivised to provide the best product possible. Instead they'll want to create a product just useful enough to not drive away users, but just useless enough to temp people to go up a tier, "I'm so close, just one more prompt and it will be right this time!"

Edit: To be clear, I'm not talking about Zed. I'm talking about the companies make the models.

GoatInGrey|5 months ago

As well as gatekeep functionality behind the prompt box. Want to find and replace? Regex? Insert a new column? Add a line break? Have the AI do it and pay us for those tokens whether it works the first time or not!

I unfortunately have seen many AI-based tools being demoed with this approach. The goal is clearly to monetize every user action while piggybacking off of models provided by a third-party. The gross thing is that leadership from the director level up LOVES these demos, even when the models very clearly fuck up in the demo.

AI: "I have cleaned the formatting for all 4,650 records in your sample XML files. Let me know if there's anything else I can do to help!"

Me: "There are over 25,000 records in that data..."

AI: "You're absolutely right!"

potlee|5 months ago

While Apple is incentivized to ship a smaller battery to cut costs, it is also incentivized to make their software efficient as possible to make the best use of the battery they do ship

mdasen|5 months ago

I agree that tokens are a really hard metric for people. I think most people are used to getting something with a certain amount of capacity per time and dealing with that. If you get a server from AWS, you're getting a certain amount of capacity per time. You still might not know what it's going to cost you to do what you want - you might need more capacity to run your website than you think. But you understand the units that are being billed to you and it can't spiral out of control (assuming you aren't using autoscaling or something).

When you get Claude Code's $20 plan, you get "around 45 messages every 5 hours". I don't really know what that means. Does that mean I get 45 total conversations? Do minor followups count against a message just as much as a long initial prompt? Likewise, I don't know how many messages I'll use in a 5 hour period. However, I do understand when I start bumping up against limits. If I'm using it and start getting limited, I understand that pretty quickly - in the same way that I might understand a processor being slower and having to wait for things.

With tokens, I might blow through a month's worth of tokens in an afternoon. On one hand, it makes more sense to be flexible for users. If I don't use tokens for the first 10 days, they aren't lost. If I don't use Claude for the first 10 days, I don't get 2,160 message credits banked up. Likewise, if I know I'm going on vacation later, I can't use my Claude messages in advance. But it's just a lot easier for humans to understand bumping up against rate limits over a more finite period of time and get an intuition for what they need to budget for.

Filligree|5 months ago

Both prefill and decode count against Claude’s subscriptions; your conversations are N^2 in conversation length.

My mental model is they’re assigning some amount of API credits to the account and billing the same way as if you were using tokens, shutting off at an arbitrary point. The point also appears to change based on load / time of day.

jklinger410|5 months ago

Token based pricing works for the company, but not for the user.