yelite | 2 years ago | on: What's that touchscreen in my room?
yelite's comments
yelite | 2 years ago | on: Efficient Memory Management for Large Language Model Serving with PagedAttention
It does worsen the performance of the attention kernel, if comparing to kernels which takes keys and values in continuous memory layout.
> Wouldn't the speed improvements be coming from that instead? Don't put an expected short input and output in the same batch as a big input and big output?
Actually it puts everything in the same batch. The reason for its high throughput is that sequences are removed from the batch as soon as it's finished, and new sequences can be added to the batch on-the-fly if there is enough space in KV cache. This is called continuous batching (https://www.anyscale.com/blog/continuous-batching-llm-infere...).
Paged attention and "virtualized" KV cache play an important role in an efficient implementation of continuous batching. Text generation in LLM is a dynamic process and it's not possible to predict how long the output is when scheduling incoming requests. Therefore a dynamic approach is needed for KV cache allocation, even though it hurts the performance of attention.
yelite | 2 years ago | on: Show HN: Willow – Open-source privacy-focused voice assistant hardware
yelite | 5 years ago | on: “Location-Based Pay” – Who are we to complain?
Now, why do companies still stick to location-based pay when many other companies are embracing remote work? I think that's just cultural inertia and eventually software engineers will be paid without taking their location into account. But that's not a good thing for everyone, because the salary at that point will probably be much lower than what people get paid in SF area today.
yelite | 6 years ago | on: Ways to reduce the costs of an HTTP(S) API on AWS
Although I never run a business, I do believe this kind of optimization is quite meaningful even though they will never be the top priority of a business.
Those optimizations lower operational cost while being mostly maintainance free (except the one that switches off from AWS certificate manager, which may increase some effort when renewing), risk free (unlike refactoring a large legacy system) and requiring little engineering effort (Maybe 10 engineering days from investigation to writing the blog post?)
In addition this blog post itself brings intangible benefit on their branding, website ranking and hiring.