top | item 46856818

(no title)

yz-yu | 28 days ago

Hi jbarrow, thanks for your feedback and the links you shared—they're great readings for me (and likely others too).

That said, I need to clarify: the content was not written by AI, and certainly not generated from a database in one shot. If there's some agent + prompt that can produce what I wrote, I'd love to learn it—it would've saved me two weekends :)

Before addressing your questions further, some context: I'm a developer with no ML background but plenty of Cloud Infra experience. I'm currently building an open-source AI Infra project, which is why I studied nano-vllm. So my writing reflects some gaps in ML knowledge.

To your specific points:

> it goes into (nano)vLLM internals and doesn't mention PagedAttention once

I didn't find any explicit "paged attention" naming in nano-vllm. After reading the first article you linked—specifically the "Paged KV Caching" section—I believe the block management logic and CPU/GPU block mapping it describes is exactly what I covered in both posts. It may not be the full picture of paged attention, but I interpreted what I saw in the code and captured the core idea. I think that's a reasonable outcome.

> Part 2 will cover dense vs MoE's, which is weird because nanovllm hardcodes a dense Qwen3 into the source

This reflects my learning approach and background. Same as point 1—I may not have realized the block design was the famous PagedAttention implementation, so I didn't name it as such. For point 2, seeing a dense Qwen3 naturally made me wonder how it differs from the xx-B-A-yy-B MoE models I'd seen on Hugging Face—specifically what changes in the decoder layers. That curiosity led me to learn about MoE and write it up for others with the same questions.

---

I completely understand that in this era, people care more about whether what they're reading is AI-generated—no one wants to waste time on low-effort slop with no human involvement.

But as I explained above—and as my hand-drawn Excalidraw diagrams show (I haven't seen an LLM produce diagrams with logic that satisfies me)—this is the result of learning shaped by my own knowledge background and preferences.

discuss

jacquesm|28 days ago

Funny, this reads even more AI written than the article itself.

marcellus23|28 days ago

It really doesn't.

lambda|28 days ago

One thing to keep in mind is that a lot of non-native English speakers use LLMs to translate to English, or to polish their English prose; they may not realize that it causes the translation to come out in a very LLM-style tone. Not sure if that's the case here, but it looks like OP is a native Chinese speaker so may be using tools to translate to English.

CodeMage|28 days ago

It does, but what does that say about the state of communication in our industry? I've seen a lot of writing that reads like an AI produced it in contexts where I could be pretty sure no AI was involved. We want to sound professional, so we sanitize how we write so much that it becomes... whatever this current situation is.

No offense intended to @yz-yu, by the way. I miss the times when more people wrote in an eccentric style -- like Steve Yegge -- but that doesn't detract from what you wrote.

yz-yu|28 days ago

Cool, humans hallucinate too. — AI

Juvination|28 days ago

The em dashes really aren't helping their case.

Der_Einzige|28 days ago

[deleted]