yarri | 4 months ago | on: Helion: A high-level DSL for performant and portable ML kernels
yarri's comments
yarri | 7 months ago | on: Steve Wozniak: Life to me was never about accomplishment, but about happiness
I saw Woz on Northbound 280 “driving” his cherry red Model S, using FSD. He was looking down at the screen the whole time I watched him. Swear he had ssh’d into it.
yarri | 9 months ago | on: Cloud Run GPUs, now GA, makes running AI workloads easier for everyone
[0] https://cloud.google.com/billing/docs/how-to/disable-billing...
yarri | 10 months ago | on: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
yarri | 10 months ago | on: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
yarri | 1 year ago | on: Tokyo released point cloud data of the entire city for free
https://github.com/tokyo-digitaltwin/roadmap_v1.0/blob/main/...
yarri | 1 year ago | on: YC is wrong about LLMs for chip design
Would appreciate the collective energy being spent instead towards adding to amor refining Garry’s request.
yarri | 1 year ago | on: Sohu – first specialized chip (ASIC) for transformer models
yarri | 1 year ago | on: Sohu – first specialized chip (ASIC) for transformer models
## How can we fit so much more compute on the silicon?
The NVIDIA H200 has 989 TFLOPS of FP16/BF16 compute without sparsity. This is state-of-the-art (more than even Google’s new Trillium chip), and the GB200 launching in 2025 has only 25% more compute (1,250 TFLOPS per die).
Since the vast majority of a GPU’s area is devoted to programmability, specializing on transformers lets you fit far more compute. You can prove this to yourself from first principles:
It takes 10,000 transistors to build a single FP16/BF16/FP8 multiply-add circuit, the building block for all matrix math. The H100 SXM has 528 tensor cores, and each has $4 \times 8 \times 16$ FMA circuits. Multiplying tells us the H100 has 2.7 billion transistors dedicated to tensor cores.
*But an H100 has 80 billion transistors! This means only 3.3% of the transistors on an H100 GPU are used for matrix multiplication!*
This is a deliberate design decision by NVIDIA and other flexible AI chips. If you want to support all kinds of models (CNNs, LSTMs, SSMs, and others), you can’t do much better than this.
By only running transformers, we can fit way more more FLOPS on our chip, without resorting to lower precisions or sparsity.
## Isn’t memory bandwidth the bottleneck on inference?
For modern models like Llama-3, no!
yarri | 3 years ago | on: F-15s Scrambled from Portland Air National Guard Base
yarri | 3 years ago | on: Ask HN: What are your predictions for 2023?
yarri | 3 years ago | on: Ask HN: Anyone else feel trapped in FANG? How did you get out?
yarri | 3 years ago | on: Apple's director of machine learning resigns due to return to office work
yarri | 5 years ago | on: Welcome Yari: MDN Web Docs has a new platform
yarri | 5 years ago | on: Welcome Yari: MDN Web Docs has a new platform
yarri | 9 years ago | on: How not to create traffic jams: Don’t let people park for free
Would the inverse of these punitive solutions, ie., encouraging carpool / ridesharing, not also work? It always amazes me how relatively unutilized the HOV lanes are.
yarri | 9 years ago | on: A right to repair: Why Nebraska farmers are taking on John Deere and Apple
yarri | 9 years ago | on: Japanese village creates field-sized 3D paintings made of coloured rice shoots
yarri | 10 years ago | on: Show HN: SnapRides – Carpool scheduling with a panic button
> the problems with your flow are clearly posting calls to actions on the page Sigh, agreed. I'd like to move away from a wizard-based flow and try either a) using a calendar to drag & drop the schedule, or b) create a WYSIWYG "signup page builder" flow.
>I’d just ask for where is the ultimate destination, maybe a date and a list of emails.
So try to get quicker to the step where a signup page is shared with users, right?
> If I have time, I’ll play around with this on my phone later. Thanks!
yarri | 10 years ago | on: Show HN: SnapRides – Carpool scheduling with a panic button
Ironically, she works at Redpoint VC, but had trouble attracting developers to her idea. I built the MVP, conducting UserTesting.com focus groups to gather feedback from a busy, educated “soccer mom” demographic and iterated on a “wizard based” signup flow. Users reported it being better than the current email / spreadsheet / shared calendar approach they use now, some interest in paying for the service on a subscription basis. Key request was a native mobile app version, and, confusingly, a Craig's List like rideshare social matching service.
MVP is built on Parse.com / Backbone / Bootstrap, waiting for the right time to build out an iOS app; likely the start of a new school year.
Obviously the growth opportunities are in service-based, market-making transportation providers like Uber, BlaBlaCar/Carpooling.com, whereas this is more of a scheduling app with a long history of similar but failed approaches (as a quick search of ‘carpool’ on HN will show!)
Still, I like the feeling that I'm helping make other people's lives easier.
Feedback & ideas for good mobile app design most welcome!