yarri's comments

yarri | 7 months ago | on: Steve Wozniak: Life to me was never about accomplishment, but about happiness

Many of us grew up in the PLD era, k-maps, etc. Woz pushed early HW to the limit, with SW APIs that delivered real value. Woz made astute design trade-offs based on full stack knowledge that his peers lacked. The world’s moved on to the GPU (low precision, accelerated parallel compute?) era, but the Woz view point still holds. You can see it in the AI kernel optimizations, or rematerialization methods to push GPU HW to the new limits, and trade-offs need to be made. GPU HW for 4-bit QAT or even 2-bit will dramatically affect the SW (AI) of this era. What trade-offs do you make?

I saw Woz on Northbound 280 “driving” his cherry red Model S, using FSD. He was looking down at the screen the whole time I watched him. Swear he had ssh’d into it.

yarri | 1 year ago | on: YC is wrong about LLMs for chip design

Please don’t do this, Zach. We need to encourage more investment in the overall EDA market not less. Garry’s pitch is meant for the dreamers, we should all be supportive. It’s a big boat.

Would appreciate the collective energy being spent instead towards adding to amor refining Garry’s request.

yarri | 1 year ago | on: Sohu – first specialized chip (ASIC) for transformer models

Details from their technical memo at https://www.etched.com/announcing-etched

## How can we fit so much more compute on the silicon?

The NVIDIA H200 has 989 TFLOPS of FP16/BF16 compute without sparsity. This is state-of-the-art (more than even Google’s new Trillium chip), and the GB200 launching in 2025 has only 25% more compute (1,250 TFLOPS per die).

Since the vast majority of a GPU’s area is devoted to programmability, specializing on transformers lets you fit far more compute. You can prove this to yourself from first principles:

It takes 10,000 transistors to build a single FP16/BF16/FP8 multiply-add circuit, the building block for all matrix math. The H100 SXM has 528 tensor cores, and each has $4 \times 8 \times 16$ FMA circuits. Multiplying tells us the H100 has 2.7 billion transistors dedicated to tensor cores.

*But an H100 has 80 billion transistors! This means only 3.3% of the transistors on an H100 GPU are used for matrix multiplication!*

This is a deliberate design decision by NVIDIA and other flexible AI chips. If you want to support all kinds of models (CNNs, LSTMs, SSMs, and others), you can’t do much better than this.

By only running transformers, we can fit way more more FLOPS on our chip, without resorting to lower precisions or sparsity.

## Isn’t memory bandwidth the bottleneck on inference?

For modern models like Llama-3, no!

yarri | 9 years ago | on: How not to create traffic jams: Don’t let people park for free

The rise of punitive solutions is real. I was involved with discussions with local municipalities placing (private) local schools under restrictions for not providing sufficient carpool coverage -- levy fines based on percent of families carpooling.

Would the inverse of these punitive solutions, ie., encouraging carpool / ridesharing, not also work? It always amazes me how relatively unutilized the HOV lanes are.

yarri | 9 years ago | on: A right to repair: Why Nebraska farmers are taking on John Deere and Apple

Depends on the industry. The auto insurance industry had facilitated this, but was then reprimanded for using generic parts to repair damaged cars. There was some irony in that US parts manufacturers claimed 3rd parties were importing "foreign" parts, but many of the US manufactures also subcontracted overseas. Thorny issue.

yarri | 10 years ago | on: Show HN: SnapRides – Carpool scheduling with a panic button

Thanks, Kevin, for the time & the detailed feedback.

> the problems with your flow are clearly posting calls to actions on the page Sigh, agreed. I'd like to move away from a wizard-based flow and try either a) using a calendar to drag & drop the schedule, or b) create a WYSIWYG "signup page builder" flow.

>I’d just ask for where is the ultimate destination, maybe a date and a list of emails.

So try to get quicker to the step where a signup page is shared with users, right?

> If I have time, I’ll play around with this on my phone later. Thanks!

yarri | 10 years ago | on: Show HN: SnapRides – Carpool scheduling with a panic button

Hi, a suburban mother of school aged kids asked me to help her solve the problem she has coordinating with other parents to arrange rides to shuttle their kids to school & sporting events. She wanted a simple scheduling service, with a panic button — a way for her to let others know if she was running late.

Ironically, she works at Redpoint VC, but had trouble attracting developers to her idea. I built the MVP, conducting UserTesting.com focus groups to gather feedback from a busy, educated “soccer mom” demographic and iterated on a “wizard based” signup flow. Users reported it being better than the current email / spreadsheet / shared calendar approach they use now, some interest in paying for the service on a subscription basis. Key request was a native mobile app version, and, confusingly, a Craig's List like rideshare social matching service.

MVP is built on Parse.com / Backbone / Bootstrap, waiting for the right time to build out an iOS app; likely the start of a new school year.

Obviously the growth opportunities are in service-based, market-making transportation providers like Uber, BlaBlaCar/Carpooling.com, whereas this is more of a scheduling app with a long history of similar but failed approaches (as a quick search of ‘carpool’ on HN will show!)

Still, I like the feeling that I'm helping make other people's lives easier.

Feedback & ideas for good mobile app design most welcome!

page 1