yarri's comments | WingNews

yarri | 4 months ago | on: Helion: A high-level DSL for performant and portable ML kernels

Pallas is the Triton equivalent in JAX land. There are some old auto tuning prototypes if you search for Pallas, like this https://github.com/jax-ml/jax-triton/pull/108

yarri | 7 months ago | on: Steve Wozniak: Life to me was never about accomplishment, but about happiness

Many of us grew up in the PLD era, k-maps, etc. Woz pushed early HW to the limit, with SW APIs that delivered real value. Woz made astute design trade-offs based on full stack knowledge that his peers lacked. The world’s moved on to the GPU (low precision, accelerated parallel compute?) era, but the Woz view point still holds. You can see it in the AI kernel optimizations, or rematerialization methods to push GPU HW to the new limits, and trade-offs need to be made. GPU HW for 4-bit QAT or even 2-bit will dramatically affect the SW (AI) of this era. What trade-offs do you make?

I saw Woz on Northbound 280 “driving” his cherry red Model S, using FSD. He was looking down at the screen the whole time I watched him. Swear he had ssh’d into it.

yarri | 9 months ago | on: Cloud Run GPUs, now GA, makes running AI workloads easier for everyone

[edit - Gabe responded]. See this Cloud Run spending cap recommendation [0] to disable billing, which potentially irreversibly deletes resources but does cap spend!

[0] https://cloud.google.com/billing/docs/how-to/disable-billing...

yarri | 10 months ago | on: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

Not sure what “official” means but would direct you to the GCP MaxText [0] framework which is not what this GDM paper is referring to but rather this repo contains various attention implementations in MaxText/layers/attentions.py

[0] https://github.com/AI-Hypercomputer/maxtext

yarri | 10 months ago | on: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

I assume the Gemini results are JAX/PAX-ML/Pallas improvements for TPUs so would look there for recent PRs

yarri | 1 year ago | on: Tokyo released point cloud data of the entire city for free

Background on the Tokyo government’s digital twin program, including sourcing and maintenance efforts

https://github.com/tokyo-digitaltwin/roadmap_v1.0/blob/main/...

yarri | 1 year ago | on: YC is wrong about LLMs for chip design

Please don’t do this, Zach. We need to encourage more investment in the overall EDA market not less. Garry’s pitch is meant for the dreamers, we should all be supportive. It’s a big boat.

Would appreciate the collective energy being spent instead towards adding to amor refining Garry’s request.

yarri | 1 year ago | on: Sohu – first specialized chip (ASIC) for transformer models

This is a datacenter chip. HVAC requirements are more interesting IMO, they seem to be targeting air cooled air edge deployments with that card. They’ll probably wind up with a baseboard design similar to the early v4i TPUs.

https://ieeexplore.ieee.org/document/9499913

yarri | 1 year ago | on: Sohu – first specialized chip (ASIC) for transformer models

Details from their technical memo at https://www.etched.com/announcing-etched

## How can we fit so much more compute on the silicon?

The NVIDIA H200 has 989 TFLOPS of FP16/BF16 compute without sparsity. This is state-of-the-art (more than even Google’s new Trillium chip), and the GB200 launching in 2025 has only 25% more compute (1,250 TFLOPS per die).

Since the vast majority of a GPU’s area is devoted to programmability, specializing on transformers lets you fit far more compute. You can prove this to yourself from first principles:

It takes 10,000 transistors to build a single FP16/BF16/FP8 multiply-add circuit, the building block for all matrix math. The H100 SXM has 528 tensor cores, and each has $4 \times 8 \times 16$ FMA circuits. Multiplying tells us the H100 has 2.7 billion transistors dedicated to tensor cores.

*But an H100 has 80 billion transistors! This means only 3.3% of the transistors on an H100 GPU are used for matrix multiplication!*

This is a deliberate design decision by NVIDIA and other flexible AI chips. If you want to support all kinds of models (CNNs, LSTMs, SSMs, and others), you can’t do much better than this.

By only running transformers, we can fit way more more FLOPS on our chip, without resorting to lower precisions or sparsity.

## Isn’t memory bandwidth the bottleneck on inference?

For modern models like Llama-3, no!

yarri | 3 years ago | on: F-15s Scrambled from Portland Air National Guard Base

Likely this one… https://en.wikipedia.org/wiki/Scrambling_(military)

yarri | 3 years ago | on: Ask HN: What are your predictions for 2023?

- Zuck will spin out FB & Instagram and merge with Twitter : TwiGramFace

yarri | 3 years ago | on: Ask HN: Anyone else feel trapped in FANG? How did you get out?

Take small bets, explain the value you are attempting to deliver and basically learn how to sell. Especially to skip levels. Be willing to fail.

yarri | 3 years ago | on: Apple's director of machine learning resigns due to return to office work

We all still do a lot of lab work

yarri | 5 years ago | on: Welcome Yari: MDN Web Docs has a new platform

The name is cool. That logo, not so much.

yarri | 5 years ago | on: Welcome Yari: MDN Web Docs has a new platform

The name choice is so close...

yarri | 9 years ago | on: How not to create traffic jams: Don’t let people park for free

The rise of punitive solutions is real. I was involved with discussions with local municipalities placing (private) local schools under restrictions for not providing sufficient carpool coverage -- levy fines based on percent of families carpooling.

Would the inverse of these punitive solutions, ie., encouraging carpool / ridesharing, not also work? It always amazes me how relatively unutilized the HOV lanes are.

yarri | 9 years ago | on: A right to repair: Why Nebraska farmers are taking on John Deere and Apple

Depends on the industry. The auto insurance industry had facilitated this, but was then reprimanded for using generic parts to repair damaged cars. There was some irony in that US parts manufacturers claimed 3rd parties were importing "foreign" parts, but many of the US manufactures also subcontracted overseas. Thorny issue.

yarri | 9 years ago | on: Japanese village creates field-sized 3D paintings made of coloured rice shoots

Civic engagement in Japanese agriculture was a challenge for rural communities when I was living in Japan, this village's activity seems similar to 4H clubs in the US.

yarri | 10 years ago | on: Show HN: SnapRides – Carpool scheduling with a panic button

Thanks, Kevin, for the time & the detailed feedback.

> the problems with your flow are clearly posting calls to actions on the page Sigh, agreed. I'd like to move away from a wizard-based flow and try either a) using a calendar to drag & drop the schedule, or b) create a WYSIWYG "signup page builder" flow.

>I’d just ask for where is the ultimate destination, maybe a date and a list of emails.

So try to get quicker to the step where a signup page is shared with users, right?

> If I have time, I’ll play around with this on my phone later. Thanks!

yarri | 10 years ago | on: Show HN: SnapRides – Carpool scheduling with a panic button

Hi, a suburban mother of school aged kids asked me to help her solve the problem she has coordinating with other parents to arrange rides to shuttle their kids to school & sporting events. She wanted a simple scheduling service, with a panic button — a way for her to let others know if she was running late.

Ironically, she works at Redpoint VC, but had trouble attracting developers to her idea. I built the MVP, conducting UserTesting.com focus groups to gather feedback from a busy, educated “soccer mom” demographic and iterated on a “wizard based” signup flow. Users reported it being better than the current email / spreadsheet / shared calendar approach they use now, some interest in paying for the service on a subscription basis. Key request was a native mobile app version, and, confusingly, a Craig's List like rideshare social matching service.

MVP is built on Parse.com / Backbone / Bootstrap, waiting for the right time to build out an iOS app; likely the start of a new school year.

Obviously the growth opportunities are in service-based, market-making transportation providers like Uber, BlaBlaCar/Carpooling.com, whereas this is more of a scheduling app with a long history of similar but failed approaches (as a quick search of ‘carpool’ on HN will show!)

Still, I like the feeling that I'm helping make other people's lives easier.

Feedback & ideas for good mobile app design most welcome!