nserrino
|
6 months ago
|
on: Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels
Hey, thanks for the thoughtful comments. A lot of big claims have been made in this area so skepticism is the right default reaction. tl;dr: agree that we should provide the kernels and benchmark suite so this can be evaluated by others, will follow up with that.
A few clarifications:
1. Baselines - We didn't compare to torch.compile because as of PyTorch 2.7, torch.compile doesn't support the MPS backend, and we ran into some issues on many of the problems when using it. GitHub issue: https://github.com/pytorch/pytorch/issues/150121. Once it's supported, it will be the obvious baseline.
2. Methodology - We followed KernelBench’s protocol to establish a baseline on Metal, adding more correctness checks. Warmup and synchronization were done. We recognize the limitations here and are expanding the validation suite.
3. Optimizations - Right now most of the optimizations are fusions, but there is some use of Metal-specific primitives/optimizations. We expect as we make the supervisor more sophisticated, the novelty of the optimized kernels will also increase.
Overall the goal here is to get some % of the benefit of a human expert in kernel engineering, without developer effort. Compiler-based optimizations are great, but hand-tuned implementations are still common for performance-critical models. The hope is that we can automate some of that process.
nserrino
|
6 months ago
|
on: Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels
PyTorch is the baseline because that's what people prototype in, and the most common reference point. The aim here is to show that you can start from prototype code and automatically produce lower-level kernels (in this case Metal) that are more usable in real deployments, without additional work from the developer. Frontier models are capable at generating efficient Metal kernels automatically/immediately, and will only get better. We expect to see significant improvements as we refine the approach, but it's enough to show this seems to be a tractable problem for AI.
nserrino
|
11 months ago
|
on: They Might Be Giants Flood EPK Promo (1990) [video]
They are just incredible live ... Flood was the soundtrack of my childhood. It's great to see how many fans on HN they have. If you have a chance to check them out in concert, do it!
nserrino
|
1 year ago
|
on: We were wrong about GPUs
What kind of models are you deploying and what type of problems are you having with deploying them?
nserrino
|
1 year ago
|
on: Alphabet to invest another $5B into Waymo
I've mostly switched over to Waymo, but had to take Ubers twice in the last month:
* The first one, the driver made multiple racist remarks about different groups he observed as we drove.
* The second one, the driver talked at length about UFOs and how they are real, for the entire 50 minute drive.
Most drivers are totally normal and don't do things like that, but the tail end of negative experiences can be quite bad. Dirty cars, loud radios, body odor, and unsafe driving are all relatively common with human drivers. A Lyft driver I was riding with a few years back almost ran over a man in a wheelchair who had the right of way.
Wait times are also more reliable so far with Waymo. It's not uncommon for an Uber/Lyft driver to accept a ride but then not drive toward you for 5+ minutes. Waymo has the advantage of predictability - both in terms of arrival time and overall travel time (whereas there is variance among human drivers).
Sometimes I've had Waymos get stuck, but it usually resolves within 10-15 seconds.
Given how smooth, predictable, and safe Waymos are, I don't see a strong reason to risk a negative experience with a human driver (beyond ideological reasons). However, I hope another strong provider comes on the market soon to give them some competition.
nserrino
|
3 years ago
|
on: Programming Burnout
The stories we often see about burnout in tech are interesting. There’s a lot about burning out of programming etc, the hours. Of course, long hours will eventually burn you out in most any profession. Less commonly discussed, it seems like most people I know actually burn out from bureaucracy. It’s far more exhausting and demotivating to not make forward progress and waste time in meetings all day. Of course, it should go without saying that some meetings are necessary.
nserrino
|
4 years ago
|
on: The meteoric rise and ongoing demise of Blue Apron (2019)
For me, the issue was that the recipes took too long. The huge amounts of packaging didn’t help, either. It was way too much work overall.
nserrino
|
4 years ago
|
on: Ask HN: Good examples of Kubernetes operators codebase?
nserrino
|
4 years ago
|
on: Attach VSCode Debugger to Native Rust in an Electron App
Thanks for the guide. I don't love using Java, but I recall that Java IDEs from 2010 provided a smoother experience for this functionality than VSCode seems to for non-Java apps today. I wonder how long it will take to reach parity there, or if there is something fundamental to the problem for non-JVM languages?
nserrino
|
4 years ago
|
on: “A calorie is a calorie” violates the second law of thermodynamics (2004)
I guess this is like saying "A Joule is a Joule" regarding different types of energy (gravitational potential, nuclear, etc). It may be true in an abstract sense, but the conversion efficiency makes a large impact in a practical sense.
nserrino
|
4 years ago
|
on: Open sourcing Pixie under Apache 2.0 license
Correct, this project is unaffiliated with that one.
nserrino
|
4 years ago
|
on: Open sourcing Pixie under Apache 2.0 license
Pixie maintainer / OP here. There's a lot of debate right now about licensing in the open source community. We made the call to go with Apache 2.0, and the lead of our project wrote in the post about why we made that decision.
nserrino
|
5 years ago
|
on: New Relic to open-source Pixie’s eBPF observability platform
/s noted (but point also taken). For those unfamiliar: APM = application performance monitoring. Essentially, figuring out why your application is slow or broken.
nserrino
|
5 years ago
|
on: New Relic to open-source Pixie’s eBPF observability platform
Engineer from Pixie here. Pixie is an APM tool for Kubernetes leveraging eBPF for automatic data collection. You can use it to monitor and debug your application performance, without code or configuration changes.
nserrino
|
5 years ago
|
on: Sherwin-Williams Fires Employee Over Their Popular Paint Themed TikTok Account
This is unfortunate for him but the publicity may actually be worth it in the long run. Hopefully Lowe’s, Benjamin Moore, or Home Depot are smart enough to scoop him up.
nserrino
|
5 years ago
|
on: Snowflake more than doubles in market debut, largest ever software IPO
Have you used both? I used to use Snowflake at a former company. Snowflake has top class support for semi-structured data, for example. I found it really easy to use and highly flexible/featured compared to Redshift.
nserrino
|
5 years ago
|
on: Being OK with not being extraordinary
A fixation on being extraordinary tends to indicate too much self absorption and a lack of perspective. It seems like another antidote is focusing more on the impact you want to have on the world and those around you, even if no one ever knew about it.
nserrino
|
5 years ago
|
on: Launch HN: Speedscale (YC S20) – Automatically create tests from actual traffic
This is cool, is the traffic curated in any way? Like if the database isn't initialized, do you start with create requests before moving on to GETs for those IDs? Also does this only support HTTP or does it support other protocols as well?
nserrino
|
5 years ago
|
on: Etcd, or, why modern software makes me sad
I have not found etcd to be 'an absolute pleasure to work with', to put it lightly. It has been a plague of stability issues and it sometimes seems better to roll one's own than continue tracking down issue after issue in etcd (yes I realize DIY has its own set of probably bigger problems :) ) I don't know if this experience is due to the changes on the original etcd implementation that the author is describing.
nserrino
|
5 years ago
|
on: Ok Google – it's time you discovered cyclists
> My "solution" to this problem is to carefully but fully occupy the bike lane before making such a turn.
In California, that's not illegal, it's the law. It's on the driver's test. Kind of counterintuitive at first, but seems much safer and also more consistent with the way turns are handled with any normal lane.
A few clarifications:
1. Baselines - We didn't compare to torch.compile because as of PyTorch 2.7, torch.compile doesn't support the MPS backend, and we ran into some issues on many of the problems when using it. GitHub issue: https://github.com/pytorch/pytorch/issues/150121. Once it's supported, it will be the obvious baseline.
2. Methodology - We followed KernelBench’s protocol to establish a baseline on Metal, adding more correctness checks. Warmup and synchronization were done. We recognize the limitations here and are expanding the validation suite.
3. Optimizations - Right now most of the optimizations are fusions, but there is some use of Metal-specific primitives/optimizations. We expect as we make the supervisor more sophisticated, the novelty of the optimized kernels will also increase.
Overall the goal here is to get some % of the benefit of a human expert in kernel engineering, without developer effort. Compiler-based optimizations are great, but hand-tuned implementations are still common for performance-critical models. The hope is that we can automate some of that process.