tmostak's comments

tmostak | 1 month ago | on: Waymo robotaxi hits a child near an elementary school in Santa Monica

Evidence (preferably with recent Teslas/HW4)?

tmostak | 1 month ago | on: Waymo robotaxi hits a child near an elementary school in Santa Monica

Evidence of this? I own a Tesla (HW4, latest FSD) as well as have taken many Waymo rides, and have found both to react well to unpredictable situations (i.e. a car unexpectedly turning in front of you), far more quickly than I would expect most human drivers to react.

This certainly may have been true of older Teslas with HW3 and older FSD builds (I had one, and yes you couldn't trust it).

tmostak | 1 month ago | on: Waymo robotaxi hits a child near an elementary school in Santa Monica

Do you have data to back this claim up, specifically with HW4 (most recent hardware) and FSD software releases?

tmostak | 4 months ago | on: Prefix sum: 20 GB/s (2.6x baseline)

Even without NVLink C2C, on a GPU with 16XPCIe 5.0 lanes to host, you have 128GB/sec in theory and 100+ GB/sec in practice bidirectional bandwidth (half that in each direction), so still come out ahead with pipelining.

Of course prefix sums are often used within a series of other operators, so if these are already computed on GPU, you come out further ahead still.

tmostak | 9 months ago | on: Modern Minimal Perfect Hashing: A Survey

We've made extensive use of perfect hashing in HeavyDB (formerly MapD/OmniSciDB), and it has definitely been a core part of achieving strong group by and join performance.

You can use perfect hashes not only the usual suspects of contiguous integer and dictionary-encoded string ranges, but also use cases like binned numeric and date ranges (epoch seconds binned per year can use a perfect hash range of one bin per year for a very wide range of timestamps), and can even handle arbitrary expressions if you propagate the ranges correctly.

Obviously you need a good "baseline" hash path to fall back to you, but it's surprising how many real-world use cases you can profitably cover with perfect hashing.

tmostak | 1 year ago | on: Show HN: TabPFN v2 – A SOTA foundation model for small tabular data

This looks amazing!

Just looking through the code a bit, it seems that the model both supports a (custom) attention mechanism between features and between rows (code uses the term items)? If so, does the attention between rows help improve accuracy significantly?

Generally, for standard regression and classification use cases, rows (observations) are seen to be independent, but I'm guessing cross-row attention might help the model see the gestalt of the data in some way that improves accuracy even when the independence assumption holds?

tmostak | 1 year ago | on: All You Need Is 4x 4090 GPUs to Train Your Own Model

You should be able to train/full-fine-tune (i.e. full weight updates, not LoRA) a much larger model with 96GB of VRAM. I generally have been able to do a full fine-tune (which is equivalent to training a model from scratch) of 34B parameter models at full bf16 using 8XA100 servers (640GB of VRAM) if I enable gradient checkpointing, meaning a 96GB VRAM box should be able to handle models of up to 5B parameters. Of course if you use LoRA, you should be able to go much larger than this, depending on your rank.

tmostak | 1 year ago | on: How Meta trains large language models at scale

This assumes that you can linearly scale up the number of TPUs to get equal performance to Nvidia cards for less cost. Like most things distributed, this is unlikely to be the case.

tmostak | 1 year ago | on: GPT-4.5 or GPT-5 being tested on LMSYS?

Are you measuring tokens/sec or words per second?

The difference matters as generally in my experience, Llama 3, by virtue of its giant vocabulary, generally tokenizes text with 20-25% less tokens than something like Mistral. So even if its 18% slower in terms of tokens/second, it may, depending on the text content, actually output a given body of text faster.

tmostak | 1 year ago | on: Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B

But it's likely to be much slower than what you'd get with a backend like llama.cpp on CPU (particularly if you're running on a Mac, but I think on Linux as well), as well as not supporting features like CPU offloading.

tmostak | 2 years ago | on: Show HN: Use natural language to query and visualize 400M tweets

Thank you, it's been a major team effort!

tmostak | 2 years ago | on: Explore 400M tweets with LLM-powered conversational analytics

More info can be found here: https://www.heavy.ai/heavyiq/overview

tmostak | 2 years ago | on: Ask HN: Who is hiring? (February 2024)

HEAVY.AI | SQL Analyst/Wrangler | Part-time or Full-time | Remote

HEAVY.AI builds a GPU-accelerated analytics platform that allows users to interactively query and visualize billions of records of data in milliseconds.

We’re looking for someone who really knows SQL. If you can decipher schemas, figure out what’s wrong with SQL statements and correct them, as well as generate queries in response to user questions, we'd love to talk to you.

The work would initially be on contract, but could lead to full-time employment. Geospatial analytics, data science background, and Python programming skills would be very useful to have as well, but are not absolute requirements.

If interested please reach out to pey.silvester@heavy[dot]ai.

tmostak | 2 years ago | on: Drawing.garden

These are awesome!

tmostak | 2 years ago | on: OpenAI investors keep pushing for Sam Altman’s return

I assume if MSFT/Satya are supportive it won't be an issue.

tmostak | 2 years ago | on: Fine-tuning GPT-3.5-turbo for natural language to SQL

It wasn't clear to me what evaluation method was being used, the chart in the blog says Execution Accuracy, but the numbers that seem to be used appear to correlate with "Exact Set Match" (comparing on SQL) instead of the "Execution With Values" (comparing on result set values). For example, DIN-SQL + GPT-4 achieves an 85.3% "Execution With Values" score. Is that what is being used here?

See the following for more info:

https://yale-lily.github.io/spider https://github.com/taoyds/spider/tree/master/evaluation_exam...

tmostak | 2 years ago | on: Fine-tuning GPT-3.5-turbo for natural language to SQL

I agree that Spider queries are not necessarily representative of the SQL you might see in the wild from real users, but looking at some analysis I did of the dataset around 43% of the queries had joins, and a number had 3, 4, or 5-way joins.

tmostak | 4 years ago | on: Getting to the bottom of web map performance

You could try OmniSci, it’s a database, rendering engine, and interactive analytics frontend (or any combination of the above) and can easily query and render millions to tens of billions of points interactively while allowing for things like tooltips on the data. See omnisci.com/demos for some live examples.

tmostak | 5 years ago | on: OmniSci launches free edition of platform for interactive visual analytics

OmniSci can run on any Nvidia GPU with sufficient RAM (we'd generally recommend >= 8GB), including a 3080. (I have two 3090s myself!) It also can run purely (and performantly) on CPU, and with the Intel's help we're further optimizing our capabilities on X86. Note however that currently you can't use our rendering engine without a GPU, however there is some initial support to run on CPU and render on an Nvidia GPU if you're interested, and soon enough we hope to support AMD and Intel integrated/discrete GPUs for rendering as well.

tmostak | 5 years ago | on: International Space Station 437.800 MHz cross band FM repeater activated

Definitely can pick it up with a HT, just caught it for ~3 minutes in the Bay Area on a Baofeng HT with whip antenna, and even picked up the first part while I was still indoors. There was a lot of static although I could make out some of the sentences.