hundredwatt | 8 months ago | on: Extending That XOR Trick to Billions of Rows
hundredwatt's comments
hundredwatt | 8 months ago | on: Extending That XOR Trick to Billions of Rows
1. Find a batch with 1 missing element 2. Delete that element from its other assigned partitions 3. Repeat, as the modified batches may now be recoverable
This iterative process (surprisingly!) succeeds with very high probability as long as the number of partitions is 1.22x larger than the number of missing elements with k=3 hash functions.
hundredwatt | 8 months ago | on: Extending That XOR Trick to Billions of Rows
The "bloom filter" name is misleading in regard to this.
hundredwatt | 9 months ago | on: That XOR Trick (2020)
For every normalized link id x:
y = (x << k) | h(x) # append a k-bit hash to the id
acc ^= y
If acc is zero, all links are reciprocal (same guarantee as before).If acc is non-zero, split it back into (x', h'):
* Re-compute h(x').
* If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems.
This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table.
hundredwatt | 11 months ago | on: A faster way to copy SQLite databases between computers
Nice tricks in the article, but you can more easily use the builtin utility now :)
I blogged about how it works in detail here: https://nochlin.com/blog/how-the-new-sqlite3_rsync-utility-w...
hundredwatt | 1 year ago | on: Ask HN: What are you working on? (February 2025)
Not planning to open source, working on a commercial offering but haven’t launched anything publicly yet.
Would love to hear any more thoughts on the concepts here or my email is in bio
hundredwatt | 1 year ago | on: Ask HN: What are you working on? (February 2025)
Most existing solutions only validate at the destination (dbt tests, Great Expectations), rely on aggregate comparisons (row counts, checksums), or generate too much noise (alert fatigue from observability tools). My tool:
* Validates every row and column directly between source and destination * Handles live source changes without false positives * Eliminates noise by distinguishing in-flight changes from real discrepancies * Detects even the smallest data mismatches without relying on thresholds * Performs efficiently with an IO-bound, bandwidth-efficient algorithm
If you're dealing with data integrity issues in ELT workflows, I'd love to hear about your challenges!
hundredwatt | 1 year ago | on: Ask HN: Books about people who did hard things
hundredwatt | 1 year ago | on: Kuvasz-streamer: open-source CDC for Postgres for low latency replication
hundredwatt | 1 year ago | on: A High-Level Technical Overview of Homomorphic Encryption
Anyone have any examples of these applications?
hundredwatt | 6 years ago | on: Ask HN: What does your BI stack look like?
For our team, using an ELT architecture (as opposed to ETL) [1] for managing our data warehouse has greatly reduced the complexity of our data processes. Instead of creating ETLs for every table we want to load into the data warehouse, we create the minimum necessary setup to copy the table into our data warehouse. Then, we write transforms, which are simply SQL statements, to generate wide-column tables that our non-technical users can use to explore data without worrying about joins or having to learn esoteric naming conventions.
Custom EL Scripts -> Redshift -> Transform Statements -> Redshift -> Metabase supports the data needs of all our departments with no dedicated data team members.
[1] https://www.dataliftoff.com/elt-with-amazon-redshift-an-over...
hundredwatt | 9 years ago | on: Ask HN: The habit adopted in 2016 that had the greatest impact on your health?
These exercises work to move your body back toward perfect posture, undoing the damage caused by sitting, typing, etc.
I stopped using these in November due to travel. My back and shoulder pain returned. It took about 7 days of consistent stretching for the pain to go away again.
hundredwatt | 14 years ago | on: Hacked: commit to rails master on GitHub
Here's the file: https://gist.github.com/1975167, just add to lib/generators in your Rails 3 app, then do rails g mass_assignment_security -h
Hopefully others find this helpful
hundredwatt | 14 years ago | on: Ask HN: Freelancer? Seeking freelancer? (December 2011)
GaggleAMP is hiring part-time software developers and UX designers to help us extend our social amplification platform. On the frontend, we use jQuery and HTML5/CSS3 via HAML templates. Our web application's backend stack is Ruby on Rails 3 with MySQL and Redis.
We'll consider hackers with any experience level, intern and up. If interested, send an email with a brief bio and one or more links to past work to jason AT gaggleamp DOT com.
hundredwatt | 14 years ago | on: Ask HN: Who is Hiring? (December 2011)
GaggleAMP is hiring part-time software developers and UX designers to help us extend our social amplification platform. On the frontend, we use jQuery and HTML5/CSS3 via HAML templates. Our web application's backend stack is Ruby on Rails 3 with MySQL and Redis.
We'll consider hackers with any experience level, intern and up. If interested, send an email with a brief bio and one or more links to past work to jason AT gaggleamp DOT com.
hundredwatt | 14 years ago | on: Ask HN: Freelancer? Seeking freelancers? (October 2011)
GaggleAMP is looking for freelance web designers to work on a per project basis.
Projects will range from add dynamic elements to landing paged TO creating the UI for new application features.
Ability to code HTML/CSS and javascript is a huge plus
Send portfolio to jason at gaggleamp dot com if interested
hundredwatt | 14 years ago | on: Ask HN: Should I work for a startup who's sole purpose is to be flipped?
Just remember to set your expectations correctly about cash/equity incentives: http://www.bothsidesofthetable.com/2010/09/06/how-to-discuss..., http://www.bothsidesofthetable.com/2009/11/04/is-it-time-for...
hundredwatt | 14 years ago | on: Ask HN: Is it feasible to use redis as the only datastore?
EDIT: If it meets your application's logical and scaling needs, there's no reason you couldn't use it
hundredwatt | 14 years ago | on: One of Google’s Self-Driving Cars Gets into an Accident
Also, computers can't get drunk...
I'd love to have these brought to the masses. But, no matter what the statistical benefits are (safer and time saving), I can't picture a time in the near future when normal people will be accepting of automated vehicles. Just like this incident almost was, 1 death caused by a self-driving vehicle is going to overshadow 100,000s of deaths by human-driven vehicles. Even once proved feasible, I can't see something like this being adopted in a matter of years or even decades.
Anyone know of any historical examples of a similarly disruptive, but untrusted technology radically changing people's lives? How long did it take for adoption?
hundredwatt | 14 years ago | on: How our SaaS startup got 1000+ signups in just 7 days, without getting Crunched.
If you're a programer and not a designer, just think of your design as another optimization point.
Whenever I worry about the design/branding of my current startup, I visit the historical timeline of amazon's logo as a reminder that we'll be able to make it better eventually, but there's more important things to work on now: http://www.kokogiak.com/gedankengang/2004/07/amazoncom-logo-...
The false decodes can be detected. During peeling, deleting a false decode inserts a new element with the opposite sign of count. Later, you decode this second false element and end up with the same element in both the A / B and B / A result sets (as long as decode completes without encountering a cycle).
So, after decode, check for any elements present in both A / B and B / A result sets and remove them.
--
Beyond that, you can also use the cell position for additional checksum bits in the decode process without increasing the data structure's bit size. i.e., if we attempt to decode the element X from a cell at position m, then one of the h_i(x) hash functions for computing indices should return m.
There's even a paper about a variant of IBFs that has no checksum field at all: https://arxiv.org/abs/2211.03683. It uses the cell position among other techniques.