top | item 46764454

Show HN: SheetSage – A Linter for the Most Dangerous Programming Language

1 points| CherishRoby | 1 month ago |sheetsage.co

I built SheetSage because "Silent Failures" in spreadsheets are a massive unmanaged risk in finance and ops. Most tools just find broken references (#REF!), but the real killers are logical errors like a VLOOKUP defaulting to approximate match on unsorted data, returning a plausible but wrong value.

The Technical Implementation:

Locale aware parsing: Since Google Sheets doesn’t provide an AST for formulas, I had to build a conservative parser that tracks quotes, parens, and braces to extract function calls without getting poisoned by strings or array literals. It handles localized argument separators (, vs ;) and decimal separators (, vs .) based on the spreadsheet's locale.

R1C1 Clustering: To avoid UI noise, I don't treat every cell as a unique finding. I normalize formulas using getFormulasR1C1() to identify templates that have been copied down. This allows the fix all engine to refactor thousands of cells in one batch.

The systemic softcap scoring: standard penalty per thousand metrics often under react to widespread errors. I implemented a continuous soft-cap model. It calculates union coverage for risks—if a critical error covers 40% of your workbook, your health score is soft-capped regardless of how many other healthy cells you have.

Snapshot & Rollback: Since I’m mutating user data, I implemented a SnapshotService that writes original formulas to a hidden SheetSage_SNAPSHOT sheet before any bulk fix. This provides a native "Undo" even after the Apps Script execution finishes.

Privacy: No spreadsheet data ever leaves the Google environment. The audit engine runs entirely in Apps Script. The only external call is a signed HMAC request to a Vercel/Next.js billing service to verify subscription entitlements via a stable clientId.

I'd love to discuss the heuristics I'm using to distinguish magic numbers from legitimate constants (like 24 for hours), and how I'm handling LockService to prevent race conditions during bulk refactoring.

3 comments

order

JustinXie|1 month ago

The R1C1 normalization is smart. Treating 5k copied formulas as one "finding" is the only way to avoid alert fatigue.

Re: magic numbers, have you considered checking column headers as a signal? E.g., if a header contains "Rate" or "Months", a hardcoded number is likely a valid constant. If it's just "Total", * 1.2 is probably a hidden risk. How do you handle cases where the context is ambiguous?

CherishRoby|1 month ago

Great question! I am using column headers as context signals. If a column is named 'Rate', 'Price', 'Percentage', or 'Count', I'm more lenient with constants in formulas referencing it. For ambiguous cases like 'Total', I currently flag it and let the user decide—which isn't ideal. I've been considering a confidence score system where:

High confidence whitelist: 24, 60, 7, 365 (time conversions) Context-dependent: numbers near column headers with semantic meaning Always flag: arbitrary numbers like 1.2, 847, etc. unless they're in a 'Constants' or 'Assumptions' section

The hardest edge case is something like Revenue * 0.15 where 0.15 might be a legitimate tax rate OR a hardcoded assumption that should be in a named cell. Right now I flag it as medium priority. How would you approach this?

CherishRoby|1 month ago

Spreadsheets are the only programming language where approximate string matching is the default behavior (VLOOKUP with range_lookup=TRUE). I can't think of another language where fuzzy matching happens silently unless you explicitly opt out. Is this the most dangerous design decision in computing history?