Show HN: SheetSage – A Linter for the Most Dangerous Programming Language
1 points| CherishRoby | 1 month ago |sheetsage.co
The Technical Implementation:
Locale aware parsing: Since Google Sheets doesn’t provide an AST for formulas, I had to build a conservative parser that tracks quotes, parens, and braces to extract function calls without getting poisoned by strings or array literals. It handles localized argument separators (, vs ;) and decimal separators (, vs .) based on the spreadsheet's locale.
R1C1 Clustering: To avoid UI noise, I don't treat every cell as a unique finding. I normalize formulas using getFormulasR1C1() to identify templates that have been copied down. This allows the fix all engine to refactor thousands of cells in one batch.
The systemic softcap scoring: standard penalty per thousand metrics often under react to widespread errors. I implemented a continuous soft-cap model. It calculates union coverage for risks—if a critical error covers 40% of your workbook, your health score is soft-capped regardless of how many other healthy cells you have.
Snapshot & Rollback: Since I’m mutating user data, I implemented a SnapshotService that writes original formulas to a hidden SheetSage_SNAPSHOT sheet before any bulk fix. This provides a native "Undo" even after the Apps Script execution finishes.
Privacy: No spreadsheet data ever leaves the Google environment. The audit engine runs entirely in Apps Script. The only external call is a signed HMAC request to a Vercel/Next.js billing service to verify subscription entitlements via a stable clientId.
I'd love to discuss the heuristics I'm using to distinguish magic numbers from legitimate constants (like 24 for hours), and how I'm handling LockService to prevent race conditions during bulk refactoring.
JustinXie|1 month ago
Re: magic numbers, have you considered checking column headers as a signal? E.g., if a header contains "Rate" or "Months", a hardcoded number is likely a valid constant. If it's just "Total", * 1.2 is probably a hidden risk. How do you handle cases where the context is ambiguous?
CherishRoby|1 month ago
High confidence whitelist: 24, 60, 7, 365 (time conversions) Context-dependent: numbers near column headers with semantic meaning Always flag: arbitrary numbers like 1.2, 847, etc. unless they're in a 'Constants' or 'Assumptions' section
The hardest edge case is something like Revenue * 0.15 where 0.15 might be a legitimate tax rate OR a hardcoded assumption that should be in a named cell. Right now I flag it as medium priority. How would you approach this?
CherishRoby|1 month ago