Introducing Guesstimate, a Spreadsheet for Things That Aren’t Certain

[+] ozgooen|10 years ago|reply

I wasn't expecting this to go on hnews yet, but happy to take any questions!

[+] jasoncrawford|10 years ago|reply

Hey, I tried to send you something through Intercom on the site, but something was broken and it wouldn't send. Here's what I tried to write:

Cool idea. Found this from a @worrydream tweet. Some comments after playing with it for a few minutes:

* A bit disappointing that I can only have uniform or Gaussian distributions. At minimum I'd like a binary distribution (coin flip, probably biased coin flip). A lot of things I would want to model need this (e.g., will we close this sale? will we close this investor? that kind of thing.)

* I'm really confused by the arbitrary two-letter codes assigned to things for formulas. Makes the formulas impossible to read. Why not just use the names I give to the cells, or something derived from the names?

Really nice start though! I'm co-founder & CEO of fieldbook.com, another spreadsheet-like tool, so I love information tools and anything that expands the mind's capacity. Best of luck and let me know how I can help!

[+] cpitman|10 years ago|reply

Any plans to add different probability distributions? I do this kind of analysis by hand for things like project timelines and cost estimates, but I model each variable with a double triangular distribution (http://www.mhnederlof.nl/doubletriangular.html). You provide a best case, worst case, and then the most likely case. The distribution approximates the long tails on many kinds of distributions.

[+] pridkett|10 years ago|reply

Have you seen or were you at all inspired by Analytica from Lumina Decision Systems (http://www.lumina.com/why-analytica)? It's been around for about twenty years and is often used in the field of policy analysis when you need you to quantify messy things like human lives in dollars in the face of massive uncertainties.

[+] Kinnard|10 years ago|reply

This is awesome! Have you heard of Augur? It's a decentalized prediction market: http://augur.net I wonder if the guys there would be interested in this.

[+] Smerity|10 years ago|reply

I love the tool and tweeted about it yesterday. It's brilliant, I love it, excited to try some more complex probabilistic distributions.

Only request would be to allow for private spreadsheets. I can download and run the code locally but this would help many people who are less tech savvy.

Great product - looking forward to seeing how it evolves!

[+] Too|10 years ago|reply

It's nice to see a web app with keyboard support, just please don't highjack the keys if any modifier keys are held down, these are usually for the browser, alt+left to go back as example.

[+] mdlincoln|10 years ago|reply

<3 this! Are there any plans for some type of export utility, e.g. some type of JSON serialization of a finished model?

[+] hackaflocka|10 years ago|reply

I love the simplicity. Overall, a fantastic product.

[+] aj7|10 years ago|reply

I would be much more interested if this product were "choose probability density function centric." Then, the Monte Carlo engine would gain much more interest. Being able to choose or specify arbitrary distributions, and then run simulations, would be valuable.

Of special interest are non-continuous distributions. How often have normal distribution reasoning failed in finance? Put another way, a user should be able to model a distribution himself.

[+] ozgooen|10 years ago|reply

Very good to know. Right now you can choose between normal, uniform, and a few very simple discreet distributions, but not others.

When I built this, my first goal was to make any distribution run quickly. At this point I believe adding other distribution types will be quite doable, expect them shortly.

[+] blablabla123|10 years ago|reply

Statistical reasoning in general has failed in finance, especially 2008+... I don't think statistics or normal distributions are to blame. Rather the mindset that the risk scenarios are something that is avoidable with certainty. That's almost religious...

[+] stdbrouw|10 years ago|reply

Guesstimate is napkin math. It makes no sense to spend an inordinate amount of time fine-tuning the distribution when both the distribution and its parameters are just best guesses. The important part is the propagation of uncertainty across many dependent variables, and the normal distribution is often good enough for that purpose. Whenever it isn't, for me that'd be a sign to use a proper statistical model instead. MCMC was invented to do inference on models of arbitrary complexity, with however much or little data you might have.

[+] perlgeek|10 years ago|reply

Why would you need Monte Carlo? Can't you combine probability density functions through convolutions (or other tricks with integrals, like Fourier or Laplace transforming and then using straight arithmetic)?

[+] hardmath123|10 years ago|reply

Relevant—Uncertain<T>: A First-Order Type for Uncertain Data (Microsoft Research)

http://research.microsoft.com/pubs/208236/asplos077-bornholt...

[+] imh|10 years ago|reply

I'm convinced that an excel sort of lay-person's computing platform is where probabilistic programming will really take off. This seems really cool!

[+] p4bl0|10 years ago|reply

It's not really related but it made me think of a friend's PhD thesis on uncertain data. If the subject interests you, be sure to checkout the summary of his (impressive) work: http://a3nm.net/blog/phd_summary.html.

[+] krmmalik|10 years ago|reply

I like it. I had to do a strategy session with a client a couple of weeks ago and we needed to estimate how much the strategy was likely to cost over the next few months the. We had quite a few variables to work with though. This would have been handy in such a scenario I presume? We knew what are components and the ranges were.

[+] ozgooen|10 years ago|reply

I believe it would be handy, but it depends on the size. Right now I think it's reasonably fast and intuitive for models of around 3 to 40 metrics(variables). If you have more it could get slower, especially if many of them have to be recalculated at once.

I suggest trying it out. If nothing else, you may be able to begin with very simple models of the most important variables.

[+] kadder|10 years ago|reply

This is very similar to the paper http://www.isi.edu/~szekely/contents/papers/2012/szekely2012...

As per the paper , you can choose arbitrary distributions , construct a fluent graph , run Monte Carlo simulation and get the result - |via http://bit.ly/hnbuzz01 |

[+] sundarurfriend|10 years ago|reply

'Fuzzy logic' seems to be an ex-buzzphrase nowadays, but this seems pretty close to that territory. A variable/cell/logical-unit containing not a single value, but a distribution (often between bounds), and getting combined with other similar variables/cells/logical-units in ways that understand and respect the probability distributions.

Perhaps that field can provide a potential source of new names, when you decide to market this as a company.

[+] jkaptur|10 years ago|reply

Very much like Crystal Ball - an Excel add-on that's popular in the finance and energy fields.

[+] darcyparker|10 years ago|reply

And @Risk http://www.palisade.com/risk/

[+] brudgers|10 years ago|reply

Direct link to Github: https://github.com/getguesstimate/guesstimate-app

[+] vive-la-liberte|10 years ago|reply

Does the app include everything so it can run offline?

[+] evanb|10 years ago|reply

I was watching "Total time spent watching this video" video, and had a basic question.

How does one tell guesstimate that there's a hard lower bound on a quantity. ie. Video Length is at least 0, because negative watch times are unphysical? I know the specified distribution in this case is very narrow (the video lasting between -1 and 0 minutes has probability ~0.000032). But the answer does come out to be 26±32, which includes a substantial unphysical region.

And, if I give a hard lower bound on Video Length, can it propagate that knowledge into an asymmetric error on Total time?

[+] ozgooen|10 years ago|reply

Good catch!

Right now the main distribution types are normal and uniform. In the video, I showed normal distributions, which have long tails in both directions.

In this case, a normal distribution isn't really correct, because, as you noted, being less than 0 is exceedingly unlikely.

I believe the correct way to deal with this is to use a lognormal distribution or something that has 0 chance of being less than 0. I don't yet have a simple way of doing this, but it's definitely on the agenda.

[+] jakespencer|10 years ago|reply

Awesome! I use Crystal Ball (http://www.oracle.com/us/products/applications/crystalball/o...) with triangular distributions and Monte Carlo for software project cost estimation. Crystal Ball costs thousands of dollars so I will be following this with interest.

[+] netghost|10 years ago|reply

You might also find LiquidPlanner (http://www.liquidplanner.com/features/#why-liquidplanner) interesting for software estimation. The scheduling engine built entirely around the notion of ranged estimates driving probabilistic schedules.

[+] jasonshen|10 years ago|reply

I think this is super cool! We're so bad at estimating probabilities (think Han Solo's "never tell me the odds") that this helps visualize the distribution of outcomes

[+] jeffehobbs|10 years ago|reply

This is really cool. Can anyone recommend any particularly good/cogent Simple Caveman explanations of how Bayesoan theory/Monte Carlo simulation work?

[+] ozgooen|10 years ago|reply

Honestly, my favorite resource for much of this is the book How to Measure Anything by Douglas Hubbard. He goes into detail in understanding the value of information and how and why to use Monte Carlo Simulations.

Video: https://www.youtube.com/watch?v=w4fHGTsZZD8 Book: http://www.amazon.com/How-Measure-Anything-Intangibles-Busin...

[+] minimaxir|10 years ago|reply

I wrote a tutorial about statistical bootstrap resampling w/ animations, which is similar/related to the Monte Carlo method: http://minimaxir.com/2015/09/bootstrap-resample/

[+] Mauricio_|10 years ago|reply

It seems that using montecarlo in excel is apparently pretty commonm, but I'm not sure if it can be done without addons.

[+] ozim|10 years ago|reply

I think DeMarco and Lister did some monte carlo in spreadsheet without addons: http://www.systemsguild.com/riskology

[+] marcusgarvey|10 years ago|reply

A "Rumsfeldian" visualizer. Neat!

http://www.theatlantic.com/politics/archive/2014/03/rumsfeld...

[+] coldtea|10 years ago|reply

I always find it strange that people found that thing Rumsfeld said dumb, bizarre or (worse) his own invention [1].

Those are standard epistemological distinctions, known (and written about) since at least the times of Aristotle.

[1] (I mean the philosophical essence of what he said -- not that he didn't tried to use it an an excuse for BS).

[+] tunesmith|10 years ago|reply

What fun - I did a monte carlo estimate a few years back when trying to determine what purchases price of house my girlfriend and I could afford. It depended on probable interest rate, how much my old house would sell for, etc. It'd be interesting to see how simply it could be modeled in this.

[+] ozgooen|10 years ago|reply

That seems like a great fit.

A few weeks back I used the tool to help a friend decide which mortgage option to take for his house. One house has a slightly lower APR than the other, but was a had a higher assistance fee.

After fiddling with it, it looked like the one with the lower assistance fee was the better option. But perhaps more important, it didn't seem like it made a big difference; perhaps around $200 after 10 years. This was a good indication that the choice didn't really matter; that it wasn't something to spend over a few hours worrying about.

http://getguesstimate.com/models/100

[+] miguelrochefort|10 years ago|reply

Isn't how all software should be written? Expressions that represent a set of all possible values, effectively replacing the need for types.

Surely, such a platform would make building an app 100 times easier. Not that building apps is a good use of our resources.

[+] tinco|10 years ago|reply

A type is an expression that represents the set of all possible values..

If you don't like making types explicit, you could use implicit typing like in Haskell, or having values carry their types like in Ruby.

[+] borplk|10 years ago|reply

I can kinda imagine it but can you elaborate on the part about getting rid of types?

[+] conservajerk|10 years ago|reply

Interesting idea - but you could certainly do this in any spreadsheet application with multiple cells to represent ranges etc.. I think the of estimating probabilities issue can be considered to be more of a practices issue than a tools issue.

[+] netghost|10 years ago|reply

This is a really great interface, and cool idea.

You might consider upping the run count, or maybe narrowing your bins for the visualization. Either way, it's great to see more tools embracing probability and uncertainty like this.

[+] ozgooen|10 years ago|reply

In practice, 5000 was basically the number that wouldn't slow it down; this represented around 20-30% of the rendering time (react components were the main bottleneck, surprisingly, though I could still optimize them more).

I think that this works fine for small models, which is much of what exists now. As there are larger models, I'd eventually like to offload calculations to AWS Lambda or something similar, so we can do far more.

[+] minimaxir|10 years ago|reply

5,000 tests is more than enough for most general use cases. (i.e. data that would be able to fit into browser memory, anyways)

94 comments