top | item 7803830

You shouldn’t use a spreadsheet for important work

245 points| ot | 12 years ago |lemire.me | reply

195 comments

order
[+] seanstickle|12 years ago|reply
Felienne Hermans has done some very interesting work examining how to refactor spreadsheets. (http://www.felienne.com/publications)

Her PhD dissertation on the subject "Analyzing and Visualizing Spreadsheets" (http://www.felienne.com/archives/2534) is particularly detailed.

Much though some people might think that spreadsheets are not for serious work, they're being used for that right now, and that is unlikely to change. Better to figure out how to improve the tooling, rather than undertake the quixotic quest to get business people to abandon their perfectly viable programming tool.

There is an excellent video with her on InfoQ, explaining her spreadsheet formula refactoring tool Bumblebee (and the F# code behind it): http://www.infoq.com/presentations/spreadsheet-refactoring

[+] bane|12 years ago|reply
What's interesting to me is that software spreadsheets are more or less just evolved version of what apparently VisiCalc got right on the first try. Sure software spreadsheets are basically just parroting the physical ledgers that came before them, but turning them into software almost immediately opened up a kind of dynamic that didn't exist before. The latest Excel is basically just an organically evolved VisiCalc. There really hasn't been a huge paradigm shift since then.

It's not clear that there should be either, spreadsheets work really well. Efforts to move them into databases or whatever be damned. A spreadsheet is != to a database table except maybe in the most abstract possible sense in some cases. They really are different things and I agree that issues with spreadsheets should be solved by improving the tools not trying to paradigm shift spreadsheet users into an inappropriate environment that also happens to have a very long and highly skilled environment setup requirement.

[+] dragonwriter|12 years ago|reply
> Much though some people might think that spreadsheets are not for serious work, they're being used for that right now, and that is unlikely to change. Better to figure out how to improve the tooling, rather than undertake the quixotic quest to get business people to abandon their perfectly viable programming tool.

Its largely used by business people because they (1) have inadequate IT support to have things developed using proper tooling, and (2) have been forbidden from using better tooling themselves by IT. Since both of these are direct products of IT policy, I don't think its a quixotic quest to fix it -- however, trying to fix it by trying to convince them to just give it up is the wrong approach. The people who need to be convinced to change the situation are CIOs.

[+] nucleardog|12 years ago|reply
I was watching a show the other day where they discussed their mapping of Benjamin Franklin's social network (who he communicated with, where they were in the world, etc). They showed some of the process and it appeared to involve some Excel workbooks with hundreds of sheets and terribly normalized data.

I can only imagine the amount of work involved in trying to extract any sort of useful information from the raw data in that format. I run into stuff like this all the time.

All I could think was that given a day and a Postgres+PostGIS server someone could have probably saved them entire man-months of their time...

[+] nhebb|12 years ago|reply
I developed a set of utilities for Excel (http://www.breezetree.com/spreadspeed/), and I started adding some auditing features to it. After spending a few weeks on the auditing tools, I decided that I should do a little market research before investing more time on auditing features. Well, I'm glad I did because I found out that the market is saturated with spreadsheet auditing tools. Most are simple and inexpensive (like mine), but there are some fairly sophisticated tools out there. So my takeaway from this is not that there aren't sufficient auditing tools, but that the market needs a non-programmer, user-friendly way to build robust yet malleable models.
[+] piokoch|12 years ago|reply
Very often Excel is a "programmer in a hostile environment" last hope. By programmer I mean someone who works in sales, bussiness analysis, finances, etc. but knows how to write programs.

In a typical non-IT company someone who needs to automate something has to wait for IT department to purchase software which would do the task. And very often it turns out that in software purchase process the purpose of that software is lost, requirments are comming from a wrong person and at the end that software is pretty useless.

It is not that someone can just download Python or Ruby or Java and start coding. No, no, company IT would never allow for this because they fear about security, patents, licences, etc. (and it does not matter that these doubts are not justified, very often it is just forbidden and that's all).

But chances are that such company has MS Office. Excel is just an interface to a better or worst programming languge. That's just better then nothing.

[+] panarky|12 years ago|reply
JP Morgan's $6.2 billion "London Whale" trading loss:

  An unnamed overworked staffer in London copied and pasted the
  wrong figures into an Excel spreadsheet, throwing the firm's
  risk models out of whack.
http://finance.fortune.cnn.com/2013/01/18/jpmorgan-london-wh...

Spreadsheet errors are reaching epidemic proportions globally. What we need is a transnational organization to contain the threat with research, best practices and conferences!

http://www.eusprig.org/

Love their compendium of horror stories. Did you know the US Securities and Exchange Commission has weak accounting because they rely on spreadsheets?

http://www.eusprig.org/horror-stories.htm

  As we have reported in prior audits, SEC's general ledger system
  and certain software applications and configurations are not
  designed to provide the accurate, complete, and timely
  transaction-level data needed to accumulate and readily report
  reliable financial information...

  Many of the agency’s financial reporting processes are still
  manual in nature and reliant on spreadsheets and databases to both
  initiate transactions and perform key control functions.
[+] twistedpair|12 years ago|reply
I used to work for Deutsche Bank. All financially modeling ran through Excel, though admittedly a massive C++ library that is a big Excel macro plugin. These sheets get massive and there are a few clusters of blades that do nothing but run Excel macros day and night.

But why is this madness? It works for them. The iBanking world is the madness. You can put in place effective change controls and auditing with any system. However, if people just want it done and don't care how, regardless of whether these calculations were in Python, Mathematica or Fortran, garbage in will still be garbage out. So, let's not wantonly scapegoat Excel for a process problem here.

[+] jacquesm|12 years ago|reply
Nobody ever caused a bug in a piece of custom software that caused an expensive problem?

Knight Capital Group?

Users make mistakes, users given the power to work with computers that can cause billions of $ to be traded can cause billions of $ in losses. The solution would seem to be to limit the exposure, to put a cap on the amount of $ that a user could trade without going through some form of verification.

Spreadsheets may not be ideal, but they're the poor mans gate to programming and I suspect they don't have a bug incidence that is much greater than regular software developed by programmers.

Of course the programmers would love to believe otherwise.

[+] bane|12 years ago|reply
I guess it's a good thing we don't have errors in software then, because that would totally eliminate these kinds of issues.

Oh wait, a user can enter a bad number in any tool, regardless of it being Excel or some custom cobbled together monstrosity that does less than the same work, but costs 100x as much.

[+] jasode|12 years ago|reply
(Since the blog's website is not responding, I had to read the article from google's cache[1])

The author has well-intentioned advice about avoiding MS Excel but it's misguided. The criticism fails to accommodate the reason why MS Excel was used. MS Excel is the lingua franca of non-programmers. Thomas Piketty is a trained economist, not a programmer. It's not realistic to expect Mr. Piketty to set aside months (years) of his time to master C++/Python/Scala/Fortran/etc to avoid using Excel. It's more realistic for an economist to use MS Excel to back his thesis than for a Python programmer to write a bestselling book about economics.

If we then tweak the advice to be, "if Piketty is not a programmer, he should have hired a compsci graduate student as a programmer", well... you've only shifted the (human source of) errors somewhere else. Plenty of examples where software not written in Excel had fatal errors: Therac-25[2], Mars Climate Orbiter[3]

Lastly, some of Piketty's errors were transcription errors. In other words, GIGO (Garbage In Garbage Out). Therefore, using Python while there were GIGO issues isn't going to solve the data problem.

[1]http://webcache.googleusercontent.com/search?q=cache:1r99Ioj...

[2]http://en.wikipedia.org/wiki/Therac-25

[3]http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_f...

[+] Iftheshoefits|12 years ago|reply
It strikes me as rather odd that a discipline in which a person purports to be on solid analytical ground would eschew learning how to write their research and modeling code in Fortran, C, C++, Scala, Python, etc. I can't think of a single colleague of mine, when I was in academia, who did not know how to write code in at least one of the languages you mentioned. That's because such knowledge is necessary to do rigorous, proper numerical computation in an analytical discipline like math, a variety of sciences and, yes, economics.

The "GIGO" issue you note is irrelevant, but even if it weren't GIGO is compounded and amplified by the copy-and-paste paradigm of spreadsheet "programming", so if anything bringing up GIGO is even more damning of the use of spreadsheets for analytical work.

[+] bkfunk|12 years ago|reply
There are plenty of much better tools that many economists use. For example, Stata. Stata allows you to view your data at any point in time in a nice table, but also to manipulate it using a very easy to learn (and yet very expressive) language that you can write interactively as well as in .do files. The economist I worked for wanted the whole process, from cleaning the data, running the regressions, outputting the tables, etc., to be in a set of .do files so we could hand over the original data files and all the code for review and replication. While building the final analysis, though, you could also work interactively to figure out what you wanted the final product to be, though of course you could log everything you did interactively so there would always be a record.
[+] adrusi|12 years ago|reply
Economists are expected to know how to program, and they do. They just have a tendency to reserve writing programs for complex models and prefer spreadsheets for what they consider "simple" data analysis.

On the point about transcription errors: Using python could have avoided some transcription errors. If the error was in copying a single constant from a source into the program, then in python it might be buried as an unlabeled magic number deep in the source tree, but it also might be labeled as a top level constant identifier. In the former case it would have the same problem as in a spreadsheet, but in the latter, a typo would likely be caught by someone reading over the code at some point. Or if some of the data came in a file format that excel couldn't automatically import, maybe he could have used a python library instead of manually transcribing 50 data points. It's hard to think of a situation where using python would have lead to more transcription errors.

[+] yodsanklai|12 years ago|reply
"It's not realistic to expect Mr. Piketty to set aside months (years) of his time to master C++/Python/Scala/Fortran/etc to avoid using Excel."

I tend to disagree. Mr Piketty graduated in Mathematics (from a highly selective institution). It is realistic to expect him to be able to use those tools if needed.

[+] monochr|12 years ago|reply
"Lastly, some of Piketty's errors were transcription errors. In other words, GIGO (Garbage In Garbage Out). Therefore, using Python while there were GIGO issues isn't going to solve the data problem."

More importantly the people claiming there are errors in hi work are making an even clearer error than Piketty in claiming that inequality hasn't risen. It has, and it is shown on dozens of measures. That they are trying to fight something as well established in main stream economics as evolution is in biology leads me to believe that in this case it is ideological windmill fighting more than honest critique.

[+] thatthatis|12 years ago|reply
Spreadsheets ARE software, anyone arguing differently has insufficient experience in at least one of the fields to have an informed opinion. I've worked professionally in both, and the argument in this article is bunk.

Spreadsheets are a different kind of software with advantages and disadvantages compared to compiled/scripted languages.

How do you treat a spreadsheet as real software:

* One operation per cell

* Inputs, processing, and display separated into distinct sheets

* Every formula checked twice (or three times)

* Write row and column check sums

* Treat row and column locking ($F4 or F$4) as the footguns they are.

* Name constants

Testing spreadsheets requires a different set of tools and behaviors, but to say that "quick and dirty" spreadsheets are less durable than "well engineered software" is really to say that "quick and dirty analysis" is less durable than "well engineered analysis."

Spreadsheets are remarkably easy to learn to use and to use correctly compared to scripted/compiled software. Not the least benefit being that they force abstraction in a way that is extremely intuitive to most people -- spacial context with the labels and state simultaneously displayed leads to much much lower context loading cost (for the human brain) than keeping all the abstract variables in your head when writing code.

Mr. Lemire seems to treat the tool as a toy then claim it is only good as a toy. Flub and not Flub languages alike can be run quite easily without unit tests, in which case they are no more durable to errors than excel (and arguably less as they don't display intermediate state).

[+] dragonwriter|12 years ago|reply
> Spreadsheets are remarkably easy to learn to use and to use correctly compared to scripted/compiled software.

I don't think this is true. IME, people who aren't primarily programmers/developers and have been using Excel for years tend to have worse practices developing Excel tools than the same people have with much less time learning, e.g., Python. Excel is much easier to learn to use to get some kind of result, perhaps, but not easier to learn to use correctly. (Part of it, I think, is because the cost of incorrect use is somewhat more deferred, which is also part of what makes it more attractive to use initially in many cases.)

[+] twistedpair|12 years ago|reply
… but I will not use Microsoft Excel to run a bank or to compute the trajectory of the space shuttle

Does the author realize that Excel is not used to run banks or to track the space shuttle? Sure, some traders might base their models on Excel. But, the actual exchanges, trade routing and account general ledgers are all on very tightly controlled systems, usually mainframes in the case of ledgers. To boot, NASA does not use a VBA Excel plugin to schedule burns and execute reentry. (Excel 2007 won't even run on a 8086 chips)

[+] barrkel|12 years ago|reply
Billions are made in the finance industry from Excel spreadsheets. Whole funds have been run from a single spreadsheet making buy / sell calculations, with a whole IT infrastructure designed to automate the execution of that single spreadsheet file.

Sure, it may be stupid, but it's democratized programming. That's why spreadsheets work.

[+] jimmcslim|12 years ago|reply
I remember working on a system at an investment bank in London before the GFC that valued exotic derivative portfolios on a grid of machines all running an instance of Excel; each trade was represented as a separate spreadsheet with well known locations for market data to be inserted.... madness!
[+] jimmcslim|12 years ago|reply
There have been a number of efforts to build a better spreadsheet. One that I was quite familiar with (I bought a license!) was Resolver One [1] that was trying to build a spreadsheet with support for Python at the cell-level. Unfortunately it didn't take off, and the team moved onto Python Anywhere [2]. Excel clearly has strong network effects!

I loved this article [3] from 2008, that argues that Excel and VBA were responsible for the credit crunch, that later possibly lead to Reinhart/Rogoff's flawed research using Excel... its spreadsheets all the way down!

[1] http://www.resolversystems.com/products/resolver-one/ (but link appears to be dead [2] https://www.pythonanywhere.com [3] http://www.theregister.co.uk/2008/01/21/vba_office_victory/

[+] patrickmclaren|12 years ago|reply
I stumbled across Pyspread the other day, a more recent attempt at a spreadsheet using Python at the cell-level. See http://manns.github.io/pyspread/.

  Pyspread expects Python expressions in its grid cells, which makes a spreadsheet
  specific language obsolete. Each cell returns a Python object that can be accessed
  from other cells. These objects can represent anything including lists or matrices.
I haven't used it yet, most of my personal spreadsheets are in Google Sheets and I'm not too keen to pull them out of Google Drive.
[+] joe_the_user|12 years ago|reply
Ah,

I can believe "flawed product" were constructed by spreadsheet but that seems a kind of shallow analysis.

Clearly, the housing bubble involved a rush to buy whatever - spreadsheet might seem the "immediate cause" but "wishful thinking" would clearly be the larger cause with various inherent tendencies of the financial system behind that.

[+] maxerickson|12 years ago|reply
I loved this article [3] from 2008, that argues that Excel and VBA were responsible for the credit crunch

That's a rather aggressive phrasing of what the article says, that many of the calculations involved in the credit crunch were done in Excel/VBA.

[+] filearts|12 years ago|reply
The beauty of Excel is its ability to do many things 'well enough' in a way that is accessible to a lot of people.

My dad put together his architectural plans for a new house in Excel because it is a tool with which he felt comfortable. Some people do calendars in Excel while others try and recreate a full General Ledger system. Because it works well enough, people don't see the need to invest the time in learning a new application and instead invest their time in pushing the limits of the tool.

As someone who builds financial models and who audits those built by others (and is a competent programmer by night), I think that the key source of risk in Excel models is that the tool has no knowledge of intentions. What I mean by this is that if I'm building a cash flow model or if I'm doing a pixel drawing, Excel doesn't care; as users, we are forced to create our own structure and build in our own checks and balances. If I make a balance sheet in Excel that doesn't balance, Clippy won't show up and let me know that things are broken.

I've often thought that it would be really amazing if a semantic layer could be built that uses Excel as the calculation backend. This sort of tool could understand the sorts of concepts of financial statements, projections, time-series and other concepts that often show up in financial models. It would have a built-in understanding of the domain-specific models that would let it leverage that understanding to reduce risk in the building of financial models. If I told it that I wanted to add a revenue stream that is tied to the output of production, the tool would connect the dots between the production schedule, any inflation and/or foreign exchange assumptions and would feed changes in working capital according to the associated collection terms, etc...

Before I get too carried away, the point is that this type of semantic layer would be much better at preventing and detecting anomalies and potential errors in the development of a high-risk financial model. Does anyone have experience with any such tools?

[+] taylodl|12 years ago|reply
That's why I always thought it'd be a good idea for Microsoft to expose such a semantic layer via a .NET API. When you can program an Excel sheet using the .NET languages and have access to the broader .NET libraries then all kinds of interesting possibilities start to open up. The fact is business users love Excel and a great number of CRUD applications could be easily built on top of Excel if only we had a reasonable programming environment (keep the VBA, you could create a VBA implementation on top of .NET). That would be win/win for everyone!
[+] opium_tea|12 years ago|reply
In my experience most civil engineering organisations use spreadsheets for the majority of design calculations. Some large organisations even still insist on engineers writing out calcs by hand. For example, a calculation that determines whether or not a retaining wall is of sufficient size more often than not will be completed in excel. Ditto with the calculation that checks the weight bearing capability of a column or beam in a large building.

Bespoke tools will be bought for tasks too complex or important for excel - finite element analysis, problems involving non-linear springs etc. Some time-consuming, repetitive tasks may also be deemed worthy of more automated tools, but on the whole the engineering industry is very much in the dark ages when it comes to modern software approaches. I often wonder what you'd end up with if you introduced a team of computer scientists into a civil/structural engineering company and told them to assist with analysis. I imagine you'd get some pretty innovative approaches to concept screening/design/cost-optimising etc.

[+] sanarothe|12 years ago|reply
This is what I'm hoping to do with a software background going to more traditional engineering.

It's just a little offputting when people talk about the tools they (have) to use -- When I first started, I would never have expected Excel to be so prominent, and Matlab so absent.

[+] notacoward|12 years ago|reply
Also, you shouldn't use an unstable-at-load web server for important writing.
[+] eggoa|12 years ago|reply
Yet spreadsheets do not allow testing.

This is an absurd claim. Any good spreadsheet will contain multiple self-tests, either in live formulas or via macros.

[+] Pxtl|12 years ago|reply
Honestly, I don't have a problem with the fact that Excel is used in so many cases - programming for non-programmers is important.

My problem is that Excel is terrible. Its formula system is painful and the formulas are invisible.

It's not that a light user DB/spreadsheet program is a bad idea, it's that Excel made a lot of terrible decisions 20 years ago and now they're married to them.

[+] wikwocket|12 years ago|reply
Let me get this straight. It is hard to review, test and maintain code in Excel workbooks, so we should instead write custom software for these tasks?!

Custom software is, to a non-hacker, the definition of "difficult to review, test, and maintain." You may as well suggest they input their data into a mystery box with some pixies inside who will crunch the numbers for them. :)

Remember, Excel (and spreadsheets in general) ARE custom software, written for this exact need, and with decades of polish and bug-fixing so that even non-wizards can use them.

I agree that we should not fly the space shuttle from an Excel macro. But I feel that suggesting economists should write software instead of use spreadsheets is not only throwing out the baby with the bathwater, but also throwing out the bathtub and the whole bathroom too. Specialty tools exist for this very use case. Let's discuss how to improve them instead of indulging our hacker instinct and reinventing a hammer every time we need to drive a nail.

[+] laurencei|12 years ago|reply
I'm pretty sure patio11 says on one of his blogs that anywhere anyone is using an excel file for important work is a SaaS waiting to be born.

edit: found it:

"My favorite symptom of an unmet need for software is any Excel spreadsheet which is ever updated by one employee, sent to a second employee, updated, and then sent back. Every time that happens a SaaS angel gets its wings."

https://training.kalzumeus.com/newsletters/archive/validatin...

[+] danielweber|12 years ago|reply
It makes programmers' heads spin, but the business world is totally used to doing all its stuff in Excel.

If Excel code is hard to audit, that's means someone could write a tool to show all the calculations being performed to get to a result. I'm giving this idea away for free because I'm pretty sure someone has already done it.

Writing a new program from scratch has issues, too.

[+] mgkimsal|12 years ago|reply
Biggest issue I've had with excel and spreadsheets in general has been "put this on the web". People spend hours/days/weeks building a spreadsheet tool that does all their calculations, then say "put this on the web" to me, and I can't do it. Not in their budget. To them, it should just be some sort of magic, but... I don't know of tools to just 'put' something like that on the web. They want people to be able to interact with it, but 'not download it'. They won't use Google spreadsheets or any office365 online stuff.

Maybe there are some tools that can take an xlsx file and make it secure, interactive and loggable and are also free (or nearly so?) Am I missing something?

[+] kubiiii|12 years ago|reply
Maybe the problem with Excel is that errors are more likely to stay under the radar. You can refer to the wrong cell, extend formulas with relative reference where it should be absolute (or the opposite) whithout noticing. When programming logical errors will more often lead to fatal errors upon running the program.

If anything, it's more a matter of using Excel correctly. Tools like slate for excel make it easier to audit spreadsheet especially other people's.

[+] ghshephard|12 years ago|reply
I'll buy that - the fact that the pretty essential vlookup function defaults to guessing when it can't find a value, rather than defaulting to "Couldn't find the value you were looking to" - has been the source of endless errors.

Explicit really reduces the number of errors you run into - there's a lot of implicit going on in a spreadsheet.

[+] filearts|12 years ago|reply
I think you'll find that VLOOKUP is not used that much by people deeply invested in the modelling field. It is dangerous in the same way as referring to cell addresses in macros. (This is where named ranges are particularly useful)

As soon as the structure of your lookup table changes, your VLOOKUP formulas risk being invalidated. After being burned a bunch of times, I've switched to INDEX or a INDEX/MATCH to accomplish the same sort of thing. With INDEX/MATCH, you are required to be explicit in selecting both the range of the data and of the key.

[+] jackbravo|12 years ago|reply
The errors are not produced by using excel, according at lest to the financial times. But to manipulation of data.

http://www.ft.com/cms/s/2/e1f343ca-e281-11e3-89fd-00144feabd...

Other people have praised the statistical work done on the book:

- http://www.nytimes.com/2014/03/24/opinion/krugman-wealth-ove... - http://www.telegraph.co.uk/culture/books/bookreviews/1081616...

So given the controversial nature of the subject, I wonder if we could point to an objective analysis of the work.