top | item 24070385

Scientists rename human genes to stop MS Excel from misreading them as dates

621 points| virde | 5 years ago |theverge.com | reply

506 comments

order
[+] LatteLazy|5 years ago|reply
It's honestly amazing that Excel hasn't fixed this issue. It's pisses off an enormous number of users especially in basically any non-US country (even if 01/02 is a date, it isn't the second of January in most of the world...)
[+] T-hawk|5 years ago|reply
Excel is stuck. It can't change this behavior if it cares about backwards compatibility. There are uncountably many recorded macros and all sorts of scripts and data tools everywhere, that implicitly and inadvertently depend on behavior like this. Like Javascript's warts, you have to declare these as features and live with them.

Of course Microsoft has thought about this, it's silly to think they never have and we're smarter. Yes, this behavior is annoying and US-centric, they know that. But they also know that breaking compatibility with all these scripts and macros would be the worse problem in the larger picture. That picture is huge, it would be on the order of the scope of the Y2K effort to modify every code everywhere that's ever touched Excel dates.

Would you want Excel to introduce a "quirks mode" to handle this sort of thing?

[+] owl57|5 years ago|reply
I don't even know why anyone still uses this notation in today's connected world. 2020-08-06 is ideal. Russian standard (06.08.2020) is uglier (digits are not sorted by significance) but still unambiguous. But that slash horror? Why would anyone continue inflicting this on themselves and their peers!?
[+] nikk1|5 years ago|reply
There are dozens of auto-formatting "features" that piss me off each and every day across all of Microsoft's products.
[+] danso|5 years ago|reply
I personally dislike this behavior in Excel, as I'm accustomed to working with plaintext data and using pandas/strftime to get datetimes as I need them. But I also figure that my situation as a technical user is likely in the minority, compared to people who are using spreadsheets every day to do manual data entry.

That said, I completely agree with the tangential issue of U.S. dates being misleadingly different in format compared to non-U.S. Always an issue when teaching data/spreadsheets to a class with at least one non-American – but also a good reason to teach them the value of ISO8601 :)

[+] pintxo|5 years ago|reply
For Germany this converts to the first of Feb, and this way of entering dates is just so convenient, as you can type the whole date just using the keys from the num block. We do actually support this format in our app for date entry, simply because it's so convenient.
[+] vsareto|5 years ago|reply
>It's honestly amazing that Excel hasn't fixed this issue.

I'm surprised there hasn't been a dotfile option added yet

[+] mattnewton|5 years ago|reply
Why hasn’t the field standardized around another tool? Is the list of excel plugins that something libreoffice couldn’t replace it, where they could just make add the options they need? My guess is the field is mostly lacking people interested in working on tooling like this?
[+] stjohnswarts|5 years ago|reply
But you can select from dozens of formats in the cell setup? Not sure I understand the issue?
[+] airstrike|5 years ago|reply
Just type '01/02 or ="01/02" and get on with your life
[+] qayxc|5 years ago|reply
That's because it's not an issue at all.

It's people using a tool without knowing said tool. You can disable auto-formatting (or even better yet - set the column data type) with a simple click.

[+] javier123454321|5 years ago|reply
Honestly though, dates make more sense being month first, that way if you are trying to discover which one comes first, 10th of february or 10th of january, you only have to go to the first parameter in format 01/10 vs 02/10 instead of 10/01, 10/02. Logically, the most broad data item should go first.
[+] Duller-Finite|5 years ago|reply
Excel isn't the program of choice for most scientists and computational biologists, who typically use R, python, or command line tools. However, we often get data from other scientists or reanalyze data from other groups that can have these errors. It's so frequent of a problem that there are scientific papers about it [1].

[1] https://genomebiology.biomedcentral.com/articles/10.1186/s13...

[+] aden1ne|5 years ago|reply
This ignores the reality that one will ultimately have to interact with people who have no understanding of any of these tools. Unless you happen to work in a pure computational biology group, one _will_ have to interact with lab workers, biologists with no training (or understanding of) in R or python, doctors, etc. All these people will know excel.
[+] Dumblydorr|5 years ago|reply
I had to use Excel for 100% of my publications and posters in medical services research. Either the data is in Excel or Excel is a tidy place to put data dictionaries. While I'd love to use only R, most of my collaborators wouldn't be able to use it, it's niche, whereas Excel is the lingua franca of data analysis.
[+] prepend|5 years ago|reply
Those tools aren’t exclusive.

I use python almost exclusively for data analysis but still open files in excel to view them for lots of reasons.

Nothing in excel is reproducible, but it’s still on all my computers.

Did an analysis a few years ago of the programs run At my organization by job series and scientists run a lot of excel.

[+] xkgt|5 years ago|reply
I spent 30 minutes today convincing a Data scientist why she shouldn't use excel to store her interim data and that even an untyped data forms such as csv or json would be a better medium compared to an excel document.
[+] noobermin|5 years ago|reply
I feel that excel should still do the right thing which is annoying, but the change also helps people avoid confusing the names with dates, for example, in the excel sheet has dates in it well.
[+] mcv|5 years ago|reply
> "Why, exactly, in a fight between Microsoft and the entire genetics community, was it the scientists who had to back down?"

Back down? Or pick a better tool. If Excel proves to be an unreliable tool for your job, use a better one. Alternatives exist, ranging from Google Docs, and LibreOffice, to simpler light-weight spreadsheets. Or possibly more specialist tools.

Why does everything always have to be put in Excel if Excel is such a poor tool for so many things?

[+] acid__|5 years ago|reply
Excel may have failed in this specific task, but let’s not pretend like its functionality doesn’t run circles around Google Docs and LibreOffice.

Excel is a “pretty darn good” tool for 95% of tasks. If your work has highly varied workflows, then that flexibility more than makes up for its failures on the last 5%.

If you have very specific workflows on the other hand, you may find value in replacing Excel with a specialist tool. But let’s not pretend that specialist tools don’t also have their own shortcomings; at best they’ll achieve 99.9% coverage of tasks.

[+] otherme123|5 years ago|reply
LibreOffice AND Google Docs fall in the same exact spot than Excel here. Try it. Type "SEPT-2" or "SEPT2" in any of them and see what happens.
[+] tqi|5 years ago|reply
Characterizing this as a "fight between Microsoft and the entire genetics community" is... a choice. But I guess Tech is this season's "big bad", so every subplot has to tie back.
[+] toast0|5 years ago|reply
Better tools for managing genomic information are probably out there, but if everyone picks Excel (or workalikes that all do automatic date formating too), then hoping people switch is tilting at windmills.

Unilaterally declaring you will not use names that confuse Excel will fix the problem (until we get new month names, anyway), and not require anyone outside their sphere of influence to comply.

I had the same situation at work, as an aquired company we had servers named devX, the new parent had servers named devX and pushed ssh config files to our laptops that made it impossible to connect to our servers; we asked them to fix it the first couple times, but eventually renamed our servers to vedX to avoid the issue. It wasn't the right solution, but it was a stable solution and let us get on with our lives instead of fighting with IT.

[+] asdff|5 years ago|reply
The vast majority of people who use tabular data a lot in biology are already just using python or R primarily, and most do this work on a cluster. This article is what happens when you send your .csv to a PI who hasn't bothered to brush up on any computer literacy after somehow mastering punch card programming, or the undergrad whose breadth of understanding beyond the web browser starts with MS office and ends with g suite.
[+] HiFaraz|5 years ago|reply
But they did pick a better tool: they changed the name. Nomenclature is also technology.
[+] hermitcrab|5 years ago|reply
Excel is an amazing tool. But it also has some significant shortcomings:

* A well known tendency to mangle date and gene data under the guide of being 'helpful'. * Easy to making mistakes when cutting and pasting cells. * Difficult to see what is going on in a spreadsheet. * Poor handling of CSV files.

Some of these shortcoming are inherent to spreadsheets. Others are specific to Excel, but hard to overcome due to the weight of backward compatibility.

I have written a product for transforming and analysing tabular data (https://www.easydatatransform.com) that tries to overcome these issues:

* Doesn't change your input data file. * Doesn't re-interpret your data, unless you ask it to. * See changes as a visual data flow. * Operations happen on a whole table or column. * Good handling of CSV files.

Also it doesn't try to do everything Excel does.

It is a fairly new tool. Would appreciate some feedback.

[+] gverrilla|5 years ago|reply
I'm a business guy, so I don't know much about it, but looks very useful and easy! I have been off the game for 5 years now, but I used to run a magento ecommerce, and I think your software might have helped managing products listings and stuff like that at the time. You might wanna take a look at this costumer segment, particularly small and medium businesses. They probably have a process for this already, but your software might be a good replacement, even though they won't be actively looking for it because they already have something that works. Good luck!
[+] tucaz|5 years ago|reply
Website looks great. Will try shortly and I'm more than willing to give you my money if it delivers. We need a product like this.

Edit: great call out to 7 days of non consecutive use

[+] zuno|5 years ago|reply
Thanks. I just checked the video. Looks neat! Excited to try it out.
[+] leecarraher|5 years ago|reply
Final Jeopardy : Trebeck: You know what, how about you just write down a number, any number at all. Could be a 1 or a 2, perhaps 3... and excel you answered; A smiley face emoji, simply stunning.

Excel is no longer motivated by the original intention of a spreadsheet, and now caters to the lowest common denominator, a piece of graph paper. As such MS has shifted focus from doing calculation to text and graphics layout tool. white text copied from a terminal : white text white background you got it! comma separated numbers : default a long string with commas in it want a plot : it is in the insert menu for some reason, since plots and numbers are no longer excels raison d'être

[+] rkachowski|5 years ago|reply
My mind is blown that Excel's usability is so bad that the representation of the human genome itself has to adapt around it's undesired behaviour.

As in, the history of genetics research is now irreversibly linked with the shortcomings of this one software product, which just happens to be incapable of describing the genetics of the organisms that created it.

[+] mmcgaha|5 years ago|reply
I hate to sound like a salty old IT guy, but here we go. It is not the fault of Excel that people are using it wrong. They have the ability to import the data as text but they skip that step all together. If the user does not say up front what the column is, Excel has to guess. If Excel didn't try to guess, someone would be making a comment on how bad usability is when an obvious date field was getting interpreted as text.
[+] throwaway936482|5 years ago|reply
Excels usability isn't bad though. Which is why everyone uses it. It falls down in this (and lots of other cases) but if you want to see what good usability looks like, look at what people use.
[+] ep103|5 years ago|reply
My favorite is still that Excel can't handle dates before 1900
[+] Cactus2018|5 years ago|reply
In a previous version of MS Excel, after opening a CSV file, Excel would silently write the 'interpreted' version to disk. So much trouble, simply from previewing a CSV.
[+] glofish|5 years ago|reply
Infuriatingly,

the paper announcing the new guidelines of renaming genes, a work of fundamental importance to all scientists in the world, cannot be read without an expensive subscription to the journal.

https://www.nature.com/articles/s41588-020-0669-3

Thanks science (sarcasm!)

[+] jordigh|5 years ago|reply
To people asking, "why do they use Excel?" that's like asking "why must we be subjected to gravity?"

The whole world's data ultimately comes from or ends up in an Excel spreadsheet. Sure, we might use other intermediate data storage methods, but in the end it's going to go into some scientist's or some politician's computer, and by golly it's gonna be in Excel. Trying to rally against Excel is like trying to rally against fundamental forces of nature.

This is just an example of that fundamental force winning.

[+] lordnacho|5 years ago|reply
The problem is that your average business user of Excel thinks type safety is a bug, not a feature. If Excel enforced types instead of guessing on your behalf, a lot of people would complain.

And it would be really hard to explain why it is sometimes useful to have constraints to someone who normally sees Excel's flexibility as its main strength.

I don't think I've ever seen a non coder use Excel in a sensible way: maintainable, easy to change, consistent meanings of entities, simple to understand. It's always a ball of spaghetti, even for pretty small projects. Loads of VLOOKUPs, external DLLs, buttons everywhere, vba files galore. Plus they lay out the cells haphazardly.

[+] jrott|5 years ago|reply
Yeah excel produces hairballs so easily. From a coder perspective I've only seen excel workbooks that are bad or terrifying.

On the other hand a ton of people that don't write software for a living and have no interest in code manage to produce things that help them do their job and automate a ton of tedious stuff.

[+] yummypaint|5 years ago|reply
Why in the world is excel the application of choice? Last time I tried to use it (for much less complicated things than genetics), it choked and became unusable when the filesize exceeded about 6MB. I have yet to encounter a spreadsheet oriented task that isn't better implemented in gnumeric. Maybe we should also shorten all publications so adobe reader can display them without crashing?
[+] c3534l|5 years ago|reply
I've always stood by the principle that a computer should never "correct" human input without asking and this is a great example of that. There should be a prompt asking if you want to convert the data, and there should always remain a way to undo "corrections," but it should never happen without the user knowing about it or wanting it to happen.
[+] mnw21cam|5 years ago|reply
The problem is it isn't just dates. If you load a list of genomic variants (read: mutations) into Excel, then the standard way to describe whether a person has the variant or not is to use "0/0" for no, "0/1" for yes heterozygous, and "1/1" for yes homozygous. Guess what gets auto-converted into the first of January.
[+] alphanumeric0|5 years ago|reply
Research software dev here. This came as a huge shock to me when I started at my job. I work with very smart, dedicated people performing cancer research, why would they put up with this affecting their productivity? Humans really are adaptable creatures.

After a few months of working there my boss handed me 3 or 4 Excel spreadsheets to compare to ensure a recent change I made hadn't affected our data (we don't have much in the way of automated tests either). As a software developer, this was a deeply troubling request.

One option was to load them in to database tables so that I could perform SQL queries against the data (Postgres has COPY that works with CSVs), which isn't hard and probably the path most people should take, but I didn't want to write table definitions.

I ended up using https://github.com/BurntSushi/xsv (I am not affiliated with the project in any way). It's a command-line tool written in Rust that performs queries/joins/manipulation/basic analysis against CSV/TSV files. While not as analytically powerful as Excel or Postgres, I was able to verify the data was good and pipe out results into another file without writing any custom code, and without opening a single file.

[+] t-c-h|5 years ago|reply
As a bioinformatician I'm not too fond of this shift. Software should be sculpted around our needs, not the other way around. It's basically submitting to the fact that we've stubbed our toes hundreds of times to the exact same rock and never learned.

But then again, I use Excel rarely, usually only at the very end of some analysis (even then I prefer R/Python libraries for visuals). So I do have sympathy for wet-lab researchers who rely heavily on Excel.

[+] foresto|5 years ago|reply
Somewhat tangential:

If you use software in American English but dislike American date and time formats, you might see if your OS respects the en_DK locale setting. It works on most recent linux systems I've tried.

$ LC_TIME=en_US.UTF-8 date '+%x %X'

08/06/2020 01:00:00 PM

$ LC_TIME=en_DK.UTF-8 date '+%x %X'

2020-08-06 13:00:00

[+] numpad0|5 years ago|reply
Excel’s problem is it destroys the original keystrokes.

Anyone know why? It makes little sense to me.

[+] Brett_S|5 years ago|reply
If the scientists had asked someone who knew Excel well, then they would have been told to prevent autocorrect from running enter ’MARCH1 with the apostrophe at the start.
[+] zw123456|5 years ago|reply
There is a super simple work around for this and it is all over the web if you search for it. Open a blank sheet, go to data, click on From Text, that allows you to import data identifying the type for each column. Yes, it is a couple more clicks but for 99% of people the auto-formatting is probably nice. I have shown this to people a million times at work and they are always amazed. But honestly a quick google search provides the solution.
[+] csours|5 years ago|reply
Excel also clobbers long numbers, which has caused all manner of confusion for serial number audits.

To prevent this, set your whole sheet to Text before any other steps.