top | item 8653345

(no title)

12423gsd | 11 years ago

The RStudio guys have really made R a pleasure to use. Thank you guys!

The core language is still a confusing mess (I'm still never sure when to use a matrix, a dataframe, a list..), but if you use their tools you can ignore it for the most part.

In under 10 lines you can massage data and generate fantastic graphics.

A little off topic: but does anyone know what their business model is? Are they going to run out of money and burnout in a year or two?

discuss

order

hadley|11 years ago

If you're confused about R's data structures, please read http://adv-r.had.co.nz/Data-structures.html and let me know if it doesn't help.

And no, we're not planning on burning out. We currently sell three things:

* RStudio Server Pro. An commercial version of the open-source server version that provides stuff that corporate IT wants (e.g. monitoring, more auth options, ...)

* Shiny Server Pro. A more flexible version of the open-source shiny server that offers more configurability (e.g. number of R processes per app), and again other stuff that corporate IT wants.

* Right to use the RStudio desktop IDE to companies who don't want to use AGPL software

unknown|11 years ago

[deleted]

smachlis|11 years ago

Here's how I think of it, which has been working for me:

matrix - If you have data that would make sense to be in a spreadsheet-type format and all your data are numbers.

dataframe - If you have data that would make sense to be in a spreadsheet-type format and some columns are numbers but other columns are something else (character strings, dates, TRUE/FALSE); but each column is only one thing. That is, you have one column that's all dates, another column that's all numbers, yet another column that's all character strings, etc.

list - if you need to mix data types within a certain entity (vector or column of data).

hadley|11 years ago

Unless you're doing linear algebra (or really care about memory usage), you almost never need to use a matrix in R.

jowiar|11 years ago

To piggyback on what hadley said a bit, I find thinking of a data frame as a "collection of records", and a matrix as "two dimensional data" to be a bit better.

One useful heuristic worth asking is "Does it make sense to sort this data by something". In that case, you have a data frame. Whereas if you want to perform matrix math on something (inverting it, multiplying it by another matrix, reducing it, etc.), you have a matrix. Things that I use a matrix for can generally also be expressed as a data frame with columns rowId, colId, and value. If it doesn't make sense in that format, a matrix is generally not the appropriate structure.

grayclhn|11 years ago

I'd amend that a little: use a matrix when you're actually calculating statistics (internally to the function). Clean your data so it always fits in a data frame when you load it. Lists are for representing things like data scraped from html before converting it to a data frame.

wesleyy|11 years ago

It's always great when you spend 10 hours trying to debug something and then find out from a mailing list that it's actually a bug in R. :(