top | item 27874857

(no title)

gaha | 4 years ago

The way I love doing these kinds of things is by literally using three lines of R code:

library(sqldf)

tab1 = read.csv("file1.csv")

sqldf("select * from tab1")

sqldf [1] is a library with which you can access R dataframes just like tables in SQL (it is actually using SQLite in the background). I do not have much experience with programming in R itself, but you barely need it. This approach has the advantage that it is very flexible, and you get the power of the (IMHO) best plotting library ggplot with it. Of course, you can also do more complicated stuff like joining multiple dataframes and basically everything else you can do with SQLite, and then store results back in R dataframes. This workflow works if you use an IDE which lets you execute single lines or blocks of codes like R-Studio. Then you also get a nice GUI with it, but there are also plugins for VI/Emacs that work very well.

[1] https://cran.r-project.org/web/packages/sqldf/index.html

EDIT: code formatting

discuss

order

RobinL|4 years ago

Checkout the R bindings for DuckDB[0]. You should find that it does the same thing (i.e. run SQL against a dataframe/file on disk) much faster for many SQL operations.

[0] https://duckdb.org/docs/api/r

legg0myegg0|4 years ago

Came here just to recommend DuckDB! :-) Huge fan. It's unreasonably fast for how easy it is to use.

nojito|4 years ago

If you’re going to use R you might as well learn a bit of dplyr or data.table.

You get that autocomplete goodness and the vast majority of sql can be reduced down to single lines.