top | item 21791637

Rsuite – R development and data science platform

97 points| wjak | 6 years ago |github.com | reply

113 comments

order
[+] amirmasoudabdol|6 years ago|reply
I’m using R for a while in my current position, alongside some other programming languages, Python and C++. R is bar far the hardest to predict and read. Rstudio is terrible. It’s a wrapper around a “web app” and that simply doesn’t work well for something as complicated as IDE. To give an example, Rstudio does only one thing at the time, you are running a code, you cannot open a data frame even to look at it. Rstudio doesn’t at all behave likes any other IDE that you’ve seen either. Try to increase the font size and the whole idea scales up!

R by itself is a mess, and I don’t think I have to say much about that. R community is big and that’s good and bad. It’s good because amazing people are developing amazing packages for it. It’s bad because there is a lot of bad packages. It’s a lot like JavaScript community. I have a feeling the community has started to reward “having a package”, and everyone has a package.

Besides the quality of R packages and R being a strange programming language, R gets the job done. However, if your job is anything beyond some statistics and data processing, then good luck. I’m not saying that you cannot achieve what you want to achieve using R, however, good luck reading R code. I found it extremely hard to read R codes and so far 90% of codes that I’ve encountered have little to no comments.

[+] epistasis|6 years ago|reply
> R being a strange programming language

I'm probably an outlier, but I have to say that the language itself is one of my favorite things about R.

Vector based, super powerful indexing of vectors, functional programming basics, lazy parameter evaluation, super convenient parameter matching and defaults, all these things make it super productive for me and let me deal with data far better than other languages. Matlab is similar in its ability to deal with data, but that's a language that feels far clunkier to me. Python has caught up with some of its packages, but it definitely feels bolted on instead of native to the language.

[+] psv1|6 years ago|reply
> To give an example, Rstudio does only one thing at the time, you are running a code, you cannot open a data frame even to look at it.

This isn't specific to R or RStudio. Start running a slow process in your Python IDE of choice, and while it's running try to execute df.head() to view some data frame - you won't be able to see it regardless of the language or IDE (and for a good reason).

[+] CreRecombinase|6 years ago|reply
Even if your job is beyond statistics and data processing, you're probably using R because the core of what your team is doing is statistics and or data processing. If that's the case, and if you're the computationally sophisticated member of the team, then shouldn't the onus be on you to understand/adapt to what your less "sophisticated" peers are using?
[+] RA_Fisher|6 years ago|reply
There are a lot of really fantastic packages, too. Many are the only implementation of a certain stats tool in the world.
[+] ekianjo|6 years ago|reply
> It’s a lot like JavaScript community.

Erm, not at all. You don't need a hundred packages to run a trivial application. Even R-base is reasonably powerful, and it goes to a complete different level once you use tidyverse as a layer to code everything for R.

[+] sxv|6 years ago|reply
Felt the same way for years coming from the python world. The R for Data Science book[0] was a game changer in making R enjoyable for me.

[0] https://r4ds.had.co.nz/

[+] mslip|6 years ago|reply
If you want to look at data frames as you run sections of your code you will have to use r markdown chunks.
[+] laichzeit0|6 years ago|reply
How do you guys get R predictive models into production? Last I used Plumber to put a REST API in front of it then discovered R is a single threaded runtime so effectively you can only go 1 request at a time. I guess the only option is to containerize and run many instances with a load balancer in front? I develop on a Mac so I can’t go the Microsoft R server route and I don’t want to embed myself into some commercial solution, e.g. Rsuite. You can trivially do this with the Python ecosystem.

My feeling is that R is great for anything that doesn’t need to be operationalized into production (monitoring, security, logging, scaling, performance, etc). There are so many good ML/stats libraries in R and most books seem to use R (when written by academics) but it feels like these people have never had to put anything into production.

[+] CapmCrackaWaka|6 years ago|reply
It depends on what you mean by 'production'. I've had great success setting up my data collection, engineering and predictions in batch processes. I agree though, I would never try to use R with a REST API, but I don't think it was ever designed for that.

As a general rule of thumb, if something needs real time predictions or I need deep learning libraries, I use Python. R is for anything else.

[+] meztez|6 years ago|reply
R is like any other languages, we have a few rest API in production for live prediction. We use rocker docker image with xgboost and plumber, data.table to do pre prediction data wrangling. Hosted on GCP kubernetes, using 0.25 cpu and 250 mem, API is able to do around 40 requests per second per pod. Multi models, both have more than a 1000 trees.
[+] demirev|6 years ago|reply
I can highly recommend RestRserve [0] for bringing R models into production (it forks every request so scaling up is easier than with Plumber). I use it regularly for various projects and I have had minimal issues with it.

[0] https://restrserve.org/

[+] wjak|6 years ago|reply
R is single threaded. The same is with python. We use kubernetes for scaling. But it is not for all applications of course. R can be put into production. Rsuite is one of the solutions that helps with that.
[+] proverbialbunny|6 years ago|reply
ymmv, but many of the libraries R uses run on multiple languages, so you can take the models built in R and run them in another language (usually Java).

Python is single threaded as well. Like Python, R can be made multi threaded, and like Python, R can be productionized without having to convert it into another language.

One possible implementation is a pool of R workers. Each request calls an R worker. So if your pool is 100 and you get 20 requests from 20 different users at once, all 20 will be ran simultaneously. Likewise, many tasks can and should be cached. Consider MemcacheD or similar.

[+] kusmi|6 years ago|reply
I always used NiFi.
[+] glofish|6 years ago|reply
R, unfortunately, is also one of the most ill-designed yet popular programming languages in existence. I would strongly recommend people to steer away from it. If you cherish your sanity stay away from using R!

Moreover after seeing what my colleagues publish as scientific R programs, I came to believe that science itself is bottlenecked by the large scale adoption by R and the sloppy, inconsitent and bug-infested programming practice that it encourages.

R does a few things well - cross-platform, plotting works on all platforms, packaging works well. But for actually programming it is atrocious.

[+] CreRecombinase|6 years ago|reply
What is "actually programming"? Is fighting with your package manager (I'm looking at you python) "actually programming"? Is re-implementing functionality that exists elsewhere in the hipster language du jour (e.g rust, julia) "actually programming"? I totally concede that R itself is a fairly unremarkable lispy mostly functional programming language. What makes it stand out is it's emphasis on immutable, in-memory, array-based data structures. This means that 1) it's very straightforward to wrap highly performant C/C++/Fortran libraries 2) despite being a dynamically typed language, it's usually quite straightforward to reason about the type/shape of the inputs and outputs of a function 3) individual functions from one package can often be easily combined with functions from another package. I totally get it if any of this isn't your thing, but to write off a whole ecosystem as "ill-designed" (without literally any argument besides "my co-workers, and scientists in general, are stupid"), is pretty lazy.
[+] curiousgal|6 years ago|reply
What are you on about? R has a specific use, which is statistics and data science. For those purposes it reins supreme. Even for developing dashboards, R-Shiny is a breeze compared to Pythons's Dash. R is awesome.

Also, to quote a comment[0] of yours from 2 days ago:

What you are saying is that since you prefer something everyone should do the same.

And what you prefer is the correct choice for everyone ...

0.https://news.ycombinator.com/item?id=21774917

[+] Mikeb85|6 years ago|reply
R is a scripting language, most of the underlying infrastructure is written in Fortran, C and C++. R is also designed for stats, not writing software. Of course you're going to have a hard time if you treat it like a real programming language. That's why R provides easy interop with other languages.

But R also makes a lot of the tasks you do in data science far easier than it would be in a 'real language'.

[+] whyhow|6 years ago|reply
This is silly. I bet your colleagues are making poor R programs because they are not well versed in programming, not because R is inheritanly worse than anything else.

My experience is that people who make bad R programs also make bad python programs. I don't think you should blame a tool for issues caused by the programmer.

[+] wodenokoto|6 years ago|reply
R is a great language with a powerful, but very dated standard library.

But don’t worry about R bringing science down - the scientific community can also write terrible python code.

[+] closed|6 years ago|reply
I strongly disagree. And in general, it seems like poor quality code in science is often not because of the language, but because scientists rarely lose their jobs when code breaks.

There are many books that cover how to develop in R in detail, and they are no less thorough than treatments of the subject in other languages (e.g. Hadley's books are as good as any I've read for python).

Many issues around inconsistency, etc, in language design (mostly how base functions / data types behave) have very clean, consistent implementations in libraries like rlang.

The main differences I see when comparing R vs python package code, that affect style are...

1. Most R operations are immutable.

2. R often uses single dispatch, rather than putting methods on a class object.

3. In R, vectorised behavior is often the norm.

4. R functions can choose to use lazy evaluation (it usually very clear when this happens in e.g. tidyverse packages).

These issues are covered in detail in books like Hadley's Advanced R.

[+] ineedasername|6 years ago|reply
This comes across as just another comment on why one language is "bad" when, as always, it all comes down to trade offs & preference when choosing a tool for a job.

The thing is, it's easy to write bug-ridden sloppy code in any language. Bemoaning R as a language because of these flaws, occurring due to rapid adoption, ignores the reasons why R has seen wide-scale adoption.

R has had an extreme democratizing effect on access to tools that facilitate data science. Previously, tools for data science were either massively expensive or had a prohibitively high price tag attached.

This means that many non-programmers are coming to R, and I maintain that the problems the parent post sees with R stem from that fact. As a result, any language that achieved that sort of layman (to programming) appeal would have the exact same bug or sloppy code fallout. That you cannot separate the momentum that led to such an accessible tool without have the same consequences. Rather than demonize the tool for this, we should recognize the positive dynamic at play and simply help guide users to better practices or improvements that would fix the issues.

[+] clircle|6 years ago|reply
I'm on the opposite site of the fence, but I'd love to hear some specifics on how R is ill-designed and encourages buggy programs.
[+] kgwgk|6 years ago|reply
> one of the most ill-designed yet popular programming languages in existence

What would the others be? Python is one, I guess.

[+] minimaxir|6 years ago|reply
Base R is bad; R augumented by other packages (e.g. tidyverse and data.table) is just as performant/easy-to-use, if not more, than other data science tools.
[+] CapmCrackaWaka|6 years ago|reply
> But for actually programming it is atrocious.

I really hope you are not using R for anything outside data science, physics, or other analysis. It was developed to do these things, not 'actual programming', which I imagine you define as creating some framework or application.

Most of the people that don't like R seem to want to use it outside of its use cases, and get frustrated when they fail.

[+] ekianjo|6 years ago|reply
> R does a few things well - cross-platform, plotting works on all platforms, packaging works well. But for actually programming it is atrocious.

Yet R is a lot more expressive than Python + Pandas for data related applications. It was never made as a universal language to develop any kind of applications, but it's pretty good at what it does with data manipulation.

[+] stewbrew|6 years ago|reply
It depends. R excels at backward compatibility and at interactive data analysis, which is what it's made for. But you're right in so far that you probably shouldn't use (much) R code in production.
[+] syrahshiraz|6 years ago|reply
Disclosure: I work at RStudio

Took a quick look at the docs. If you're looking for dependency management there's renv[1] and you can (obviously) use git for source control. If you actually have enterprise use cases for library curation or air-gapped deployments, you can check out RStudio Package Manager[2]. Among other things, it provides precompiled binaries for packages, which Rsuite doesn't improve on, per docs[3]:

> Now you are ready to install dependencies. Beware that it will take a lot of time because of compilation. You install dependencies with the following command:

[1]: https://github.com/rstudio/renv

[2]: https://rstudio.com/products/package-manager/

[3]: https://rsuite.io/RSuite_Tutorial.php?article=rsuite_binary_...

[+] wjak|6 years ago|reply
Rsuite has supported binary pkgs about a year before rstudio. You have not read docs to the end. Rsuite has been used for enterprise. It works great. And it is open-source. Moreover it brings proper definition of R project which rstudio still is missing.
[+] psv1|6 years ago|reply
After a couple of minutes on their website I still can't figure out what advantage this offers over using RStudio as an IDE and/or running scripts with the default CRAN R installation.
[+] wjak|6 years ago|reply
Hi, I one of the creators. From GitHub page: R Suite an R package which together with R Suite CLI tool enables you to design deployment workflow that fits you and makes R your primary data science platform. It has beed developed by WLOG Solutions company to make their development and deployment data science process robust.

R Suite gives answers to the following challenges for any R based software and data science solution:

- Isolated and reproducible projects with controlled dependencies and configuration.

- Separation of business, infrastructural and domain logic.

- Package based solution development.

- Management of custom CRAN-alike repositories.

- Automation of deployment package preparation.

- Flawless integration with Docker.

- Development process integrated with version control system (currently git and svn).

- Working in internetless environments.

[+] wodenokoto|6 years ago|reply
Only read the headlines. My understanding is that they have version control of packages (something to the effect of a virtual environment, but maybe with a completely different approach)
[+] williamstein|6 years ago|reply
After a few minutes I can't even tell what problems this is supposed to solve or if it is even related to solving an IDE problem... the actual site starts with bullet points that describe the product, but not what problems it solves:

" - Open source with Enteprise [sic] support.

- Designed to separate..."

The very first bullet point has a typo so maybe this isn't very mature yet?

[+] ngcc_hk|6 years ago|reply
Not yet tried but open source ... free ...
[+] arminiusreturns|6 years ago|reply
I see a lot of people hating on R or on R-studio. For those people, I'm curious what you would posit as an alternative?

I have liked R because I use it simply, inside emacs org-mode code source blocks which use either R to generate plots or gnuplot. Based on comments, now I am afraid I will reach some ceiling in R. What else is there? Octave? Sage? Julia?

[+] malshe|6 years ago|reply
If you read the comments, it’s just one guy giving his opinion on every comment without any supporting evidence.
[+] xvilka|6 years ago|reply
Not hating the R, but from what you listed Julia comes the closest and the best designed of all. Octave is bound to be MATLAB compatible, this prevents language innovation. Sage is bound by being "a middle ground" for all third-party languages and frameworks it incorporates. Julia language is cleaner.
[+] anthony_doan|6 years ago|reply
I'm going to buck the trend and state that I love using R for modeling and statistic.

The R packages for these domains are one of the best I've seen.

As for R in production, I would wrap it using https://www.rplumber.io/.

[+] vhhn|6 years ago|reply
Hi Wit, you guys do a great job to make R ready for deployment in production.

What do you think of the new renv package?

[+] wjak|6 years ago|reply
It's goal is different.

We started with reproducible project definition. Then we implemented rsuite to help manage the project. It includes dependency management which is what renv solve. What is the biggest difference is that our project consists of possibly many pkgs that are local to it. This allows you to create complex solutions. Moreover deployment PKG is zip file and to use it you only need r. No PKG installation on prod.

[+] ngcc_hk|6 years ago|reply
I heard of R when I am fond of XLispStat. The older language is good but it is lisp. Hence, people move on to the mess. I just use R on a pragmatic manner. It is very hard if you take the language too serious. Just use it. And if you can compare a bit your result with other like old SPSS you are familiar with as the result is quite programmer dependent.