melondonkey's comments

melondonkey | 1 year ago | on: Amazon is filled with garbage e-books, this is how they get made

Usually pretty scam savvy but dropped my guard and bought an absolute garbage AI translation of The Little Prince on Amazon. Now I research anything before buying

melondonkey | 1 year ago | on: NPR suspends veteran editor as it grapples with his public criticism

I think it’s honestly annoying how they feel they have to parenthetically add every time something is a lie or untrue. While their intention is good I think it does a service to no one and underestimates the intelligence of their listeners.

Also almost every story gets tied to either identity politics or climate change. Also just gets annoying even for those who agree. It’s like watching a movie with too much exposition dialogue.

melondonkey | 1 year ago | on: A rudimentary simulation of the three-body problem

Looks like Pokemon Jirachi

melondonkey | 1 year ago | on: Ask HN: Is a masters in ML worth it?

The cultural divide between ML engineers and “girls and gays” in data science is very real and in my experience getting worse. Good but rare when the styles can be brought together.

melondonkey | 1 year ago | on: AutoBNN: Probabilistic Time Series Forecasting

Damn, this is like the fifth time series framework posted this week.

This one seems theoretically more interesting than some others but practically less useful. For one, who wants to do stuff in tensorflow anymore let alone tensorflow-probability. Tp has had ample time to prove its worth and from what I can tell pretty much no one is using it because of a worst of both worlds problem—DL community prefers pytorch and stats community prefers Stan.

I’m starting to feel like time series and forecasting research is just going off the rails as every company is trying to jump on the DL/LLM hype train and try to tell us that somehow neural nets know something we don’t about how to predict the future based on the past.

melondonkey | 1 year ago | on: Math writing is dull when it neglects the human dimension

Hard to meet everyone where they are and at the same time give them a relevant practical application for their own life. Good learners just soak it up and look for the application later. But that also doesn’t fit all. It’s hard to even write a pop song that everyone likes so math education that appeals to all is almost impossible

melondonkey | 1 year ago | on: DBRX: A new open LLM

Data scientist here that’s also tired of the tools. We put so much effort in trying to educate DSes in our company to get away from notebooks and use IDEs like VS or RStudio and databricks has been a step backwards cause we didn’t get the integrated version

melondonkey | 1 year ago | on: Moirai: A time series foundation model for universal forecasting

One detail I don’t really understand is the low-variance normal component of the target mixture. Would be curious to see from the weights how often that was used

melondonkey | 1 year ago | on: Moirai: A time series foundation model for universal forecasting

I know. Here I am modeling my data generating process like a chump.

melondonkey | 1 year ago | on: Ask HN: How to find a fullfilling career after a data science job?

Just needs a less engineering-oriented DS role and will be fine. Consulting is a good way to work in lots of industries and try things on.

melondonkey | 1 year ago | on: Chronos: Learning the Language of Time Series

I guess I just mean I’m a data scientist—someone who uses models like these in practice as opposed to someone who develops them.

I’m not sure what to even make of a term like “foundational time series”. Does that just mean it’s widely used and known? You have to earn a role like that you can’t just declare yourself one.

melondonkey | 1 year ago | on: Chronos: Learning the Language of Time Series

As a practitioner the most impactful library for time series has been brms, which basically gives you syntactic sugar for creating statistical models in Stan. Checks all the boxes including probabilistic forecasts, multiple link functions for the likelihood including weiner, gamma, Gaussian, student t, binomial, zero-inflated and hurdle models. Also has auto-regressive and ordinal predictors and you actually learn something from your data.

I find a lot of these ML and DL libraries to be harder to troubleshoot beyond blind hyperparameter tuning whereas with stats I can tweak model, modify likelihood, etc. There’s also a lot of high value problems that have few data points these libraries tend to want at least daily data.

melondonkey | 1 year ago | on: RStudio: Integrated development environment (IDE) for R

Weird one minute it feels like the internet is screaming that I’m an out-of-touch dinosaur for using R and the next a simple link to its most popular IDE makes the front of HN.

melondonkey | 2 years ago | on: Why do tree-based models still outperform deep learning on tabular data? (2022)

This is interesting. Are BART models differentiable? I haven’t looked closely at them but I would have thought for posterior sampling they’d have to be. BART has been around for a while, too

melondonkey | 2 years ago | on: Why do tree-based models still outperform deep learning on tabular data? (2022)

What? Can you explain the mechanism than a NN can “extrapolate” an invoice where a tree model couldn’t? This is all just how the modeler builds the features.

Also all models are a “mean of the subgroup of the data.” The prediction is by definition the conditional mean as a function of the input values.

melondonkey | 2 years ago | on: Spc-kit: A toolkit for statistical process control using SQL

More dashboards need this I think. I’ve also added relative standard error values on aggregations before to serve as a reliability filter that doesn’t even show users data when they slice it too then.

melondonkey | 2 years ago | on: Why do tree-based models still outperform deep learning on tabular data? (2022)

At this point I wish every junior DS could read this paper and not come in to every problem with the new bright idea that they’re going to beat XGBoost with their DL architecture. Free promotion if they never say the words “latent subspace”

melondonkey | 2 years ago | on: Why do tree-based models still outperform deep learning on tabular data? (2022)

This explanation doesn’t make sense to me. What do you mean by “linearize your data”—tree methods assume no linear form and are not even monotonically constrained. Classification is not done by plane-drawing but by probability estimation + cost function