cye131's comments

cye131 | 8 months ago | on: Time Series Forecasting with Graph Transformers

I'm not a fan of this blog post as it tries to pass off a method that's not accepted as a good or standard time series methodology (graph transformers) as though it were a norm. Transformers perform poorly on time series, and graph deep learning performs poorly for tasks that don't have real behaviorial/physical edges (physical space/molecules/social graphs etc), so it's unclear why combining them would produce anything useful for "business applications" of time series like sales forecasting.

For those interested in transformers with time series, I recommend reading this paper: https://arxiv.org/pdf/2205.13504. There is also plenty of other research showing that transformers-based time series models generally underperform much simpler alternatives like boosted trees.

After looking further it seems like this startup is both trying to publish academic research promoting these models as well as selling it to businesses, which seems like a conflict of interest to me.

cye131 | 9 months ago | on: Gemini 2.5: Our most intelligent models are getting even better

The new 2.5 Pro (05-06) definitely does not have any sort of meaningful 1 million context window, as many users have pointed out. It does not even remember to generate its reasoning block at 50k+ tokens.

Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.

cye131 | 10 months ago | on: A flat pricing subscription for Claude Code

I'm curious whether anyone's actually using Claude code successfully. I tried it on release and found it negative value for tasks other than spinning up generic web projects. For existing codebases of even a moderate size, it burns through cash to write code that is always slightly wrong and requires more tuning than writing it myself.

cye131 | 10 months ago | on: Qwen3: Think deeper, act faster

These performance numbers look absolutely incredible. The MoE outperforms o1 with 3B active parameters?

We're really getting close to the point where local models are good enough to handle practically every task that most people need to get done.

cye131 | 11 months ago | on: Big Book of R

R especially dplyr/tidyverse is so underrated. Working in ML engineering, I see a lot of my coworkers suffering through pandas (or occasionally polars or even base Python without dataframes) to do basic analytics or debugging, it takes eons and gets complex so quickly that only the most rudimentary checks get done. Anyone working in data-adjacent engineering work would benefit from R/dplyr in their toolkit.

cye131 | 1 year ago | on: Introducing deep research

Does anyone actually have access to this? It says available for pro users on the website today - I have pro via my employer but see no "deep research" option in the message composer.

cye131 | 1 year ago | on: Emerging reasoning with reinforcement learning

Is it accurate to compare 8k example RL with 8k example SFT? RL with the same amount of examples would take massively more compute than the SFT version (though depending on how many rollouts they do per example).

RL is more data-efficient but that may not be relevant now that we can just use Deepseek-R1's responses as the training data.

cye131 | 1 year ago | on: Unlocking the power of time-series data with multimodal models

There is a surprisingly common use case for "quick and dirty univariate time series forecasts" that are basically equivalent to giving a small child a pencil, and asking them to draw out the trendline. The now-deprecated Prophet model from Facebook (which was just some GAM) was often used for this. Auto-ARIMA models, ETS etc are also still really commonly used. I also see people try to use boosted trees, or deep learning stuff like DeepAR or N-BEATS etc even though it's rarely appropriate for their 1k-datapoint univariate time series, just because it gives off the impression of serious methodological work.

There are a lot of use cases in business were what's needed is just some basic reasonable-ish forecast. I actually think this new model is really neat because it completely dispenses with the pretense that we're doing some really serious and methodologically-backed thing, and we're really just looking a basic curve fit that seems pretty reasonable with human intuition.