top | item 37657772

Yandex open sourced it's BI tool DataLens

268 points| SergeAx | 2 years ago |github.com | reply

86 comments

order
[+] xnx|2 years ago|reply
Such a huge landscape of BI and data visualization tools. Are there any clear open source leaders? Apache Superset?
[+] RedShift1|2 years ago|reply
Right now I'm heavily into Grafana but when it comes to BI it kinda falls flat, I regularly have to fall back to using the Plotly plugin to create the charts (but it's getting better, at least you can do a normal scatter or bar chart out of the box since version 8. Labeling the axes is still a problem though). Navigation is also problematic, like jumping to a source table takes a lot of effort to make (basically you have to create a new dashboard and do some hyperlinking instead of there being a ready to go "view source data" button). I feel like there's a lot of friction to get Grafana to do BI, but I've also invested so much time in it I'm afraid to jump ship...
[+] FridgeSeal|2 years ago|reply
I’d advise metabase over superset.

Superset looked good, but operating superset quickly runs into the same Python issues all Python software suffers from.

Sometimes it would just break for no apparent reason. Configuring it was a nightmare of magic Python code and unclear settings. Trying to use plugins was equally painful: due to the poor boundary separating the applications dependencies from the plugins dependencies, adding a db connector could just bork the whole application.

[+] noughtme|2 years ago|reply
Also Metabase, which I found easier to deploy and use.
[+] doctorpangloss|2 years ago|reply
What do people use for an "analytics.js" for reporting events with common items like campaign data, user device and user profile measurements, and related from browsers and devices?
[+] zxspectrum1982|2 years ago|reply
Is there any open source BI tool that can be embedded in some other product so that users (not product developers but final users) can create their own dashboards?
[+] moltar|2 years ago|reply
QuickSight has great embedding story

Also CubeJS if you want to a bit more flexibility.

[+] cvalka|2 years ago|reply
This is awesome! Let's not forget about their second generation SQL database YDB which is an open source alternative to TiDB and YugabyteDB.
[+] mdekkers|2 years ago|reply
Yandex consistently pumps out great software. Clickhouse is awesome, for starters, as is this.
[+] felixhummel|2 years ago|reply
Apache 2.0 licensed (from a cursory glance at the first few repos).
[+] totalhack|2 years ago|reply
Looks pretty neat. Having not fully investigated it yet, I will say the one thing I usually run into with this and other BI tools is a lack of flexibility in the UI for forming queries. It's sometimes limited to one table or view at a time. I wonder why more of them don't use more flexible querying techniques, perhaps just due to the risk of a bad query being formed?

My preferred approach is implemented in Zillion, which I use for BI at my company: https://github.com/totalhack/zillion

[+] dwheeler|2 years ago|reply
Nit on title:

it's => its

[+] pklausler|2 years ago|reply
Just be glad it wasn't " its' " (sic), which has been showing up more and more in my input streams.
[+] unixhero|2 years ago|reply
>Where can I find persistent application data storage?

>We use the .us-data folder to store PostgreSQL data permanently. You can delete this folder if you want, it will be recreated with the demo data after restarting the datalens-us container

Why "us"?

[+] lasermike026|2 years ago|reply
I'm giving up Power BI and I'm moving to Domo. I look at rolling my own occasionally.
[+] mritchie712|2 years ago|reply
(I'm the founder of a competitor to Domo)

I really like the concept of Domo. They have ETL, modeling, a warehouse and BI in one app ("data-stack-in-a-box"). I've interviewed 20 of their customers and the general sentiment was pretty bad. There's a long sales process, a longer process to get it set it up, and they've built all the modeling and connectors themselves (vendor lock in, none are best-in-class).

Definite (https://www.definite.app/) is a data-stack-in-a-box. We have a built-in modeling layer for core metrics and an AI assistant to answer any one-off questions.

A few ways we're different:

Built on open source - We run the data stack for you and give you a single app to manage and analyze your data, but it's all built on open source standards. So if you decide at any point you want to run it all yourself, the code is yours to lift and shift to your own infrastructure.

Battle tested connectors - We're using Meltano / Singer (open source library from Stitch) for our connectors, so they've been used heavily in production for years.

Self-serve that actually works - A lot of tools promise self-serve, but AI is making this real. We've invested heavily in making it possible to ask questions and get accurate answers. The AI queries a modeled view of your data that can answer questions that depend on well defined metrics (e.g. ARR, DAU, etc.).

[+] _boffin_|2 years ago|reply
We have Domo at work and it just seems overly complicated and insane. I'm wanting to learn it instead of just polling data sources and adding them to a local Opensearch instance, but... too verbose for me.
[+] RedShift1|2 years ago|reply
If you want to roll your own, maybe have a look at Plotly's Dash?
[+] mobileexpert|2 years ago|reply
Cool. Rolling your own BI seems fraught with peril at most orgs where I imagine the buy vs build decision is always buy. How many PowerBI or Tableau seats do you need before rolling your own internal BI platform starts to make sense?
[+] htrp|2 years ago|reply
I don't think it ever makes sense because the large players will always be able to make new innovations (mobile apps, natural language querying, SSO integrations, etc) that, short of large corps hiring BI teams to invest in the open source ecosystem like superset, your open source solution won't have.
[+] totalhack|2 years ago|reply
I must like to live dangerously. In all seriousness though there are low cost alternatives to those mega BI tools that suit many use cases. If I wasn't rolling my own I'd probably start with Metabase or Superset. What I use: https://github.com/totalhack/zillion
[+] londons_explore|2 years ago|reply
Yandex makes some pretty cool tech- they clearly have a lot of smart engineers.

It's a shame that geopolitics means most of it will have to be reinvented by someone else before it'll see any use.

[+] efxhoy|2 years ago|reply
Yeah. I’d love to use clickhouse but yandex ties to the russian government makes me not want to.
[+] anonyfox|2 years ago|reply
Since the discussion started in the comments already, I have a similar question: any recommendations for a solution (don’t care if OSS or not) that has the best UX for nontechnical people to assemble some data and reports anyhow? I have Salesforce, some mariadb/postgres and (optionally) hubspot as data sources.

I can buy or manually provision anything, no technical hurdles or policies from that side. My absolute focus is the raw UX for business people.

Suggestions?

[+] davidarenas|2 years ago|reply
Honestly Metabase has given best balance between allowing non-technical users to self-service and technical users to dig in and use raw sql if that's what they want. Also it OSS core so you can self host. It is super feature rich and has most everything in the OSS version as long as you don't need enterprise features like SAML auth, audit log, ...,etc
[+] arthurwu|2 years ago|reply
I'm a co-founder of Dataland.io where we're building a powerful dataset viewer + search engine that can work on top of your Postgres or data warehouse.

We designed it specifically to provide an excellent UX to business users while reducing BI burden on the data team. We find that most business users often just need to search, filter, and sort instead of looking at charts to make operational decisions.

UX-wise, what sets us apart are:

- <1s full-text search (even on billions of rows of data), feels like Cmd+F in Google sheets, but faster

- Performance: we stream billions of rows into the web browser, seamless scrolling (no paging of 50 records at a tieme)

- Rich cells make tables easier to scan/read (enum strings => colored tags, numbers => color-coded based on value => checkboxes, timestamptz => clear date time pills)

If that fits what you need, happy to give you a demo.

arthur(at)dataland.io

Otherwise, I think the simplest BI (if charts are impt) could be something like evidence.dev or Metabase.

But I also think it's going to require some curation on your part. Can you reasonably expect business users to navigate the entire schema/table tree across these three sources? That's where I think the bottleneck often lies -- if your BI tool allows engineering to just expose a subset of curated core tables.

[+] mritchie712|2 years ago|reply
I'm the founder of Definite (https://www.definite.app/). We do ETL, modeling, storage and BI in one app ("data-stack-in-a-box").

> has the best UX for nontechnical people to assemble some data

If they can use Excel / pivot tables, they can use Definite. They can also just ask in natural language and we generate the report for them.

> Salesforce, some mariadb/postgres and (optionally) hubspot as data sources

We have pipelines for all of these and can spin up a managed data warehouse to store all the data if you don't already have one.

Drop me a note at [email protected] if you're interested

[+] robertlagrant|2 years ago|reply
The problem is that developing the perfect UI for nontechnical people to assemble reports probably requires a bespoke frontend for your business, and one that likely lags behind the reality of its changes. Most businesses instead opt to just hire semitechnical people that can do a bit of data work and give answers to the report-writers, as they can accommodate business changes over time and understand how to construct new queries out of the overall business' data sources.

Maybe that'll change one day with AI, and when it does that will be bought by every big company in the world (-:

[+] tillvz|2 years ago|reply
Veezoo (https://www.veezoo.com) is built to make it as easy as possible for nontechnical users to get answers to their ad-hoc questions.

Follows a conversational "ChatGPT-like" approach since already 2016.

Info: I'm one of the founders.

[+] riffic|2 years ago|reply
glaring misuse of it's in the title
[+] Logans_Run|2 years ago|reply
Well played ;-)

ps - I spotted the pedant that is technically correct, and I claim my 5 McFun bucks!

[+] culebron21|2 years ago|reply
Add "'s" for genitive case in English, they said.