top | item 28304781

Launch HN: Evidence (YC S21) – Web framework for data analysts

263 points| amcaskill | 4 years ago | reply

Hi HN! We’re Adam and Sean from Evidence (https://evidence.dev). We’re building a static site generator for data analysts. It's like Jekyll or Hugo for SQL analysts.

In Evidence, pages are markdown documents. When you write SQL inside that markdown, the SQL runs against your database (we support BigQuery, Snowflake, and Postgres - with more to come). You can reference the results of those queries using a simple templating syntax, which you can use to inline query results into text or to generate report sections from a query. Evidence also includes a component library that lets you do things like add charts and graphs (driven by your queries) by writing declarative tags like: <LineChart />

How is it different? Most BI tools use a no-code drag-and-drop interface. Analysts click around to build their queries, set up their charts etc. and then they drag them into place onto a dashboard. To stick with the analogy, if Evidence is Hugo, most BI tools are Squarespace. BI tools are built that way because they assume that data analysts are non-technical. In our experience, that assumption is no longer correct. Data analysts increasingly want tools that let them adopt software engineering practices like version control, testing, and abstraction.

When everything is under version control, you are less likely to ship an incorrect report. When you can write a for loop, you can show sections for each region, product-line etc., instead of asking your users to engage with a filter interface. When you can abstract a piece of analysis into a reusable component, you don’t have to maintain the same content in multiple places. Basically, we’re providing the fundamentals of programming in a way that analysts can easily make use of.

Reporting tools have been around since COBOL, and have gone through many iterations as tech and markets have evolved. Our view is that it’s time for the next major iteration. We worked together for five years building the data science group at a private equity firm in Canada. We set up ‘the modern data stack’ (Fivetran, dbt, BigQuery etc.) at many of the firm’s portfolio companies and we were in the room during a lot of key corporate decisions.

In our experience, the BI layer is the weakest part of the modern data stack. The BI layer has a poor developer experience, and decision makers don’t really like the outputs they get. It turns out, these two issues are closely related. The drag and drop experience is so slow and low-leverage that the only way to get all the content on the page is to push a lot of cognitive load onto the end user: global filters, drill down modals, grids of charts without context. Like most users, business people hate that shit. And because the production process isn’t in code, the outputs are hard to version control and test—so dashboards break, results are internally inconsistent, and so on, in just the way that software would suck if you didn’t version control and test it.

As early adopters of the modern data stack, we saw the value in treating analytics more like software development, but we were consistently disappointed with the workflow and the quality of the outputs our team could deliver using BI tools and notebook products. Graphics teams that we admire at newspapers like the New York Times don’t use BI tools or Jupyter notebooks to present their work. They code their data products by hand, and the results are dramatically better than what you see in a typical BI deployment. That’s too much of an engineering lift for most data teams, but with a framework designed for their needs and their range of expertise, we think data teams could build products that come much closer to those high standards.

Evidence is built on Svelte and Svelte Kit. This is the JS framework that the NYT has used to build some of their more recent data products, like their Covid risk maps. Sean and I fell in love with Svelte, and we owe a huge debt to that project. In this early stage,Evidence is really just a set of convenience features wrapped around SvelteKit to make it accessible to data analysts (the markdown preprocessor, db connections, chart library). The core framework will always be open source, and eventually we plan to launch a paid cloud version of our product, including hosting, granular access control, and other features that enterprises might pay for.

We would love to hear your thoughts, questions, concerns, or ideas about what we’re building - or about your experiences with business intelligence in general. We appreciate all feedback and suggestions!

91 comments

[+] saadatq|4 years ago|reply

> In our experience, the BI layer is the weakest part of the modern data stack. The BI layer has a poor developer experience, and decision makers don’t really like the outputs they get. It turns out, these two issues are closely related. The drag and drop experience is so slow and low-leverage that the only way to get all the content on the page is to push a lot of cognitive load onto the end user: global filters, drill down modals, grids of charts without context. Like most users, business people hate that shit. And because the production process isn’t in code, the outputs are hard to version control and test—so dashboards break, results are internally inconsistent, and so on, in just the way that software would suck if you didn’t version control and test it.

If I could upvote this a 100 times, I would. I've felt this pain everyday with Looker, Mode, Metabase, every other BI tool that I've tried.

[+] amcaskill|4 years ago|reply

Haha thank you! Unfortunately I think a lot of people are suffering under the status quo.

[+] Aeolun|4 years ago|reply

We can just teach all the people that want this data SQL?

[+] sails|4 years ago|reply

> In our experience, the BI layer is the weakest part of the modern data stack. The BI layer has a poor developer experience, and decision makers don’t really like the outputs they get

Totally agree, very interested in trying this out. FWIW I've tried and been frustrated by Looker, Metabase, PowerBI, Superset, Redash.

I do think that while dbt does a great job with dimensional modelling, the BI layer is still required to provide some aspects of metric modelling. Is this something that Evidence is looking to solve? From what I've seen it looks more to be a pure frontend visualisation rather than a tool for managing business metrics. Looker and Metabase do some good work in this metric management space, Superset and PowerBI much less so.

[+] amcaskill|4 years ago|reply

You have cut straight to the heart of a pretty interesting problem that we are still thinking through.

You are exactly right, right now this is pure front-end. That's intentional, and there's a lot that we like about that approach, especially in light of the success of dbt.

We think dbt fundamentally changes what is needed from a BI tool and that vendors who are maintaining really heavy built-in data transformation layers will basically be wasting resources over the coming years. The approach of modelling in your data warehouse is just so much more sensible, that we think it's a really a good thing to bet on.

That said, having some form of metric modelling in your BI tool is really nice -- it helps you keep your queries dry, and makes it simpler to roll out changes. If we were to build something here, I think it would be very lightweight -- a config that basically let you define re-usable sql snippets, and maybe some constraints on them.

On the other hand, there are A LOT of startups building metrics layers, which look great. Usually these expose an API endpoint, and some sort SQL interface. We'd be just as happy to plug into one of those SQL interfaces and call it a day. I just wish one of those was open source, since the metrics layer is such a choke-hold on your data operation.

Maybe someone will build the 'dbt of metrics layers'. That would be great for the ecosystem. Maybe dbt will do it themselves. I think there's probably something interesting they could do there by treating stored procedures as materialization targets.

[+] adithyasrin|4 years ago|reply

I have worked with MODLR [1] for data modelling for a FP&A solution and highly recommend it. Its a complete platform, not just a visualization tool.

[1] - https://modlr.co

[+] bayesian_horse|4 years ago|reply

In my experience, most reporting has to be done as printable PDF or it doesn't exist. There doesn't seem to be a good story for paginated media yet. In the past I've used polyfills for CSS pagination features to generate modestly well formatted PDF documents. Then you can either print it out from the browser or use a headless browser to do the same. Also there are modules on NPM for Latex-style typesetting.

This looks very promising, also with the inclusion of svelte. I'd have to see how well it integrates with Python and R, as far as I can see you'd need to export data from those to make it usable, which is probably easy as well.

[+] krishvs|4 years ago|reply

Absolutely agree with this point. At the company I work for whenever we create reports its always done twice or thrice - a browser version, a pdf that is printable and an excel version for business users to download and play with.

[+] amcaskill|4 years ago|reply

Thanks for taking a look and for the kind words.

Absolutely, a great PDF export is on the roadmap, and raises a whole raft of tricky issues.

Just an aside, but one of the other features Sean and I really liked in R markdown was the ability to export a single standalone HTML file that you could just email to someone. I’m not sure if we’ll build that, but there are a bunch of cases where you want to send the “real” thing, without actually deploying anything (if that makes sense). It also saves you from the challenges of pagination.

[+] jrumbut|4 years ago|reply

This is very cool and something that appeals to me as someone does a blend of web and data work. A constant problem I run into is making good reports quickly. Like you mention, I don't have time to hand code it, but no code dashboards are both slow and tedious to make and the quality is terrible.

I have a couple questions.

1. I work in research and we use a lot of strange databases and query languages, how hard would it be to add support for new databases (or alternative sources like CSVs or API calls) and to include multiple sources in the same report?

2. I had trouble telling from the docs how hard it was to drop in hand coded components (say I have some D3 creation, or I have some requirement that breaks the model and requires JavaScript and CSS to change everything)?

[+] amcaskill|4 years ago|reply

Thanks for the kind words!

1. Yes, if you have specific DBs, please feel free to create an issue on Github. We're also working on opening up the DB connector ecosystem so that people can add their own. I've also opened an issue for CSVs, I think we could support them pretty seamlessly.

2. Evidence is actually pretty slick in this regard. This is one of the benefits from starting from a web framework and working backwards towards the data analyst, rather than starting from Jupyter and trying to work forwards (if that makes sense).

The markdown documents compile to svelte components, so you can just add <script> tag, and/or a <style> tag right into your .md file. d3 works pretty seamlessly in svelte, so you can go nuts. The other neat thing is the styles scope themselves to the

We haven't written the docs for this portion yet, but you can also add global svelte components to your project, so if you wanted to write something re-usable you can write it that way, and then just import it into your reports to use it as a component. In either case, you could call out to other APIs if you didn't want to retrieve data via a SQL query.

As an example, an add-on component library we'd like to build is an interface to FRED data from the St Louis Fed. That way if you just need a quick chart of GDP, or you want to add recession shading to one of your charts, you can just drop it in without having to load that data into your database. <FredTimeSeries ref=gdp/> that type of thing.

[+] jart|4 years ago|reply

Would you be interested in using https://redbean.dev to enhance your product? It's really nice for building web apps that handle sensitive data since you can put redbean with your CSVs or SQLite database on something like an Iron Key and anyone you share it with will be able to use your app and view the reports in a purely offline manner on any o/s.

If your app is built on Node it's got an unwieldy amount of dependencies which frequently have security issues and something like Postgres is usually only viable as an online service you're self-hosting, and those things get hacked. So redbean is really a no-brainer if you want to protect data without making life difficult for the people who are authorized to look at it. We're also looking at integrating QuickJS soon, as an alternative to Lua, so there should be a painless migration path for Node folks.

[+] amcaskill|4 years ago|reply

Redbean seems pretty ingenious, albeit totally over my head. I will take a more detailed look when I have some time. Thanks for sharing!

[+] dg4|4 years ago|reply

Looks great! How are you thinking about the review workflow. I've noticed that BI artifacts / dashboards rarely get a detailed logical review. This living in Github seems like a step in the right direction, but it'd be great to have SQL execution in the review context. i.e. I think you're accidentally filtering out these rows here they are SELECT ...

[+] amcaskill|4 years ago|reply

Thanks!

That's a super promising line of thinking.

We really like how Vercel works with pull requests -- generating a preview, blocking the pr if there is a failure in the build process etc. and that's definitely where we'd like to go with the cloud service. We hadn't thought of providing executed SQL back into the review context but of course that would be do-able and very useful.

There is a whole host of tooling that you can build around the artifacts when you move them into code. One example an early user suggested was scanning your entire project to find repeated blocks of SQL, and surfacing them to be re-factored into your data warehouse (into your dbt project for example). You could imagine a github action that periodically opens a PR with those suggested re-factors.

[+] Jugurtha|4 years ago|reply

Congratulations on the launch. I'll keep an eye on this.

We used to build custom, turn-key machine learning products for enterprise. Recently, after playing with things like Voilà, Streamlit, and Superset, we made it possible for our data scientists and ML people to show prototypes and applications right from the platform, without worrying about creating a VM, setting up the environment, scp stuff, create an application, configure a server, set up authentication, send a link to the client, etc.

I can envision doing something similar with Evidence. Given it's markdown, could we imagine having a Jupyter notebook containing markdown cells that somehow use Evidence? Could this be a JupyterLab extension?

I'm asking this because we have live collaboration / collaborative editing notebook on the platform, with access to external data sources such as S3 as if they were filesystems, so several people could collaborate on the same notebook, see cursors and selections of others, etc. Why not do that on Evidence work as well:

- I start a notebook. Add a Markdown cell. Some magic, I can do whatever it is I can do to generate reports with Markdown.

- Share the notebook with other users. We get together and work on that visualization/report.

Tangent: Something that kind of sucks is that some clients send us a database dump as a file, plus all other miscellaneous data. We have to create a MySQL database from that dump. It's not a big deal, but we don't like it.

[+] bryik|4 years ago|reply

It's great how naturally the charts and data are integrated into Evidence reports. I spent today trying to accomplish a similar style in a Metabase dashboard and was largely unsuccessful due to the limited text controls and layout options.

SQLite support would be nice!

[+] nonameiguess|4 years ago|reply

Have you heard of knitr (https://yihui.org/knitr/)? It's the gold standard as far as I'm concerned for dynamic report generation that needs to run code. Since it supports running arbitrary shell commands, it can already be used to query remote databases as long as you have a CLI to query them with. Combined with RMarkdown (https://rmarkdown.rstudio.com/), which augments Markdown with support for LaTeX typesetting, it's the ultimate toolset for doing this kind of thing. You can read a blog post here on how to use knitr within RMarkdown: https://kbroman.org/knitr_knutshell/pages/Rmarkdown.html

I'm not trying to be a downer, but it seems like your product is just duplicating the functionality of these existing products but does less since it only supports SQL and Markdown.

I guess you autogenerate charts, but it says you're targeting a technical audience that is presumably comfortable calling functions in Python and R for graphical data visualization.

This is nitpicky, and I'm sure you have some command line option to choose another port (though your "get started" doesn't show how), but mdbook also uses 3000. I'm sure they probably weren't the first to default to that, either.

I hope this doesn't come across as downplaying your product. It looks nice. I just don't see what you offer here that can't already be done with existing data ecosystem tools. I was using RMarkdown with knitr to generate all of my papers when I was an ML grad student years ago. It felt back then like I was the only person at Georgia Tech who realized these tools existed, and now it still feels that way.

[+] amcaskill|4 years ago|reply

Absolutely no need to apologize, thanks for taking the time to check out our project.

I have written a lot of R Markdown over the years, and I agree wholeheartedly with most of what you're saying. The R ecosystem is phenomenal. Anyone who is excited about our project, might be 10x more excited about learning R and writing a report with R markdown.

A big part of why we are building Evidence is that my co-founder Sean and I felt like we lost a lot on the presentation side when we graduated from notebooks to primarily working with data warehouses, dbt & BI tools.

The thing is, we gained so much from that transition to 'the modern data stack' that we would never go back. So we're setting out to fix the presentation layer in a way that would have worked for us.

Undoubtably, anything that you could accomplish in Evidence is going to be do-able within the R Markdown or jupyter ecosystems, so I won't try to claim any truly unique features. It's maybe more of a vibe: what's easy in Evidence vs. what's tricky in a notebook?

If you're writing an ML paper, R markdown is definitely the move. If you're trying to build a common, internally consistent understanding across hundreds (thousands) of people about how your business is doing, and what they might do about it, Evidence is going to be a better fit.

Here's a comment from awhile ago discussing the comparison with Jupyter: https://news.ycombinator.com/item?id=27363349

It only supports SQL and Markdown:

That constraint is part of the point.

In a large organization, a fair number of people are going to contribute to your reporting apparatus, and you want to keep it in a state where you can re-factor useful abstractions up into your data warehouse. This gets a lot harder if your reporting is a swirl of R scripts and python snippets and whatever else.

Some order of magnitude more people know SQL and markdown than R or Python. Every business I have been involved in has someone there who is cranking out analysis and data pulls using SQL. Very rarely would that person be comfortable working in R markdown.

You can't in-line an ML model into your reports:

Again, we think this constraint is basically a good thing. If you have a model that is profitable to your business, it should be governed and executed in a purpose built environment and, where feasible, you should be storing the relevant outputs for posterity in your data warehouse.

We will add instructions on setting the port! :)

[+] Aeolun|4 years ago|reply

> I'm not trying to be a downer, but it seems like your product is just duplicating the functionality of these existing products but does less since it only supports SQL and Markdown.

To me, this is a feature. Evidence sounds like it’s completely batteries included. Your example sounds like I have to learn a whole new toolchain.

[+] lytefm|4 years ago|reply

I guess you could make the same argument pointing to Jupyter Notebook or D3.js - if you already know how to use a programming language like R, Python or JS and how to visualise data with it, you're probably not the target audience as end user.

This looks more like it's made for an Analyst who mainly uses SQL or Excel.

But if it makes me more productive that Jupyter Notebooks for simple reports, I'll give it a try.

[+] edusoftwerks|4 years ago|reply

The problem with these tools is they only work sometimes, and when they do, its because you spent a whole day configuring your environment.

[+] 101008|4 years ago|reply

How do you handle live data vs fixed data? If I am making a report, I want the charts to remain static - if not, over time, they may not match with what is said on the report. Is there an option to, after saving the report or run the query, to make the values static forever?

[+] amcaskill|4 years ago|reply

This is an excellent question, and it's one of the areas where we think we can do some pretty novel things with our approach.

There are a two main cases of this idea that we have spent time thinking about.

1. Truly static report.

Here, you would need to condition your SQL queries so that they continue to return the same results over time. E.g. your `where` clause restricts the results to 'on or before' the day of writing. Evidence will continue to build the report on a schedule, but the results will never change so long as your historical data is constant. You can do this today.

In a future state, we've talked about rendering a snapshot of the report and checking that into version control, so that even if your underlying data is a moving target, you can hold onto what the report looked like at a moment in time.

We're kind of mixed on that idea of snapshotting reports themselves though, it's just so much better to build your data warehouse such that it is actually retaining the historical data, but we recognize sometimes that's not practical. TBD on that functionality.

2. Recurring report with static commentary

Here, you have a recurring time-bounded report, and you occasionally want to mix-in commentary that's only relevant for specific time periods.

With Evidence, (this part is from svelte kit) you can mix paramaterized pages, and static pages on the same route. So if you had 'monthly mrr growth report', you could use a parameterized page to generate the report for every month into history and into the future, AND, you could include versions of the report with hand-written commentary for any specific months where it was needed. So if someone navigates to the February 2021 page, they get the standard parameterized version, but if they go to January 2021, they get the handwritten January version that explains that there was an acquisition which drove the big pick-up in MRR.

This one is a bit tricky to explain, but we will build some examples.

[+] smashah|4 years ago|reply

Wow that's cool. Now of I can create these snippets to output as SVG then I can add it to my GitHub readme!

[+] jahewson|4 years ago|reply

As an engineer I quite like the look of this. Right now for this kind of internal report I’d use a hosted Jupyter notebook, e.g. Mode. Both data science and engineering folks can handle it. What’s the one-sentence selling point for Evidence in my use case?

[+] unknown|4 years ago|reply

[deleted]

[+] louiskw|4 years ago|reply

This looks great. Traditional BI has way too many visual configurable options (google data studio). I just want metrics in a simple blog like page, this looks perfect for that

[+] amcaskill|4 years ago|reply

Thanks Louis!

[+] mrosett|4 years ago|reply

This is a great idea. My current role doesn’t require any kind of BI reporting but the next time I need to build a dashboard this will be the first thing I try.

[+] amcaskill|4 years ago|reply

We'll get you someday ;)

[+] wizwit999|4 years ago|reply

I like it. I've seen the same issue and agree people over index on no code. I didn't see it on your page, do you get into visualization as well?

[+] amcaskill|4 years ago|reply

Thanks so much!

Yes, definitely. We include a visualization library with the Evidence.

You can write <LineChart .../> to add a line chart to your document, <Hist .../> for a histogram etc.

You can see the documentation for the chart types we have built under 'components' in our docs. Here's the histogram: https://docs.evidence.dev/components/hist

Designing this is one of the trickiest parts of the project, and is going to be one of the biggest areas of effort going forward. We're trying to build something that is very declarative, so that people don't have to spend a lot of time configuring their charts, and something that is composable, so that you can create more complex viz that include things like annotations.

[+] swyx|4 years ago|reply

congrats on launching and building atop SvelteKit!

Since you have some data processing inline with your Markdown, I am wondering if you explored using MdSvex? https://mdsvex.pngwn.io/

it would seem like it would be a pure win for you as you (your users) get reusable functions, on-document styles, local component state, and so on.

[+] amcaskill|4 years ago|reply

swyx!

Absolutely, we are using mdsvex, and we owe you a beer!

You rock.

[+] void_mint|4 years ago|reply

I love this. Fantastic work.

[+] amcaskill|4 years ago|reply

Thanks!

[+] thejosh|4 years ago|reply

This is exactly what I have wanted to build, changing into a data engineer build. Well done, excited to keep an eye on it.

[+] amcaskill|4 years ago|reply

Thank you! We have actually heard this from quite a few people. We now know of at least four companies that have at one point or another built their reporting systems in a similar way.

[+] anyfactor|4 years ago|reply

> Most BI tools use a no-code drag-and-drop interface

I am learning Tableau but I have experience with PyGal (SVG based python data visualizer) and a little of bokeh. Using enterprise accepted data analytics tools seems like going backwards coming from a developer role.