Launch HN: Evidence (YC S21) – Web framework for data analysts
In Evidence, pages are markdown documents. When you write SQL inside that markdown, the SQL runs against your database (we support BigQuery, Snowflake, and Postgres - with more to come). You can reference the results of those queries using a simple templating syntax, which you can use to inline query results into text or to generate report sections from a query. Evidence also includes a component library that lets you do things like add charts and graphs (driven by your queries) by writing declarative tags like: <LineChart />
How is it different? Most BI tools use a no-code drag-and-drop interface. Analysts click around to build their queries, set up their charts etc. and then they drag them into place onto a dashboard. To stick with the analogy, if Evidence is Hugo, most BI tools are Squarespace. BI tools are built that way because they assume that data analysts are non-technical. In our experience, that assumption is no longer correct. Data analysts increasingly want tools that let them adopt software engineering practices like version control, testing, and abstraction.
When everything is under version control, you are less likely to ship an incorrect report. When you can write a for loop, you can show sections for each region, product-line etc., instead of asking your users to engage with a filter interface. When you can abstract a piece of analysis into a reusable component, you don’t have to maintain the same content in multiple places. Basically, we’re providing the fundamentals of programming in a way that analysts can easily make use of.
Reporting tools have been around since COBOL, and have gone through many iterations as tech and markets have evolved. Our view is that it’s time for the next major iteration. We worked together for five years building the data science group at a private equity firm in Canada. We set up ‘the modern data stack’ (Fivetran, dbt, BigQuery etc.) at many of the firm’s portfolio companies and we were in the room during a lot of key corporate decisions.
In our experience, the BI layer is the weakest part of the modern data stack. The BI layer has a poor developer experience, and decision makers don’t really like the outputs they get. It turns out, these two issues are closely related. The drag and drop experience is so slow and low-leverage that the only way to get all the content on the page is to push a lot of cognitive load onto the end user: global filters, drill down modals, grids of charts without context. Like most users, business people hate that shit. And because the production process isn’t in code, the outputs are hard to version control and test—so dashboards break, results are internally inconsistent, and so on, in just the way that software would suck if you didn’t version control and test it.
As early adopters of the modern data stack, we saw the value in treating analytics more like software development, but we were consistently disappointed with the workflow and the quality of the outputs our team could deliver using BI tools and notebook products. Graphics teams that we admire at newspapers like the New York Times don’t use BI tools or Jupyter notebooks to present their work. They code their data products by hand, and the results are dramatically better than what you see in a typical BI deployment. That’s too much of an engineering lift for most data teams, but with a framework designed for their needs and their range of expertise, we think data teams could build products that come much closer to those high standards.
Evidence is built on Svelte and Svelte Kit. This is the JS framework that the NYT has used to build some of their more recent data products, like their Covid risk maps. Sean and I fell in love with Svelte, and we owe a huge debt to that project. In this early stage,Evidence is really just a set of convenience features wrapped around SvelteKit to make it accessible to data analysts (the markdown preprocessor, db connections, chart library). The core framework will always be open source, and eventually we plan to launch a paid cloud version of our product, including hosting, granular access control, and other features that enterprises might pay for.
We would love to hear your thoughts, questions, concerns, or ideas about what we’re building - or about your experiences with business intelligence in general. We appreciate all feedback and suggestions!
[+] [-] saadatq|4 years ago|reply
If I could upvote this a 100 times, I would. I've felt this pain everyday with Looker, Mode, Metabase, every other BI tool that I've tried.
[+] [-] amcaskill|4 years ago|reply
[+] [-] Aeolun|4 years ago|reply
[+] [-] sails|4 years ago|reply
Totally agree, very interested in trying this out. FWIW I've tried and been frustrated by Looker, Metabase, PowerBI, Superset, Redash.
I do think that while dbt does a great job with dimensional modelling, the BI layer is still required to provide some aspects of metric modelling. Is this something that Evidence is looking to solve? From what I've seen it looks more to be a pure frontend visualisation rather than a tool for managing business metrics. Looker and Metabase do some good work in this metric management space, Superset and PowerBI much less so.
[+] [-] amcaskill|4 years ago|reply
You are exactly right, right now this is pure front-end. That's intentional, and there's a lot that we like about that approach, especially in light of the success of dbt.
We think dbt fundamentally changes what is needed from a BI tool and that vendors who are maintaining really heavy built-in data transformation layers will basically be wasting resources over the coming years. The approach of modelling in your data warehouse is just so much more sensible, that we think it's a really a good thing to bet on.
That said, having some form of metric modelling in your BI tool is really nice -- it helps you keep your queries dry, and makes it simpler to roll out changes. If we were to build something here, I think it would be very lightweight -- a config that basically let you define re-usable sql snippets, and maybe some constraints on them.
On the other hand, there are A LOT of startups building metrics layers, which look great. Usually these expose an API endpoint, and some sort SQL interface. We'd be just as happy to plug into one of those SQL interfaces and call it a day. I just wish one of those was open source, since the metrics layer is such a choke-hold on your data operation.
Maybe someone will build the 'dbt of metrics layers'. That would be great for the ecosystem. Maybe dbt will do it themselves. I think there's probably something interesting they could do there by treating stored procedures as materialization targets.
[+] [-] adithyasrin|4 years ago|reply
[1] - https://modlr.co
[+] [-] bayesian_horse|4 years ago|reply
This looks very promising, also with the inclusion of svelte. I'd have to see how well it integrates with Python and R, as far as I can see you'd need to export data from those to make it usable, which is probably easy as well.
[+] [-] krishvs|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
Absolutely, a great PDF export is on the roadmap, and raises a whole raft of tricky issues.
Just an aside, but one of the other features Sean and I really liked in R markdown was the ability to export a single standalone HTML file that you could just email to someone. I’m not sure if we’ll build that, but there are a bunch of cases where you want to send the “real” thing, without actually deploying anything (if that makes sense). It also saves you from the challenges of pagination.
[+] [-] jrumbut|4 years ago|reply
I have a couple questions.
1. I work in research and we use a lot of strange databases and query languages, how hard would it be to add support for new databases (or alternative sources like CSVs or API calls) and to include multiple sources in the same report?
2. I had trouble telling from the docs how hard it was to drop in hand coded components (say I have some D3 creation, or I have some requirement that breaks the model and requires JavaScript and CSS to change everything)?
[+] [-] amcaskill|4 years ago|reply
1. Yes, if you have specific DBs, please feel free to create an issue on Github. We're also working on opening up the DB connector ecosystem so that people can add their own. I've also opened an issue for CSVs, I think we could support them pretty seamlessly.
2. Evidence is actually pretty slick in this regard. This is one of the benefits from starting from a web framework and working backwards towards the data analyst, rather than starting from Jupyter and trying to work forwards (if that makes sense).
The markdown documents compile to svelte components, so you can just add <script> tag, and/or a <style> tag right into your .md file. d3 works pretty seamlessly in svelte, so you can go nuts. The other neat thing is the styles scope themselves to the
We haven't written the docs for this portion yet, but you can also add global svelte components to your project, so if you wanted to write something re-usable you can write it that way, and then just import it into your reports to use it as a component. In either case, you could call out to other APIs if you didn't want to retrieve data via a SQL query.
As an example, an add-on component library we'd like to build is an interface to FRED data from the St Louis Fed. That way if you just need a quick chart of GDP, or you want to add recession shading to one of your charts, you can just drop it in without having to load that data into your database. <FredTimeSeries ref=gdp/> that type of thing.
[+] [-] jart|4 years ago|reply
If your app is built on Node it's got an unwieldy amount of dependencies which frequently have security issues and something like Postgres is usually only viable as an online service you're self-hosting, and those things get hacked. So redbean is really a no-brainer if you want to protect data without making life difficult for the people who are authorized to look at it. We're also looking at integrating QuickJS soon, as an alternative to Lua, so there should be a painless migration path for Node folks.
[+] [-] amcaskill|4 years ago|reply
[+] [-] dg4|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
That's a super promising line of thinking.
We really like how Vercel works with pull requests -- generating a preview, blocking the pr if there is a failure in the build process etc. and that's definitely where we'd like to go with the cloud service. We hadn't thought of providing executed SQL back into the review context but of course that would be do-able and very useful.
There is a whole host of tooling that you can build around the artifacts when you move them into code. One example an early user suggested was scanning your entire project to find repeated blocks of SQL, and surfacing them to be re-factored into your data warehouse (into your dbt project for example). You could imagine a github action that periodically opens a PR with those suggested re-factors.
[+] [-] Jugurtha|4 years ago|reply
We used to build custom, turn-key machine learning products for enterprise. Recently, after playing with things like Voilà, Streamlit, and Superset, we made it possible for our data scientists and ML people to show prototypes and applications right from the platform, without worrying about creating a VM, setting up the environment, scp stuff, create an application, configure a server, set up authentication, send a link to the client, etc.
I can envision doing something similar with Evidence. Given it's markdown, could we imagine having a Jupyter notebook containing markdown cells that somehow use Evidence? Could this be a JupyterLab extension?
I'm asking this because we have live collaboration / collaborative editing notebook on the platform, with access to external data sources such as S3 as if they were filesystems, so several people could collaborate on the same notebook, see cursors and selections of others, etc. Why not do that on Evidence work as well:
- I start a notebook. Add a Markdown cell. Some magic, I can do whatever it is I can do to generate reports with Markdown.
- Share the notebook with other users. We get together and work on that visualization/report.
Tangent: Something that kind of sucks is that some clients send us a database dump as a file, plus all other miscellaneous data. We have to create a MySQL database from that dump. It's not a big deal, but we don't like it.
[+] [-] bryik|4 years ago|reply
SQLite support would be nice!
[+] [-] nonameiguess|4 years ago|reply
I'm not trying to be a downer, but it seems like your product is just duplicating the functionality of these existing products but does less since it only supports SQL and Markdown.
I guess you autogenerate charts, but it says you're targeting a technical audience that is presumably comfortable calling functions in Python and R for graphical data visualization.
This is nitpicky, and I'm sure you have some command line option to choose another port (though your "get started" doesn't show how), but mdbook also uses 3000. I'm sure they probably weren't the first to default to that, either.
I hope this doesn't come across as downplaying your product. It looks nice. I just don't see what you offer here that can't already be done with existing data ecosystem tools. I was using RMarkdown with knitr to generate all of my papers when I was an ML grad student years ago. It felt back then like I was the only person at Georgia Tech who realized these tools existed, and now it still feels that way.
[+] [-] amcaskill|4 years ago|reply
I have written a lot of R Markdown over the years, and I agree wholeheartedly with most of what you're saying. The R ecosystem is phenomenal. Anyone who is excited about our project, might be 10x more excited about learning R and writing a report with R markdown.
A big part of why we are building Evidence is that my co-founder Sean and I felt like we lost a lot on the presentation side when we graduated from notebooks to primarily working with data warehouses, dbt & BI tools.
The thing is, we gained so much from that transition to 'the modern data stack' that we would never go back. So we're setting out to fix the presentation layer in a way that would have worked for us.
Undoubtably, anything that you could accomplish in Evidence is going to be do-able within the R Markdown or jupyter ecosystems, so I won't try to claim any truly unique features. It's maybe more of a vibe: what's easy in Evidence vs. what's tricky in a notebook?
If you're writing an ML paper, R markdown is definitely the move. If you're trying to build a common, internally consistent understanding across hundreds (thousands) of people about how your business is doing, and what they might do about it, Evidence is going to be a better fit.
Here's a comment from awhile ago discussing the comparison with Jupyter: https://news.ycombinator.com/item?id=27363349
It only supports SQL and Markdown:
That constraint is part of the point.
In a large organization, a fair number of people are going to contribute to your reporting apparatus, and you want to keep it in a state where you can re-factor useful abstractions up into your data warehouse. This gets a lot harder if your reporting is a swirl of R scripts and python snippets and whatever else.
Some order of magnitude more people know SQL and markdown than R or Python. Every business I have been involved in has someone there who is cranking out analysis and data pulls using SQL. Very rarely would that person be comfortable working in R markdown.
You can't in-line an ML model into your reports:
Again, we think this constraint is basically a good thing. If you have a model that is profitable to your business, it should be governed and executed in a purpose built environment and, where feasible, you should be storing the relevant outputs for posterity in your data warehouse.
We will add instructions on setting the port! :)
[+] [-] Aeolun|4 years ago|reply
To me, this is a feature. Evidence sounds like it’s completely batteries included. Your example sounds like I have to learn a whole new toolchain.
[+] [-] lytefm|4 years ago|reply
This looks more like it's made for an Analyst who mainly uses SQL or Excel.
But if it makes me more productive that Jupyter Notebooks for simple reports, I'll give it a try.
[+] [-] edusoftwerks|4 years ago|reply
[+] [-] 101008|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
There are a two main cases of this idea that we have spent time thinking about.
1. Truly static report.
Here, you would need to condition your SQL queries so that they continue to return the same results over time. E.g. your `where` clause restricts the results to 'on or before' the day of writing. Evidence will continue to build the report on a schedule, but the results will never change so long as your historical data is constant. You can do this today.
In a future state, we've talked about rendering a snapshot of the report and checking that into version control, so that even if your underlying data is a moving target, you can hold onto what the report looked like at a moment in time.
We're kind of mixed on that idea of snapshotting reports themselves though, it's just so much better to build your data warehouse such that it is actually retaining the historical data, but we recognize sometimes that's not practical. TBD on that functionality.
2. Recurring report with static commentary
Here, you have a recurring time-bounded report, and you occasionally want to mix-in commentary that's only relevant for specific time periods.
With Evidence, (this part is from svelte kit) you can mix paramaterized pages, and static pages on the same route. So if you had 'monthly mrr growth report', you could use a parameterized page to generate the report for every month into history and into the future, AND, you could include versions of the report with hand-written commentary for any specific months where it was needed. So if someone navigates to the February 2021 page, they get the standard parameterized version, but if they go to January 2021, they get the handwritten January version that explains that there was an acquisition which drove the big pick-up in MRR.
This one is a bit tricky to explain, but we will build some examples.
[+] [-] smashah|4 years ago|reply
[+] [-] jahewson|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] louiskw|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
[+] [-] mrosett|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
[+] [-] wizwit999|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
Yes, definitely. We include a visualization library with the Evidence.
You can write <LineChart .../> to add a line chart to your document, <Hist .../> for a histogram etc.
You can see the documentation for the chart types we have built under 'components' in our docs. Here's the histogram: https://docs.evidence.dev/components/hist
Designing this is one of the trickiest parts of the project, and is going to be one of the biggest areas of effort going forward. We're trying to build something that is very declarative, so that people don't have to spend a lot of time configuring their charts, and something that is composable, so that you can create more complex viz that include things like annotations.
[+] [-] swyx|4 years ago|reply
Since you have some data processing inline with your Markdown, I am wondering if you explored using MdSvex? https://mdsvex.pngwn.io/
it would seem like it would be a pure win for you as you (your users) get reusable functions, on-document styles, local component state, and so on.
[+] [-] amcaskill|4 years ago|reply
Absolutely, we are using mdsvex, and we owe you a beer!
You rock.
[+] [-] void_mint|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
[+] [-] thejosh|4 years ago|reply
[+] [-] amcaskill|4 years ago|reply
[+] [-] anyfactor|4 years ago|reply
I am learning Tableau but I have experience with PyGal (SVG based python data visualizer) and a little of bokeh. Using enterprise accepted data analytics tools seems like going backwards coming from a developer role.