top | item 16615739

(no title)

meow1032 | 8 years ago

It's been a while since I've done database stuff, so maybe this is a naive question, but how do you find the reusability of using primarily SQL/redshift to answer these sorts of questions.

I remember getting these sorts of questions and having to write the sort of queries that you just wrote, but then having to write like 10 other ones to solve related questions with subtle differences(such as the event_type would change or something).

discuss

order

aceregen|8 years ago

Hello, you may like to try out what we've been working on at Holistics (www.holistics.io)! We make it easy for you to retrieve and share your SQL data.

We help in 2 use-cases to address the "duplicative" overhead of writing the same SQL query.

1. Where you just need to adjust specific parameters/values in your query With Holistics, You can insert custom variables in your SQL query to make them dynamic to address the subtle differences without repeating yourself. See an example here: https://docs.holistics.io/guides/adding-filters/

2. Where you have chunks of text that are reused across multiple questions This can be long chunks of `CASE-IFs` or CTE/sub-queries that you use across multiple reports. You can insert a template snippet (chunks of SQL syntax for example) that can be reused across multiple questions. You can find more information here: https://docs.holistics.io/query-templates/

meow1032|8 years ago

Thanks for the response, I'm more on the research side of things now, so I don't have a current use case, but thinking about moving back to industry, and it's something that I remember having trouble with when I was last in industry.

scapecast|8 years ago

dashboard tools like Chartio and Looker make self-service analytics "easy" so you don't need to write 10 different flavors of the same query. We see this happening every day - our biggest customer has 1,200 analysts on Looker.

The key is how you set up your data warehouse, e.g. Amazon Redshift. What we see happening that data engineering teams provide "data services" to their company, via a set of common schemas / tables.

At the very high level:

- set up two different schemas. A (1) raw schema in which you dump all your event level data. only data engineers are allowed to access that schema. and then (2) an ad-hoc schema that analysts can use to run their queries.

- You move data from the raw to the ad-hoc schema with scheduled transformations / aggregations. Airflow, Luigi, Pinball, dbt are popular tools for that purpose. The tables in the ad-hoc schema need to be well documented so analysts can understand what data is available.

- Give every analysts a dashboard seat and access to the ad-hoc schema. Give them access to "SQL playbooks" that analysts can re-use. If you're adventurous, allow them to create their own tables.

This approach scales from a few GBs to TBs and more.

aceregen|8 years ago

Yes, like what scapecast mentioned, the key to writing and executing SQL well is to organize and transform your data. You won't want to be running queries across large events records.

We've just published a blog post last week on some tips for analysts starting their first data warehouse project.

https://blog.holistics.io/the-analyst-guide-to-designing-a-m...

teej|8 years ago

Set up a “SQL-in-charts-out” tool. There’s a number of open source ones now. Apache Superset is one example, thought I don’t necessarily recommend it specifically. This is very low cost to do and it solves this problem exactly.

The next layer of the cake is more complex and costly - setting up a SaaS BI tool like Periscope or Chartio or Looker. This only really becomes useful when you need dynamic pivoting and grouping.