Impressive! An entire article about semantic layers, artfully avoids ever defining what a semantic layer is.
Let me take a swipe at it: a semantic layer helps express queries and their results in terms the end-consumers will care about / prefer to reason in, instead of whatever extremely correct and efficient atrocities the database nerds came up with.
Sounds good to me! Semantic layers help expose a more user-friendly view of the data, so it is easier to ask business questions and get accurate results. More technically, it brings modularity and reusability to SQL. Things like joins, aggregate functions, and dimensional expressions are encapsulated as new fields/objects. Typically this logic is rendered at query time rather than pre-computed and materialized. The advantage of that is more flexible iteration and composability. In essence they are highly glorified SQL templating engines.
> A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
> There's a lot of information out there, including from myself about the history and rise [2022], comparing it to an MVC-like approach, or explaining its capabilities. That's why in this article I focus on the why and showcase how to use it in a practical example in the next chapter.
> A semantic layer acts as an intermediary, translating complex data into understandable user business concepts. It bridges the gap between raw data in databases (such as sales data with various attributes) and actionable insights (such as revenue per store or popular brands). This layer helps business users access and interpret data using familiar terms without needing deep technical knowledge. https://www.ssp.sh/brain/semantic-layer#semantic-layer-defin...
I love a semantic layer as much as the next guy...
Pivoting a decent sized BI shop toward using one instead of splashing the same SQL all over the place is *tough*. It's one of those: "the analyst could have been building important report for director and you want them to create re-usable logic??? we'll do that later, get report done now. Just copy/paste that SQL over here"
This is how you end up with the the 1000 model, "the numbers don't match up", hot mess situations that gain momentum and are hard to slow down.
The flip side is, you often don’t know what needs to be reusable until you’ve had some iterations. Wrong abstractions can be way worse, and also gain their own momentum.
Yeah, minimizing the gap between the semantic layer authoring and adhoc is what you need to do to close that - there has to be a progressive model both for consumption (take this semantic layer, slightly extend/tweak it in an adhoc fashion) and for organically promoting up the adhoc works to the layer.
Right now a lot of semantic tools introduce a big discontinuity in both workflows that keeps the two worlds separate.
That tracks. The semantic layer is like a capital investment that pays off over time. So it can be hard to justify the initial investment if people don't grok the payoff.
We built a transformation library[0] (think a simpler, more performant dbt) for duckdb and I'd really like to create a semantic layer as an extension for it at some point.
Limiting support to only duckdb would make some really useful features trivial to implement. e.g. duckdb has a `json_serialize_sql` function that would handle a lot of the tedious parts of building a semantic layer.
Semantic Layer is about decomposing views into dimensions and aggregates, then letting downstream apps/users compose their own views on top without having to redefine/re-calculate business level metrics.
This makes data analyis more flexible than sql views which are hardcoded on particular groupings.
It's a lot more. A SQL VIEW is just a saved query, where a semantic layer defines the shared meaning of the data, and helps enforce consistent metrics, joins, and logic across tools. You'd be surprised at how many ways "active customer" can be represented as SQL.
Nothing to do with linear, meaningful projections on embedding spaces, and everything to do with efficient maintenance of legacy data reporting systems.
As one of the consumers of a "semantic layer" for many years now, I am firmly convinced that a "single source of truth" must either be useless or a lie.
Ok, the DBA has produced some joins that I can count up to decide how many "customers" we have. We immediately have the issue that a "customer count" from the semantic layer cannot always be the meaningful or relevant figure. In my experience, outside of the exllicit context it was written it, it cannot be the correct figure. So, I have my single source of truth customer count, but my revenue per customer needs to to use a different count that's slightly off. Another analyst needs to produce customer calls to our call center and that uses a slightly different definition. And so on, until the semantic layer is just a special database for pre-defined executive KPI dashboards and no more.
I think Common Logic ( https://en.m.wikipedia.org/wiki/Common_Logic - ISO/IEC 24707:2007) would be a good addition to any effort trying to add a semantic layer to any database.
This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.
Yeah, I think it's great that there are ARD formats and you can access bytes via low level s3 like protocol. This enables interesting tools like DuckDB which can abstract away some stuff, and be fastish and "serverless". However, clearly there is also some kind of marketing hype train and jargon built around it, and it seems like a concerted movement to displace some other "boring" and "uncool" products and technologies. I actually think it's great to displace proprietary services with open formats and protocols. I hope it takes out "data lakes" and co, but I'd love to keep MVC and not invent completely new terms, APIs and ORMs, for things that have been working fine, for a long time.
hey all, another perspective that I have been thinking about is if semantic layers are like ORM for but BI dashboards. Actually, they I think its more than BI dashboards since a similar idea applies to Features. Features in ML land are nothing but a Measure + Entity metadata + TTL. So, really its about higher-order semantics and as we move up the stack, we need richer expression to describe our world.
I think my key takeaway building this is that we need better expression systems and Ibis is a great foundation to build yours..maybe you want to build a language for some other domain etc.
PS: I am one of the authors of bsl and co-founder of Xorq.
I am one of the authors of bsl and founder of xorq.
I’m familiar with Xorq. One of features of the Xorq library that I find interesting is that it catalogs data processing (compute) expressions as it compiles, along with call lineage. Makes reuse easier for SQL and non-SQL processing.
btbuildem|6 months ago
Let me take a swipe at it: a semantic layer helps express queries and their results in terms the end-consumers will care about / prefer to reason in, instead of whatever extremely correct and efficient atrocities the database nerds came up with.
Did I get that right?
anon84873628|6 months ago
refset|6 months ago
> A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
There's also now a paper: https://arxiv.org/pdf/2406.00251
articsputnik|6 months ago
> There's a lot of information out there, including from myself about the history and rise [2022], comparing it to an MVC-like approach, or explaining its capabilities. That's why in this article I focus on the why and showcase how to use it in a practical example in the next chapter.
[1] https://www.ssp.sh/blog/rise-of-semantic-layer-metrics/ [2] https://cube.dev/blog/exploring-the-semantic-layer-through-t... [3] https://cube.dev/blog/universal-semantic-layer-capabilities-...
My one line definition that I use atm:
> A semantic layer acts as an intermediary, translating complex data into understandable user business concepts. It bridges the gap between raw data in databases (such as sales data with various attributes) and actionable insights (such as revenue per store or popular brands). This layer helps business users access and interpret data using familiar terms without needing deep technical knowledge. https://www.ssp.sh/brain/semantic-layer#semantic-layer-defin...
Edit: I'm the OP.
seedless-sensat|6 months ago
sschnei8|6 months ago
Pivoting a decent sized BI shop toward using one instead of splashing the same SQL all over the place is *tough*. It's one of those: "the analyst could have been building important report for director and you want them to create re-usable logic??? we'll do that later, get report done now. Just copy/paste that SQL over here"
This is how you end up with the the 1000 model, "the numbers don't match up", hot mess situations that gain momentum and are hard to slow down.
halfcat|6 months ago
efromvt|6 months ago
Right now a lot of semantic tools introduce a big discontinuity in both workflows that keeps the two worlds separate.
anon84873628|6 months ago
mritchie712|6 months ago
we took the same approach when we started https://www.definite.app/.
mritchie712|6 months ago
Limiting support to only duckdb would make some really useful features trivial to implement. e.g. duckdb has a `json_serialize_sql` function that would handle a lot of the tedious parts of building a semantic layer.
0 - https://github.com/definite-app/crabwalk
datadrivenangel|6 months ago
cryptonector|6 months ago
aszen|6 months ago
Semantic Layer is about decomposing views into dimensions and aggregates, then letting downstream apps/users compose their own views on top without having to redefine/re-calculate business level metrics.
This makes data analyis more flexible than sql views which are hardcoded on particular groupings.
CharlesW|6 months ago
Frotag|6 months ago
aszen|6 months ago
Semantic Layer needs proper language and tooling support which Malloy provides.
articsputnik|6 months ago
I curate some more on here in case of interest: https://www.ssp.sh/brain/data-modeling-languages.
kermatt|6 months ago
12ian34|6 months ago
kovezd|6 months ago
cool_dude85|6 months ago
As one of the consumers of a "semantic layer" for many years now, I am firmly convinced that a "single source of truth" must either be useless or a lie.
Ok, the DBA has produced some joins that I can count up to decide how many "customers" we have. We immediately have the issue that a "customer count" from the semantic layer cannot always be the meaningful or relevant figure. In my experience, outside of the exllicit context it was written it, it cannot be the correct figure. So, I have my single source of truth customer count, but my revenue per customer needs to to use a different count that's slightly off. Another analyst needs to produce customer calls to our call center and that uses a slightly different definition. And so on, until the semantic layer is just a special database for pre-defined executive KPI dashboards and no more.
whitten|6 months ago
This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.
Demiurge|6 months ago
LargoLasskhyfv|6 months ago
mousematrix|6 months ago
Feature stores explored here: https://www.xorq.dev/blog/featurestore-to-featurehouse
I think my key takeaway building this is that we need better expression systems and Ibis is a great foundation to build yours..maybe you want to build a language for some other domain etc.
PS: I am one of the authors of bsl and co-founder of Xorq.
I am one of the authors of bsl and founder of xorq.
secondrow|6 months ago