Last week, I shared an article that tracked an ongoing battle between dbt and Google over the metrics layer. And it opened a huge can of worms. Folks complained about the feature-incompleteness of dbt’s semantic layer, quibbled over APIs and semantics, extolled the virtues of alternatives, and even questioned the value-add of a semantic layer (a reasonable take!).
My takeaway: few of us actually know what we want, let alone how we should be thinking about the semantic layer. So before we argue endlessly and/or dismiss the whole endeavor, I want to offer an opinion on how to think about the space.
In what follows:
To start, I’ll consider the canonical product-building question:
What problem does the semantic layer solve?I’ll then consider a key question that divides the solution space:
Do we care about depth or accessibility?I’ll discuss two products that represent the ends of the accessibility/depth spectrum: dbt and Looker.
At the end of all this, I hope you’ll have a clearer lens by which to gauge dbt, Looker, and other semantic layers moving forward. Let’s dive in.
What problem are we solving?
The main problems solved by the semantic layer can be categorized as follows:
Things that make us more effective
🛠 We need a better way to SSOT define and use business concepts (e.g. metrics).
Data people waste a lot of time redefining and disseminating business logic. Our current processes (spreadsheets, dbt models, tables) are messy, error-prone, and difficult for other tools to leverage. We needlessly duplicate work across BI tools, experimentation tools, Amplitude/Mixpanel, etc.
Things that make the rest of the company more effective
🔋 We want everyone to be able to self-service more easily.
“Can you cut this metric by Y?” is one of the most common requests from stakeholders. Stakeholders cannot work with the simplest requests themselves, having to rely on analysts as gatekeepers of data.
🔌 We want everyone to be able to self-service more deeply.
We have too many questions. Let’s create a system so most ad hoc requests can be self-served.
The key tradeoff: depth vs. accessibility
In general, the world has consensus on the things that make us more effective side of the world — there’s a clear benefit to having a hub and spoke model for business concepts (as with any widely accessed asset). The most minimalist of us might object to having yet another tool in our modern vendor stack, but it’s a losing battle to argue that this world won’t come one day. The Bezos mandate has finally reached the rat’s nest of SQL logic we’ve been tucking away in spreadsheets.
On the other hand, the world is clearly split on what’s important in the things that make the rest of the company more effective category, particularly with respect to self-service.
Is the purpose of our semantic layer to enable a subset of sufficiently resolute stakeholders to answer nearly all of their own questions? Or is our objective to enable as many stakeholders as possible to answer many (but not all) of their questions? Do we care more about accessibility or depth?
While we would ideally want all of our stakeholders to be able to do anything, unfortunately there’s no free lunch. Where a tool falls on this spectrum will determine its shape and usage patterns.
What does depth look like? What does accessibility look like?
If we care about depth, we want a semantic layer that maximizes expressiveness. If we care about accessibility, we need an intuitive semantic representation where business entities, not SQL abstractions, are first class citizens. To understand what this might look like, let’s return to Looker and dbt, as they represent the different ends of this spectrum.
LookML sacrifices accessibility for depth.
LookML has always been about building a [practically-speaking] nearly complete semantic model of SQL. For those unfamiliar with LookML, the basic idea is this: you need to foreign key relationships between tables and define some metrics (and dimensions). And then you can do 99% of what you might want to do in SQL. You just have to learn how to click and drag things around in the Looker GUI.
.model file with explore
explore: table_1 {
join: table_2 {
type: left_outer
sql_on: $(table_1.id} = ${table_2.id}
}
}
.view file with dimensions and measures
view: table_1 {
dimension: is_liked_by_robert {
sql: ${TABLE}.is_liked_by_robert
}
measure: distinct_ids {
type: count_distinct
sql: ${id}
}
}
Concepts are defined in a hierarchy that’s pretty close to bare-metal SQL (as in: table and relationships between them (explores), aggregations (measures), and columns (dimensions)). Self-service is certainly possible and effective, but you’ll miss folks that lack the patience to invest in learning how to create Looker explores.
Dbt sacrifices depth for accessibility.
dbt emphasizes metrics (and metrics only, for now) as a first-class citizen1. By pushing for metrics
to be a first-class citizen, they’ve made a bet on accessibility. Not only in naming metrics metrics
(something with a clear business meaning), but also in the prominence of this entity at the top of the config (not nested under or conflated with tables). It might seem trivial, but the convention choice here will shape the API and so the access patterns. Whatever is top-of-mind for semantic layer producers will determine what’s presented to semantic layer consumers.
Sample file for a metric:
metrics:
- name: distinct_ids
model: ref('table_1')
type: count_distinct
time_grains: [day, week, month, year]
dimensions:
- is_liked_by_robert
The most common complaint about this system: dbt doesn’t have a sense of foreign key relationships, massively reducing its self-service use cases. But this is something they could easily build out to get to Looker parity. That said, I don’t think this is actually what we want nor what they will do. Foreign key logic doesn’t quite seem to solve the primary problems we want to solve with the semantic layer: more ergonomic definition and access. And to that end, we need business concepts to be first-class citizens, not foreign key relationships2.
From what I can tell, dbt is being careful in passing through any one-way doors. There are decisions to be made here around how to build something that has just enough semantic coverage to be able to do most of what people want, while still preserving comprehensibility, accessibility, and usability by keeping clear business concepts as first-class citizens.
Which one is better?
Having worked with both Looker and Airbnb’s Minerva (another accessibility-first system), I can definitively say this:
The majority of non-data people will not build their own Looker explores.
On the contrary, almost everyone at Airbnb, at one point or another, did their own dimensional exploration of metrics in Minerva.
Still, I did have PMs still asking me for help debugging their queries. For those unicorn stakeholders who seek deeper self-service analytics, Looker certainly could have unlocked doors beyond what Minerva was able to do.
That said, this is all a long-winded way of saying: I’m not sure. I can only guess as to how these pathways directly translate to business value. Does it even matter if non-data people can look at metrics by themselves? I mean, yes I think so, but that’s really hard to quantify.
Still, as I’ve already said, nothing is stopping dbt from steadily crawling towards feature parity. If it’s needed, technical depth will get there eventually (and they have a solid track record of getting there, given enough time). And it’s a lot easier to go from something small and usable to something fully functional than it is to start with something fully functional and turn it into something usable (granted, Looker isn’t that bad).
Does it matter?
Probably not, and I imagine other considerations — the power of bundling, the power of the open-source community, etc. will be more critical factors. But hopefully I’ve provided you a rationale for why dbt’s view of the world makes sense — a more accessible non-technical experience may not be sound compelling, but it’s key in unlocking adoption.
At minimum, I hope this post has at least made you rethink the purpose of the semantic layer. In general, we have this tendency as technical folks to focus on the technical aspects of a problem3, but the semantic layer — in spite of its potential for theoretical rigor — does not solve an entirely technical problem. Conceptual elegance only has a place in our discussions insofar as it furthers our primary objective (whatever we decide that is). And when usability takes a backseat to intellectual rigor, products fail.
Contrary to the signaling of their premature rebrand.
This is… a lot like how we tend to over-focus on technical aspects of our work, as analysts. It’s really turtles all the way down.
Great post!
As an outsider take, there's no reason M can't handle all of this, better than SQL-first solutions. Power Query on M is pretty bad ass; M just needs more TLC.