LLMs, the semantic layer, and 2023
An overview of my predictions for the next year (and a review of my recent talk)
👋 Hello! I’m Robert, CPO of Hyperquery and former data scientist. Welcome to Win With Data, where we talk weekly about maximizing the impact of data. As always, find me on LinkedIn or Twitter — I’m always happy to chat. 🙂
A few weeks ago, I gave a talk at Sharpen (the inaugural SharpestMinds conference) on Analytics trends for 2023. I thought I’d consolidate the high-level takeaways here.
Dun dun dun, get ready for some completely unhinged guesses.
Setting the stage with my favorite analogy: cooking
Let’s set the stage, first. Broad brush, analytics is a lot like cooking. You grow raw ingredients (ETL), you store them in a fridge (warehouse), you cut and prep them (dbt, transformations, testing), you cook them into a presentable meal (notebooks, dashboards), served to customers (stakeholders). I have a history of running with this analogy, so let’s use this to set a contextual base to make it a little easier to understand the predictions I’ll propose. Yes, this means I’m going to make a bunch of cooking analogy predictions. Drumroll, please:
1. The rise of Blue Apron 💙🧑🍳.
Ah yes, Blue Apron — the darling of the pandemic. I remember my first Blue Apron delivery clearly. I could finally cook with raw ingredients I knew of but had never dared to use — fennel? With couscous? And cornish hen? What sorcery is this? The biggest barrier to cooking interesting meals for me was always a lack of guidance around ingredients that fell outside of my standard garlic, pepper, salt repertoire. Blue Apron was my chaperone (and greatly preferred to the random walks through my local Star Market1 that I'd grown accustomed to).
The semantic layer is a lot like Blue Apron, providing a curated experience for those of us lacking any real culinary (SQL) skills. It brings restaurant-quality, fresh-made food (data, insights) to our own homes.
Let’s make this clearer by way of example. One might ask:
By how much did visitors to my homepage drop in the US last week?
Ordinarily, this is a cumbersome sql query. But within a semantic layer, a small technical group of analysts can carefully decompose this SQL query into higher-level business abstractions.
count(distinct id) where page = ‘homepage’
becomes a metric:visitors to my homepage
).country = ‘us’
becomes thecountry
dimension.
And suddenly, BI tools, notebooks, consumptive interfaces can expose this logic, rather than a raw SQL interface, to consumers, enabling unprecedented access for non-technical folks. Want to get some data? You need only specify what data.
I’ve written about this concept a few times, but things are coming to a head with the commotion brought on by dbt’s efforts: dbt labs is pushing hard to get the semantic layer off the ground. They've pushed for partnerships with BI vendors, they've made huge announcements, they even recently acquired the team that made Airbnb's metrics platform. If they can't brute force this vision into reality, I'd actually be very surprised.
2. The rise of the robot chef.
Up next: I am certain AI will change analytics. A few weeks ago, we announced the private beta of HyperAI. And while I'm certainly excited about the prospects for our own platform, if you consider the macroscopic repercussions for a second, it’s not difficult to envision a future where LLM-generated SQL is not novel, but a standard access pattern. And it'll only get better as we augment it with validated documentation and queries, further fine-tuning of the algorithm, etc.
Don't ask me about the picture above. I think it's a robot frying some peanut butter pretzels, but it's meant to represent AI cooking, and I ran out of Dall-E credits. The irony is not lost on me.
3. We’ll do more cooking, less microwaving.
And as the first two trends take off, analytics will become more and more about analyses, not data. Over the last decade, we've been leveraging a comically wide range of culinary skills to cook meals with an analogously wide range of skill requirements. Sometimes we build dashboards (akin to microwaving food). Other times, we prepare elegant narrative analyses (more like proper cooking).
And my prediction here is nearly self-evident: as self-service capabilities improve — whether through the semantic layer, through LLM superpowers, or simply through general industry-wide enlightenment — self-serviceable questions will increasingly become a smaller part of our work. The questions that we as analysts and data scientists solve are going to be the ones that are not answerable by self-service.
4. We’ll get nicer kitchens.
My final prediction is that tooling will start to reflect a greater emphasis on user experience. The rise of the kitchen. We are poorly equipped to deal with analytics work in a post-semantic layer, post-llm [let alone post-dbt!] world. Tools like Jupyter, SQL IDEs, BI tools, etc. -- these have been the state of the art, but they weren't built for modern analytics.
Jupyter-based tools and SQL IDE variants are relics of an era where data science reigned. But using these for analytics is like using a bunsen burner to cook meals. It works, but this not optimal for anyone but graduate students2. On the other hand, BI tools are like microwaves. They act as reasonable complements to the SQL IDE3 for many folks and, again, it can work. But you can only do so much with a microwave. You'll likely just get pigeonholed into making microwave meals (dashboards). The patterns reinforced by this toolchain are suboptimal.
What we need: proper kitchenware, kitchens, etc. We need tools that are built for the primary deliverable of analytics: analyses.
While certainly, we have our own horse in the race, the trend towards UX here is inevitable. History has repeatedly followed this pattern. In the data warehousing world, for instance, we started with Hadoop and Redshift, but later ended up with warehouses built from the ground up for the quick access needs and reduced overhead costs of analytics—Bigquery, Snowflake, Motherduck. In the transformation space, we started with Airflow as a general-purpose tool. But the industry quickly got behind dbt, which vastly improved the developer experience for the SQL world (and more recently, Mage for code-based pipelines specifically around data). So while I can't say what we're building is going to be the kitchen to end all kitchens, but I can say tooling will change.
Analytics is going to get interesting.
As I’ve said before, 2023 is going to mark the rise of the analyst. The tectonic shifts in our domain are going to upset standards, but also present unprecedented opportunities for how we can provide value to our organizations. Analytics is going to get interesting.
What up, Boston.
Just kidding, obviously. What grad student can afford fresh ingredients? I just stole food out of group meetings and symposiums, obviously.
Another parallel for us Robert. I love food metaphors. I grew up in a restaurant family. In fact, I've been thinking about the platforms I build as kitchens pretty much forever.
Given what you said above....food prep will take less time, we'll have nicer kitchens, people will eat better...does this create even more incentive for a high performing food supply chain? Reversing the metaphor and going back to data.....if LLMs are consuming data, do real time pipelines finally make sense?