👋 Hello! I’m Robert, CPO of Hyperquery and former data scientist. Welcome to Win With Data, where we talk weekly about maximizing the impact of data.
This time, there is absolutely no relevance to Hyperquery in this article, but you should still check us out. As always, find me on LinkedIn or Twitter — I’m always happy to chat. 🙂
Google’s data stack is ready for disruption. Data studio is clunky. Looker’s self-service ambitions have been largely unrealized. Dataform, while a promising product, has little chance of overtaking the obvious 800 pound gorilla in the transformation space. But in typical Google fashion, we tolerate the product deficiencies because the bundling appeal is so strong.
But it’s precarious. We practitioners will tolerate only so much experiential anguish before the bundling alchemy holding its homunculus data stack together is finally dispelled.1 And aside from some [albeit powerful] infrastructural components, the only truly delightful piece of data tooling Google has is BigQuery.
So the natural question here is: how could Snowflake win this fight?
My big caveat behind all this: this is all woefully broad brush, but I’m going to put this out into the world so I can point to this article in case it actually happens. Take it with a cup of salt.
Why should Snowflake even care?
Let me set the record straight: Snowflake might not have to care about this battle. Snowflake has a lot of points in their favor already:
They’re cross-cloud, making it easier to set up and govern than GCP (which would require pipes into GCP from other cloud vendors).
They have a stronger partnership ecosystem that purportedly decreases time to value — something all data teams seem to be yearning for these days.
They have a stronger marketing and sales team.
The list goes on…
In general, I’m long on Snowflake — while costs can be high, the pricing is transparent, and they have a strong track record of investing in their developer experience. But my fundamental hypothesis is that Google, if left unchecked, could slowly win the bottom of the market and, over many years, might topple the king.
There are two strong draws for smaller companies:
BigQuery’s bottoms-up pricing is, in typical Google fashion, predatory (it’s completely free in the early days of your company), and costs seem comparable at scale.
The Google analytics suite is a compelling value prop — you can avoid the mindshare cost of having to make tooling decisions and set up connections to Snowflake. If you have a choice between a warehouse with basic partnerships (Snowflake) and a warehouse with an ecosystem of deeply integrated tools (BigQuery), a lot of folks will choose the latter, even if you have to cross clouds to do it.
Want Sheets integration? You should use BigQuery.
Want Looker or Data Studio integration? You should probably use BigQuery.
Need a transformation tool? Well, Google has one! Snowflake on the other hand…
How does Snowflake fight back? By acquiring dbt.
Snowflake will likely be hard pressed to fight Google on bottoms-up pricing — it’s their tried and true corporate MO, after all. But they could fight the ecosystem bundling. They’ve already done quite a bit to broker partnerships in this regard, but I’d contend that their greatest opportunity surrounds the weak points in the Google data stack: transformations and semantics.
On the transformational side of things, dbt is, for the time being, king. While there are certainly reasons why dbt might fall, it seems unlikely that any sins they commit could break their momentum. They’re a wildcard, positioned directly in the middle of Google’s stack. It’s tasty real estate — the first thing you usually want to do after setting up a data warehouse is either (a) set up core dashboards or (b) make some data models to unblock your dashboard creation. While there are certainly upstream inefficiencies here that dbt is capitalizing on, it’s hard to argue that basic data manipulation is better done in a version controlled system, and, for the time being, dbt is the obvious option. We’re at a point in Modern Data Stack history, after all, where you don’t really get fired for choosing dbt.
Now, if you’ll let me return to the semantic layer for just a minute, the case becomes even more compelling. The semantic layer is what will ultimately bridge raw data and consumption in a way that’ll be accessible to the rest of the business. In a previous article, I’ve discussed how dbt is going to win the war for the semantic layer because they got one unshakeable thing right: accessibility. And while we all have qualms around their present-day execution around the semantic layer, they certainly seem to be moving in the right direction. If they win this battle, what happens next?
Well, suddenly the Looker world looks much less appealing. One of Google’s potential siphons away from Snowflake shuts down. It puts at risk the stranglehold bundling strategy they used to dominate search, email, productivity.
And so there’s an opportunity to dethrone the king — to dismantle the Google data stack. Snowflake needs its own siphons, and its latest acquisitions frankly aren’t going to cut it, but dbt certainly would.
So Snowflake should acquire dbt, if they can. I imagine it’d actually be in the best interests of both parties. dbt faces a long climb up the tail of the Gartner hype cycle to grow into their valuation. And Snowflake could give it the opportunity it needs to move beyond the transformation and semantic layer. And to be honest, it’d be so nefariously synergistic. Then again, it all hinges on the Modern Data Stack continuing to reign king, and sentiments are rightly changing.
For you PowerBI stans, Microsoft is certainly relevant here. But they’ve cut off on their own. They’ve nailed a lot of the key requirements needed by data folks. But in typical Microsoft fashion… you really need Windows. There’s a different battle going on there - something larger, cohesive, and the data stack is only a foot soldier in the infinite war for hardware mindshare. And I imagine it’s fighting that war that has necessitated nudges that have made their data stack substantially less modular, which I suspect will mean they won’t win the longer-term mindshare battle.
Bruhhh you clickbaited me!!!
Interesting article. Who is going to win the EL space in long run ? I don't see snowflake or even Google has any great integration tools for data ingestion..
I am exploring ELT tools and no one seems scalable enough to handle my 10000+ sources.