Good analytics always looks different

And that's okay.

Dec 06, 2022

If our escapades in the Modern Data Stack have taught us anything, it’s that we love to standardize things1. We incessantly collect tools and processes and frameworks, then with our newfound conviction, talk about them. But one aspect of the data world has evaded clear consensus on how standardization ought to occur: analytics. Let’s talk about why analytics — and in particular, good analytics — can look so different.

Industry-specific immutable characteristics

I recently asked folks what factors might be strong predictors of what your life will be like as a data person. It occurred to me that all of these reasons could be usefully segmented into one of two categories: cultural (mutable) factors vs. immutable factors. While cultural factors point to problems that are generally pervasive across all data teams (documentation culture, how the data team is viewed, reactivity/buy-in), immutable factors are rigid properties of the industry you’re working in. To name a few: inherent variability in the data (human-generated vs. deterministically generated), the provenance and quantity of data, amount of regulation you have to deal with.

The existence of these immutable factors ensures that good analytics will always look different. Bad analytics shares common themes: analysts hired without data engineers, an overly reactive culture of mad-lib analytics. But good analytics maximizes impact for your business, and because businesses look different, so will good analytics. A clickstream-heavy tech company might care primarily about A/B tests, while this may not be practical in a heavily regulated industry where regional fluctuations dominate effect sizes. Or if your revenue comes from B2B sales, your data might be too human-sourced to drive any valuable deep-dive insights, though such efforts could form the bread and butter of a clickstream-first tech organization. Any productive discourse around analytics best practices, then, needs to keep this in mind.

Some examples

I want to give you a few examples illustrating how this divergence can manifest. For data leaders, I hope this gives you some solace if you’ve set up the Modern Data Stack but still find yourself drowning in dashboards, wading through ad hoc requests, struggling to set up a proper experimentation system — overall, feeling like you’ve failed as a newly minted head of data. Don’t beat yourself up. As long as you’re maximizing the ROI of your data efforts with business impact in mind, you’re probably doing alright.

Example 1: your business is heavily regulated

You’re probably dealing with crypto or weed or healthcare or some other industry that subjects you to an unending nightmare of extremely specific and variegated government policies. You have some key dashboards, but full self-service is a pipe dream — you have far too many questions that require bespoke deep dives. You probably live all over the place (spreadsheets, dashboards, notebooks, docs) because your heterogeneous world warrants heterogeneous solutions. Data comes from everywhere, and you need to stitch it together to craft a story. ETL tool? Yeah, from like one or two sources. Maybe if they can connect to 50 different government APIs you’ll see deeper usage one day.

Chaos is your biggest problem. It is far too easy to let your analyses become an unorganized pile of reactive garbage, so you’re constantly thinking about how to re-design your org, re-design your processes, re-think your tooling to rein in the sprawl.

Example 2: you’re a tech company, with a bunch of data-driven PMs

I Applied To 13 Top Tech Companies In Silicon Valley — Talked To 9 — Got Onsite Interviews At 5 — And Offers From 3 Of Them | by Tony Mai | Medium

You have an oddly high proportion of product managers who used to be data scientists. Consequently, data literacy is deep in the fabric of your culture. Your stakeholders try — and fail2 — to run queries. You have dreams of semantics, of metrics, of self-service, and these might actually be largely realizable (or have been realized already). You see the potential of the modern data tooling to take everything you ever wanted to do with data and make it a reality.

You've built out rudimentary self-service capabilities that help everyone use data... but with data has come a huge influx of important follow-up questions. Why did that metric change? And root cause is rarely determinable by a dashboard. You need to look at all the levers, think about the touch-points, think causally. Beyond core dashboards, data models, and experimentation tooling, the highest value-add thing you can do as a head of data is establish best practices for the exploratory work that takes up 90% of your team’s time.

Example 3: you’re a sales-driven B2B organization

Your business follows the lead of a prolific sales org. You probably have an ML team driving lead scoring, because the efficiency of your sales team is critical. You don’t run many A/B tests, because you seldom have enough traffic (and far too much variability) for these to converge in any meaningful amount of time. Insights-based data science is also rarely priority. You may have attempted to use data to find leaks in your sales process, but there’s so much heterogeneity in human-driven sales and you have relatively little data. If you’re small, you probably rely on your domain tools for analysis, analyzing touch-points and conversion rates in Salesforce or wherever your marketing campaigns like rather than SQL.

There is some value in blending sales data with marketing data, but any analysis that blends this pool of information together requires a strong infrastructural backbone, so you probably rely heavily on ETL pipelines piping from Salesforce, Hubspot, Marketo, etc. into a centralized data warehouse (standard connections, so you can use a vendor). However, because large amounts of context live in the heads of stakeholders, not data people, transferring it over takes a lot more overhead than in other industries. Self-service and dashboards are therefore critical, to minimize context transfer — domain-specific teams and their embedded analysts are going to be the ones doing any custom data pulls.

The moral

Don’t read too much into the specific details of the examples I’ve given. Looking at the broad brush strokes, I hope you agree with the core lesson: what is needed from analytics and what is possible with analytics are both highly industry-dependent. Certainly there are common best practices and data needs across all industries — we all need some core dashboards, some core data models, a tool for generating insights, systems to share knowledge. But the conditions within which analytics occurs are rarely so predictable. As much as we like to bash the all-too-common trashboard epidemic, maybe a swarm of dashboards is exactly what your organization needs.

Think about what your business needs and build your analytics organization accordingly. Hire people that can address these needs, build processes that are optimized for these needs, and then — and only then — choose tools that give your people and processes maximal leverage.

Let’s agree on this aspect of human nature, independent of what you might think about the MDS.

Cue: “hey why is this query taking forever to run?”. We still love them for trying, though. ❤️

Nicole Lillian Mark

Dec 6, 2022

I said "yes" aloud three times while reading this. I worked in healthcare for years (one other complicating factor there is interoperability, or lack thereof) and now I work in ecommerce. TOTALLY different set of problems on the immutable list.

Expand full comment

2 replies

2 more comments...