š Hello! Iām Robert, CPO ofĀ HyperqueryĀ and former data scientist + analyst. Welcome to Win With Data, where we talk weekly about maximizing the impact of data. As always, find me onĀ LinkedInĀ orĀ TwitterĀ ā Iām always happy to chat. And if you enjoyed this post, Iād appreciate a follow/like/share. š
Our culture loves speed. We praise kids who go to college early, we boast about the number of books we read in a year, we seek out new productivity tools to eke more out of our days, we even indulge ourselves at 2x.
And of course, so when it comes to data, we let this ethos creep into our way of working. We pride ourselves on our ability to Minority Report the shit out of our ad hoc requests. But much in the same way, we are only hastening our own demise. Letās talk about why we should be slower.
The case for being slower, in general.
To start, I want to make a wider cultural point. The examples I gave at the start of this article are all traps ā while thereās certainly something to be admired in raw speed, we hasten to over-value it. Speed can certainly be awe-inspiring, but there are chiefly two problems with speed:
It itself is rarely the objective. Optimizing for it is often another example of confusing correlation with causation.
Itās also fiendishly alluring. Itās a vanity metric, and with all such vanity metrics, itās tempting to optimize against it rather than the primary objective.
Going to college early does not make you more prodigious. Reading books more quickly does not mean youāve learned more. And fitting more into your day is a trap ā fastidious prioritization means you get more useless things done, whereas ruthless prioritization means you get the right things done. But, of course, all of these things look good on paper, and thatās the trap. We overvalue speed so heavily that we lose sight of the actual objective, at once ignorant to the causal impotence of speed and enamored by its culturally-praised perception.
Weāll get into data in a second, but a final point I want to make here is this: youād do well to consider what you actually care about in any human endeavor, because rarely is it speed. Weāre rarely limited by speed. We are, on the other hand, frequently limited by our capacity for attention. You can finish a book without paying attention to it. But attention is how you gain knowledge from the book. You can watch a show, idly letting the colors glaze over your pupils. But attention is how you really enjoy the experience, letting entire worlds wash over you. You can do check off a long list of tasks you had to do for the day, but have you truly done what is important to you? Attention is what you seek.
The case for being slower, in data.
Alright, letās bring it back down to earth. While I certainly believe thereās value in moving slower, more intentionally in general, Iād like to also wager itās paramount in data work. Letās talk about three reasons in particular:
Data work is error-prone.
Fast data work reinforces your role as a data retrieval interface.
Speed makes us miss the point.
Data work is error-prone
Data work is highly error-prone. Iāve been trying to understand why this is, and I suspect itās because, unlike software development, data work is usually one-off. You build in logic that does some sort of analysis, then you act on that logic. We build libraries and data models to circumvent this, of course, but the set of code youāll one day re-use is much smaller.
As a result, we have a greater surface area of novel code with every analysis we construct, making ourselves more exposed to errors. This, plus the fact that we generally donāt have rigorous testing procedures for data work, means itās easy to find ourselves in a tightrope walk over our stakeholdersā trust.
One of the most harrowing things thatāll happen in an ICās data career is having to recant their findings. I was fortunate enough to experience this quite early on in my career, and in a forgiving environment ā I excitedly brought a plot to my graduate school advisor, and after one glance at the plot, he told me definitively āyou made a mistake.ā The plot looked too smooth ā years of experience etched into his brain made him able to identify that without even looking at my source code. I was deflated, of course, but he was right. And I learned my lesson: always triple-check data work, because itās easy to make mistakes, and trust is built in drops but lost in buckets.
Fast data work reinforces your role as a data retrieval interface
Iāve written about this before, so I wonāt harp on this again. But my second point is this: if you operate as a speedy data retrieval interface, this is how stakeholders will perceive you. Your non-technical capabilities ā your interpretive skills, your business acumen, your intellectual honesty ā will fall to the wayside.
Speed makes us miss the point
Finally, speed makes us miss the point. An overzealous focus on speed makes us miss the objective of our work. And there always is an underlying strategic objective. Without asking why before doing data work, youāre not only curtailing your impact, but also dooming yourself to the Sisyphean task of answering new, headless questions over and over and over. We have a strong tendency to dumpster dive, but in our celerity, we forget that data work isnāt actually about data. Itās about the things that happen after.
Final comments
Iāve found myself working as an analyst lately ā I set up our data pipelines at Hyperquery, and so Iāve become the de facto person to answer such questions as āCan you plot account growth for our customers, aligned by first sign up?ā āCan you build a cohort analysis, but broken out by their Typeform responses?ā And for once, Iām intimately aware of the objective: we need to give investor updates, we need to fundraise, we need to understand our market.
And with that knowledge, Iāve been giving what feels like painfully long estimates to my cofounder. āHow long will it take you to do this analysis?ā Where Iād say ā15 minutesā before, Iāll give an estimate of āa couple hoursā. As you might expect, the generous time allotment has given me the bandwidth to be generally more careful. But more specifically, itās given me bandwidth to run checks against all my assumptions, document them all, and come up with answers that I can have a lot more confidence in ā all in all, itās given me space to construct a workflow that is more deliberate.
Thereās a bit of process change thatās been quite helpful for me, so I thought Iād share quickly what that looks like: Iāve been using Hyperquery header toggles to document and hide all these extraneous checks. An example of this for a recent analysis I built is shown below. In the past, Iād discard my checks as soon as I did them, but by establishing a solid workflow / storage mechanism for these checks, Iāve grown to be more explicit about my assumptions, and this has helped tremendously in reducing my error rate.
In any case, this is all to say that some modicum of process here can help you stay disciplined around doing work more deliberately. But independent of that idea, I hope at this point we can all agree Minority Report is not the goal, however alluring the aesthetic.
We also see systems engineering and product design better when we slow down. Connections are easier to make we slow down, but speed up understanding of the underlying reasons and potential solutions from it. Especially when there's a roadmap that's dependent on it. Speed focus can definitely direct us on the path of building something, without laying the foundation.
"This, plus the fact that we generally donāt have rigorous testing procedures for data work, means itās easy to find ourselves in a tightrope walk over our stakeholdersā trust."
I can't take any single word from this passage. It's so beautifully crafted.
I love how you mentioned the lack of rigorous testing procedures in data work. It's night and day compared to software work.
Worst case scenario if soft. engineer make mistake (in the typical tech company nowadays) is "Oops, I didn't pass the test coverage. Well, I can just delete this particular branch, re-clone and start all over".
In the data work, the worst case scenario is your stakeholder coming to you and said, "Hey, I think you use wrong calculation for our revenue number. It confuse all of us during the meeting. Have you double check it?".
Incidentally, this also the reason why I bet it's quite hard for AI to eat analytics work.