Before I condemn the IDE, I want to say that I do look back fondly on my days writing in SQL IDEs. The process was spare and brutally technical in a way that nourished my obsession with Unix fundamentalism. While I was at Airbnb, I even wrote an open source library to be able to build my CLI setup into a proper SQL IDE, data discovery and all.
But it’s high time I admit something I’ve always known deep in my shell-script heart:
The IDE was not built for analytics.
Let’s talk about why.
Note: if you’re searching for the bias, I am a co-founder of Hyperquery, where we’re building a notebook for analytics. You’ll find that the thrust of this article pushes towards a notebook shape as the optimal solution. This is not meant to be a sales pitch, but to share an argument (and it just happens to be the motivation behind what we’ve built). I’ll try to stay measured.
Why the IDE isn’t for analytics:
analytics isn’t about development.
Well, the IDE is not for analytics, by definition. The IDE is an integrated development environment. It consolidates the needs of the development workflow.
But analytics is not about development. The extent to which we leverage a programming language at all is not to develop an application, but as an access and manipulation layer. And everything else that happens in analytics revolves around interpreting the resulting data payload, not hardening the code into a codebase.
Analytics is primarily about alignment, interpretation, communication — the non-SQL behaviors that enable us to establish an interface between data and impact. While our scripting chops open the door to a world of data inaccessible to the rest of the business, our subsequent behaviors unlock the value therein.
What the solution should look like:
the notebook, where data and interpretation mix.
So what’s the solution?
We don’t need an integrated development environment, because analytics isn’t primarily about development. We need an integrated analytics environment that addresses the needs of analytics, not just SQL. It’s time we stopped co-opting an interface from another field when our needs are different.
We need a proper analytics notebook.
Why?
1. Notebooks fit the analytics workflow better.
While SQL IDEs push you towards consolidation (one, final query), notebooks push you towards exploration. And the latter is the preferred pattern for analytics: your queries are rarely ends in and of themselves. They deserve, at minimum, a line or two of explanation, contextualization, always. Notebooks are better for this.
2. Notebooks reinforce better behaviors.
It might seem that dumpster diving into your IDE is the fastest way to get going, it’s seldom the best solution — it’s the fast food of analytics work. It may work in a pinch, but relying on it for the bulk of your work will only reinforce bad habits and degrade the quality of your work in the long-term. Work should always be aligned and interpreted on either end of the technical work.
3. Notebooks elevate data to knowledge, and that’s what we care about.
Notebooks represent knowledge, and knowledge is the currency of the business that analytics teams should peddle (not data!). SQL queries deal in data. Orientation around the thing that matters aligns all ancillary processes to it in a more coherent way. Knowledge should be organized, not data. Knowledge should be shared, not data.
Final comments
A few caveats:
There are certainly workflows where development is appropriate: building pipelines, data models, etc. But these fall within the realm of data engineering and analytics engineering. While these are often within the scope of analytics work, they are not analytics.
Some of you may be chanting “Jupyter” or its derivatives at this point, but I don’t think this is the optimal solution. It’s not built from first principles for analytics, meaning its shape will inevitably bear fundamental shortcomings and clumsy vestiges. But that’s another post for another time.
And all that said, certainly I’m biased. We have a lot of sunk cost here. But I hope you find the original reasoning sound (and if not, let me know — there is precious little I care about more than challenging this line of reasoning).
Notebook or not, an upheaval is overdue. Not everything is a nail. We deserve a tool purpose-built for analytics, not another re-purposed development tool.
Just gonna jump in and pile on the hot-takes here...
First off, I don’t know anything about hyperquery so I’m staying out of that part of it.
But analysis both is and isn’t development.
Analysts write code, it should be repeatable, and correct. In other words, it is development. There is nothing fundamentally incompatible or suboptimal with using an IDE for this. Depending on the IDE of course.
But analysts don’t ship code. Not the same way. I mean, you could set up a CI system to generate and publish a report on merges, but that’s quite fanatic. I tried. It was short-lived. So we default to think “notebook” as the solution because it’s code but has plots and isn’t an IDE.
RStudio is an interesting case study here. It is unquestionably an IDE. I have shown it to colleagues in IT and they all agree it is an IDE. It has all the debugging features. The git integration. The testing framework integrations. But it is also definitely a tool for analysis. It displays plots. It produces reports. It publishes interactive visualizations. It can even help you write to word if you need to.
So where does this leave us? A lot of IDEs aren’t good IDEs for analysts. DataGrip (or whatever other SQL client) isn’t, and afaik never attempted to be. Same with IntelliJ, even though someone has tried to duct-tape a “show plots” feature onto it. But there are IDEs for analysis. In addition to RStudio, there is DataSpell (which I haven’t used in a while, and had a very “notebook with stuff glued on” feeling to it if memory serves). Most seem to be python-first though. SQL is left behind.
I have left out any discussion of new-ish entrants like hex (and hyperquery) because I don’t know enough about them. But It feels like we are at a Henry Ford kind of moment where everyone is so used to notebooks, it is the only thing they can think to ask for. The next winner in this space introduces something new that is neither a notebook nor IntelliJ. But they will have to teach people how to use it.