Before I condemn the IDE, I want to say that I do look back fondly on my days writing in SQL IDEs. The process was spare and brutally technical in a way that nourished my obsession with Unix fundamentalism. While I was at Airbnb, I even wrote an open source library to be able to build my CLI setup into a proper SQL IDE, data discovery and all.
But itâs high time I admit something Iâve always known deep in my shell-script heart:
The IDE was not built for analytics.
Letâs talk about why.
Note: if youâre searching for the bias, I am a co-founder of Hyperquery, where weâre building a notebook for analytics. Youâll find that the thrust of this article pushes towards a notebook shape as the optimal solution. This is not meant to be a sales pitch, but to share an argument (and it just happens to be the motivation behind what weâve built). Iâll try to stay measured.
Why the IDE isnât for analytics:
analytics isnât about development.
Well, the IDE is not for analytics, by definition. The IDE is an integrated development environment. It consolidates the needs of the development workflow.
But analytics is not about development. The extent to which we leverage a programming language at all is not to develop an application, but as an access and manipulation layer. And everything else that happens in analytics revolves around interpreting the resulting data payload, not hardening the code into a codebase.
Analytics is primarily about alignment, interpretation, communicationâââthe non-SQL behaviors that enable us to establish an interface between data and impact. While our scripting chops open the door to a world of data inaccessible to the rest of the business, our subsequent behaviors unlock the value therein.
What the solution should look like:
the notebook, where data and interpretation mix.
So whatâs the solution?
We donât need an integrated development environment, because analytics isnât primarily about development. We need an integrated analytics environment that addresses the needs of analytics, not just SQL. Itâs time we stopped co-opting an interface from another field when our needs are different.
We need a proper analytics notebook.
Why?
1. Notebooks fit the analytics workflow better.
While SQL IDEs push you towards consolidation (one, final query), notebooks push you towards exploration. And the latter is the preferred pattern for analytics: your queries are rarely ends in and of themselves. They deserve, at minimum, a line or two of explanation, contextualization, always. Notebooks are better for this.
2. Notebooks reinforce better behaviors.
It might seem that dumpster diving into your IDE is the fastest way to get going, itâs seldom the best solutionâââitâs the fast food of analytics work. It may work in a pinch, but relying on it for the bulk of your work will only reinforce bad habits and degrade the quality of your work in the long-term. Work should always be aligned and interpreted on either end of the technical work.
3. Notebooks elevate data to knowledge, and thatâs what we care about.
Notebooks represent knowledge, and knowledge is the currency of the business that analytics teams should peddle (not data!). SQL queries deal in data. Orientation around the thing that matters aligns all ancillary processes to it in a more coherent way. Knowledge should be organized, not data. Knowledge should be shared, not data.
Final comments
A few caveats:
There are certainly workflows where development is appropriate: building pipelines, data models, etc. But these fall within the realm of data engineering and analytics engineering. While these are often within the scope of analytics work, they are not analytics.
Some of you may be chanting âJupyterâ or its derivatives at this point, but I donât think this is the optimal solution. Itâs not built from first principles for analytics, meaning its shape will inevitably bear fundamental shortcomings and clumsy vestiges. But thatâs another post for another time.
And all that said, certainly Iâm biased. We have a lot of sunk cost here. But I hope you find the original reasoning sound (and if not, let me know â there is precious little I care about more than challenging this line of reasoning).
Notebook or not, an upheaval is overdue. Not everything is a nail. We deserve a tool purpose-built for analytics, not another re-purposed development tool.
Just gonna jump in and pile on the hot-takes here...
First off, I donât know anything about hyperquery so Iâm staying out of that part of it.
But analysis both is and isnât development.
Analysts write code, it should be repeatable, and correct. In other words, it is development. There is nothing fundamentally incompatible or suboptimal with using an IDE for this. Depending on the IDE of course.
But analysts donât ship code. Not the same way. I mean, you could set up a CI system to generate and publish a report on merges, but thatâs quite fanatic. I tried. It was short-lived. So we default to think ânotebookâ as the solution because itâs code but has plots and isnât an IDE.
RStudio is an interesting case study here. It is unquestionably an IDE. I have shown it to colleagues in IT and they all agree it is an IDE. It has all the debugging features. The git integration. The testing framework integrations. But it is also definitely a tool for analysis. It displays plots. It produces reports. It publishes interactive visualizations. It can even help you write to word if you need to.
So where does this leave us? A lot of IDEs arenât good IDEs for analysts. DataGrip (or whatever other SQL client) isnât, and afaik never attempted to be. Same with IntelliJ, even though someone has tried to duct-tape a âshow plotsâ feature onto it. But there are IDEs for analysis. In addition to RStudio, there is DataSpell (which I havenât used in a while, and had a very ânotebook with stuff glued onâ feeling to it if memory serves). Most seem to be python-first though. SQL is left behind.
I have left out any discussion of new-ish entrants like hex (and hyperquery) because I donât know enough about them. But It feels like we are at a Henry Ford kind of moment where everyone is so used to notebooks, it is the only thing they can think to ask for. The next winner in this space introduces something new that is neither a notebook nor IntelliJ. But they will have to teach people how to use it.