Just gonna jump in and pile on the hot-takes here...
First off, I don’t know anything about hyperquery so I’m staying out of that part of it.
But analysis both is and isn’t development.
Analysts write code, it should be repeatable, and correct. In other words, it is development. There is nothing fundamentally incompatible or suboptimal with using an IDE for this. Depending on the IDE of course.
But analysts don’t ship code. Not the same way. I mean, you could set up a CI system to generate and publish a report on merges, but that’s quite fanatic. I tried. It was short-lived. So we default to think “notebook” as the solution because it’s code but has plots and isn’t an IDE.
RStudio is an interesting case study here. It is unquestionably an IDE. I have shown it to colleagues in IT and they all agree it is an IDE. It has all the debugging features. The git integration. The testing framework integrations. But it is also definitely a tool for analysis. It displays plots. It produces reports. It publishes interactive visualizations. It can even help you write to word if you need to.
So where does this leave us? A lot of IDEs aren’t good IDEs for analysts. DataGrip (or whatever other SQL client) isn’t, and afaik never attempted to be. Same with IntelliJ, even though someone has tried to duct-tape a “show plots” feature onto it. But there are IDEs for analysis. In addition to RStudio, there is DataSpell (which I haven’t used in a while, and had a very “notebook with stuff glued on” feeling to it if memory serves). Most seem to be python-first though. SQL is left behind.
I have left out any discussion of new-ish entrants like hex (and hyperquery) because I don’t know enough about them. But It feels like we are at a Henry Ford kind of moment where everyone is so used to notebooks, it is the only thing they can think to ask for. The next winner in this space introduces something new that is neither a notebook nor IntelliJ. But they will have to teach people how to use it.
Love this take, thanks Henning. Agree that there are aspects of it that are development (repeatability, correctness). I suppose my more measured take would've been: it's not development in the traditional sense... It's more akin to API testing -- there's code involved, but you really care about the inputs/outputs. So it warrants different tools (in the same way API work warrants a tool like Postman). That said, I think the lurking misconception is that the development part of analytics work is the most *important*, which is what I wanted to push back against by saying "it's not about development", but that's a post for another day. ;)
While I'm picking a fight here with the IDE, the Henry Ford idea is exactly what I had in mind - and most technical tools actually share this problem IMO, notebooks and R markdown included. I love RStudio as a technical tool (RMarkdown has always felt more ergonomic than Jupyter, e.g.), but to me it still falls short. It's probably the best tool we've got for the technical parts, but, like Jupyter, my qualm is that the rest of analytics work still feels off. I used to share notebooks and rendered R markdown files all the time as a DS, but I couldn't shake the feeling that no one nontechnical ever read them...
Thanks! The API/Postman analogy is very good, and I’ll definitely steal it. Somehow it made me think of Test Driven Development, which would be incredibly nonsensical for analysis work. Analysis is more like QA-driven development, which I’m sure any backend developer would wince at. Of course, the developer and the QA tester can often be the same person, but still, conceptually these are different hats.
I love RStudio too, but now that I think of it, the tool I have seen that tries to elevate SQL to an equal partner instead of some necessary incantations just to get the data “out” of the database is SAS. It’s got syntax highlighting, a native (albeit confused) SQL engine, and lets you juggle SQL and base SAS as you wish. Not that any of that is a recommendation, but it might serve as an inspiration.
Just gonna jump in and pile on the hot-takes here...
First off, I don’t know anything about hyperquery so I’m staying out of that part of it.
But analysis both is and isn’t development.
Analysts write code, it should be repeatable, and correct. In other words, it is development. There is nothing fundamentally incompatible or suboptimal with using an IDE for this. Depending on the IDE of course.
But analysts don’t ship code. Not the same way. I mean, you could set up a CI system to generate and publish a report on merges, but that’s quite fanatic. I tried. It was short-lived. So we default to think “notebook” as the solution because it’s code but has plots and isn’t an IDE.
RStudio is an interesting case study here. It is unquestionably an IDE. I have shown it to colleagues in IT and they all agree it is an IDE. It has all the debugging features. The git integration. The testing framework integrations. But it is also definitely a tool for analysis. It displays plots. It produces reports. It publishes interactive visualizations. It can even help you write to word if you need to.
So where does this leave us? A lot of IDEs aren’t good IDEs for analysts. DataGrip (or whatever other SQL client) isn’t, and afaik never attempted to be. Same with IntelliJ, even though someone has tried to duct-tape a “show plots” feature onto it. But there are IDEs for analysis. In addition to RStudio, there is DataSpell (which I haven’t used in a while, and had a very “notebook with stuff glued on” feeling to it if memory serves). Most seem to be python-first though. SQL is left behind.
I have left out any discussion of new-ish entrants like hex (and hyperquery) because I don’t know enough about them. But It feels like we are at a Henry Ford kind of moment where everyone is so used to notebooks, it is the only thing they can think to ask for. The next winner in this space introduces something new that is neither a notebook nor IntelliJ. But they will have to teach people how to use it.
Love this take, thanks Henning. Agree that there are aspects of it that are development (repeatability, correctness). I suppose my more measured take would've been: it's not development in the traditional sense... It's more akin to API testing -- there's code involved, but you really care about the inputs/outputs. So it warrants different tools (in the same way API work warrants a tool like Postman). That said, I think the lurking misconception is that the development part of analytics work is the most *important*, which is what I wanted to push back against by saying "it's not about development", but that's a post for another day. ;)
While I'm picking a fight here with the IDE, the Henry Ford idea is exactly what I had in mind - and most technical tools actually share this problem IMO, notebooks and R markdown included. I love RStudio as a technical tool (RMarkdown has always felt more ergonomic than Jupyter, e.g.), but to me it still falls short. It's probably the best tool we've got for the technical parts, but, like Jupyter, my qualm is that the rest of analytics work still feels off. I used to share notebooks and rendered R markdown files all the time as a DS, but I couldn't shake the feeling that no one nontechnical ever read them...
We call Hyperquery a notebook, but it isn't quite the same, and it's our bet to make things more impactful, from the consumption end of things. I'm just hoping we're building a model T, not some steam powered monstrosity (https://journal.classiccars.com/2020/09/01/video-of-the-day-the-original-steam-car-cugnot-is-a-250-year-old-design/), but time will tell. 😬
Thanks! The API/Postman analogy is very good, and I’ll definitely steal it. Somehow it made me think of Test Driven Development, which would be incredibly nonsensical for analysis work. Analysis is more like QA-driven development, which I’m sure any backend developer would wince at. Of course, the developer and the QA tester can often be the same person, but still, conceptually these are different hats.
I love RStudio too, but now that I think of it, the tool I have seen that tries to elevate SQL to an equal partner instead of some necessary incantations just to get the data “out” of the database is SAS. It’s got syntax highlighting, a native (albeit confused) SQL engine, and lets you juggle SQL and base SAS as you wish. Not that any of that is a recommendation, but it might serve as an inspiration.