Published: Sat 02 March 2024
By Bo Johnson
In Articles .
tags: python julia
The early days
I have to make a confession at the start here. I'm like a squirrel in the middle
of a field full of nuts when it comes to technology: I run around trying all
the latest things and never stick around on one piece of tech for long. This
not only include gadgets like phones or computers, but also operating systems
and programming languages. I think this stems from my earliest days learning to
program and doing research in undergrad. When you're first introduced to Linux,
which a lot of physicists tend to use because it's free, you're immediately
immersed in a world that offers tons of possibilities and choices. One day you can be on
Linux Mint and then the next Ubuntu, followed by Fedora or Arch. Not to mention
all the flavors and desktops you can try as well. Then there are the programming
language options. I learned C++ because of the intro computer science courses but
was also asked to learn Fortran with the first professor I worked with. Then
it's easy to see how much easier some tasks can be done in Matlab or Python or
Java or, or, or… And my personality is to want to learn all the things. So while
I spent time learning multiples of these I never got that great in any. I'd say
I was the most exposed and experienced with C++ because I took so many courses that taught it, but
for research and personal stuff I would hop around and try different things.
So when I started grad school I thought I would try to stick to whatever people
around me were using. In particle physics this tends to be C++ because ROOT.
I've certainly learned how to use that and feel comfortable in manipulating
ROOT files and creating ROOT scripts and plotting in ROOT. But honestly it feels
like you need to have mild psychosis to want to do serious analysis in ROOT, and
by extension, C++. It's just that it takes so much coding in order to accomplish
any small take such as plotting data. What would take two or three lines in Python
script requires multiples of that in C++. It's really painful. It's not elegant,
either. You really have to contort your thinking to the mold of C++ for the analysis
rather than the language you're using being molded to your analysis. Which is why
for a while I also tried to do things in Python. It's really nice to program in
Python because it's higher-level constructs like list comprehensions, generators,
and functional programming tools make it possible to shape the code to how you
envision the analysis unfolding.
The matchup
The issue with Python though is it's speed. Now any programming language is fast
by any means and for really any normal task I come across. (Admittedly, I'm not
a programmer by trade and so normal tasks for me are analyzing data, plotting,
fitting data, etc.). But when it comes to analyzing gigabytes of data that contain
hundreds of thousands of simulated physics interactions, for example, Python tends
to grind to a halt. So it seemed, when I started grad school and doing research,
the only options were Python for the simple things like plotting, but having to
use C++ for serious lifting.
Enter Julia. I learned a bit about Julia over a summer intership during undergrad.
I spent a lot of time learning it's ins and outs and think it's got some really
neat features. I was drawn to its mantra "Walks like Python, runs like C" because
that was the problem I had hoped to solve with Python and ROOT. When I was
running up against the wall with analysis, not wanting to use ROOT and waiting for
Python to finish running, I started accomplishing my work with Julia. I've even
contributed to a Julia package I used quite often. Once I started using Julia for
more things, I started to get hung up on little things here and there. The one
issue you have to deal with whenever you run code is waiting for it to compile. So
if you're wanting to do some plots, you have to wait for tens of seconds for it to
compile first. I know that issue has been one of the most pressing for the Julia
community and a lot of work has gone into bringing the so-called time to first plot
down, but it still isn't as nice as using Python for simple things there. Another
thing I'd run up against was the lack of mature packages. Julia is a lot less
established than Python and so looking for packages to do something like curve
fitting isn't as straight forward as using SciPy. The package universe in Julia
is more disparate and less well maintained, for the things I was hoping to get done.
It's really nice that the usual libraries in Python for scientific research have
been around a long time and are really well maintainted and tested, like NumPy,
SciPy, Matplotlib, etc. In Julia there's no standard for curve fitting or integration
or even plotting. So it's easy to look at any package askanse in Julia because you
don't know how battle-tested it is.
The result
With all these little things starting to add up, I decided to once again give Python
a try. That's because I (re)learned the benefits of using Numba
in conjunction wit Python. It's kind of like Julia in that you can compile Python
code, but it's unlike Julia in that you only compile what you really need to speed up.
So I can code and do analysis with nice-looking, easy to understand code wherever I
want, but when I need that extra speed boost, I put on my C++-style coding hat and
write some Python code that Numba is able to compile. Now things run as fast as I need
and there's no more bottleneck in Python. I get to use the nice Python packages that
have been around for a long time and know there's more helpful information out there
on the internet with any question I might have, too.
So right now, I'm trying to do the responsible thing and stick to a single programming
language for physics analysis. Python's actually been quite easy to pick up again and
I'm hoping my squirrel brain won't be too perturbed by all the shiny-looking languages
out there (looking at you Rust ).