Using Python (Efficiently)

Python is slow. I can't deny it. But Python, for better or for worse, has taken over the scientific programming ecosystem meaning there are a large set of packages that are open source, professionally maintained, and perform the common tasks that scientists want to do.

One aspect of Python that has enabled its widespread adoption (I think) is the relative ease of writing packages in another programming language. This means that most of our analysis code will not be using "pure Python", but instead be using compiled languages (C++ mostly) hidden behind some convenience wrappers. This is the strategy of many popular Python packages numpy, scipy, and the HEP-specific hist and awkward.

Biggest Performance Tip

My biggest tip for any new Python analyzers is to avoid writing the for loop. As mentioned, Python itself is slow, so only write for if you know you are only looping over a small number of things (e.g. looping over the five different plots you want to make). We can avoid for by using these packages with compiled languages under-the-hood. The syntax of writing these "vectorized" operations is often complicated to grasp, but it is concise, precise, and performant.

With those introductory ramblings out of the way, let's get started.

Set Up

The packages we use (and the python ecosystem in general) evolves relatively fast. For this reason, it will be helpful to use a newer python version such that you can access the newer versions of the packages. Personally, I use denv to pick new python without needing to compile it and then have my installed packages for that python be isolated to within my working directory.

While you can install the packages I'll be using directly, they (and a few other helpful ones) are availabe withing the scikit-hep metapackage allowing a quick and simple installation command. In addition, Jupyter Lab offers an excellent way to see plots while your constructing them and recieve other feedback from your code in an immediate way (via a REPL for those interested).


cd work-area
denv init python:3.11
denv pip install scikit-hep jupyterlab

In addition, you will need some data to play with. I generated the input events.root file using an ldmx-sw simulation, but I am using common event objects that exist in most (if not all) event files, so this example should work with others.

Generate Event File with ldmx-sw

Again, I used denv to pin the ldmx-sw version and generate the data file.


denv init ldmx/pro:v3.3.6 eg-gen
cd eg-gen
denv fire config.py
mv events.root path/to/work-area/

where config.py is the following python configuration script.


from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('test')
from LDMX.SimCore import simulator as sim
mySim = sim.simulator( "mySim" )
mySim.setDetector( 'ldmx-det-v14-8gev', True )
from LDMX.SimCore import generators as gen
mySim.generators = [ gen.single_8gev_e_upstream_tagger() ]
mySim.beamSpotSmear = [20.,80.,0.]
mySim.description = 'Basic test Simulation'
p.sequence = [ mySim ]
p.run = 1
p.maxEvents = 10_000
p.outputFiles = [ 'events.root' ]

import LDMX.Ecal.EcalGeometry
import LDMX.Ecal.ecal_hardcoded_conditions
import LDMX.Hcal.HcalGeometry
import LDMX.Ecal.digi as ecal_digi

p.sequence = [
    mySim,
    ecal_digi.EcalDigiProducer(),
    ecal_digi.EcalRecProducer()
]

Launch a jupyter lab with


denv jupyter lab

open the link in your browser and create a new notebook to get started.

Notebooks consist of "cells" which can contain "markdown" (a way to write write formatted text), code that will be executed, and raw text. The rest of these sections are the the result of exporting a jupyter notebook (a *.ipynb file) into "markdown" and then including it in this documentation here.


# load the python modules we will use
import uproot # for data loading
import awkward as ak # for data manipulation
import hist # for histogram filling (and some plotting)

import matplotlib as mpl # for plotting
import matplotlib.pyplot as plt # common shorthand
import mplhep # style of plots
%matplotlib inline
mpl.style.use(mplhep.style.ROOT) # set the plot style

Load Data

The first step to any analysis is loading the data. For this step in this analysis workflow, we are going to use the uproot package to load the data into awkward arrays in memory. Some helpful links to find more details on this


%%time
# while the uproot file is open, conver the 'LDMX_Events' tree into in-memory arrays
with uproot.open('events.root') as f:
    events = f['LDMX_Events'].arrays()
# show the events array
events


CPU times: user 3.63 s, sys: 220 ms, total: 3.85 s
Wall time: 3.85 s

Printout of `events` array

[{'SimParticles_test.first': [1, 2, 3, ..., 48, 49], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 61, 62], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 11, 12], ...},
 {'SimParticles_test.first': [1, 2, ..., 15, 1199], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 8, 9], ...},
 {'SimParticles_test.first': [1, 2, ..., 118, 119], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 10, 11], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 10, 11], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 7, 61], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 10, 11], ...},
 ...,
 {'SimParticles_test.first': [1, 2, 3, ..., 9, 10], ...},
 {'SimParticles_test.first': [1, 2, ..., 126, 127], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 9, 10], ...},
 {'SimParticles_test.first': [1, 2, ..., 85, 4702], ...},
 {'SimParticles_test.first': [1, 2, ..., 72, 6976], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 8, 93], ...},
 {'SimParticles_test.first': [1, 2, ..., 7120], ...},
 {'SimParticles_test.first': [1, 2, ..., 7921], ...},
 {'SimParticles_test.first': [1, 2, 3, ..., 8, 9], ...}]
---------------------------------------------------------------------
type: 10000 * {
    "SimParticles_test.first": var * int32,
    "SimParticles_test.second.energy_": var * float64,
    "SimParticles_test.second.pdgID_": var * int32,
    "SimParticles_test.second.genStatus_": var * int32,
    "SimParticles_test.second.time_": var * float64,
    "SimParticles_test.second.x_": var * float64,
    "SimParticles_test.second.y_": var * float64,
    "SimParticles_test.second.z_": var * float64,
    "SimParticles_test.second.endX_": var * float64,
    "SimParticles_test.second.endY_": var * float64,
    "SimParticles_test.second.endZ_": var * float64,
    "SimParticles_test.second.px_": var * float64,
    "SimParticles_test.second.py_": var * float64,
    "SimParticles_test.second.pz_": var * float64,
    "SimParticles_test.second.endpx_": var * float64,
    "SimParticles_test.second.endpy_": var * float64,
    "SimParticles_test.second.endpz_": var * float64,
    "SimParticles_test.second.mass_": var * float64,
    "SimParticles_test.second.charge_": var * float64,
    "SimParticles_test.second.daughters_": var * var * int32,
    "SimParticles_test.second.parents_": var * var * int32,
    "SimParticles_test.second.processType_": var * int32,
    "SimParticles_test.second.vertexVolume_": var * string,
    "TaggerSimHits_test.id_": var * int32,
    "TaggerSimHits_test.layerID_": var * int32,
    "TaggerSimHits_test.moduleID_": var * int32,
    "TaggerSimHits_test.edep_": var * float32,
    "TaggerSimHits_test.time_": var * float32,
    "TaggerSimHits_test.px_": var * float32,
    "TaggerSimHits_test.py_": var * float32,
    "TaggerSimHits_test.pz_": var * float32,
    "TaggerSimHits_test.energy_": var * float32,
    "TaggerSimHits_test.x_": var * float32,
    "TaggerSimHits_test.y_": var * float32,
    "TaggerSimHits_test.z_": var * float32,
    "TaggerSimHits_test.pathLength_": var * float32,
    "TaggerSimHits_test.trackID_": var * int32,
    "TaggerSimHits_test.pdgID_": var * int32,
    "RecoilSimHits_test.id_": var * int32,
    "RecoilSimHits_test.layerID_": var * int32,
    "RecoilSimHits_test.moduleID_": var * int32,
    "RecoilSimHits_test.edep_": var * float32,
    "RecoilSimHits_test.time_": var * float32,
    "RecoilSimHits_test.px_": var * float32,
    "RecoilSimHits_test.py_": var * float32,
    "RecoilSimHits_test.pz_": var * float32,
    "RecoilSimHits_test.energy_": var * float32,
    "RecoilSimHits_test.x_": var * float32,
    "RecoilSimHits_test.y_": var * float32,
    "RecoilSimHits_test.z_": var * float32,
    "RecoilSimHits_test.pathLength_": var * float32,
    "RecoilSimHits_test.trackID_": var * int32,
    "RecoilSimHits_test.pdgID_": var * int32,
    "HcalSimHits_test.id_": var * int32,
    "HcalSimHits_test.edep_": var * float32,
    "HcalSimHits_test.x_": var * float32,
    "HcalSimHits_test.y_": var * float32,
    "HcalSimHits_test.z_": var * float32,
    "HcalSimHits_test.time_": var * float32,
    "HcalSimHits_test.trackIDContribs_": var * var * int32,
    "HcalSimHits_test.incidentIDContribs_": var * var * int32,
    "HcalSimHits_test.pdgCodeContribs_": var * var * int32,
    "HcalSimHits_test.edepContribs_": var * var * float32,
    "HcalSimHits_test.timeContribs_": var * var * float32,
    "HcalSimHits_test.nContribs_": var * uint32,
    "HcalSimHits_test.pathLength_": var * float32,
    "HcalSimHits_test.preStepX_": var * float32,
    "HcalSimHits_test.preStepY_": var * float32,
    "HcalSimHits_test.preStepZ_": var * float32,
    "HcalSimHits_test.preStepTime_": var * float32,
    "HcalSimHits_test.postStepX_": var * float32,
    "HcalSimHits_test.postStepY_": var * float32,
    "HcalSimHits_test.postStepZ_": var * float32,
    "HcalSimHits_test.postStepTime_": var * float32,
    "HcalSimHits_test.velocity_": var * float32,
    "EcalSimHits_test.id_": var * int32,
    "EcalSimHits_test.edep_": var * float32,
    "EcalSimHits_test.x_": var * float32,
    "EcalSimHits_test.y_": var * float32,
    "EcalSimHits_test.z_": var * float32,
    "EcalSimHits_test.time_": var * float32,
    "EcalSimHits_test.trackIDContribs_": var * var * int32,
    "EcalSimHits_test.incidentIDContribs_": var * var * int32,
    "EcalSimHits_test.pdgCodeContribs_": var * var * int32,
    "EcalSimHits_test.edepContribs_": var * var * float32,
    "EcalSimHits_test.timeContribs_": var * var * float32,
    "EcalSimHits_test.nContribs_": var * uint32,
    "EcalSimHits_test.pathLength_": var * float32,
    "EcalSimHits_test.preStepX_": var * float32,
    "EcalSimHits_test.preStepY_": var * float32,
    "EcalSimHits_test.preStepZ_": var * float32,
    "EcalSimHits_test.preStepTime_": var * float32,
    "EcalSimHits_test.postStepX_": var * float32,
    "EcalSimHits_test.postStepY_": var * float32,
    "EcalSimHits_test.postStepZ_": var * float32,
    "EcalSimHits_test.postStepTime_": var * float32,
    "EcalSimHits_test.velocity_": var * float32,
    "TargetSimHits_test.id_": var * int32,
    "TargetSimHits_test.edep_": var * float32,
    "TargetSimHits_test.x_": var * float32,
    "TargetSimHits_test.y_": var * float32,
    "TargetSimHits_test.z_": var * float32,
    "TargetSimHits_test.time_": var * float32,
    "TargetSimHits_test.trackIDContribs_": var * var * int32,
    "TargetSimHits_test.incidentIDContribs_": var * var * int32,
    "TargetSimHits_test.pdgCodeContribs_": var * var * int32,
    "TargetSimHits_test.edepContribs_": var * var * float32,
    "TargetSimHits_test.timeContribs_": var * var * float32,
    "TargetSimHits_test.nContribs_": var * uint32,
    "TargetSimHits_test.pathLength_": var * float32,
    "TargetSimHits_test.preStepX_": var * float32,
    "TargetSimHits_test.preStepY_": var * float32,
    "TargetSimHits_test.preStepZ_": var * float32,
    "TargetSimHits_test.preStepTime_": var * float32,
    "TargetSimHits_test.postStepX_": var * float32,
    "TargetSimHits_test.postStepY_": var * float32,
    "TargetSimHits_test.postStepZ_": var * float32,
    "TargetSimHits_test.postStepTime_": var * float32,
    "TargetSimHits_test.velocity_": var * float32,
    "TriggerPad1SimHits_test.id_": var * int32,
    "TriggerPad1SimHits_test.edep_": var * float32,
    "TriggerPad1SimHits_test.x_": var * float32,
    "TriggerPad1SimHits_test.y_": var * float32,
    "TriggerPad1SimHits_test.z_": var * float32,
    "TriggerPad1SimHits_test.time_": var * float32,
    "TriggerPad1SimHits_test.trackIDContribs_": var * var * int32,
    "TriggerPad1SimHits_test.incidentIDContribs_": var * var * int32,
    "TriggerPad1SimHits_test.pdgCodeContribs_": var * var * int32,
    "TriggerPad1SimHits_test.edepContribs_": var * var * float32,
    "TriggerPad1SimHits_test.timeContribs_": var * var * float32,
    "TriggerPad1SimHits_test.nContribs_": var * uint32,
    "TriggerPad1SimHits_test.pathLength_": var * float32,
    "TriggerPad1SimHits_test.preStepX_": var * float32,
    "TriggerPad1SimHits_test.preStepY_": var * float32,
    "TriggerPad1SimHits_test.preStepZ_": var * float32,
    "TriggerPad1SimHits_test.preStepTime_": var * float32,
    "TriggerPad1SimHits_test.postStepX_": var * float32,
    "TriggerPad1SimHits_test.postStepY_": var * float32,
    "TriggerPad1SimHits_test.postStepZ_": var * float32,
    "TriggerPad1SimHits_test.postStepTime_": var * float32,
    "TriggerPad1SimHits_test.velocity_": var * float32,
    "TriggerPad2SimHits_test.id_": var * int32,
    "TriggerPad2SimHits_test.edep_": var * float32,
    "TriggerPad2SimHits_test.x_": var * float32,
    "TriggerPad2SimHits_test.y_": var * float32,
    "TriggerPad2SimHits_test.z_": var * float32,
    "TriggerPad2SimHits_test.time_": var * float32,
    "TriggerPad2SimHits_test.trackIDContribs_": var * var * int32,
    "TriggerPad2SimHits_test.incidentIDContribs_": var * var * int32,
    "TriggerPad2SimHits_test.pdgCodeContribs_": var * var * int32,
    "TriggerPad2SimHits_test.edepContribs_": var * var * float32,
    "TriggerPad2SimHits_test.timeContribs_": var * var * float32,
    "TriggerPad2SimHits_test.nContribs_": var * uint32,
    "TriggerPad2SimHits_test.pathLength_": var * float32,
    "TriggerPad2SimHits_test.preStepX_": var * float32,
    "TriggerPad2SimHits_test.preStepY_": var * float32,
    "TriggerPad2SimHits_test.preStepZ_": var * float32,
    "TriggerPad2SimHits_test.preStepTime_": var * float32,
    "TriggerPad2SimHits_test.postStepX_": var * float32,
    "TriggerPad2SimHits_test.postStepY_": var * float32,
    "TriggerPad2SimHits_test.postStepZ_": var * float32,
    "TriggerPad2SimHits_test.postStepTime_": var * float32,
    "TriggerPad2SimHits_test.velocity_": var * float32,
    "TriggerPad3SimHits_test.id_": var * int32,
    "TriggerPad3SimHits_test.edep_": var * float32,
    "TriggerPad3SimHits_test.x_": var * float32,
    "TriggerPad3SimHits_test.y_": var * float32,
    "TriggerPad3SimHits_test.z_": var * float32,
    "TriggerPad3SimHits_test.time_": var * float32,
    "TriggerPad3SimHits_test.trackIDContribs_": var * var * int32,
    "TriggerPad3SimHits_test.incidentIDContribs_": var * var * int32,
    "TriggerPad3SimHits_test.pdgCodeContribs_": var * var * int32,
    "TriggerPad3SimHits_test.edepContribs_": var * var * float32,
    "TriggerPad3SimHits_test.timeContribs_": var * var * float32,
    "TriggerPad3SimHits_test.nContribs_": var * uint32,
    "TriggerPad3SimHits_test.pathLength_": var * float32,
    "TriggerPad3SimHits_test.preStepX_": var * float32,
    "TriggerPad3SimHits_test.preStepY_": var * float32,
    "TriggerPad3SimHits_test.preStepZ_": var * float32,
    "TriggerPad3SimHits_test.preStepTime_": var * float32,
    "TriggerPad3SimHits_test.postStepX_": var * float32,
    "TriggerPad3SimHits_test.postStepY_": var * float32,
    "TriggerPad3SimHits_test.postStepZ_": var * float32,
    "TriggerPad3SimHits_test.postStepTime_": var * float32,
    "TriggerPad3SimHits_test.velocity_": var * float32,
    "EcalScoringPlaneHits_test.id_": var * int32,
    "EcalScoringPlaneHits_test.layerID_": var * int32,
    "EcalScoringPlaneHits_test.moduleID_": var * int32,
    "EcalScoringPlaneHits_test.edep_": var * float32,
    "EcalScoringPlaneHits_test.time_": var * float32,
    "EcalScoringPlaneHits_test.px_": var * float32,
    "EcalScoringPlaneHits_test.py_": var * float32,
    "EcalScoringPlaneHits_test.pz_": var * float32,
    "EcalScoringPlaneHits_test.energy_": var * float32,
    "EcalScoringPlaneHits_test.x_": var * float32,
    "EcalScoringPlaneHits_test.y_": var * float32,
    "EcalScoringPlaneHits_test.z_": var * float32,
    "EcalScoringPlaneHits_test.pathLength_": var * float32,
    "EcalScoringPlaneHits_test.trackID_": var * int32,
    "EcalScoringPlaneHits_test.pdgID_": var * int32,
    "HcalScoringPlaneHits_test.id_": var * int32,
    "HcalScoringPlaneHits_test.layerID_": var * int32,
    "HcalScoringPlaneHits_test.moduleID_": var * int32,
    "HcalScoringPlaneHits_test.edep_": var * float32,
    "HcalScoringPlaneHits_test.time_": var * float32,
    "HcalScoringPlaneHits_test.px_": var * float32,
    "HcalScoringPlaneHits_test.py_": var * float32,
    "HcalScoringPlaneHits_test.pz_": var * float32,
    "HcalScoringPlaneHits_test.energy_": var * float32,
    "HcalScoringPlaneHits_test.x_": var * float32,
    "HcalScoringPlaneHits_test.y_": var * float32,
    "HcalScoringPlaneHits_test.z_": var * float32,
    "HcalScoringPlaneHits_test.pathLength_": var * float32,
    "HcalScoringPlaneHits_test.trackID_": var * int32,
    "HcalScoringPlaneHits_test.pdgID_": var * int32,
    "TargetScoringPlaneHits_test.id_": var * int32,
    "TargetScoringPlaneHits_test.layerID_": var * int32,
    "TargetScoringPlaneHits_test.moduleID_": var * int32,
    "TargetScoringPlaneHits_test.edep_": var * float32,
    "TargetScoringPlaneHits_test.time_": var * float32,
    "TargetScoringPlaneHits_test.px_": var * float32,
    "TargetScoringPlaneHits_test.py_": var * float32,
    "TargetScoringPlaneHits_test.pz_": var * float32,
    "TargetScoringPlaneHits_test.energy_": var * float32,
    "TargetScoringPlaneHits_test.x_": var * float32,
    "TargetScoringPlaneHits_test.y_": var * float32,
    "TargetScoringPlaneHits_test.z_": var * float32,
    "TargetScoringPlaneHits_test.pathLength_": var * float32,
    "TargetScoringPlaneHits_test.trackID_": var * int32,
    "TargetScoringPlaneHits_test.pdgID_": var * int32,
    "TrigScintScoringPlaneHits_test.id_": var * int32,
    "TrigScintScoringPlaneHits_test.layerID_": var * int32,
    "TrigScintScoringPlaneHits_test.moduleID_": var * int32,
    "TrigScintScoringPlaneHits_test.edep_": var * float32,
    "TrigScintScoringPlaneHits_test.time_": var * float32,
    "TrigScintScoringPlaneHits_test.px_": var * float32,
    "TrigScintScoringPlaneHits_test.py_": var * float32,
    "TrigScintScoringPlaneHits_test.pz_": var * float32,
    "TrigScintScoringPlaneHits_test.energy_": var * float32,
    "TrigScintScoringPlaneHits_test.x_": var * float32,
    "TrigScintScoringPlaneHits_test.y_": var * float32,
    "TrigScintScoringPlaneHits_test.z_": var * float32,
    "TrigScintScoringPlaneHits_test.pathLength_": var * float32,
    "TrigScintScoringPlaneHits_test.trackID_": var * int32,
    "TrigScintScoringPlaneHits_test.pdgID_": var * int32,
    "TrackerScoringPlaneHits_test.id_": var * int32,
    "TrackerScoringPlaneHits_test.layerID_": var * int32,
    "TrackerScoringPlaneHits_test.moduleID_": var * int32,
    "TrackerScoringPlaneHits_test.edep_": var * float32,
    "TrackerScoringPlaneHits_test.time_": var * float32,
    "TrackerScoringPlaneHits_test.px_": var * float32,
    "TrackerScoringPlaneHits_test.py_": var * float32,
    "TrackerScoringPlaneHits_test.pz_": var * float32,
    "TrackerScoringPlaneHits_test.energy_": var * float32,
    "TrackerScoringPlaneHits_test.x_": var * float32,
    "TrackerScoringPlaneHits_test.y_": var * float32,
    "TrackerScoringPlaneHits_test.z_": var * float32,
    "TrackerScoringPlaneHits_test.pathLength_": var * float32,
    "TrackerScoringPlaneHits_test.trackID_": var * int32,
    "TrackerScoringPlaneHits_test.pdgID_": var * int32,
    "MagnetScoringPlaneHits_test.id_": var * int32,
    "MagnetScoringPlaneHits_test.layerID_": var * int32,
    "MagnetScoringPlaneHits_test.moduleID_": var * int32,
    "MagnetScoringPlaneHits_test.edep_": var * float32,
    "MagnetScoringPlaneHits_test.time_": var * float32,
    "MagnetScoringPlaneHits_test.px_": var * float32,
    "MagnetScoringPlaneHits_test.py_": var * float32,
    "MagnetScoringPlaneHits_test.pz_": var * float32,
    "MagnetScoringPlaneHits_test.energy_": var * float32,
    "MagnetScoringPlaneHits_test.x_": var * float32,
    "MagnetScoringPlaneHits_test.y_": var * float32,
    "MagnetScoringPlaneHits_test.z_": var * float32,
    "MagnetScoringPlaneHits_test.pathLength_": var * float32,
    "MagnetScoringPlaneHits_test.trackID_": var * int32,
    "MagnetScoringPlaneHits_test.pdgID_": var * int32,
    channelIDs_: var * uint32,
    samples_: var * uint32,
    numSamplesPerDigi_: uint32,
    sampleOfInterest_: uint32,
    version_: int32,
    "EcalRecHits_test.id_": var * int32,
    "EcalRecHits_test.amplitude_": var * float32,
    "EcalRecHits_test.energy_": var * float32,
    "EcalRecHits_test.time_": var * float32,
    "EcalRecHits_test.xpos_": var * float32,
    "EcalRecHits_test.ypos_": var * float32,
    "EcalRecHits_test.zpos_": var * float32,
    "EcalRecHits_test.isNoise_": var * bool,
    eventNumber_: int32,
    run_: int32,
    "timestamp_.fSec": int32,
    "timestamp_.fNanoSec": int32,
    weight_: float64,
    isRealData_: bool,
    "intParameters_.first": var * string,
    "intParameters_.second": var * int32,
    "floatParameters_.first": var * string,
    "floatParameters_.second": var * float32,
    "stringParameters_.first": var * string,
    "stringParameters_.second": var * string
}

Oh wow, that's a lot of branches! While this printout can be overwhelming, I'd encourage you to look through it. Each branch is named and contains the number in each event (var is used if the number varies) and the type of object (string, float32, bool to name a few). This printout is very helpful for familiarizing yourself with an awkward array and its structure so don't be shy to look at it and scroll through it!

Manipulate Data

This is probably the most complicated part not least because I'm not sure what analysis you actually want to do. As an example, and to show the power of "vectorization", I am going to calculate the total energy reconstructed within the ECal. These lines are often short but incredibly dense. Do not be afraid to try a few out and see the results, changing the input axis, as well as other branches of events. While it can be difficult to find the right expression to do the manipulation you wish, trying different expressions is an incredibly quick procedure and one that I would enourage you to do. Instead of assigning the result to a variable, you can even just print it directly to the console like events above so you can see the resulting shape and a few example values.

awkward's website is growing with documentation and has a detailed reference of the available commands. Nevertheless, you can find more help on numpy's website whose syntax and vocabulary is a main inspiration for awkward. Specifically, I would guide folks to What is NumPy and the NumPy Beginners Guide to be given helpful definitions of the relevant vocabulary ("vectorization", "array", "axis", "attribute" to name a few).


%%time
# add up the energy_ along axis 1, axis 0 is the "outermost" axis which is indexed by event
total_ecal_rec_energy = ak.sum(events['EcalRecHits_test.energy_'], axis=1)


CPU times: user 5.99 ms, sys: 2 ms, total: 7.99 ms
Wall time: 7.34 ms

Fill Histogram

Filling a histogram is where we condense a variable calculated for many events into an object that can be viewed and interpreted. While numpy and matplotlib have histogram filling and plotting that work well. The hist package gives a more flexible definition of histogram, allowing you to define histograms with more axes, do calculations with histograms (e.g. sum over bins, rebin, slicing, projection), and eventually plot them. We won't go into multi-dimensional (more than one axis) histograms or histogram calcuations here, but the links below give a helpful overview of available options.

hist
hist indexing (the stuff you can put between square brackets)


%%time
# construct the histogram defining a Regular (Reg) binning
# I found this binning after trying a few options
tere_h = hist.Hist.new.Reg(160,0,16,label='Total Reconstructed Energy in ECal [GeV]').Double()
# fill the histogram with the passed array
# I'm dividing by 1000 here to convert the MeV to GeV
tere_h.fill(total_ecal_rec_energy/1000.)


CPU times: user 1.18 ms, sys: 44 µs, total: 1.23 ms
Wall time: 1.26 ms

Regular(160, 0, 16, label='Total Reconstructed Energy in ECal [GeV]')

Double() Σ=10000.0

Plot Histogram

The final step is converting our histogram into an image that is helpful for us humans! hist connects to the standard matplotlib through a styling package called mplhep. Both of which have documentation sites that are helpful to browse so you can gain an idea of the available functions.

matplotlib is used extensively inside and outside HEP, I often find an immediate answer with a simple internet search (e.g. searching plt change aspect ratio tells you how to change the aspect ratio of your plot).
mplhep is a wrapper around matplotlib for HEP to do common things like plot histograms.


tere_h.plot()
plt.yscale('log')
plt.ylabel('Events')
plt.show()

png

Round-and-Round We Go

Now, in almost all of my analyses, one plot just immediately opens the door to new questions. For example, I notice that this total energy plot is rather broad, I want to see why that is. One idea is that it is simply a function of how many hits we observed in the ECal. There are many ways to test this hypothesis, but I'm going to do a specific method that showcases a specific tool that is extremely useful: boolean slicing (awkward's implementation is inspired by numpy's boolean slicing but it also interacts with the idea of broadcasting between arrays).

I also showcase a helpful feature of hist where if the first axis is a StrCategory, then the resulting plot will just overlay the plots along the other axis.

You'll notice that, while I'm manipulating the data, filling a histogram, and plotting the histogram, I do not need to re-load the data. This is by design. Jupyter Lab (and the notebooks it interacts with) use an interactive Python "kernel" to hold variables in memory while you are using them. This is helpful to avoid the time-heavy job of loading data into memory, but it can be an area of confusion. Be careful to not re-use variable names and create new notebooks often!


nhits_by_tere_h = (
    hist.Hist.new
    .StrCategory(['Below Beam Energy','Above Beam Energy'])
    .Reg(100,0,200,label='Number of Hits in ECal')
    .Double()
)
n_ecal_hits = ak.count(events['EcalRecHits_test.id_'], axis=1)
nhits_by_tere_h.fill(
    'Below Beam Energy',
    n_ecal_hits[total_ecal_rec_energy < 8000.]
)
nhits_by_tere_h.fill(
    'Above Beam Energy',
    n_ecal_hits[total_ecal_rec_energy > 8000.]
)
nhits_by_tere_h.plot()
plt.yscale('log')
plt.ylabel('Events')
plt.legend()
plt.show()

png