Using Python (Efficiently)
Python is slow. I can't deny it. But Python, for better or for worse, has taken over the scientific programming ecosystem meaning there are a large set of packages that are open source, professionally maintained, and perform the common tasks that scientists want to do.
One aspect of Python that has enabled its widespread adoption
(I think) is the relative ease of writing packages in another
programming language. This means that most of our analysis
code will not be using "pure Python", but instead be using
compiled languages (C++ mostly) hidden behind some convenience
wrappers. This is the strategy of many popular Python packages
numpy
, scipy
, and the HEP-specific hist
and awkward
.
My biggest tip for any new Python analyzers is to avoid writing
the for
loop. As mentioned, Python itself is slow, so only
write for
if you know you are only looping over a small number
of things (e.g. looping over the five different plots you want
to make). We can avoid for
by using these packages with compiled
languages under-the-hood. The syntax of writing these "vectorized"
operations is often complicated to grasp, but it is concise, precise,
and performant.
With those introductory ramblings out of the way, let's get started.
Set Up
The packages we use (and the python ecosystem in general) evolves relatively fast. For this reason, it will be helpful to use a newer python version such that you can access the newer versions of the packages. Personally, I use denv to pick new python without needing to compile it and then have my installed packages for that python be isolated to within my working directory.
While you can install the packages I'll be using directly, they (and a few other helpful
ones) are availabe withing the scikit-hep
metapackage allowing a quick and simple
installation command. In addition, Jupyter Lab offers an excellent way to see
plots while your constructing them and recieve other feedback from your code in an
immediate way (via a REPL
for those interested).
cd work-area
denv init python:3.11
denv pip install scikit-hep jupyterlab
In addition, you will need some data to play with. I generated the input events.root
file using an ldmx-sw simulation, but I am using common event objects that exist in
most (if not all) event files, so this example should work with others.
Generate Event File with ldmx-sw
Again, I used denv
to pin the ldmx-sw version and generate the data file.
denv init ldmx/pro:v3.3.6 eg-gen
cd eg-gen
denv fire config.py
mv events.root path/to/work-area/
where config.py
is the following python configuration script.
from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('test')
from LDMX.SimCore import simulator as sim
mySim = sim.simulator( "mySim" )
mySim.setDetector( 'ldmx-det-v14-8gev', True )
from LDMX.SimCore import generators as gen
mySim.generators = [ gen.single_8gev_e_upstream_tagger() ]
mySim.beamSpotSmear = [20.,80.,0.]
mySim.description = 'Basic test Simulation'
p.sequence = [ mySim ]
p.run = 1
p.maxEvents = 10_000
p.outputFiles = [ 'events.root' ]
import LDMX.Ecal.EcalGeometry
import LDMX.Ecal.ecal_hardcoded_conditions
import LDMX.Hcal.HcalGeometry
import LDMX.Ecal.digi as ecal_digi
p.sequence = [
mySim,
ecal_digi.EcalDigiProducer(),
ecal_digi.EcalRecProducer()
]
Launch a jupyter lab with
denv jupyter lab
open the link in your browser and create a new notebook to get started.
Notebooks consist of "cells" which
can contain "markdown" (a way to write write formatted text), code that will be executed, and raw text.
The rest of these sections are the the result of exporting a jupyter notebook (a *.ipynb
file)
into "markdown" and then including it in this documentation here.
# load the python modules we will use
import uproot # for data loading
import awkward as ak # for data manipulation
import hist # for histogram filling (and some plotting)
import matplotlib as mpl # for plotting
import matplotlib.pyplot as plt # common shorthand
import mplhep # style of plots
%matplotlib inline
mpl.style.use(mplhep.style.ROOT) # set the plot style
Load Data
The first step to any analysis is loading the data.
For this step in this analysis workflow, we are going to use the uproot
package
to load the data into awkward
arrays in memory.
Some helpful links to find more details on this
%%time
# while the uproot file is open, conver the 'LDMX_Events' tree into in-memory arrays
with uproot.open('events.root') as f:
events = f['LDMX_Events'].arrays()
# show the events array
events
CPU times: user 3.63 s, sys: 220 ms, total: 3.85 s
Wall time: 3.85 s
Printout of `events` array
[{'SimParticles_test.first': [1, 2, 3, ..., 48, 49], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 61, 62], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 11, 12], ...}, {'SimParticles_test.first': [1, 2, ..., 15, 1199], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 8, 9], ...}, {'SimParticles_test.first': [1, 2, ..., 118, 119], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 10, 11], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 10, 11], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 7, 61], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 10, 11], ...}, ..., {'SimParticles_test.first': [1, 2, 3, ..., 9, 10], ...}, {'SimParticles_test.first': [1, 2, ..., 126, 127], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 9, 10], ...}, {'SimParticles_test.first': [1, 2, ..., 85, 4702], ...}, {'SimParticles_test.first': [1, 2, ..., 72, 6976], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 8, 93], ...}, {'SimParticles_test.first': [1, 2, ..., 7120], ...}, {'SimParticles_test.first': [1, 2, ..., 7921], ...}, {'SimParticles_test.first': [1, 2, 3, ..., 8, 9], ...}] --------------------------------------------------------------------- type: 10000 * { "SimParticles_test.first": var * int32, "SimParticles_test.second.energy_": var * float64, "SimParticles_test.second.pdgID_": var * int32, "SimParticles_test.second.genStatus_": var * int32, "SimParticles_test.second.time_": var * float64, "SimParticles_test.second.x_": var * float64, "SimParticles_test.second.y_": var * float64, "SimParticles_test.second.z_": var * float64, "SimParticles_test.second.endX_": var * float64, "SimParticles_test.second.endY_": var * float64, "SimParticles_test.second.endZ_": var * float64, "SimParticles_test.second.px_": var * float64, "SimParticles_test.second.py_": var * float64, "SimParticles_test.second.pz_": var * float64, "SimParticles_test.second.endpx_": var * float64, "SimParticles_test.second.endpy_": var * float64, "SimParticles_test.second.endpz_": var * float64, "SimParticles_test.second.mass_": var * float64, "SimParticles_test.second.charge_": var * float64, "SimParticles_test.second.daughters_": var * var * int32, "SimParticles_test.second.parents_": var * var * int32, "SimParticles_test.second.processType_": var * int32, "SimParticles_test.second.vertexVolume_": var * string, "TaggerSimHits_test.id_": var * int32, "TaggerSimHits_test.layerID_": var * int32, "TaggerSimHits_test.moduleID_": var * int32, "TaggerSimHits_test.edep_": var * float32, "TaggerSimHits_test.time_": var * float32, "TaggerSimHits_test.px_": var * float32, "TaggerSimHits_test.py_": var * float32, "TaggerSimHits_test.pz_": var * float32, "TaggerSimHits_test.energy_": var * float32, "TaggerSimHits_test.x_": var * float32, "TaggerSimHits_test.y_": var * float32, "TaggerSimHits_test.z_": var * float32, "TaggerSimHits_test.pathLength_": var * float32, "TaggerSimHits_test.trackID_": var * int32, "TaggerSimHits_test.pdgID_": var * int32, "RecoilSimHits_test.id_": var * int32, "RecoilSimHits_test.layerID_": var * int32, "RecoilSimHits_test.moduleID_": var * int32, "RecoilSimHits_test.edep_": var * float32, "RecoilSimHits_test.time_": var * float32, "RecoilSimHits_test.px_": var * float32, "RecoilSimHits_test.py_": var * float32, "RecoilSimHits_test.pz_": var * float32, "RecoilSimHits_test.energy_": var * float32, "RecoilSimHits_test.x_": var * float32, "RecoilSimHits_test.y_": var * float32, "RecoilSimHits_test.z_": var * float32, "RecoilSimHits_test.pathLength_": var * float32, "RecoilSimHits_test.trackID_": var * int32, "RecoilSimHits_test.pdgID_": var * int32, "HcalSimHits_test.id_": var * int32, "HcalSimHits_test.edep_": var * float32, "HcalSimHits_test.x_": var * float32, "HcalSimHits_test.y_": var * float32, "HcalSimHits_test.z_": var * float32, "HcalSimHits_test.time_": var * float32, "HcalSimHits_test.trackIDContribs_": var * var * int32, "HcalSimHits_test.incidentIDContribs_": var * var * int32, "HcalSimHits_test.pdgCodeContribs_": var * var * int32, "HcalSimHits_test.edepContribs_": var * var * float32, "HcalSimHits_test.timeContribs_": var * var * float32, "HcalSimHits_test.nContribs_": var * uint32, "HcalSimHits_test.pathLength_": var * float32, "HcalSimHits_test.preStepX_": var * float32, "HcalSimHits_test.preStepY_": var * float32, "HcalSimHits_test.preStepZ_": var * float32, "HcalSimHits_test.preStepTime_": var * float32, "HcalSimHits_test.postStepX_": var * float32, "HcalSimHits_test.postStepY_": var * float32, "HcalSimHits_test.postStepZ_": var * float32, "HcalSimHits_test.postStepTime_": var * float32, "HcalSimHits_test.velocity_": var * float32, "EcalSimHits_test.id_": var * int32, "EcalSimHits_test.edep_": var * float32, "EcalSimHits_test.x_": var * float32, "EcalSimHits_test.y_": var * float32, "EcalSimHits_test.z_": var * float32, "EcalSimHits_test.time_": var * float32, "EcalSimHits_test.trackIDContribs_": var * var * int32, "EcalSimHits_test.incidentIDContribs_": var * var * int32, "EcalSimHits_test.pdgCodeContribs_": var * var * int32, "EcalSimHits_test.edepContribs_": var * var * float32, "EcalSimHits_test.timeContribs_": var * var * float32, "EcalSimHits_test.nContribs_": var * uint32, "EcalSimHits_test.pathLength_": var * float32, "EcalSimHits_test.preStepX_": var * float32, "EcalSimHits_test.preStepY_": var * float32, "EcalSimHits_test.preStepZ_": var * float32, "EcalSimHits_test.preStepTime_": var * float32, "EcalSimHits_test.postStepX_": var * float32, "EcalSimHits_test.postStepY_": var * float32, "EcalSimHits_test.postStepZ_": var * float32, "EcalSimHits_test.postStepTime_": var * float32, "EcalSimHits_test.velocity_": var * float32, "TargetSimHits_test.id_": var * int32, "TargetSimHits_test.edep_": var * float32, "TargetSimHits_test.x_": var * float32, "TargetSimHits_test.y_": var * float32, "TargetSimHits_test.z_": var * float32, "TargetSimHits_test.time_": var * float32, "TargetSimHits_test.trackIDContribs_": var * var * int32, "TargetSimHits_test.incidentIDContribs_": var * var * int32, "TargetSimHits_test.pdgCodeContribs_": var * var * int32, "TargetSimHits_test.edepContribs_": var * var * float32, "TargetSimHits_test.timeContribs_": var * var * float32, "TargetSimHits_test.nContribs_": var * uint32, "TargetSimHits_test.pathLength_": var * float32, "TargetSimHits_test.preStepX_": var * float32, "TargetSimHits_test.preStepY_": var * float32, "TargetSimHits_test.preStepZ_": var * float32, "TargetSimHits_test.preStepTime_": var * float32, "TargetSimHits_test.postStepX_": var * float32, "TargetSimHits_test.postStepY_": var * float32, "TargetSimHits_test.postStepZ_": var * float32, "TargetSimHits_test.postStepTime_": var * float32, "TargetSimHits_test.velocity_": var * float32, "TriggerPad1SimHits_test.id_": var * int32, "TriggerPad1SimHits_test.edep_": var * float32, "TriggerPad1SimHits_test.x_": var * float32, "TriggerPad1SimHits_test.y_": var * float32, "TriggerPad1SimHits_test.z_": var * float32, "TriggerPad1SimHits_test.time_": var * float32, "TriggerPad1SimHits_test.trackIDContribs_": var * var * int32, "TriggerPad1SimHits_test.incidentIDContribs_": var * var * int32, "TriggerPad1SimHits_test.pdgCodeContribs_": var * var * int32, "TriggerPad1SimHits_test.edepContribs_": var * var * float32, "TriggerPad1SimHits_test.timeContribs_": var * var * float32, "TriggerPad1SimHits_test.nContribs_": var * uint32, "TriggerPad1SimHits_test.pathLength_": var * float32, "TriggerPad1SimHits_test.preStepX_": var * float32, "TriggerPad1SimHits_test.preStepY_": var * float32, "TriggerPad1SimHits_test.preStepZ_": var * float32, "TriggerPad1SimHits_test.preStepTime_": var * float32, "TriggerPad1SimHits_test.postStepX_": var * float32, "TriggerPad1SimHits_test.postStepY_": var * float32, "TriggerPad1SimHits_test.postStepZ_": var * float32, "TriggerPad1SimHits_test.postStepTime_": var * float32, "TriggerPad1SimHits_test.velocity_": var * float32, "TriggerPad2SimHits_test.id_": var * int32, "TriggerPad2SimHits_test.edep_": var * float32, "TriggerPad2SimHits_test.x_": var * float32, "TriggerPad2SimHits_test.y_": var * float32, "TriggerPad2SimHits_test.z_": var * float32, "TriggerPad2SimHits_test.time_": var * float32, "TriggerPad2SimHits_test.trackIDContribs_": var * var * int32, "TriggerPad2SimHits_test.incidentIDContribs_": var * var * int32, "TriggerPad2SimHits_test.pdgCodeContribs_": var * var * int32, "TriggerPad2SimHits_test.edepContribs_": var * var * float32, "TriggerPad2SimHits_test.timeContribs_": var * var * float32, "TriggerPad2SimHits_test.nContribs_": var * uint32, "TriggerPad2SimHits_test.pathLength_": var * float32, "TriggerPad2SimHits_test.preStepX_": var * float32, "TriggerPad2SimHits_test.preStepY_": var * float32, "TriggerPad2SimHits_test.preStepZ_": var * float32, "TriggerPad2SimHits_test.preStepTime_": var * float32, "TriggerPad2SimHits_test.postStepX_": var * float32, "TriggerPad2SimHits_test.postStepY_": var * float32, "TriggerPad2SimHits_test.postStepZ_": var * float32, "TriggerPad2SimHits_test.postStepTime_": var * float32, "TriggerPad2SimHits_test.velocity_": var * float32, "TriggerPad3SimHits_test.id_": var * int32, "TriggerPad3SimHits_test.edep_": var * float32, "TriggerPad3SimHits_test.x_": var * float32, "TriggerPad3SimHits_test.y_": var * float32, "TriggerPad3SimHits_test.z_": var * float32, "TriggerPad3SimHits_test.time_": var * float32, "TriggerPad3SimHits_test.trackIDContribs_": var * var * int32, "TriggerPad3SimHits_test.incidentIDContribs_": var * var * int32, "TriggerPad3SimHits_test.pdgCodeContribs_": var * var * int32, "TriggerPad3SimHits_test.edepContribs_": var * var * float32, "TriggerPad3SimHits_test.timeContribs_": var * var * float32, "TriggerPad3SimHits_test.nContribs_": var * uint32, "TriggerPad3SimHits_test.pathLength_": var * float32, "TriggerPad3SimHits_test.preStepX_": var * float32, "TriggerPad3SimHits_test.preStepY_": var * float32, "TriggerPad3SimHits_test.preStepZ_": var * float32, "TriggerPad3SimHits_test.preStepTime_": var * float32, "TriggerPad3SimHits_test.postStepX_": var * float32, "TriggerPad3SimHits_test.postStepY_": var * float32, "TriggerPad3SimHits_test.postStepZ_": var * float32, "TriggerPad3SimHits_test.postStepTime_": var * float32, "TriggerPad3SimHits_test.velocity_": var * float32, "EcalScoringPlaneHits_test.id_": var * int32, "EcalScoringPlaneHits_test.layerID_": var * int32, "EcalScoringPlaneHits_test.moduleID_": var * int32, "EcalScoringPlaneHits_test.edep_": var * float32, "EcalScoringPlaneHits_test.time_": var * float32, "EcalScoringPlaneHits_test.px_": var * float32, "EcalScoringPlaneHits_test.py_": var * float32, "EcalScoringPlaneHits_test.pz_": var * float32, "EcalScoringPlaneHits_test.energy_": var * float32, "EcalScoringPlaneHits_test.x_": var * float32, "EcalScoringPlaneHits_test.y_": var * float32, "EcalScoringPlaneHits_test.z_": var * float32, "EcalScoringPlaneHits_test.pathLength_": var * float32, "EcalScoringPlaneHits_test.trackID_": var * int32, "EcalScoringPlaneHits_test.pdgID_": var * int32, "HcalScoringPlaneHits_test.id_": var * int32, "HcalScoringPlaneHits_test.layerID_": var * int32, "HcalScoringPlaneHits_test.moduleID_": var * int32, "HcalScoringPlaneHits_test.edep_": var * float32, "HcalScoringPlaneHits_test.time_": var * float32, "HcalScoringPlaneHits_test.px_": var * float32, "HcalScoringPlaneHits_test.py_": var * float32, "HcalScoringPlaneHits_test.pz_": var * float32, "HcalScoringPlaneHits_test.energy_": var * float32, "HcalScoringPlaneHits_test.x_": var * float32, "HcalScoringPlaneHits_test.y_": var * float32, "HcalScoringPlaneHits_test.z_": var * float32, "HcalScoringPlaneHits_test.pathLength_": var * float32, "HcalScoringPlaneHits_test.trackID_": var * int32, "HcalScoringPlaneHits_test.pdgID_": var * int32, "TargetScoringPlaneHits_test.id_": var * int32, "TargetScoringPlaneHits_test.layerID_": var * int32, "TargetScoringPlaneHits_test.moduleID_": var * int32, "TargetScoringPlaneHits_test.edep_": var * float32, "TargetScoringPlaneHits_test.time_": var * float32, "TargetScoringPlaneHits_test.px_": var * float32, "TargetScoringPlaneHits_test.py_": var * float32, "TargetScoringPlaneHits_test.pz_": var * float32, "TargetScoringPlaneHits_test.energy_": var * float32, "TargetScoringPlaneHits_test.x_": var * float32, "TargetScoringPlaneHits_test.y_": var * float32, "TargetScoringPlaneHits_test.z_": var * float32, "TargetScoringPlaneHits_test.pathLength_": var * float32, "TargetScoringPlaneHits_test.trackID_": var * int32, "TargetScoringPlaneHits_test.pdgID_": var * int32, "TrigScintScoringPlaneHits_test.id_": var * int32, "TrigScintScoringPlaneHits_test.layerID_": var * int32, "TrigScintScoringPlaneHits_test.moduleID_": var * int32, "TrigScintScoringPlaneHits_test.edep_": var * float32, "TrigScintScoringPlaneHits_test.time_": var * float32, "TrigScintScoringPlaneHits_test.px_": var * float32, "TrigScintScoringPlaneHits_test.py_": var * float32, "TrigScintScoringPlaneHits_test.pz_": var * float32, "TrigScintScoringPlaneHits_test.energy_": var * float32, "TrigScintScoringPlaneHits_test.x_": var * float32, "TrigScintScoringPlaneHits_test.y_": var * float32, "TrigScintScoringPlaneHits_test.z_": var * float32, "TrigScintScoringPlaneHits_test.pathLength_": var * float32, "TrigScintScoringPlaneHits_test.trackID_": var * int32, "TrigScintScoringPlaneHits_test.pdgID_": var * int32, "TrackerScoringPlaneHits_test.id_": var * int32, "TrackerScoringPlaneHits_test.layerID_": var * int32, "TrackerScoringPlaneHits_test.moduleID_": var * int32, "TrackerScoringPlaneHits_test.edep_": var * float32, "TrackerScoringPlaneHits_test.time_": var * float32, "TrackerScoringPlaneHits_test.px_": var * float32, "TrackerScoringPlaneHits_test.py_": var * float32, "TrackerScoringPlaneHits_test.pz_": var * float32, "TrackerScoringPlaneHits_test.energy_": var * float32, "TrackerScoringPlaneHits_test.x_": var * float32, "TrackerScoringPlaneHits_test.y_": var * float32, "TrackerScoringPlaneHits_test.z_": var * float32, "TrackerScoringPlaneHits_test.pathLength_": var * float32, "TrackerScoringPlaneHits_test.trackID_": var * int32, "TrackerScoringPlaneHits_test.pdgID_": var * int32, "MagnetScoringPlaneHits_test.id_": var * int32, "MagnetScoringPlaneHits_test.layerID_": var * int32, "MagnetScoringPlaneHits_test.moduleID_": var * int32, "MagnetScoringPlaneHits_test.edep_": var * float32, "MagnetScoringPlaneHits_test.time_": var * float32, "MagnetScoringPlaneHits_test.px_": var * float32, "MagnetScoringPlaneHits_test.py_": var * float32, "MagnetScoringPlaneHits_test.pz_": var * float32, "MagnetScoringPlaneHits_test.energy_": var * float32, "MagnetScoringPlaneHits_test.x_": var * float32, "MagnetScoringPlaneHits_test.y_": var * float32, "MagnetScoringPlaneHits_test.z_": var * float32, "MagnetScoringPlaneHits_test.pathLength_": var * float32, "MagnetScoringPlaneHits_test.trackID_": var * int32, "MagnetScoringPlaneHits_test.pdgID_": var * int32, channelIDs_: var * uint32, samples_: var * uint32, numSamplesPerDigi_: uint32, sampleOfInterest_: uint32, version_: int32, "EcalRecHits_test.id_": var * int32, "EcalRecHits_test.amplitude_": var * float32, "EcalRecHits_test.energy_": var * float32, "EcalRecHits_test.time_": var * float32, "EcalRecHits_test.xpos_": var * float32, "EcalRecHits_test.ypos_": var * float32, "EcalRecHits_test.zpos_": var * float32, "EcalRecHits_test.isNoise_": var * bool, eventNumber_: int32, run_: int32, "timestamp_.fSec": int32, "timestamp_.fNanoSec": int32, weight_: float64, isRealData_: bool, "intParameters_.first": var * string, "intParameters_.second": var * int32, "floatParameters_.first": var * string, "floatParameters_.second": var * float32, "stringParameters_.first": var * string, "stringParameters_.second": var * string }
Oh wow, that's a lot of branches! While this printout can be overwhelming, I'd encourage you
to look through it. Each branch is named and contains the number in each event
(var
is used if the number varies) and the type of object (string
, float32
, bool
to name
a few). This printout is very helpful for familiarizing yourself with an awkward
array and its structure
so don't be shy to look at it and scroll through it!
Manipulate Data
This is probably the most complicated part not least because I'm not sure what analysis you actually want to do.
As an example, and to show the power of "vectorization", I am going to calculate the total energy reconstructed within the ECal.
These lines are often short but incredibly dense. Do not be afraid to try a few out and see the results, changing the input axis
,
as well as other branches of events
. While it can be difficult to find the right expression to do the manipulation you wish,
trying different expressions is an incredibly quick procedure and one that I would enourage you to do. Instead of assigning
the result to a variable, you can even just print it directly to the console like events
above so you can see the resulting
shape and a few example values.
awkward's website is growing with documentation and has a detailed reference of the available commands.
Nevertheless, you can find more help on numpy's website whose syntax and vocabulary is a main inspiration for awkward
.
Specifically, I would guide folks to What is NumPy and the
NumPy Beginners Guide to be given helpful definitions of the relevant vocabulary
("vectorization", "array", "axis", "attribute" to name a few).
%%time
# add up the energy_ along axis 1, axis 0 is the "outermost" axis which is indexed by event
total_ecal_rec_energy = ak.sum(events['EcalRecHits_test.energy_'], axis=1)
CPU times: user 5.99 ms, sys: 2 ms, total: 7.99 ms
Wall time: 7.34 ms
Fill Histogram
Filling a histogram is where we condense a variable calculated for many events into an object that can be viewed and interpreted.
While numpy
and matplotlib
have histogram filling
and plotting that work well. The hist
package gives
a more flexible definition of histogram, allowing you to define histograms with more axes, do calculations with histograms
(e.g. sum over bins, rebin, slicing, projection), and eventually plot them.
We won't go into multi-dimensional (more than one axis) histograms or histogram calcuations here,
but the links below give a helpful overview of available options.
- hist
- hist indexing (the stuff you can put between square brackets)
%%time
# construct the histogram defining a Regular (Reg) binning
# I found this binning after trying a few options
tere_h = hist.Hist.new.Reg(160,0,16,label='Total Reconstructed Energy in ECal [GeV]').Double()
# fill the histogram with the passed array
# I'm dividing by 1000 here to convert the MeV to GeV
tere_h.fill(total_ecal_rec_energy/1000.)
CPU times: user 1.18 ms, sys: 44 µs, total: 1.23 ms
Wall time: 1.26 ms
Double() Σ=10000.0
Plot Histogram
The final step is converting our histogram into an image that is helpful for us humans! hist
connects to the standard matplotlib
through a styling package called mplhep
. Both of which have documentation sites that are helpful to browse so you can gain an idea of the available functions.
- matplotlib is used extensively inside and outside HEP, I often find an immediate answer with a simple internet search (e.g. searching
plt change aspect ratio
tells you how to change the aspect ratio of your plot). - mplhep is a wrapper around matplotlib for HEP to do common things like plot histograms.
tere_h.plot()
plt.yscale('log')
plt.ylabel('Events')
plt.show()
Round-and-Round We Go
Now, in almost all of my analyses, one plot just immediately opens the door to new questions. For example, I notice that this total energy plot is rather broad, I want to see why that is. One idea is that it is simply a function of how many hits we observed in the ECal. There are many ways to test this hypothesis, but I'm going to do a specific method that showcases a specific tool that is extremely useful: boolean slicing (awkward's implementation is inspired by numpy's boolean slicing but it also interacts with the idea of broadcasting between arrays).
I also showcase a helpful feature of hist
where if the first axis is a StrCategory
, then
the resulting plot
will just overlay the plots along the other axis.
You'll notice that, while I'm manipulating the data, filling a histogram, and plotting the histogram, I do not need to re-load the data. This is by design. Jupyter Lab (and the notebooks it interacts with) use an interactive Python "kernel" to hold variables in memory while you are using them. This is helpful to avoid the time-heavy job of loading data into memory, but it can be an area of confusion. Be careful to not re-use variable names and create new notebooks often!
nhits_by_tere_h = (
hist.Hist.new
.StrCategory(['Below Beam Energy','Above Beam Energy'])
.Reg(100,0,200,label='Number of Hits in ECal')
.Double()
)
n_ecal_hits = ak.count(events['EcalRecHits_test.id_'], axis=1)
nhits_by_tere_h.fill(
'Below Beam Energy',
n_ecal_hits[total_ecal_rec_energy < 8000.]
)
nhits_by_tere_h.fill(
'Above Beam Energy',
n_ecal_hits[total_ecal_rec_energy > 8000.]
)
nhits_by_tere_h.plot()
plt.yscale('log')
plt.ylabel('Events')
plt.legend()
plt.show()