Using ldmx-sw Directly
As you may be able to pick up based on my tone, I prefer the (efficient) python analysis
method given before using uproot
, awkward
, and hist
. Nevertheless, there are two
main reasons I've had for putting an analysis into the ldmx-sw C++.
- Longevity: While the python ecosystem evolving quickly is helpful for obtaining new features, it can also cause long-running projects to "break" -- forcing a refactor that is purely due to upstream code changes. Writing an analysis into C++, with its more rigid view on backwards compatibility, essentially solidifies it so that it can be run in the same way for a long time into the future.
- Non-Vectorization: In the previous chapter, I made a big deal about how most
analyses can be vectorizable and written in terms of these pre-compiled and fast
functions. That being said, sometimes its difficult to figure out how to write
an analysis in a vectorizable form. Dropping down into the C++ allows analyzers
to write the
for
loop themselves which may be necessary for an analysis to be understandable (or even functional).
The following example is stand-alone in the same sense as the prior chapter's
jupyter notebook. It can be run from outside of the ldmx-sw repository; however,
this directly contradicts the first reason for writing an analysis like this
in the first place. I would recommend that you store your analyzer source code
in ldmx-sw or the private ldmx-analysis. These repos also contain CMake infrastructure
so you can avoid having to write the long g++
command necessary to compile
and link the source code.
Set Up
I am going to use the same version of ldmx-sw that was used to generate
the input events.root
file. This isn't strictly necessary - more often
than not, newer ldmx-sw versions are able to read files from prior ldmx-sw
versions.
cd work-area
denv init ldmx/pro:v3.3.6
The following code block shows the necessary boilerplate for starting a C++ analyzer running with ldmx-sw.
#include "Framework/EventProcessor.h"
#include "Ecal/Event/EcalHit.h"
class MyAnalyzer : public framework::Analyzer {
public:
MyAnalyzer(const std::string& name, framework::Process& p)
: framework::Analyzer(name, p) {}
~MyAnalyzer() = default;
void onProcessStart() final;
void analyze(const framework::Event& event) final;
};
void MyAnalyzer::onProcessStart() {
// this is where we will define the histograms we want to fill
}
void MyAnalyzer::analyze(const framework::Event& event) {
// this is where we will fill the histograms
}
DECLARE_ANALYZER(MyAnalyzer);
And below is an example python config call ana-cfg.py
that I will
use to run this analyzer with fire
. It assumes to be in the same
place as the source file so that it knows where the library it needs
to load is.
from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('ana')
import os
# needs to match path to where compiled library is
# deduced automatically if built and installed alongside ldmx-sw
ldmxcfg.Process.addLibrary(f'{os.getcwd()}/libMyAnalysis.so')
class MyAnalysis:
def __init__(self):
self.instanceName = 'my-ana'
self.className = 'MyAnalyzer' # match class name in source file
self.histograms = []
p.sequence = [ MyAnalysis() ]
p.inputFiles = [ 'events.root' ]
p.histogramFile = 'hist.root'
A quick test can show that the code is compiling and running (although it will not print out anything or create any files).
$ denv 'g++ -fPIC -shared -o libMyAnalysis.so -lFramework -I$(root-config --incdir) MyAnalysis.cxx'
$ denv time fire ana-cfg.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete --------
1.99user 0.11system 0:02.10elapsed 100%CPU (0avgtext+0avgdata 320716maxresident)k
0inputs+0outputs (0major+58469minor)pagefaults 0swaps
Notice that we still took about 2s to run. This is because, even though MyAnalyzer
isn't
doing anything with the data, fire
is still looping through all of the events.
Load Data, Manipulate Data, and Fill Histograms
While these steps were separate in the previous workflow, they all share the same process
in this workflow. The data loading is handled by fire
and we are expected to write the
code that manipulates the data and fills the histograms.
I'm going to look at the total reconstructed energy in the ECal again. Below is the updated
code file. Notice that, besides the additional #include
at the top, all of the changes were
made within the function definitions.
#include "Framework/EventProcessor.h"
#include "Ecal/Event/EcalHit.h"
class MyAnalyzer : public framework::Analyzer {
public:
MyAnalyzer(const std::string& name, framework::Process& p)
: framework::Analyzer(name, p) {}
~MyAnalyzer() = default;
void onProcessStart() final;
void analyze(const framework::Event& event) final;
};
void MyAnalyzer::onProcessStart() {
getHistoDirectory(); // forget this -> silently no output histogram
histograms_.create(
"total_ecal_rec_energy",
"Total ECal Rec Energy [GeV]", 160, 0.0, 16.0
);
}
void MyAnalyzer::analyze(const framework::Event& event) {
const auto& ecal_rec_hits{event.getCollection<ldmx::EcalHit>("EcalRecHits")};
double total = 0.0;
for (const auto& hit : ecal_rec_hits) {
total += hit.getEnergy();
}
histograms_.fill("total_ecal_rec_energy", total/1000.);
}
DECLARE_ANALYZER(MyAnalyzer);
In order to run this code on the data, we need to compile and run the program. Again, putting your analyzer within ldmx-sw or ldmx-analysis gives you infrastructure that shortens how much you type during this compilation step.
$ denv 'g++ -fPIC -shared -o libMyAnalysis.so -lFramework -I$(root-config --incdir) MyAnalysis.cxx'
$ denv time fire ana-cfg.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete --------
2.03user 0.12system 0:02.20elapsed 97%CPU (0avgtext+0avgdata 321300maxresident)k
0inputs+40outputs (0major+57977minor)pagefaults 0swaps
Now there is a new file hist.root
in this directory which has the histogram we filled stored within it.
$ denv rootls -l hist.root
TDirectoryFile Apr 30 22:30 2024 my-ana "my-ana"
$ denv rootls -l hist.root:*
TH1F Apr 30 22:30 2024 my-ana_total_ecal_rec_energy ""
Plotting Histograms
Viewing the histogram with the filled data is another fork in the road.
I often switch back to the Jupyter Lab approach using uproot
to load histograms from hist.root
into hist
objects that I can then manipulate and plot with hist
and matplotlib
.
Accessing Histograms in Jupyter Lab
I usually hold the histogram file handle created by uproot
and then use the to_hist()
function
once I find the histogram I want to plot in order to pull the histogram into an object I am
familiar with. For example
f = uproot.open('hist.root')
h = f['my-ana/my-ana_total_ecal_rec_energy'].to_hist()
h.plot()
But it is also very common to view these histograms with one of ROOT's browsers.
There is a root browser within the dev images (via denv rootbrowse
or ldmx rootbrowse
),
but I also like to view the histogram with the online ROOT browser.
Below is a screenshot of my browser window after opening the hist.root
file and then
selecting the histogram we filled.