Using ldmx-sw Directly

As you may be able to pick up based on my tone, I prefer the (efficient) python analysis method given before using uproot, awkward, and hist. Nevertheless, there are two main reasons I've had for putting an analysis into the ldmx-sw C++.

  1. Longevity: While the python ecosystem evolving quickly is helpful for obtaining new features, it can also cause long-running projects to "break" -- forcing a refactor that is purely due to upstream code changes. Writing an analysis into C++, with its more rigid view on backwards compatibility, essentially solidifies it so that it can be run in the same way for a long time into the future.
  2. Non-Vectorization: In the previous chapter, I made a big deal about how most analyses can be vectorizable and written in terms of these pre-compiled and fast functions. That being said, sometimes its difficult to figure out how to write an analysis in a vectorizable form. Dropping down into the C++ allows analyzers to write the for loop themselves which may be necessary for an analysis to be understandable (or even functional).

The following example is stand-alone in the same sense as the prior chapter's jupyter notebook. It can be run from outside of the ldmx-sw repository; however, this directly contradicts the first reason for writing an analysis like this in the first place. I would recommend that you store your analyzer source code in ldmx-sw or the private ldmx-analysis. These repos also contain CMake infrastructure so you can avoid having to write the long g++ command necessary to compile and link the source code.

Set Up

I am going to use the same version of ldmx-sw that was used to generate the input events.root file. This isn't strictly necessary - more often than not, newer ldmx-sw versions are able to read files from prior ldmx-sw versions.

cd work-area
denv init ldmx/pro:v3.3.6

The following code block shows the necessary boilerplate for starting a C++ analyzer running with ldmx-sw.

#include "Framework/EventProcessor.h"

#include "Ecal/Event/EcalHit.h"

class MyAnalyzer : public framework::Analyzer {
 public:
  MyAnalyzer(const std::string& name, framework::Process& p)
    : framework::Analyzer(name, p) {}
  ~MyAnalyzer() = default;
  void onProcessStart() final;
  void analyze(const framework::Event& event) final;
};

void MyAnalyzer::onProcessStart() {
  // this is where we will define the histograms we want to fill
}

void MyAnalyzer::analyze(const framework::Event& event) {
  // this is where we will fill the histograms
}

DECLARE_ANALYZER(MyAnalyzer);

And below is an example python config call ana-cfg.py that I will use to run this analyzer with fire. It assumes to be in the same place as the source file so that it knows where the library it needs to load is.

from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('ana')
import os
# needs to match path to where compiled library is
# deduced automatically if built and installed alongside ldmx-sw
ldmxcfg.Process.addLibrary(f'{os.getcwd()}/libMyAnalysis.so')
class MyAnalysis:
    def __init__(self):
        self.instanceName = 'my-ana'
        self.className = 'MyAnalyzer' # match class name in source file
        self.histograms = []
p.sequence = [ MyAnalysis() ]
p.inputFiles = [ 'events.root' ]
p.histogramFile = 'hist.root'

A quick test can show that the code is compiling and running (although it will not print out anything or create any files).

$ denv 'g++ -fPIC -shared -o libMyAnalysis.so -lFramework -I$(root-config --incdir) MyAnalysis.cxx'
$ denv time fire ana-cfg.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete  --------
1.99user 0.11system 0:02.10elapsed 100%CPU (0avgtext+0avgdata 320716maxresident)k
0inputs+0outputs (0major+58469minor)pagefaults 0swaps

Notice that we still took about 2s to run. This is because, even though MyAnalyzer isn't doing anything with the data, fire is still looping through all of the events.

Load Data, Manipulate Data, and Fill Histograms

While these steps were separate in the previous workflow, they all share the same process in this workflow. The data loading is handled by fire and we are expected to write the code that manipulates the data and fills the histograms.

I'm going to look at the total reconstructed energy in the ECal again. Below is the updated code file. Notice that, besides the additional #include at the top, all of the changes were made within the function definitions.

#include "Framework/EventProcessor.h"

#include "Ecal/Event/EcalHit.h"

class MyAnalyzer : public framework::Analyzer {
 public:
  MyAnalyzer(const std::string& name, framework::Process& p)
    : framework::Analyzer(name, p) {}
  ~MyAnalyzer() = default;
  void onProcessStart() final;
  void analyze(const framework::Event& event) final;
};

void MyAnalyzer::onProcessStart() {
  getHistoDirectory(); // forget this -> silently no output histogram
  histograms_.create(
      "total_ecal_rec_energy",
      "Total ECal Rec Energy [GeV]", 160, 0.0, 16.0
  );
}

void MyAnalyzer::analyze(const framework::Event& event) {
  const auto& ecal_rec_hits{event.getCollection<ldmx::EcalHit>("EcalRecHits")};
  double total = 0.0;
  for (const auto& hit : ecal_rec_hits) {
    total += hit.getEnergy();
  }
  histograms_.fill("total_ecal_rec_energy", total/1000.);
}

DECLARE_ANALYZER(MyAnalyzer);

In order to run this code on the data, we need to compile and run the program. Again, putting your analyzer within ldmx-sw or ldmx-analysis gives you infrastructure that shortens how much you type during this compilation step.

$ denv 'g++ -fPIC -shared -o libMyAnalysis.so -lFramework -I$(root-config --incdir) MyAnalysis.cxx'
$ denv time fire ana-cfg.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete  --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete  --------
2.03user 0.12system 0:02.20elapsed 97%CPU (0avgtext+0avgdata 321300maxresident)k
0inputs+40outputs (0major+57977minor)pagefaults 0swaps

Now there is a new file hist.root in this directory which has the histogram we filled stored within it.

$ denv rootls -l hist.root
TDirectoryFile  Apr 30 22:30 2024 my-ana  "my-ana"
$ denv rootls -l hist.root:*
TH1F  Apr 30 22:30 2024 my-ana_total_ecal_rec_energy  ""

Plotting Histograms

Viewing the histogram with the filled data is another fork in the road. I often switch back to the Jupyter Lab approach using uproot to load histograms from hist.root into hist objects that I can then manipulate and plot with hist and matplotlib.

Accessing Histograms in Jupyter Lab

I usually hold the histogram file handle created by uproot and then use the to_hist() function once I find the histogram I want to plot in order to pull the histogram into an object I am familiar with. For example

f = uproot.open('hist.root')
h = f['my-ana/my-ana_total_ecal_rec_energy'].to_hist()
h.plot()

But it is also very common to view these histograms with one of ROOT's browsers. There is a root browser within the dev images (via denv rootbrowse or ldmx rootbrowse), but I also like to view the histogram with the online ROOT browser. Below is a screenshot of my browser window after opening the hist.root file and then selecting the histogram we filled.

screenshot of JSROOT