Development Image Build Context
The build context for the ldmx/dev images used for developing and running ldmx-sw.
There is a corresponding workflow in ldmx-sw that generates a production image using the image generated by this build context as a base image. This production image already has ldmx-sw built and installed on it and assumes the user wants to run the application.
Usage
The image is designed to be used with denv
which provides support for Docker, Podman, and Apptainer.
Software in Image
| Software Package | Version | Construction Process |
|---|---|---|
| Ubuntu Server | 24.04 | Base Image |
| GCC | 13.3.0 | From Ubuntu Repos |
| Python | 3.12.3 | From Ubuntu Repos |
| cmake | 3.28.3 | From Ubuntu Repos |
| Boost | 1.83.0 | From Ubuntu Repos |
| XercesC | 3.3.0 | Built from source |
| LHAPDF | 6.5.5 | Built from source |
| Pythia8 | 8.313 | Built from source |
| nlohmann/json | 3.11.3 | From Ubuntu Repos |
| ROOT | 6.34.10 | Built from source |
| Geant4 | LDMX.10.2.3_v0.6 | Built from source |
| Eigen | 3.4.0 | Built from source |
| HEPMC3 | 3.3.0 | Built from source |
| GENIE Generator | 3.04.02-ldmx | Built from source |
| GENIE Reweight | 1.04.00 | Built from Source |
| Catch2 | 3.8.0 | Built from source |
| Acts | 36.0.0 | Built from source |
| ONNX Runtime | 1.15.0 | Download pre-built binaries |
| Clang | 18.1.3 | From Ubuntu Repos |
More detailed documentation of the available software is online.
and documentation on the workflow and runner used to build the image is here.
Other Configuration
- SSL Certificates that will be trusted by container are in the
certsdirectory
Other Packages
If you would like another package included in the development container, please open an issue in this repository for further discussion.
Determining an Image's Version
Often it is helpful to determine an image's version. Sometimes, this is as easy as
looking at the tag provided by docker image ls or written into the SIF file name,
but sometimes this information is lost. Since v4 of the container image, we've been
more generous with adding labels to the image and a standard one is included
org.opencontainers.image.version which (for our purposes) stores the release that
built the image.
We can inspect an image to view its labels.
You can find out which runner is being used by running denv check.
Docker/Podman
For docker and podman, the inspect command returns JSON with all of the image manifest details.
The jq program just helps us parse this JSON for the specific label we are looking for,
but you could just scroll through the output.
docker inspect ldmx/dev:latest \
| jq 'map(.Config.Labels["org.opencontainers.image.version"])[]'
You can also avoid using jq and scrolling if you provide a Go template.
docker inspect ldmx/dev:latest \
--format '{{ (index .Config.Labels "org.opencontainers.image.version") }}'
Apptainer
apptainer inspect by default returns just the list of labels,
so we can just use grep to select the line with the label we care about.
Similar to above, you can also just scroll through the output if you want.
apptainer inspect ldmx_dev_latest.sif | grep org.opencontainers.image.version
denv uses the OCI Image Tag to refer to the images rather than a path to the SIF file
which is needed for apptainer inspect to function.
Fortunately, you can find the full path to the cached SIF file using an environment
variable apptainer defines at runtime.
denv printenv APPTAINER_CONTAINER
Putting all this together, we can find the image version label with the following one-liner.
apptainer inspect $(denv printenv APPTAINER_CONTAINER) | grep org.opencontainers.image.version
Using a Custom Geant4
Geant4 is our main simulation engine and it has a large effect on the products of our simulation samples. As such, it is very common to compare multiple different versions, patches, and tweaks to Geant4 with our simulation.
Make sure you have an image that is at least v4.2.0. You can check your version of the image by inspecting the image labels.
Building Your Geant4
You can build your Geant4 in a similar manner as ldmx-sw. It does take much longer to compile than ldmx-sw since it is larger, so be sure to leave enough time for it.
You can only run this custom build of Geant4 with whatever image you are building it with, so make sure you are happy with the image version you are using.
cd path/to/ldmx/ldmx-sw # ldmx-sw you want to build with custom geant4
git clone git@github.com:LDMX-Software/geant4.git # or could be mainline Geant4 or an unpacked tar-ball
denv cmake -B geant4/build -S geant4 <cmake-options>
denv cmake --build geant4/build --target install
Now building Geant4 from source has a lot of configuration options that can be used to customize how it is built. Below are a few that are highlighted for how we use containers and their interaction with the Geant4 build.
CMAKE_INSTALL_PREFIX: This should be set to a path accessible from the container so that the programs within the container can read from and write to this directory. If the geant4 build directory is within ldmx-sw (like it is above), then you could do something like-DCMAKE_INSTALL_PREFIX=geant4/installwhen you rundenv cmake ...- If you are keeping Geant4 outside of ldmx-sw, then you may need to mount it into the ldmx-sw container image with
denv config mountsif it is not already within a mounted directory.
- If you are keeping Geant4 outside of ldmx-sw, then you may need to mount it into the ldmx-sw container image with
GEANT4_INSTALL_DATADIR: If you are building a version of Geant4 that has the same data files as the Geant4 version built into the container image, then you can tell the Geant4 build to use those data files with this option, saving build time and disk space. This is helpful if (for example) you are just re-building the same version of Geant4 but in Debug mode. You can see where the Geant4 data is within the container withdenv printenv G4DATADIRand then use this value-DGEANT4_INSTALL_DATADIR=/usr/local/share/geant4/data.
The following are the build options used when setting up the image and are likely what you want to get started
-DGEANT4_USE_GDML=ONEnable reading geometries with the GDML markup language which is used in LDMX-sw for all our geometries-DGEANT4_INSTALL_EXAMPLES=OFFDon't install the Geant4 example applications (just to save space and compilation time)-DGEANT4_USE_OPENGL_X11=ONenable an X11-based GUI for inspecting geometries-DGEANT4_MULTITHREADED=OFFIf you are building a version of Geant4 that is multithreaded by default, you will want to disable it with. The dynamic loading used in LDMX-sw will often not work with a multithreaded version of Geant4
Concerns when building different versions of Geant4 than 10.2.3
For most use cases you will be building a modified version of the same release of Geant4 that is used in the image (10.2.3). It is also possible to build and use later versions of Geant4 although this should be done with care. In particular
- Different Geant4 release versions will require that you rebuild LDMX-sw for use with that version, it will not be sufficient to just source the custom Geant4's environment and pick up the shared libraries therein
- Recent versions of Geant4 group the electromagnetic processes for each particle into a so-called general process for performance reasons. This means that many features in LDMX-sw that rely on the exact names of processes in Geant4 will not work. You can disable this by inserting something like the following in RunManager::setupPhysics()
// Make sure to include G4EmParameters if needed
auto electromagneticParameters {G4EmParameters::Instance()};
// Disable the use of G4XXXGeneralProcess,
// i.e. G4GammaGeneralProcess and G4ElectronGeneralProcess
electromagneticParameters->SetGeneralProcessActive(false);
- Geant4 relies on being able to locate a set of datasets when running. For builds of 10.2.3, the ones that are present in the container will suffice but other versions may need different versions of these datasets. If you run into issues with this, use
denv printenvand check that the following environment variables are pointing to the right location GEANT4_DATA_DIRshould point to$LDMX_CUSTOM_GEANT4/share/Geant4/data- The following environment variables should either be unset or point to the correct location in
GEANT4_DATA_DIRG4NEUTRONHPDATAG4LEDATAG4LEVELGAMMADATAG4RADIOACTIVEDATAG4PARTICLEXSDATAG4PIIDATAG4REALSURFACEDATAG4SAIDXSDATAG4ABLADATAG4INCLDATAG4ENSDFSTATEDATA
- When using CMake, ensure that the right version of Geant4 is picked up at configuration time (i.e. when you run
denv cmake)- You can always check the version that is used in a build directory by running
denv ccmake .in the build directory and searching for the Geant4 version variable - If the version is incorrect, you will need to re-configure your build directory. If
cmakeisn't picking up the right Geant4 version by default, ensure that theCMAKE_PREFIX_PATHis pointing to your version of Geant4
- You can always check the version that is used in a build directory by running
- Make sure that your version of Geant4 was built with multithreading disabled
Geant4 Data Duplication
The Geant4 datasets do not evolve as quickly as the source code that uses them. We have a copy of the data needed for the LDMX standard version within the container (v10.2.3 currently) and you can inspect the versions of the datasets that have changed between the version in the container image and the one you want to build to see which datasets you may need to install.
The file cmake/Modules/Geant4DatasetDefinitions.cmake in the Geant4 source code has these
versions for us (The name changed from Geant4Data... to G4Data... in v10.7.0) and we can
use this file to check manually which datasets need to be updated when running a newer version.
Below, I'm comparing Geant4 v10.3.0 and our current standard.
diff \
--new-line-format='+%L' \
--old-line-format='-%L' \
--unchanged-line-format=' %L' \
<(wget -q -O - https://raw.githubusercontent.com/LDMX-Software/geant4/LDMX.10.2.3_v0.5/cmake/Modules/Geant4DatasetDefinitions.cmake) \
<(wget -q -O - https://raw.githubusercontent.com/Geant4/geant4/v10.3.0/cmake/Modules/Geant4DatasetDefinitions.cmake)
Output
# - Define datasets known and used by Geant4
# We keep this separate from the Geant4InstallData module for conveniance
# when updating and patching because datasets may change more rapidly.
# It allows us to decouple the dataset definitions from how they are
# checked/installed/configured
#
# - NDL
geant4_add_dataset(
NAME G4NDL
VERSION 4.5
FILENAME G4NDL
EXTENSION tar.gz
ENVVAR G4NEUTRONHPDATA
MD5SUM fd29c45fe2de432f1f67232707b654c0
)
# - Low energy electromagnetics
geant4_add_dataset(
NAME G4EMLOW
- VERSION 6.48
+ VERSION 6.50
FILENAME G4EMLOW
EXTENSION tar.gz
ENVVAR G4LEDATA
- MD5SUM 844064faa16a063a6a08406dc7895b68
+ MD5SUM 2a0dbeb2dd57158919c149f33675cce5
)
# - Photon evaporation
geant4_add_dataset(
NAME PhotonEvaporation
- VERSION 3.2
+ VERSION 4.3
FILENAME G4PhotonEvaporation
EXTENSION tar.gz
ENVVAR G4LEVELGAMMADATA
- MD5SUM 01d5ba17f615d3def01f7c0c6b19bd69
+ MD5SUM 012fcdeaa517efebba5770e6c1cbd882
)
# - Radioisotopes
geant4_add_dataset(
NAME RadioactiveDecay
- VERSION 4.3.2
+ VERSION 5.1
FILENAME G4RadioactiveDecay
EXTENSION tar.gz
ENVVAR G4RADIOACTIVEDATA
- MD5SUM ed171641682cf8c10fc3f0266c8d482e
+ MD5SUM 994853b153c6f805e60e2b83b9ac10e0
)
# - Neutron XS
geant4_add_dataset(
NAME G4NEUTRONXS
VERSION 1.4
FILENAME G4NEUTRONXS
EXTENSION tar.gz
ENVVAR G4NEUTRONXSDATA
MD5SUM 665a12771267e3b31a08c622ba1238a7
)
# - PII
geant4_add_dataset(
NAME G4PII
VERSION 1.3
FILENAME G4PII
EXTENSION tar.gz
ENVVAR G4PIIDATA
MD5SUM 05f2471dbcdf1a2b17cbff84e8e83b37
)
# - Optical Surfaces
geant4_add_dataset(
NAME RealSurface
VERSION 1.0
FILENAME RealSurface
EXTENSION tar.gz
ENVVAR G4REALSURFACEDATA
MD5SUM 0dde95e00fcd3bcd745804f870bb6884
)
# - SAID
geant4_add_dataset(
NAME G4SAIDDATA
VERSION 1.1
FILENAME G4SAIDDATA
EXTENSION tar.gz
ENVVAR G4SAIDXSDATA
MD5SUM d88a31218fdf28455e5c5a3609f7216f
)
# - ABLA
geant4_add_dataset(
NAME G4ABLA
VERSION 3.0
FILENAME G4ABLA
EXTENSION tar.gz
ENVVAR G4ABLADATA
MD5SUM d7049166ef74a592cb97df0ed4b757bd
)
# - ENSDFSTATE
geant4_add_dataset(
NAME G4ENSDFSTATE
- VERSION 1.2.3
+ VERSION 2.1
FILENAME G4ENSDFSTATE
EXTENSION tar.gz
ENVVAR G4ENSDFSTATEDATA
- MD5SUM 98fef898ea35df4010920ad7ad88f20b
+ MD5SUM 95d970b97885aeafaa8909f29997b0df
)
As you can see, while only a subset of the datasets change, some of them do change. Unless you are planning to compare several different Geant4 versions that all share mostly the same datasets, it is easier just to have each Geant4 version have its own downloaded copies of the datasets.
Running with your Geant4
The way we use different versions of Geant4 has changed over the years, so it depends on which version of the image you are using.
>=5.1.1
Since we are using denv to interact with the development image, you now have access to a local file that
can customize your development environment within the container image.
This file is .profile located within the container's home directory.
To find the location of this file, run
denv printenv HOME
from the location where you want to use the custom Geant4.
The path output by this command is where the .profile is that you will edit.
System .profile
.profileThe .profile file is a file that exists in many normal Linux (and MacOS) systems.
I am just pointing this out because if you edit your system one (located at ~/.profile)
instead of the one that is located within the denv workspace, you will not get the changes
to the container environment you want and you could break your system.
All this stuff should go at the end of the .profile so that you are "updating" the default
environment.
First, make sure to unset the image-specific versions of the Geant4 environment variables defining
the location of the data directories.
This list may not be complete depending on the version of Geant4 installed in the image, you can use
denv printenv to see the full list of environment variables within the container environment.
unset G4NEUTRONHPDATA
unset G4LEDATA
unset G4LEVELGAMMADATA
unset G4RADIOACTIVEDATA
unset G4PARTICLEXSDATA
unset G4PIIDATA
unset G4REALSURFACEDATA
unset G4SAIDXSDATA
unset G4ABLADATA
unset G4INCLDATA
unset G4ENSDFSTATEDATA
unset G4NEUTRONXSDATA
If you changed the location of the data directory when building Geant4, make sure to also use that
location here by defining GEANT4_DATA_DIR before sourcing the Geant4 environment script.
# only needed if changed data location when building geant4
export GEANT4_DATA_DIR=/full/path/to/custom/data/location
Then source the custom Geant4's environment script.
. /full/path/to/custom/geant4/bin/geant4.sh
# this stuff below is helpful to make sure a data directory is found
# and allows folks to have a debug build of Geant4 without re-downloading the data
# it goes _after_ the script because the script will define GEANT4_DATA_DIR if the
# build is configured with a specific data location
if [ -z "${GEANT4_DATA_DIR+x}" ]; then
export GEANT4_DATA_DIR="${G4DATADIR}"
fi
And finally, update CMAKE_PREFIX_PATH so that ldmx-sw will prefer this custom Geant4 instead
of the one installed within the image.
export CMAKE_PREFIX_PATH="/full/path/to/custom/geant4/lib/cmake:${CMAKE_PREFIX_PATH}"
After these changes, you should be able to compile and run ldmx-sw from this environment using your custom build of Geant4 with the normal development commands.
just compile
just fire config.py
You can make sure your Geant4 was found and is being used by going into the build and inspecting the configuration.
cd build && denv ccmake .
You should see Geant4_DIR set to the path of your custom Geant4 instead of some path in /usr/local/....
Returning to Normal
If you want to return to the normal environment, you can comment-out or remove your .profile changes
at the bottom of that file.
A nuclear option is to have the image re-copy of new .profile by removing the .profile file and a denv
internal file signalling that the profile has already been copied.
rm .profile .denv/skel-init
<5.1.1,>=4.2.0
With release 4.2.0 of the ldmx/dev image, the entrypoint script now checks the environment variable LDMX_CUSTOM_GEANT4 for a path to a local installation of Geant4.
This allows the user to override the Geant4 that is within the image with one that available locally. In this way, you can choose whichever version of Geant4 you want,
with whatever code modifications applied, with whatever build instructions you choose.
Just like with ldmx-sw, you can only run a specific build of Geant4 in the same image that you used to build it.
just setenv LDMX_CUSTOM_GEANT4=/path/to/geant4/install
If you followed the procedure above, the Geant4 install will be located at ${LDMX_BASE}/geant4/install and you can use
this in the setenv command.
just setenv LDMX_CUSTOM_GEANT4=${LDMX_BASE}/geant4/install
By default the container will produce a rather verbose warning when using a custom Geant4 build.
This is to avoid reproducibility issues caused by accidental use of the feature.
You can disable it by defining the LDMX_CUSTOM_GEANT4_CONFIRM_DEV environment variable in the container environment
just setenv LDMX_CUSTOM_GEANT4=${LDMX_BASE}/geant4/install
denv ... # Some command
> Warning: You are relying on a non-container version of Geant4. This mode of operation can come with some reproducibility concerns if you aren't careful. # The actual warning is longer than this...
just setenv LDMX_CUSTOM_GEANT4_CONFIRM_DEV=yes # Can be anything other than an empty string
denv ... # No warning!
Custom Acts
For similar reasons as Geant4, some developers may want to try building ldmx-sw with a different version of Acts than what resides within the current image. Luckily, using a custom Acts is a bit simpler than Geant4 since there aren't as many pieces of data to consider (for example, Acts does not inspect environment variables for runtime configuration to my knowledge).
Building Your Acts
You can only run this custom build of Acts with whatever image you are building it with, so make sure you are happy with the image version you are using.
cd path/to/ldmx # directory that contains ldmx-sw
git clone git@github.com:acts-project/acts.git
# checkout specific version or branch
denv cmake -B acts/build -S acts \
-DCMAKE_INSTALL_PREFIX=acts/install \
-DCMAKE_CXX_STANDARD=20
denv cmake --build acts/build --target install
The cmake options written above can be experimented with.
We use the C++20 standard in ldmx-sw currently, so it is helpful to use the same standard in Acts.
The CMAKE_INSTALL_PREFIX is the path where Acts will be installed and that path will need to
be inside a directory that is mounted to the container at runtime and provided to ldmx-sw when
configuring the build.
Running with Your Acts
Since there aren't other environment variables needed for Acts to function at runtime,
we just need to build ldmx-sw with specific cmake options pointing it to our new location of Acts.
just configure -DActs_DIR=/path/to/ldmx/acts/install
just build
Some other CMake options may be required depending on your version of ACTS and the version of the dev
image you are using. Below are some options that we've come across while testing. They can be set on the
the command line when running just configure with -D<name>=<value> like Acts_DIR above.
CMAKE_FIND_DEBUG_MODE: may need to be turnedOFFnlohmann_json_DIR: may need to be directed to the specific version that was installed with ACTS- e.g.
-Dnlohmann_json_DIR=/path/to/ldmx/acts/install/lib/cmake/nlohmann_json/
- e.g.
Using Parallel Containers
Sometimes users wish to compare the behavior of multiple dev images without changing the source code of ldmx-sw (or a related repository) very much if at all. This page documents how to use two (or more) images in parallel.
Normally, when users switch images, they need to full re-build after fully cleaning out
all of the generated files (usually with just clean).
This method avoids this connection between a full re-build and switching images at the cost of extra complexity.
The best way to document this is by outlining an example; however,
please note that this can easily be expanded to any number of images you wish
(and could be done with software that is not necessarily ldmx-sw).
Let's call the two images we wish to use alice and bob,
both of which are already built.
1. Clean Up Environment
cd ~/ldmx/ldmx-sw # go to ldmx-sw
just clean
2. Build with Both Images
# going to build with alice first
just use ldmx/dev:alice
denv cmake -B alice/build -S . -DCMAKE_INSTALL_PREFIX=alice/install
denv cmake --build alice/build --target install
# now build with bob
just use ldmx/dev:bob
denv cmake -B bob/build -S . -DCMAKE_INSTALL_PREFIX=bob/install
denv cmake --build bob/build --target install
3. Run with an Image
The container run from an image looks at a specific path for libraries to link and executables to run
that were built by the user within the container. In current images (based on version 3
or newer), this path is ${LDMX_BASE}/ldmx-sw/install.
# I want to run alice so I need its install in the location where
# the container looks when it runs (i.e. ldmx-sw/install)
ln -sf alice/install install
just use ldmx/dev:alice
just fire # runs ldmx-sw compiled with alice
ln -sf bob/install install
just use ldmx/dev:bob
just fire # runs ldmx-sw compiled with bob
Contributing
From fixing typos in these documentation pages to patching a bug in one of the dependencies to adding a new ubuntu or python package you find useful.
Please reach out via GitHub issues or on the LDMX slack to get started.
To contribute code to the project, you will need to create an account on github if you don't have one already, and then request to be added to the LDMX-Software organization.
When adding new code, you should do this on a branch created by a command like git checkout -b johndoe-dev in order to make sure you don't apply changes directly to the master (replace "johndoe" with your user name). We typically create branches based on issue names in the github bug tracker, so "Issue 1234: Short Description in Title" turns into the branch name 1234-short-desc.
Then you would git add and git commit your changes to this branch.
If you don't already have SSH keys configured, look at the GitHub directions. This makes it easier to push/pull to/from branches on GitHub!
Pull Requests
We prefer that any major code contributions are submitted via pull requests so that they can be reviewed before changes are merged into the master.
Before you start, an issue should be added to the issue tracker.
Branch Name Convention
Then you should make a local branch from trunk using a command like git checkout -b 1234-short-desc where 1234 is the issue number from the issue tracker and short-desc is a short description (using - as spaces) of what the branch is working one.
Once you have committed your local changes to this branch using the git add and git commit commands, then push your branch to github using a command like git push -u origin 1234-short-desc.
Finally, submit a pull request to integrate your changes by selecting your branch in the compare dropdown box and clicking the green buttons to make the PR. This should be reviewed and merged or changes may be requested before the code can be integrated into the master.
If you plan on starting a major (sub)project within the repository like adding a new code module, you should give advance notice and explain your plains beforehand. :) A good way to do this is to create a new issue. This allows the rest of the code development team to see what your plan is and offer comments/questions.
Ubuntu Packages
Here I try to list all of the installed ubuntu packages and give an explanation of why they are included. Lot's of these packages are installed into the ROOT official docker container and so I have copied them into this image. I have looked into their purpose by a combination of googling the package name and looking at ROOT's reason for them.
In the Dockerfile, most packages are added when they are needed for the rest of
the build. Adding packages before they are needed means the container needs to
be rebuilt starting from the point you add them, so it is a good habit to avoid
doing so. There is a helper script installed in the container
install-ubuntu-packages that can be called directly in the Dockerfile with a
list of packages to install.
If you want to add additional packages that aren't necessary for building ldmx-sw, its dependencies, or the container environment use the install command at the end of the Dockerfile.
If you are looking to add python packages, prefer adding them to the python packages file rather than installing them from the ubuntu repositories.
Extracting Package List from Dockerfile
We have settled into a relatively simple syntax for the packages in the Dockerfile
and thus I've been able to write an awk script that can parse the Dockerfile
and list the packages we install from the ubuntu repositories.
BEGIN {
in_install=0;
}
{
# check if this line is in an install command
if (in_install && NF > 0) {
# print out all entires on line except the line continuation backslash
for (i=1; i <= NF; i++) {
if ($i != "\\") {
print $i;
}
}
}
# update for next lines if we are opening an install command or closing
if ($0 ~ /^RUN install-ubuntu-packages.*$/) {
in_install=1;
} else if (NF == 0 || $1 == "RUN" && $2 != "install-ubuntu-packages") {
in_install=0;
}
}
which is stored in docs/src and can be run like
awk -f docs/src/get-ubuntu-packages.awk Dockerfile
| Package | Reason |
|---|---|
| binutils | Adding PPA and linking libraries |
| cmake | Configuration of build system |
| gcc | GNU C Compiler |
| g++ | GNU C++ Compiler |
| gfortran | GNU Fortran Compiler |
| locales | Configuration of TPython and other python packages |
| make | Building system for dependencies and ldmx-sw |
| wget | Download source files for dependencies and ldmx-sw Conditions |
| python3-dev | ROOT TPython and ldmx-sw configuration system |
| python3-numpy | ROOT TPython requires numpy and downstream analysis packages |
| python3-pip | Downloading more python packages |
| python3-tk | matplotlib requires python-tk for some plotting |
| rsync | necessary to build Pythia8 |
| fonts-freefont-ttf | fonts for plots with ROOT |
| libafterimage-dev | ROOT GUI needs these for common shapes |
| libfftw3-dev | Discrete fourier transform in ROOT |
| libfreetype6-dev | fonts for plots with ROOT |
| libftgl-dev | Rendering fonts in OpenGL |
| libgif-dev | Saving plots as GIFs |
| libgl1-mesa-dev | MesaGL allowing 3D rendering using OpenGL |
| libgl2ps-dev | Convert OpenGL image to PostScript file |
| libglew-dev | GLEW library for helping use OpenGL |
| libglu-dev | OpenGL Utility Library |
| libjpeg-dev | Saving plots as JPEGs |
| liblz4-dev | Data compression in ROOT serialization |
| liblzma-dev | Data compression in ROOT serialization |
| libpng-dev | Saving plots as PNGs |
| libx11-dev | low-level window management (ROOT GUI) |
| libxext-dev | low-level window management (ROOT GUI) |
| libxft-dev | low-level window management (ROOT GUI) |
| libxml2-dev | XML reading and writing |
| libxmu-dev | low-level window management (ROOT GUI) |
| libxpm-dev | low-level window management (ROOT GUI) |
| libz-dev | Data compression in ROOT serialization |
| libzstd-dev | Data compression in ROOT serialization |
| nlohmann-json3-dev | JSON reading/writing in ROOT and ldmx-sw |
| srm-ifce-dev | srm-ifce client side access of distributed storage within ROOT |
| libgsl-dev | GNU Scientific Library for numerical calculations in ROOT MathMore (needed for GENIE) |
| liblog4cpp5-dev | C++ Logging Library used in GENIE |
| ca-certificates | Installing certificates to trust within container |
| clang-format | C++ Code Formatting for ldmx-sw |
| libboost-all-dev | C++ Utilities for Acts and ldmx-sw |
| libssl-dev | Securely interact with other computers and encrypt files |
| clang | C++ Alternative Compiler for ldmx-sw |
| clang-tidy | C++ Static Analyzer for ldmx-sw |
| clang-tools | Additional development tools for ldmx-sw |
| cmake-curses-gui | GUI for inspecting CMake configuration |
| gdb | GNU DeBugger for ldmx-sw development |
| libasan8 | Address sanitization for ldmx-sw |
| lld | alternative linker for ldmx-sw |
Python Packages
Many Python packages are useful for late stage analysis of data, so many are included within this image to make them readily available for downstream LDMX collaborators.
Do you need to use ldmx/dev?
If you are just using Python and supporting libraries (like uproot, awkward, and hist)
for your analysis, then you do not need to use this large and heavy image.
Instead, it is recommended for you to craft your own environment which is lighter (making
it easier to reproduce and move around clusters) and more nimble (allowing you to upgrade
packages if you want to).
cd my-analysis
denv init python:3.12 # choose python version
denv pip install scikit-hep # install packages you need
denv pip freeze > requirements.txt # write packages you've used for later reproducibility
git add requirements.txt .denv/config # store the configuration in git with your analysis code
Then, using the code in my-analysis on any other computer with denv installed would
get started by
git clone git@github.com:my-username/my-analysis.git
cd my-analysis
denv pip install -r requirements.txt
No denv init is required since the .denv/config is present in the my-analysis git repository.
The only time you need to use the Python packages within the ldmx/dev image
is if you are using PyROOT (import ROOT) or the ldmx-sw ROOT dictionary.
If you use the scikit-hep libraries, then you can move to this more nimble
workflow.
What's installed?
Many python packages evolve at a faster rate than we build the ldmx/dev image, so many of the packages installed within the ldmx/dev image will be behind what has been most recently released.
You can get the full list and specific versions of the Python packages installed
within your ldmx/dev image by running denv pip freeze.
The table below is just documentation of the packages we request to be installed
and not the full dependency tree.
scikit-hep- Meta-package holding scikit-hep libraries
- Includes (among others)
uproot,mplhep,hist,awkward,pandas,vector, andpylhe
pyhepmc- IO for the HEPMC data format
pip,wheel,setuptools- Helpful for packaging and downstream upgrading
Cython- Write C extensions for Python
numba- Pre-compile Python functions for performance improvements
scikit-learn,xgboost- Machine Learning libraries
Runtime Configuration
These images are built with all of the software necessary to build and run ldmx-sw; however, there are additional steps that need to be taken when a container is launched from the image in order to make running ldmx-sw easier.
The ldmx-env-init.sh script is a shell script that defines some necessary environment
variables for the user. This script is run after a container is initially constructed
and so it can access the directories that are mounted to a container at runtime.
We use denv to run these images and denv
uses a login shell to run the requested commands within the container while
mounting the denv_workspace directory to the container's ${HOME}.
Additionally, denv copies any files from /etc/skel/
(the "skeleton" area holding default initialization files for new user home directories)
into the container's HOME directory if they do not exist yet.
This enables us, as an image creator, to update files within /etc/skel/
with our custom initialization.
If you are updating the image and find yourself needing to include environment
variables that need to know runtime information (like the full path to mounted directories),
then these environment variables should be defined in ldmx-env-init.sh or /etc/skel/.profile
so that users get them "automatically" with denv
We do this by updating the /etc/skel/.profile file at the end of the Dockerfile,
defining LDMX_BASE to be the HOME directory if it isn't defined.
This mapping of LDMX_BASE to HOME is only necessary to support the legacy usage
of these images with the custom entrypoint script entry.sh and the ldmx suite
of bash functions defined within ldmx-sw/scripts/ldmx-env.sh.
Using denv with Images Prior to v4.2.2
Images before v4.2.2 did not update /etc/skel/.profile and so users need to
manually update the .profile in order to functionally use the image with denv.
- Run a dummy command to copy over the initial
.profile - Define
LDMX_BASEasHOMEin your.profile - Copy the
ldmx-env-init.shscript into your.profile
For example
# 1.
denv true
# 2.
printf "%s\n" \
"# make sure LDMX_BASE is defined for ldmx-env-init.sh" \
"if [ -z \"\${LDMX_BASE+x}\" ]; then" \
" export LDMX_BASE=\"\${HOME}\"" \
"fi" \
>> .profile
# 3.
curl -s https://raw.githubusercontent.com/LDMX-Software/dev-build-context/refs/heads/main/ldmx-env-init.sh \
>> .profile
GitHub Workflows for Development Image
The definition of these workflows are located at .github/workflows in the repository.
The Continuous Integration (CI) workflow is split into three parts.
- Build: In separate and parallel jobs, build the image for the different architectures we want to support. Push the resulting image (if successfully built) to DockerHub only using its sha256 digest.
- Merge: Create a manifest for the images built earlier that packages the different architectures together into one tag. Container managers (like docker and singularity) will then deduce from this manifest which image they should pull for the architecture they are running on.
- Test: Check that ldmx-sw can compile and run a basic simulation for various versions of ldmx-sw.
We only test after a successful build so, if the tests fail, users can pull the image and debug why the tests are failing locally.
ldmx-sw Test Versions
The CI can only test a finite number of versions of ldmx-sw - in general, we'd like to make sure the past few minor versions are supported (with the possibility of interop patches) by the new image build while also enabling support for the newest ldmx-sw.
The versions that will be tested are in ci/ldmx-sw-to-test.json
which is periodically updated by another workflow that checks for new ldmx-sw releases.
This workflow is not perfect, so it is helpful to keep the following in mind when a new ldmx-sw
release is posted and the workflow generates a PR updating the JSON file above.
- Test newest developments of ldmx-sw (
trunk) - Test minimum version supported (currently set at
v3.3.0) - Test highest patch-number for each minor version in between (
v3.0.2,v3.1.13,v3.2.12) - Test additional releases specifically related to changes to the image
- For example
v4.2.15has some patches to ldmx-sw enabling support for the newer GCC in v5 images.
- For example
GitHub Actions Runner
The image builds take a really long time since we are building many large packages from scratch and sometimes emulating a different architecture than the one doing the image building. For this reason, we needed to move to a self-hosted runner solution.
Pulling by Digest
You may want to pull an image by its digest because the manifest that creates a more helpful tag has not been created (e.g. the other architecture's build is still running or the merge step failed). You can do this by downloading the digest artifact from the workflow run (at the bottom of the CI Workflow "Summary" page).
The digest is stored as the name of an empty file in this artifact. We first copy this file name into a shell variable.
cd ~/Downloads
mkdir digest
cd digest
unzip ../digest-amd64.zip
export digest=$(ls *)
cd ..
rm -r digest digest-amd64.zip
The builds referenced by digest are architecture specific. Grouping them together into a manifest allows the runner to choose the image based on the host computer's architecture. This means you must choose the digest artifact corresponding to your computer's architecture.
Next, we need to download the image using the 64-character digest stored in ${digest}.
Docker/Podman
Below, I use docker but you can also do the same commands with podman in place of docker.
docker pull ldmx/dev@sha256:${digest}
docker tag ldmx/dev@sha256:${digest} ldmx/dev:some-helpful-name
Apptainer
I have only ever needed to do this on my laptop with docker or podman installed; however, I'm pretty sure this will work.
apptainer build ldmx_dev_some-helpful-name.sif docker://ldmx/dev@sha256:${digest}
Legacy Interop
For some past versions of ldmx-sw, we need to modify the code slightly
in order for it to be able to be built by the newer containers.
For this reason, we have a set of patch files (the ci/interop directory).
The patch files here are intended to patch older versions of ldmx-sw so that they can be built with newer images that have newer dependencies and compilers.
They are git applyed within ldmx-sw and are applied before the configuration (cmake)
step so that they can modify the build configuration files if need be.
For creating a patch files, there is a small script in ci/interop
that runs the appropriate git commands for you.
# inside of the ldmx-sw you have patched
path/to/dev-build-context/ci/interop/save-patch
Many versions of ldmx-sw require the same patch and so instead of copying the same file, I have just symlinked a specific version's patch file to the previous version so that developers only need to update a patch file for the version where the (now breaking) change was introduced.
GitHub and Self-Hosted Runners
We've run into the main obstacle with using the free, GitHub-hosted runners - a single job is required to run for less than six hours. Instead of attempting to go through the difficult process of partitioning the image build into multiple jobs that each take less than six hours, we've chosen to attach a self-hosted runner to this repository which allows us to expand the time limit up to 35 days as we see fit. This document outlines the motivations for this change, how this was implemented, how to maintain the current self-hosted runner, and how to revert this change if desired.
Motivation
As mentioned above, the build step for this build context lasts longer than six
hours for non-native architecture builds. Most (if not all) of GitHub's runners
have the amd64 architecture, but we desire to also build an image for the arm64
architecture since several of our collaborators have arm-based laptops they are
working on (most prominently, Apple's M series). A build for the native architecture
takes roughly three hours while a build for a non-native architecture, which
requires emulation with qemu, takes about ten times as long (about 30 hours).
This emulation build is well over the six hour time limit of GitHub runners and
would require some serious development of the Dockerfile and build process in
order to cut it up into sub-jobs each of which were a maximum of six hours long
(especially since some build steps were taking longer than six hours).
Putting all this information together, a natural choice was to move the building of these images to a self-hosted runner in order to (1) support more architectures besides amd64, (2) avoid an intricate (and potentially impossible) redesign of the build process, and (3) expand the job time limit to include a slower emulation build.
Implementation
While individual builds take a very long time, we do not do a full build of the image very frequently. In fact, besides the OS upgrade, there hasn't been a change in dependencies for months at a time. Thus, we really don't require a highly performing set of runners. In reality, we simply need a single machine that can host a handful of runners to each build a single architecture image one at a time in a single-threaded, single-core manner.
Once we enter the space of self-hosted runners, there is a lot of room to explore different customization options. The GitHub runner application runs as a user on the machine it is launched from so we could highly-specialize the environment of that machine so the actions it performs are more efficient. I chose to not go this route because I am worried about the maintanability of self-hosted runners for a relatively small collaboration like LDMX. For this reason, I chose to attempt to mimic a GitHub runner as much as possible in order to reduce the number of changes necessary to the workflow definition - allowing future LDMX collaborators to stop using self-hosted runners if they want or need to.
Workflow Definition
In the end, I needed to change the workflow definition in five ways.
runs-on: self-hosted- tell GitHub to use the registered self-hosted runners rather than their owntimeout-minutes: 43200- increase job time limit to 30 days to allow for emulation build to complete- Add
type=localcache at a known location within the runner filesystem - Remove
retention-dayslimit since the emulation build may take many days - Add the
linux/arm64architecture to the list of platforms to build on
The local cache is probably the most complicated piece of the puzzle and I will not attempt to explain it here since I barely understand it myself. For future workflow developers, I'd point you to Docker's cache storage backends and cache management with GitHub actions documentation. The current implementation stores a cache of the layers created during the build process on the local filesystem (i.e. on the disk of the runner). These layers need to be separated by platform so that the different architectures do not interfere with each other.
Self-Hosted Runner
GitHub has good documentation on Self-Hosted Runners that you should look at if you want to learn more. This section is merely here to document how the runners for this repository were configured.
First, I should note that I put all of this setup inside a Virtual Machine on the computer I was using in order to attempt to keep it isolated. This is meant to provide some small amount of security since GitHub points out that a malicious actor could fork this repository and run arbitrary code on the machine by making a PR .1
- VM with 2 cores and ~3/4 of the memory of the machine
- Ubuntu 22.04 Minimal Server
- Install OpenSSH so we can connect to the VM from the host machine
- Have
githubbe the username (so the home directory corresponds to the directory in the workflow) - Make sure
tmuxis installed so we can startup the runner and detach - Get the IP of the VM with
hostname -Iso we can SSH into it from the host- I update the host's SSH config to give a name to these IP addresses so its easier to remember how to connect.
- From here on, I am just SSHing to the VM from the host. This makes it easier to copy in commands copied from the guides linked below.
- Install docker engine
- Follow post-install instructions to allow
dockerto be run by users - Follow Add a Self-Hosted Runner
treating the VM as the runner and not the host
- add the
UMNlabel to the runners during config so LDMX knows where they are
- add the
- Repeat these steps for each of the runners (isolating the runners from the host and each other)
- We did attempt to have the runners share a VM and a layer cache, but this was causing issues when two jobs were trying to read from the same layer cache and one was completing before the other LDMX-Software/dev-build-context Issue #69
I'd like to emphasize how simple this procedure was. GitHub has put a good amount of effort into making Self-Hosted runners easy to connect, so I'd encourage other LDMX institutions to contribute a node if the UMN one goes down.
Besides this isolation step, I further isolated this node by working with our IT department to take control of the node - separating it from our distributed filesystem hosting data as well as our login infrastructure.
Maintenance
The maintenance for this runner is also relatively simple. Besides the obvious steps of checking to make sure that it is up and running on a periodic basis, someone at UMN2 should log into the node periodically to check how much disk space is available within the VM. This needs to be done because I have implemented the workflow and the runner to not delete the layer cache ever. This is done because we do not build the image very frequently and so we'd happily keep pre-built layers for months or longer if it means adding a new layer on at the end will build faster.
A full build of the image from scratch takes ~1.8GB of cache and we have allocated ~70GB to the cache inside the VM. This should be enough for the foreseeable future.
Future improvements to this infrastructure could include adding a workflow whose job
is to periodically connect to the runner, check the disk space, and - if the disk space
is lacking space - attempt to clean out some layers that aren't likely to be used again.
docker buildx has cache maintenance tools that we could leverage if we specialize
the build action more by having the runner be configured with pre-built docker builders
instead of using the setup-buildx action as a part of our workflow. I chose to not go
this route so that it is easier to revert back to GitHub building were persisting docker
builders between workflow runs is not possible.
For UMN folks, the username/password for the node and the VM within it are written down in the room the physical node is in. The node is currently in PAN 424 on the central table.
Revert
This is a somewhat large infrastructure change, so I made the concious choice to leave reversion easy and accessible. If a self-hosted runner becomes infeasible or GitHub changes its policies to allow longer job run times (perhaps through some scheme of total job run time limit rather than individual job run time limit), we can go back to GitHub-hosted runners for the building by updating the workflow.
runs-on: ubuntu-latest- use GitHub's ubuntu-latest runners- remove
timeout-minutes: 43200which will drop the time limit back to whatever GitHub imposes (6hrs right now) - remove caching parameters (GitHub's caching system is too short-lived and too small at the free tier to be useful for our builds)
- remove the
linux/arm64architecture from the list of platforms to build on (unless GitHub allows jobs a longer run time)- Images can still be built for the arm architecture, but they would need to happen manually by the user with that computer or by someone willing to run the emulation and figure out how to update the manifest for an image tag