Development Container for ldmx-sw

Docker build context for developing and running ldmx-sw: Docker Hub

There is a corresponding workflow in ldmx-sw that generates a production docker container using the container generated by this build context as a base image. This production container already has ldmx-sw built and installed on it and assumes the user wants to run the application.

Use in ldmx-sw

In ldmx-sw, an environment script is defined in bash to setup the environment for both docker and singularity correctly. A description of this setup process is given for both docker and singularity if you desire more information.

Current Container Configuration

Direct Dependecy of ldmx-sw	Version	Construction Process
Ubuntu Server	22.04	Base Image
Python	3.10.6	From Ubuntu Repos
cmake	3.22.1	From Ubuntu Repos
XercesC	3.2.4	Built from source
Pythia6	6.428	Built from source
ROOT	6.22/08	Built from source
Geant4	LDMX.10.2.3_v0.5	Built from source
Eigen	3.4.0	Built from source
LHAPDF	6.5.3	Built from source
GENIE	3.02.00	Built from source
Catch2	3.3.1	Built from source
ONNX Runtime	1.15.0	Download pre-built binaries

A detailed list of all packages installed from ubuntu repositories is given here, and documentation on the workflow and runner used to build the image is here.

Python Packages for Analyses

Installed in Python 3.

pip
Cython
numpy
uproot
matplotlib
xgboost
sklearn

Other Configuration

SSL Certificates that will be trusted by container are in the certs directory

Other Packages

If you would like another package included in the development container, please open an issue in this repository.

Container with Custom Geant4

Geant4 is our main simulation engine and it has a large effect on the products of our simulation samples. As such, it is very common to compare multiple different versions, patches, and tweaks to Geant4 with our simulation.

There are two different methods for using a custom Geant4. The first one listed is newer but more flexible and is the preferred path forward to prevent the proliferation of ldmx/dev images.

Locally Build Geant4

With release 4.2.0 of the ldmx/dev image, the entrypoint script now checks the environment variable LDMX_CUSTOM_GEANT4 for a path to a local installation of Geant4. This allows the user to override the Geant4 that is within the image with one that available locally. In this way, you can choose whichever version of Geant4 you want, with whatever code modifications applied, with whatever build instructions you choose.

Building Your Geant4

You can build your Geant4 in a similar manner as ldmx-sw. It does take much longer to compile than ldmx-sw since it is larger, so be sure to leave enough time for it.

Remember

You can only run this custom build of Geant4 with whatever container you are building it with, so make sure you are happy with the container version you are using.

cd ${LDMX_BASE}
git clone git@github.com:LDMX-Software/geant4.git # or could be mainline Geant4 or an unpacked tar-ball
cd geant4
mkdir build
cd build
ldmx cmake <cmake-options> ..
ldmx make install

Now building Geant4 from source has a lot of configuration options that can be used to customize how it is built. Below are a few that are highlighted for how we use containers and their interaction with the Geant4 build.

CMAKE_INSTALL_PREFIX: This should be set to a path accessible from the container so that the programs within the container can read from and write to this directory. If the geant4 build directory is within LDMX_BASE (like it is above), then you could do something like -DCMAKE_INSTALL_PREFIX=../install when you run ldmx cmake within the build directory.
GEANT4_INSTALL_DATADIR: If you are building a version of Geant4 that has the same data files as the Geant4 version built into the container image, then you can tell the Geant4 build to use those data files with this option, saving build time and disk space. This is helpful if (for example) you are just re-building the same version of Geant4 but in Debug mode. You can see where the Geant4 data is within the container with ldmx 'echo ${G4DATADIR}' and then use this value -DGEANT4_INSTALL_DATADIR=/usr/local/share/geant4/data.

The following are the build options used when setting up the container and are likely what you want to get started

-DGEANT4_USE_GDML=ON Enable reading geometries with the GDML markup language which is used in LDMX-sw for all our geometries
-DGEANT4_INSTALL_EXAMPLES=OFF Don't install the Geant4 example applications (just to save space and compilation time)
-DGEANT4_USE_OPENGL_X11=ON enable an X11-based GUI for inspecting geometries
-DGEANT4_MULTITHREADED=OFF If you are building a version of Geant4 that is multithreaded by default, you will want to disable it with. The dynamic loading used in LDMX-sw will often not work with a multithreaded version of Geant4

Concerns when building different versions of Geant4 than 10.2.3

For most use cases you will be building a modified version of the same release of Geant4 that is used in the container (10.2.3). It is also possible to build and use later versions of Geant4 although this should be done with care. In particular

Different Geant4 release versions will require that you rebuild LDMX-sw for use with that version, it will not be sufficient to set the LDMX_CUSTOM_GEANT4 environment variable and pick up the shared libraries therein
Recent versions of Geant4 group the electromagnetic processes for each particle into a so-called general process for performance reasons. This means that many features in LDMX-sw that rely on the exact names of processes in Geant4 will not work. You can disable this by inserting something like the following in RunManager::setupPhysics()

// Make sure to include G4EmParameters if needed
auto electromagneticParameters {G4EmParameters::Instance()};
// Disable the use of G4XXXGeneralProcess,
// i.e. G4GammaGeneralProcess and G4ElectronGeneralProcess
electromagneticParameters->SetGeneralProcessActive(false);

Geant4 relies on being able to locate a set of datasets when running. For builds of 10.2.3, the ones that are present in the container will suffice but other versions may need different versions of these datasets. If you run into issues with this, use ldmx env and check that the following environment variables are pointing to the right location
GEANT4_DATA_DIR should point to $LDMX_CUSTOM_GEANT4/share/Geant4/data
- You can define the LDMX_CUSTOM_GEANT4_DATA_DIR environment variable in the container environment to manually point it to a custom location
The following environment variables should either be unset or point to the correct location in GEANT4_DATA_DIR
- G4NEUTRONHPDATA
- G4LEDATA
- G4LEVELGAMMADATA
- G4RADIOACTIVEDATA
- G4PARTICLEXSDATA
- G4PIIDATA
- G4REALSURFACEDATA
- G4SAIDXSDATA
- G4ABLADATA
- G4INCLDATA
- G4ENSDFSTATEDATA
When using CMake, ensure that the right version of Geant4 is picked up at configuration time (i.e. when you run ldmx cmake)
- You can always check the version that is used in a build directory by running ldmx ccmake . in the build directory and searching for the Geant4 version variable
- If the version is incorrect, you will need to re-configure your build directory. If cmake isn't picking up the right Geant4 version by default, ensure that the CMAKE_PREFIX_PATH is pointing to your version of Geant4
Make sure that your version of Geant4 was built with multithreading disabled

Geant4 Data Duplication

The Geant4 datasets do not evolve as quickly as the source code that uses them. We have a copy of the data needed for the LDMX standard version within the container (v10.2.3 currently) and you can inspect the versions of the datasets that have changed between the version in the container image and the one you want to build to see which datasets you may need to install.

The file cmake/Modules/Geant4DatasetDefinitions.cmake in the Geant4 source code has these versions for us (The name changed from Geant4Data... to G4Data... in v10.7.0) and we can use this file to check manually which datasets need to be updated when running a newer version. Below, I'm comparing Geant4 v10.3.0 and our current standard.

diff \
  --new-line-format='+%L' \
  --old-line-format='-%L' \
  --unchanged-line-format=' %L' \
  <(wget -q -O - https://raw.githubusercontent.com/LDMX-Software/geant4/LDMX.10.2.3_v0.5/cmake/Modules/Geant4DatasetDefinitions.cmake) \
  <(wget -q -O - https://raw.githubusercontent.com/Geant4/geant4/v10.3.0/cmake/Modules/Geant4DatasetDefinitions.cmake)

Output

 # - Define datasets known and used by Geant4
 # We keep this separate from the Geant4InstallData module for conveniance
 # when updating and patching because datasets may change more rapidly.
 # It allows us to decouple the dataset definitions from how they are
 # checked/installed/configured
 #
 
 # - NDL
 geant4_add_dataset(
   NAME      G4NDL
   VERSION   4.5
   FILENAME  G4NDL
   EXTENSION tar.gz
   ENVVAR    G4NEUTRONHPDATA
   MD5SUM    fd29c45fe2de432f1f67232707b654c0
   )
 
 # - Low energy electromagnetics
 geant4_add_dataset(
   NAME      G4EMLOW
-  VERSION   6.48
+  VERSION   6.50
   FILENAME  G4EMLOW
   EXTENSION tar.gz
   ENVVAR    G4LEDATA
-  MD5SUM    844064faa16a063a6a08406dc7895b68
+  MD5SUM    2a0dbeb2dd57158919c149f33675cce5
   )
 
 # - Photon evaporation
 geant4_add_dataset(
   NAME      PhotonEvaporation
-  VERSION   3.2
+  VERSION   4.3
   FILENAME  G4PhotonEvaporation
   EXTENSION tar.gz
   ENVVAR    G4LEVELGAMMADATA
-  MD5SUM    01d5ba17f615d3def01f7c0c6b19bd69
+  MD5SUM    012fcdeaa517efebba5770e6c1cbd882
   )
 
 # - Radioisotopes
 geant4_add_dataset(
   NAME      RadioactiveDecay
-  VERSION   4.3.2
+  VERSION   5.1
   FILENAME  G4RadioactiveDecay
   EXTENSION tar.gz
   ENVVAR    G4RADIOACTIVEDATA
-  MD5SUM    ed171641682cf8c10fc3f0266c8d482e
+  MD5SUM    994853b153c6f805e60e2b83b9ac10e0
   )
 
 # - Neutron XS
 geant4_add_dataset(
   NAME      G4NEUTRONXS
   VERSION   1.4
   FILENAME  G4NEUTRONXS
   EXTENSION tar.gz
   ENVVAR    G4NEUTRONXSDATA
   MD5SUM    665a12771267e3b31a08c622ba1238a7
   )
 
 # - PII
 geant4_add_dataset(
   NAME      G4PII
   VERSION   1.3
   FILENAME  G4PII
   EXTENSION tar.gz
   ENVVAR    G4PIIDATA
   MD5SUM    05f2471dbcdf1a2b17cbff84e8e83b37
   )
 
 # - Optical Surfaces
 geant4_add_dataset(
   NAME      RealSurface
   VERSION   1.0
   FILENAME  RealSurface
   EXTENSION tar.gz
   ENVVAR    G4REALSURFACEDATA
   MD5SUM    0dde95e00fcd3bcd745804f870bb6884
   )
 
 # - SAID
 geant4_add_dataset(
   NAME      G4SAIDDATA
   VERSION   1.1
   FILENAME  G4SAIDDATA
   EXTENSION tar.gz
   ENVVAR    G4SAIDXSDATA
   MD5SUM    d88a31218fdf28455e5c5a3609f7216f
   )
 
 # - ABLA
 geant4_add_dataset(
   NAME      G4ABLA
   VERSION   3.0
   FILENAME  G4ABLA
   EXTENSION tar.gz
   ENVVAR    G4ABLADATA
   MD5SUM    d7049166ef74a592cb97df0ed4b757bd
   )
 
 # - ENSDFSTATE
 geant4_add_dataset(
   NAME      G4ENSDFSTATE
-  VERSION   1.2.3
+  VERSION   2.1
   FILENAME  G4ENSDFSTATE
   EXTENSION tar.gz
   ENVVAR    G4ENSDFSTATEDATA
-  MD5SUM    98fef898ea35df4010920ad7ad88f20b
+  MD5SUM    95d970b97885aeafaa8909f29997b0df
   )

As you can see, while only a subset of the datasets change, some of them do change. Unless you are planning to compare several different Geant4 versions that all share mostly the same datasets, it is easier just to have each Geant4 version have its own downloaded copies of the datasets.

Running with your Geant4

Just like with ldmx-sw, you can only run a specific build of Geant4 in the same container that you used to build it.

ldmx setenv LDMX_CUSTOM_GEANT4=/path/to/geant4/install

If you followed the procedure above, the Geant4 install will be located at ${LDMX_BASE}/geant4/install and you can use this in the setenv command.

ldmx setenv LDMX_CUSTOM_GEANT4=${LDMX_BASE}/geant4/install

By default the container will produce a rather verbose warning when using a custom Geant4 build. This is to avoid reproducibility issues caused by accidental use of the feature. You can disable it by defining the LDMX_CUSTOM_GEANT4_CONFIRM_DEV environment variable in the container environment

ldmx setenv LDMX_CUSTOM_GEANT4=${LDMX_BASE}/geant4/install 
ldmx ... # Some command 
> Warning: You are relying on a non-container version of Geant4. This mode of operation can come with some reproducibility concerns if you aren't careful. # The actual warning is longer than this...
ldmx setenv LDMX_CUSTOM_GEANT4_CONFIRM_DEV=yes # Can be anything other than an empty string 
ldmx ... # No warning!

Remote Build

You could also build your custom version of Geant4 into the image itself. The container is allowed to build (almost) any release of Geant4, pulling either from the official Geant4 repository or pulling from LDMX's fork if "LDMX" appears in the tag name requested.

Most of the newer versions of Geant4 can be built the same as the current standard LDMX.10.2.3_v0.4, so to change the tag that you want to use in the container you simply need to change the GEANT4 parameter in the Dockerfile.

...a bunch of crap...
ENV GEANT4=LDMX.10.2.3_v0.4 #CHANGE ME TO YOUR TAG NAME
... other crap ...

Changing this parameter could be all you need, but if the build is not completing properly, you may need to change the RUN command that actually builds Geant4.

Building the Image

To build a docker container, one would normally go into this repository and simply run docker build . -t ldmx/dev:my-special-tag, but since this container takes so long to build, if you are only making a small change, you can simply create a new branch in this repository and push it up to the GitHub repository. In this repository, there are repo actions that will automatically attempt to build the image for the container and push that image to DockerHub if it succeeds. Any non-main branches will be pused to DockerHub under the name of the branch, for example, the branch geant4.10.5 contains the same container as our main but with a more recent version of Geant4:

$ git diff geant4.10.5 main
diff --git a/Dockerfile b/Dockerfile
index 9627e00..0fb31e8 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -131,7 +131,7 @@ RUN mkdir xerces-c && cd xerces-c &&\
 #  - G4DIR set to path where Geant4 should be installed
 ###############################################################################
-ENV GEANT4=geant4-10.5-release
+ENV GEANT4=LDMX.10.2.3_v0.4
 LABEL geant4.version="${GEANT4}"

And this is enough to have a new container on DockerHub with the Geant4 version 10.5 under the Docker tag ldmx/dev:geant4.10.5, so one would use this container by calling ldmx pull dev geant4.10.5

Using Parallel Containers

Sometimes users wish to compare the behavior of multiple containers without changing the source code of ldmx-sw (or a related repository) very much if at all. This page documents how to use two (or more) containers in parallel.

Normally, when users switch containers, they need to full re-build after fully cleaning out all of the generated files (usually with ldmx clean src). This method avoids this connection between a full re-build and switching containers at the cost of extra complexity.

The best way to document this is by outlining an example; however, please note that this can easily be expanded to any number of containers you wish (and could be done with software that is not necessarily ldmx-sw). Let's call the two containers we wish to use alice and bob, both of which are already built (i.e. they are seen in the list returned by ldmx list dev).

1. Clean Up Environment

cd ~/ldmx/ldmx-sw # go to ldmx-sw
ldmx clean src # make sure clean build

2. Build for Both Containers

ldmx use dev alice # going to build with alice first
ldmx cmake -B alice/build -S . -DCMAKE_INSTALL_PREFIX=alice/install
cd alice/build
ldmx make install
cd ../..
ldmx use dev bob # now lets build with bob
ldmx cmake -B bob/build -S . -DCMAKE_INSTALL_PREFIX=bob/install
cd bob/build
ldmx make install
cd ../..

3. Run with a container

The container looks at a specific path for libraries to link and executables to run that were built by the user within the container. In current images (based on version 3 or newer), this path is ${LDMX_BASE}/ldmx-sw/install. Note: Later images may move this path to ${LDMX_BASE}/.container-install or similar, in which case, the path that you symlink the install to will change.

# I want to run alice so I need its install in the location where
# the container looks when it runs (i.e. ldmx-sw/install)
ln -sf alice/install install
ldmx use dev alice
ldmx fire # runs ldmx-sw compiled with alice
ln -sf bob/install install
ldmx use dev bob
ldmx fire # runs ldmx-sw compiled with bob

Contributing

All contributions are welcome. From fixing typos in these documentation pages to patching a bug in the event reconstruction code to adding a new simulatino process. Please reach out via GitHub issues or on the LDMX slack to get started.

To contribute code to the project, you will need to create an account on github if you don't have one already, and then request to be added to the LDMX-Software organization.

When adding new code, you should do this on a branch created by a command like git checkout -b johndoe-dev in order to make sure you don't apply changes directly to the master (replace "johndoe" with your user name). We typically create branches based on issue names in the github bug tracker, so "Issue 1234: Short Description in Title" turns into the branch name 1234-short-desc.

Then you would git add and git commit your changes to this branch.

If you don't already have SSH keys configured, look at the GitHub directions. This makes it easier to push/pull to/from branches on GitHub!

Pull Requests

We prefer that any major code contributions are submitted via pull requests so that they can be reviewed before changes are merged into the master.

Before you start, an issue should be added to the issue tracker.

Branch Name Convention

Then you should make a local branch from trunk using a command like git checkout -b 1234-short-desc where 1234 is the issue number from the issue tracker and short-desc is a short description (using - as spaces) of what the branch is working one.

Once you have committed your local changes to this branch using the git add and git commit commands, then push your branch to github using a command like git push -u origin 1234-short-desc.

Finally, submit a pull request to integrate your changes by selecting your branch in the compare dropdown box and clicking the green buttons to make the PR. This should be reviewed and merged or changes may be requested before the code can be integrated into the master.

If you plan on starting a major (sub)project within the repository like adding a new code module, you should give advance notice and explain your plains beforehand. :) A good way to do this is to create a new issue. This allows the rest of the code development team to see what your plan is and offer comments/questions.

Ubuntu Packages

Here I try to list all of the installed ubuntu packages and give an explanation of why they are included. Lot's of these packages are installed into the ROOT official docker container and so I have copied them here. I have looked into their purpose by a combination of googling the package name and looking at ROOT's reason for them.

In the Dockerfile, most packages are added when they are needed for the rest of the build. Adding packages before they are needed means the container needs to be rebuilt starting from the point you add them, so it is a good habit to avoid doing so. There is a helper script installed in the container install-ubuntu-packages that can be called directly in the Dockerfile with a list of packages to install.

If you want to add additional packages that aren't necessary for building ldmx-sw, its dependencies, or the container environment use the install command at the end of the Dockerfile.

Note: If you are looking to add python packages, prefer adding them to the python packages file rather than installing them from the ubuntu repositories.

Package	Necessary	Reason
apt-utils	Yes	Necessary for distrobox support
autoconf	Yes	Configuration of log4cpp build, needed for GENIE
automake	Yes	Configuration of log4cpp build, needed for GENIE
bc	Yes	Necessary for distrobox support
binutils	Yes	Adding PPA and linking libraries
ca-certificates	Yes	Installing certificates to trust in container
clang-format	Yes	LDMX C++ code formatting
cmake	Yes	Make configuration, v3.22.1 available in Ubuntu 22.04 repos
curl	Yes	Necessary for distrobox support
dialog	Yes	Necessary for distrobox support
diffutils	Yes	Necessary for distrobox support
davix-dev	No	Remote I/O, file transfer and file management
dcap-dev	Unknown	C-API to the DCache Access Protocol
dpkg-dev	No	Old Installation from PPA
findutils	Yes	Necessary for distrobox support
fish	Yes	Shell necessary for distrobox support
fonts-freefont-ttf	Yes	Fonts for plots
g++	Yes	Compiler with C++17 support, v11 available in Ubuntu 22.04 repos
gcc	Yes	Compiler with C++17 support, v11 available in Ubuntu 22.04 repos
gdb	No	Supporting debugging LDMX-sw programs within the container
gfortran	Yes	FORTRAN compiler; needed for compiling Pythia6, which in turn is needed for GENIE
gnupg2	Yes	Necessary for distrobox support
git	No	Old Downloading dependency sources
less	Yes	Necessary for distrobox support
libafterimage-dev	Yes	ROOT GUI depends on these for common shapes
libasan8	No	Runtime components for the compiler based instrumentation tools that come with GCC
libboost-all-dev	Yes	Direct ldmx-sw dependency, v1.74 available in Ubuntu 22.04 repos, v1.71 required by ACTS
libcfitsio-dev	No	Reading and writing in FITS data format
libfcgi-dev	No	Open extension of CGI for internet applications
libfftw3-dev	Yes	Computing discrete fourier transform
libfreetype6-dev	Yes	Fonts for plots
libftgl-dev	Yes	Rendering fonts in OpenGL
libgfal2-dev	No	Toolkit for file management across different protocols
libgif-dev	Yes	Saving plots as GIFs
libgl1-mesa-dev	Yes	MesaGL allowing 3D rendering using OpenGL
libgl2ps-dev	Yes	Convert OpenGL image to PostScript file
libglew-dev	Yes	GLEW library for helping use OpenGL
libglu-dev	Yes	OpenGL Utility Library
libgraphviz-dev	No	Graph visualization library
libgsl-dev	Yes	GNU Scientific library for numerical calculations; needed for GENIE
libjpeg-dev	Yes	Saving plots as JPEGs
liblog4cpp5-dev	Yes	Dependency of GENIE
liblz4-dev	Yes	Data compression
liblzma-dev	Yes	Data compression
libmysqlclient-dev	No	Interact with SQL database
libnss-myhostname	Yes	Necessary for distrobox support
libpcre++-dev	Yes	Regular expression pattern matching
libpng-dev	Yes	Saving plots as PNGs
libpq-dev	No	Light binaries and headers for PostgreSQL applications
libpythia8-dev	No	Pythia8 HEP simulation
libsqlite3-dev	No	Interact with SQL database
libssl-dev	Yes	Securely interact with other computers and encrypt files
libtbb-dev	No	Multi-threading
libtiff-dev	No	Save plots as TIFF image files
libtool	Yes	Needed for log4cpp build, in turn needed for GENIE
libvte-2.9[0-9]-common	Yes	Necessary for distrobox support
libvte-common	Yes	Necessary for distrobox support
libx11-dev	Yes	Low-level window management with X11
libxext-dev	Yes	Low-level window management
libxft-dev	Yes	Low-level window management
libxml2-dev	Yes	Low-level window management
libxmu-dev	Yes	Low-level window management
libxpm-dev	Yes	Low-level window management
libz-dev	Yes	Data compression
libzstd-dev	Yes	Data compression
lsof	Yes	Necessary for distrobox support
locales	Yes	Configuration of TPython and other python packages
make	Yes	Building dependencies and ldmx-sw source
ncurses-base	Yes	Necessary for distrobox support
passwd	Yes	Necessary for distrobox support
pinentry-curses	Yes	Necessary for distrobox support
procps	Yes	Necessary for distrobox support
python3-dev	Yes	ROOT TPython and ldmx-sw ConfigurePython
python3-pip	Yes	For downloading more python packages later
python3-numpy	Yes	ROOT TPython requires numpy
python3-tk	Yes	matplotlib requires python-tk for some plotting
sudo	Yes	Necessary for distrobox support
srm-ifce-dev	Unknown	Unknown
time	Yes	Necessary for distrobox support
unixodbc-dev	No	Access different data sources uniformly
util-linux	Yes	Necessary for distrobox support
wget	Yes	Download Xerces-C source and dowload Conditions tables in ldmx-sw

Running these Images

These images are built with all of the software necessary to build and run ldmx-sw; however, the image makes a few assumptions about how it will be launched into a container so that it can effectively interact with the source code that resides on the host machine.

We can enforce other choices on users to make the usage of these images easier (see the ldmx-env.sh script in ldmx-sw for details), but there are a minimum of two requirements that must be met in order for the image to operate.

The following sections provide examples of implementing the running of these images in a few common container runners. The full implementation used by LDMX is written into the ldmx-env.sh environment script.

LDMX_BASE

The environment variable LDMX_BASE must be defined for the container. This is then mounted into the container as the workspace where most (if not all) of the files the container must view on the host reside. In ldmx-sw, we choose to have LDMX_BASE be the parent directory of the ldmx-sw repository, but that is not necessary.

ldmx-sw install location

Within the entrypoint for the container, we add a few paths relative to $LDMX_BASE to various PATH shell variables so that the shell executing within the container can find executables to run and libraries to link. All of the paths are subdirectories of $LDMX_BASE/ldmx-sw/install¹ which implies that ldmx-sw must be installed to this directory if users wish to run their developments within the container. This install location is put into the ldmx-sw CMakeLists.txt so users will not need to change it unless they wish to do more advanced development.

See LDMX-Software/docker Issue #38. We may change this assumption to make it clearer that other packages could be installed at this location besides ldmx-sw.

Determining an Image's Version

Often it is helpful to determine an image's version. Sometimes, this is as easy as looking at the tag provided by docker image ls or written into the SIF file name, but sometimes this information is lost. Since v4 of the container image, we've been more generous with adding labels to the image and a standard one is included org.opencontainers.image.version which (for our purposes) stores the release that built the image.

We can inspect an image to view its labels.

# docker inspect returns JSON with all the image manifest details
# jq just helps us parse this JSON for the specific thing we are looking for,
# but you could just scroll through the output of docker inspect
docker inspect ldmx/dev:latest | jq 'map(.Config.Labels["org.opencontainers.image.version"])[]'
# apptainer inspect (by default) returns just the list of labels
# so we can just use grep to select the line with the label we care about
apptainer inspect ldmx_dev_latest | grep org.opencontainers.image.version

Use Development Container with `docker`

Assumptions

Docker engine is installed on your computer
(For linux systems), you can manage docker as a non-root user

Environment Setup

Decide what tag you want to use: export LDMX_DOCKER_TAG="ldmx/dev:my-tag"
Pull down desired docker image: docker pull ${LDMX_DOCKER_TAG}
Define a helpful alias:

alias ldmx='docker run --rm -it -e LDMX_BASE -v $LDMX_BASE:$LDMX_BASE ${LDMX_DOCKER_TAG} $(pwd)'

Define the directory that ldmx-sw is in:

cd <path-to-directory-containing-ldmx-sw>
export LDMX_BASE=$(pwd -P)

Using the Container

Prepend any commands you want to run with ldmx-sw with the container alias you defined above. For example, to configure the ldmx-sw build, ldmx cmake .. (instead of just cmake ..).

Detailed `docker run` explanation

docker \ #base docker command
    run \ #run the container
    --rm \ #clean up container after execution finishes
    -it \ #make container interactive
    -e LDMX_BASE \ #pass environment variable to container
    -v $LDMX_BASE:$LDMX_BASE \ #mount filesystem to container
    -u $(id -u ${USER}):$(id -g ${USER}) \ #act as current user
    ${LDMX_DOCKER_TAG} \ #docker image to build container from
    $(pwd) \ #go to present directory inside the continaer

Display Connection

In order to connect the display, you need to add two more parameters to the above docker run command. When running docker inside of the Windoze Subsystem for Linux (WSL), you will also need to have an external X server running outside WSL. Ubuntu has a good tutorial on how to get graphical applications running inside WSL.

Define how to interface wiith the display.
- For Linux: export LDMX_CONTAINER_DISPLAY=""
- For MacOS: export LDMX_CONTAINER_DISPLAY="docker.for.mac.host.internal"
- For WSL: export LDMX_CONTAINER_DISPLAY=$(awk '/nameserver / ${print$2; exit}' /etc/resolv.conf 2>/dev/null)¹
Define the DISPLAY environment variable for inside the container. -e DISPLAY=${LDMX_CONTAINER_DISPLAY}:0
Mount the cache directory for the window manager for the container to share. -v /tmp/.X11-unix:/tmp/.X11-unix

WSL Graphical Apps on Ubuntu Wiki

Use Development Container with Singularity

Assumptions

Singularity is installed on your computer
You have permission to run singularity build and singularity run.

Environment Setup

Decide what tag you want to use: export LDMX_DOCKER_TAG="ldmx/dev:my-tag"
Name the singularity image that will be built: export LDMX_SINGULARITY_IMG="$(pwd -P)/ldmx_dev_my-tag.sif"
Pull down desired docker image and convert it to singularity style: singularity build ${LDMX_SINGULARITY_IMG} docker://${LDMX_DOCKER_TAG}
- You may need to point singularity to a larger directory using the SINGULARITY_CACHEDIR environment variable
Define a helpful bash alias:

alias ldmx='singularity run --no-home --bind ${LDMX_BASE} --cleanenv --env LDMX_BASE=${LDMX_BASE} ${LDMX_SINGULARITY_IMG} $(pwd)'

Define the directory that ldmx-sw is in:

cd <path-to-directory-containing-ldmx-sw>
export LDMX_BASE=$(pwd -P)

Using the Container

Prepend any commands you want to run with ldmx-sw with the container alias you defined above. For example, to configure the ldmx-sw build, ldmx cmake .. (instead of just cmake ..). Notice that using this container after the above setup is identical to using this container with docker.

Detailed `singularity run` explanation

singularity's default behavior is to mount the current directory into the container. This means we go to the $LDMX_BASE directory so that the container will have access to everything inside $LDMX_BASE. Then we enter the container there before going back to where the user was while inside the container.

singularity \ #base singularity command
    run \ #run the container
    --no-home \ #don't mount home directory (might overlap with current directory)
    --bind ${LDMX_BASE} \ #mount the directory containting all things LDMX
    --cleanenv \ #don't copy the environment variables into the container
    --env LDMX_BASE=${LDMX_BASE} \ #copy the one environment variable we need shared with the container
    ${LDMX_SINGULARITY_IMG} \ #full path to singularity image to make container out of
    $(pwd) \ #go to the working directory after entering the container

Display Connection

I've only been able to determine how to connect the display when on Linux systems. The connection procedure is similar to docker.

Pass the DISPLAY environment variable to the container --env LDMX_BASE=${LDMX_BASE},DISPLAY=:0
Mount the cache directory for the window manager --bind ${LDMX_BASE},/tmp/.X11

GitHub Workflows for Development Image

The workflow is split into three parts.

Build: In separate and parallel jobs, build the image for the different architectures we want to support. Push the resulting image (if successfully built) to DockerHub only using its sha256 digest.
Merge: Create a manifest for the images built earlier that packages the different architectures together into one tag. Container managers (like docker and singularity) will then deduce from this manifest which image they should pull for the architecture they are running on.
Test: Check that ldmx-sw can compile and pass its tests at various versions for the built image.

We only test after a successful build so, if the tests fail, users can pull the image and debug why the tests are failing locally.

ldmx-sw Test Versions

The CI can only test a finite number of versions of ldmx-sw - in general, we'd like to make sure the past few minor versions are supported (with the possibility of interop patches) by the new container image build while also enabling support for the newest ldmx-sw. This means we manually write the ldmx_sw_branch for the CI to test following the rules

Test minimum version supported (currently set at v3.0.0)
Test newest developments of ldmx-sw (trunk)
Test highest patch-number for each minor version in between (v3.0.2, v3.1.13, v3.2.12)
Test additional releases specifically related to the container image
- For example v3.2.4 was the last release before necessary updates were made to support the newer GCC version in v4 images.

Legacy Interop

For some past versions of ldmx-sw, we need to modify the code slightly in order for it to be able to be built by the newer containers. For this reason, we have a set of interop scripts (the .github/interop directory). If there is a directory corresponding to the version being tested, then the CI will run the scripts in that directory before attempting to build and install ldmx-sw.

If there are interop patches, we assume that the testing is also not functional so neither the test program nor a test simulation are run.

GitHub Actions Runner

The image builds take a really long time since we are building many large packages from scratch and sometimes emulating a different architecture than the one doing the image building. For this reason, we needed to move to a self-hosted runner solution which is documented on the next page.

GitHub and Self-Hosted Runners

We've run into the main obstacle with using the free, GitHub-hosted runners - a single job is required to run for less than six hours. Instead of attempting to go through the difficult process of partitioning the image build into multiple jobs that each take less than six hours, we've chosen to attach a self-hosted runner to this repository which allows us to expand the time limit up to 35 days as we see fit. This document outlines the motivations for this change, how this was implemented, how to maintain the current self-hosted runner, and how to revert this change if desired.

Motivation

As mentioned above, the build step for this build context lasts longer than six hours for non-native architecture builds. Most (if not all) of GitHub's runners have the amd64 architecture, but we desire to also build an image for the arm64 architecture since several of our collaborators have arm-based laptops they are working on (most prominently, Apple's M series). A build for the native architecture takes roughly three hours while a build for a non-native architecture, which requires emulation with qemu, takes about ten times as long (about 30 hours). This emulation build is well over the six hour time limit of GitHub runners and would require some serious development of the Dockerfile and build process in order to cut it up into sub-jobs each of which were a maximum of six hours long (especially since some build steps were taking longer than six hours).

Putting all this information together, a natural choice was to move the building of these images to a self-hosted runner in order to (1) support more architectures besides amd64, (2) avoid an intricate (and potentially impossible) redesign of the build process, and (3) expand the job time limit to include a slower emulation build.

Implementation

While individual builds take a very long time, we do not do a full build of the image very frequently. In fact, besides the OS upgrade, there hasn't been a change in dependencies for months at a time. Thus, we really don't require a highly performing set of runners. In reality, we simply need a single machine that can host a handful of runners to each build a single architecture image one at a time in a single-threaded, single-core manner.

Once we enter the space of self-hosted runners, there is a lot of room to explore different customization options. The GitHub runner application runs as a user on the machine it is launched from so we could highly-specialize the environment of that machine so the actions it performs are more efficient. I chose to not go this route because I am worried about the maintanability of self-hosted runners for a relatively small collaboration like LDMX. For this reason, I chose to attempt to mimic a GitHub runner as much as possible in order to reduce the number of changes necessary to the workflow definition - allowing future LDMX collaborators to stop using self-hosted runners if they want or need to.

Workflow Definition

In the end, I needed to change the workflow definition in five ways.

runs-on: self-hosted - tell GitHub to use the registered self-hosted runners rather than their own
timeout-minutes: 43200 - increase job time limit to 30 days to allow for emulation build to complete
Add type=local cache at a known location within the runner filesystem
Remove retention-days limit since the emulation build may take many days
Add the linux/arm64 architecture to the list of platforms to build on

The local cache is probably the most complicated piece of the puzzle and I will not attempt to explain it here since I barely understand it myself. For future workflow developers, I'd point you to Docker's cache storage backends and cache management with GitHub actions documentation. The current implementation stores a cache of the layers created during the build process on the local filesystem (i.e. on the disk of the runner). These layers need to be separated by platform so that the different architectures do not interfere with each other.

Self-Hosted Runner

GitHub has good documentation on Self-Hosted Runners that you should look at if you want to learn more. This section is merely here to document how the runners for this repository were configured.

First, I should note that I put all of this setup inside a Virtual Machine on the computer I was using in order to attempt to keep it isolated. This is meant to provide some small amount of security since GitHub points out that a malicious actor could fork this repository and run arbitrary code on the machine by making a PR .¹

VM with 2 cores and ~3/4 of the memory of the machine
Ubuntu 22.04 Minimal Server
Install OpenSSH so we can connect to the VM from the host machine
Have github be the username (so the home directory corresponds to the directory in the workflow)
Make sure tmux is installed so we can startup the runner and detach
Get the IP of the VM with hostname -I so we can SSH into it from the host
- I update the host's SSH config to give a name to these IP addresses so its easier to remember how to connect.
- From here on, I am just SSHing to the VM from the host. This makes it easier to copy in commands copied from the guides linked below.
Install docker engine
Follow post-install instructions to allow docker to be run by users
Follow Add a Self-Hosted Runner treating the VM as the runner and not the host
- add the UMN label to the runners during config so LDMX knows where they are
Repeat these steps for each of the runners (isolating the runners from the host and each other)
- We did attempt to have the runners share a VM and a layer cache, but this was causing issues when two jobs were trying to read from the same layer cache and one was completing before the other LDMX-Software/docker Issue #69

I'd like to emphasize how simple this procedure was. GitHub has put a good amount of effort into making Self-Hosted runners easy to connect, so I'd encourage other LDMX institutions to contribute a node if the UMN one goes down.

Besides this isolation step, I further isolated this node by working with our IT department to take control of the node - separating it from our distributed filesystem hosting data as well as our login infrastructure.

Maintenance

The maintenance for this runner is also relatively simple. Besides the obvious steps of checking to make sure that it is up and running on a periodic basis, someone at UMN² should log into the node periodically to check how much disk space is available within the VM. This needs to be done because I have implemented the workflow and the runner to not delete the layer cache ever. This is done because we do not build the image very frequently and so we'd happily keep pre-built layers for months or longer if it means adding a new layer on at the end will build faster.

A full build of the image from scratch takes ~1.8GB of cache and we have allocated ~70GB to the cache inside the VM. This should be enough for the foreseeable future.

Future improvements to this infrastructure could include adding a workflow whose job is to periodically connect to the runner, check the disk space, and - if the disk space is lacking space - attempt to clean out some layers that aren't likely to be used again. docker buildx has cache maintenance tools that we could leverage if we specialize the build action more by having the runner be configured with pre-built docker builders instead of using the setup-buildx action as a part of our workflow. I chose to not go this route so that it is easier to revert back to GitHub building were persisting docker builders between workflow runs is not possible.

For UMN folks, the username/password for the node and the VM within it are written down in the room the physical node is in. The node is currently in PAN 424 on the central table.

Revert

This is a somewhat large infrastructure change, so I made the concious choice to leave reversion easy and accessible. If a self-hosted runner becomes infeasible or GitHub changes its policies to allow longer job run times (perhaps through some scheme of total job run time limit rather than individual job run time limit), we can go back to GitHub-hosted runners for the building by updating the workflow.

runs-on: ubuntu-latest - use GitHub's ubuntu-latest runners
remove timeout-minutes: 43200 which will drop the time limit back to whatever GitHub imposes (6hrs right now)
remove caching parameters (GitHub's caching system is too short-lived and too small at the free tier to be useful for our builds)
remove the linux/arm64 architecture from the list of platforms to build on (unless GitHub allows jobs a longer run time)
- Images can still be built for the arm architecture, but they would need to happen manually by the user with that computer or by someone willing to run the emulation and figure out how to update the manifest for an image tag

LDMX Development Image