Event Skimming
Another feature of our event processing framework is selecting out only certain events to be written into the output file. This is colloquially called "skimming" and is helpful for many reasons, primarily for users to be able to not waste disk space on events that are not interesting for an analysis.
Since not writing events into the output file is an inherently destructive feature, event skimming will only operate if both the config and the processors being run by the config support it. Both play a part in how the choice to keep an event is made and so they are described below.
The StorageControl class in the processing framework is what handles the event skimming decisions and the input from the config and processors.
Processor
On the C++ side of things, processors can use their protected member function setStorageHint
to "hint" to the framework what that processor wants to happen to the event. The argument
to this function is a framework::StorageControl::Hint
value as well as a string that can
be used to give a more detailed "purpose".
Hints are then used to algorithmically decide if an event should be written to the output file (kept) or not (dropped). Only hints passing the "listening rules" (see below) are included in this algorithm.
Must*
: "must"-style hints are dominant. If any processor provides a "must" hint, then that decision is made.MustDrop
takes precendence overMustKeep
if both are provided.Should*
: "should"-style hints are less forceful and are just used to tally votes. A simple majority of should hints is used to make the decision if no must hints are provided.- If there is a tie (including if no hints are provided), then the default decision is made.
Config
The config script can alter which processors the framework will "listen" to when deciding what to do with an event by checking for specific processor names and (optionally) only specific "purpose" strings from that processor. In addition, the config script defines what the default decision is in the case of no hints or a tie in voting.
Listening Rules
The format of the string representing a listening rule is a bit complicated, so it is best
to just use the p.skimConsider
function when defining which processors you wish to "listen"
to (or "consider") when making a skimming decision. Most commonly, a user just has one processor
that they want to make the skimming decision and in this case you can just provide the processors
name.
p.skimConsider(my_processor.instanceName)
Technically, the listening rules use regular expressions for both the processor name and the purpose, so one could get quite fancy about which processors (and purposes) are selected. This more intricate skimming method has not been well tested.
Default Decision
The default storage decision is what will be used if no processors that are being listened
to provide a hint during processing. Since the default configuration is to listen to no
processors at all, the default configuration of the default decision is to keep i.e. without
any config changes, we keep all events that are processed. Without changing this,
you could have a processor that explicity "drops" events and update the config to consider
this processor in order to run over events and drop the "uninteresting" ones
(using p.skimConsider
like above).
On the other hand, we could set the default decision to drop all the events and instead have a processor specifically "keep" events that are interesting. In this case, you would want to add the following to your config.
p.skimDefaultIsDrop()
along with a p.skimConsider
so that something ends up in your output event file.
This style of event dropping is geared more towards readout, reconstruction, and analysis decisions where the event is already fully created in memory and we just want to make a decision about it.
In the simulation, we also allow for events to be "aborted" in the middle of creating it so that we can end the event early and save computing resources from being wasted on an event we've decided is not interesting (for whatever reason). This style of "dropping" events is more commonly called "filtering" the simulation and is more complicated than what is described here since it operates during the Geant4 simulation.