Falaise  3.3.0
SuperNEMO Software Toolkit
Working With Events in FLReconstruct

Table of Contents

Introduction to event record

The C++ type used to represent events in flreconstruct is the datatools::things class. Pipeline module classes inherit from dpp::base_module and thus must implement the pure abstract method dpp::base_module::process . This method is called for each event, and is passed a reference to the mutable datatools::things object representing the current event being passed through the pipeline.

The datatools::things class implements an associative, hierarchical and heterogenous collection of objects. To put it in simpler terms, it provides a dictionary mapping string "keys" to object instances inheriting from the datatools::i_serializable pure abstract base class. It's the dictionary like interface that provides the associativity, and the storage of pointer-to-base-class that provides the heterogeneity (many different concrete types). As datatools::things itself inherits from datatools::i_serializable, it is capable of storing other datatools::things instances, providing the possibility of arranging objects in a tree-like structure.

In this tutorial, we'll look at three basic aspects of working with datatools::things instances provided to the process method of your custom pipeline module,

  1. Reading data from the datatools::things instance
  2. Writing builtin objects to the instance
  3. Implementing custom objects for storage in datatools::things

Reading Data from datatools::things Instances

To work with events in the pipeline we first need to implement a pipeline module to do this. The basics of how to do this are covered in a dedicated tutorial and you should familiarize yourself with this material as this tutorial will build on it.

First of all we implement our module, build it and write a pipeline script to use it in flreconstruct. Note that we have stripped all comments except those relating to the process methods, and that the module takes no configuration. If you require details on how to implement a basic flreconstruct method, please refer to the introductory tutorial first. We begin with the header:

#ifndef ACCESSTHINGSMODULE_HH
#define ACCESSTHINGSMODULE_HH
class AccessThingsModule : public dpp::base_module {
public:
AccessThingsModule();
virtual ~AccessThingsModule();
virtual void initialize(const datatools::properties& myConfig,
virtual void reset();
private:
DPP_MODULE_REGISTRATION_INTERFACE(AccessThingsModule);
};
#endif // ACCESSTHINGSMODULE_HH

and now the implementation:

#include "AccessThingsModule.h"
#include <boost/foreach.hpp>
DPP_MODULE_REGISTRATION_IMPLEMENT(AccessThingsModule,"AccessThingsModule");
AccessThingsModule::AccessThingsModule() : dpp::base_module()
{}
AccessThingsModule::~AccessThingsModule() {
if (is_initialized()) this->reset();
}
datatools::service_manager& /*flServices*/,
dpp::module_handle_dict_type& /*moduleDict*/) {
this->_set_initialized(true);
}
//! [AccessThingsModule::Process]
AccessThingsModule::process(datatools::things& workItem) {
// Print most basic information
std::cout << "AccessThingsModule::process called!" << std::endl;
std::cout << "[name] : " << workItem.get_name() << std::endl;
std::cout << "[description] : " << workItem.get_description() << std::endl;
// Extract list of keys stored by the object
std::vector<std::string> workItemKeyList;
workItem.get_names(workItemKeyList);
// Iterate over keys, printing their name and the type of the object
// they map to
BOOST_FOREACH(std::string key, workItemKeyList) {
std::cout << "- [key, serial_tag] : "
<< key
<< ", "
<< workItem.get_entry_serial_tag(key)
<< std::endl;
}
// Grab simulated data bank
// Simulated data will only be present in simulation output files,
// so wrap in a try block
try {
const mctools::simulated_data& simData = workItem.get<mctools::simulated_data>("SD");
simData.tree_dump();
} catch (std::logic_error& e) {
std::cerr << "failed to grab SD bank : " << e.what() << std::endl;
}
// MUST return a status, see ref dpp::processing_status_flags_type
}
//! [AccessThingsModule::Process]
void AccessThingsModule::reset() {
this->_set_initialized(false);
}

The key method to look at is AccessThings::process which as we've seen before is passed a reference to the current event in the pipeline.

We begin working with the event by simply printing its name and description. This is a trivial demonstration that the datatools::things interface works, as for event data these are likely to be blank.

The second, more relevant task, is to extract the list of keys, and thus data banks, stored in the event. Here, we use the datatools::things::get_names method to fill a std::vector with the key names. We then iterate over this vector to print out the key name and, by using the datatools::things::get_entry_serial_tag method, typename of the object it maps to.

// Extract list of keys stored by the object
std::vector<std::string> workItemKeyList;
workItem.get_names(workItemKeyList);
// Iterate over keys, printing their name and the type of the object
// they map to
BOOST_FOREACH(std::string key, workItemKeyList) {
std::cout << "- [key, serial_tag] : "
<< key
<< ", "
<< workItem.get_entry_serial_tag(key)
<< std::endl;
}

If we know the type of the key we wish to extract, we can use the datatools::things::get method to obtain a reference to it.

try {
const mctools::simulated_data& simData = workItem.get<mctools::simulated_data>("SD");
simData.tree_dump();
} catch (std::logic_error& e) {
std::cerr << "failed to grab SD bank : " << e.what() << std::endl;
}

We know that the "SD" ( S imulated D ata) entry should map to an instance of mctools::simulated_data so we use the datatools::things::get method to obtain a const reference to it (const being read only). This method takes a template argument which is the typename we want to extract, and a function argument which is the name of the key to get. The method will throw an exception if either

We therefore wrap the extraction in a try-catch block to handle both of these potential errors. If we're able to get the reference to the object, then we can use it directly. In this example, we simply use the mctools::simulated_data::tree_dump method to dump some information on the object to screen. You should consult the documentation of the classes extracted to see what you can do with them. If an exception is thrown, then we report the error to the standard error stream, and return the dpp::base_module::PROCESS_INVALID flag. This will make the pipeline abort any further processing of the event and subsequent events, but other flags are available to handle a range of process errors.

To see the effect of this reading, we compile the above code into a shared library just as before using the following CMake script:

# - Basic CMake setup
# Check version meets ou requirements
# Declare project, which will configure compiler for us
cmake_minimum_required(VERSION 3.3)
project(AccessThingsModule)
# Modules use Falaise, so we need to locate this or fail
# Locating Falaise will automatically locate all of its
# dependencies such as Bayeux, ROOT and Boost.
find_package(Falaise REQUIRED)
# Build a dynamic library from our sources
add_library(AccessThingsModule SHARED AccessThingsModule.h AccessThingsModule.cpp)
# Link it to the FalaiseModule library
# This ensures the correct compiler flags, include paths
# and linker flags are applied to our dynamic library.
target_link_libraries(AccessThingsModule PUBLIC Falaise::FalaiseModule)

and run flreconstruct with the following pipeline script:

To see the effect of this writing, we compile the above code into a shared library just as before using the following CMake script:

#@description AccessThings Pipeline
#@key_label "name"
#@meta_label "type"
# - Custom modules
[name="flreconstruct.plugins" type="flreconstruct::section"]
plugins : string[1] = "AccessThingsModule"
AccessThingsModule.directory : string = "."
# - Pipeline configuration
[name="pipeline" type="AccessThingsModule"]

You should see output similar to the dump modules we ran in earlier tutorials.

Writing Data to datatools::things Instances

As the datatools::things instance is passed to pipeline modules by non-const reference, it is directly modifiable by your module. This means your module can store results of working with the event back into the event for later modules to use (you can of course also delete existing data, so be careful!).

Instances of datatools::things can only store objects that inherit from the datatools::i_serializable abstract base class, so this restricts the types your module can add. For now we will just look at how to store an existing concrete class of datatools::i_serializable in datatools::things, specifically, the datatools::properties class. The use case of adding you own concrete classes of datatools::i_serializable is deferred to a later tutorial in this guide.

We begin by refactoring the process method of our module into read and write parts, first the header

#ifndef ACCESSTHINGSMODULE_HH
#define ACCESSTHINGSMODULE_HH
class AccessThingsModule : public dpp::base_module {
public:
AccessThingsModule();
virtual ~AccessThingsModule();
virtual void initialize(const datatools::properties& myConfig,
virtual void reset();
private:
DPP_MODULE_REGISTRATION_INTERFACE(AccessThingsModule);
};
#endif // ACCESSTHINGSMODULE_HH

and then the implementation

#include "AccessThingsModule.h"
#include <boost/foreach.hpp>
DPP_MODULE_REGISTRATION_IMPLEMENT(AccessThingsModule,"AccessThingsModule");
AccessThingsModule::AccessThingsModule() : dpp::base_module()
{}
AccessThingsModule::~AccessThingsModule() {
this->reset();
}
datatools::service_manager& /*flServices*/,
dpp::module_handle_dict_type& /*moduleDict*/) {
this->_set_initialized(true);
}
//! [AccessThingsModule::Process]
dpp::base_module::process_status AccessThingsModule::process(
datatools::things& workItem) {
process_status readStatus = this->read(workItem);
if (readStatus != PROCESS_OK) return readStatus;
process_status writeStatus = this->write(workItem);
// MUST return a status, see ref dpp::processing_status_flags_type
return writeStatus;
}
//! [AccessThingsModule::Process]
void AccessThingsModule::reset() {
this->_set_initialized(false);
}
dpp::base_module::process_status AccessThingsModule::read(datatools::things& workItem) {
// Print most basic information
std::cout << "AccessThingsModule::process called!" << std::endl;
std::cout << "[name] : " << workItem.get_name() << std::endl;
std::cout << "[description] : " << workItem.get_description() << std::endl;
// Extract list of keys stored by the object
std::vector<std::string> workItemKeyList;
workItem.get_names(workItemKeyList);
// Iterate over keys, printing their name and the type of the object
// they map to
BOOST_FOREACH(std::string key, workItemKeyList) {
std::cout << "- [key, serial_tag] : "
<< key
<< ", "
<< workItem.get_entry_serial_tag(key)
<< std::endl;
}
// Grab simulated data bank
// Simulated data will only be present in simulation output files,
// so wrap in a try block
try {
const mctools::simulated_data& simData = workItem.get<mctools::simulated_data>("SD");
simData.tree_dump();
} catch (std::logic_error& e) {
std::cerr << "failed to grab SD bank : " << e.what() << std::endl;
return PROCESS_INVALID;
}
return PROCESS_OK;
}
//! [AccessThingsModule::write]
dpp::base_module::process_status AccessThingsModule::write(datatools::things& workItem) {
// Add a new entry to the things
datatools::properties& atmProperties = workItem.add<datatools::properties>("ATMProperties");
atmProperties.set_description("Properties added by the AccessThings Module");
atmProperties.store("foo", "bar");
atmProperties.store("baz", 3.14);
return PROCESS_OK;
}

This separation is done for clarity in this example, but it illustrates that your process method need not be monolithic (and in fact shouldn't be except for trivial cases). The read method is exactly as we implemented earlier. In the write method, we use the datatools::things::add method to add new data bank to the event holding a datatools::properties instance. We pass this method a template argument indicating the type of the data bank, and a string function argument indicating the key under which to store the new data bank. The method returns a reference to the newly created instance so it can be modified in place, as we do by setting the description and adding two properties.

To see the effect of this writing, we compile the above code into a shared library just as before using the following CMake script:

# - Basic CMake setup
# Check version meets ou requirements
# Declare project, which will configure compiler for us
cmake_minimum_required(VERSION 3.3)
project(AccessThingsModule)
# Modules use Falaise, so we need to locate this or fail
# Locating Falaise will automatically locate all of its
# dependencies such as Bayeux, ROOT and Boost.
find_package(Falaise REQUIRED)
# Build a dynamic library from our sources
add_library(AccessThingsModule SHARED AccessThingsModule.h AccessThingsModule.cpp)
# Link it to the FalaiseModule library
# This ensures the correct compiler flags, include paths
# and linker flags are applied to our dynamic library.
target_link_libraries(AccessThingsModule PUBLIC Falaise::FalaiseModule)

To see the effect of the writing new banks into the event, we use a pipeline script to sandwich the module between two dump modules as follows

#@description AccessThings Pipeline
#@key_label "name"
#@meta_label "type"
# - Custom modules
[name="flreconstruct.plugins" type="flreconstruct::section"]
plugins : string[1] = "AccessThingsModule"
AccessThingsModule.directory : string = "."
# - Pipeline configuration
[name="pipeline" type="dpp::chain_module"]
modules : string[3] = "preprocess" "access_things" "postprocess"
[name="preprocess" type="dpp::dump_module"]
title : string = "PreProcess"
[name="postprocess" type="dpp::dump_module"]
title : string = "PostProcess"
[name="access_things" type="AccessThingsModule"]

You should see that the PostProcess stage results in output containing the information written into the ATMProperties bank.

Implementing Custom Objects for Storage in datatools::things

As discussed above, the datatools::things object can store instances of any type inheriting from datatools::i_serializable. If the builtin types provided by Falaise and Bayeux do not meet your needs, you can implement a new custom class derived from datatools::i_serializable.

In this example, we will implement a simple custom class and add it into the datatools::things event record. This class must inherit from the pure abstract base class datatools::i_serializable and hence must:

We therefore begin by writing the header file, which we'll name MyDataType.h:

//! \file MyDataType.h
//! \brief Example custom data type for use with datatools::things
//! \details Store an integer for later use
#ifndef MYDATATYPE_HH
#define MYDATATYPE_HH
// Standard Library
// Third Party
// - Bayeux
// This Project
class MyDataType : public datatools::i_serializable {
public:
//! Construct type
MyDataType();
//! Destructor
virtual ~MyDataType();
//! Increment counter
void increment();
//! Return value of counter
int current_value() const;
//! Declare serialization interfaces for tagging and streaming
private:
int mdtCounter_; //!< Stored counter
};
#endif // MYDATATYPE_HH

Note the inheritance from datatools::i_serializable and the use of the DATATOOLS_SERIALIZATION_DECLARATION macro, which declares the get_serial_tag method for us. We have also provided concrete methods to implement this type as a simple increment-only counter. Note also the use of Doxygen markup to document the file and methods. This is required for your data type to be integrated into the official mainline pipeline.

With the header in place we can create the implementation file, which we'll name MyDataType.cpp

#include "MyDataType.h"
//! Implement the serialization tag mechanism
//! Constructor
MyDataType::MyDataType() : datatools::i_serializable(), mdtCounter_(0) {
}
//! Destructor
MyDataType::~MyDataType() {
}
void MyDataType::increment() {
++mdtCounter_;
}
int MyDataType::current_value() const {
return mdtCounter_;
}

Here we've implemented the trivial constructor/destructor and the counter implementation, and added the DATATOOLS_SERIALIZATION_SERIAL_TAG_IMPLEMENTATION macro. This, together with the use of DATATOOLS_SERIALIZATION_DECLARATION in the header file provides the minimal boilerplate allowing the class to be stored in datatools::things. Additional work is needed to make the type fully serializable to/from a file, and this is described in a later section.

To use this type in the pipeline, we update the implementation of the AccessThings module as follows for the header:

#ifndef ACCESSTHINGSMODULE_HH
#define ACCESSTHINGSMODULE_HH
class AccessThingsModule : public dpp::base_module {
public:
AccessThingsModule();
virtual ~AccessThingsModule();
virtual void initialize(const datatools::properties& myConfig,
datatools::things& workItem);
virtual void reset();
private:
DPP_MODULE_REGISTRATION_INTERFACE(AccessThingsModule);
};
#endif // ACCESSTHINGSMODULE_HH

and the implementation:

#include "AccessThingsModule.h"
#include <boost/foreach.hpp>
#include "MyDataType.h"
DPP_MODULE_REGISTRATION_IMPLEMENT(AccessThingsModule,"AccessThingsModule");
AccessThingsModule::AccessThingsModule() : dpp::base_module()
{}
AccessThingsModule::~AccessThingsModule() {
this->reset();
}
datatools::service_manager& /*flServices*/,
dpp::module_handle_dict_type& /*moduleDict*/) {
this->_set_initialized(true);
}
//! [AccessThingsModule::Process]
dpp::base_module::process_status AccessThingsModule::process(
datatools::things& workItem) {
// Add our custom type to the item
MyDataType & atmCounter = workItem.add<MyDataType>("ATMCounter");
atmCounter.increment();
return PROCESS_OK;
}
//! [AccessThingsModule::Process]
void AccessThingsModule::reset() {
this->_set_initialized(false);
}

Note the inclusion of the MyDataType header. We use the datatools::things::add method to add a MyDataType bank to the event. To compile the new type into a loadable module, we simply add the header and implementation to the add_library call in the CMakeLists.txt script:

# - Basic CMake setup
# Check version meets ou requirements
# Declare project, which will configure compiler for us
cmake_minimum_required(VERSION 3.3)
project(AccessThingsModuleCustom)
# Modules use Falaise, so we need to locate this or fail
# Locating Falaise will automatically locate all of its
# dependencies such as Bayeux, ROOT and Boost.
find_package(Falaise REQUIRED)
# Build a dynamic library from our sources
add_library(AccessThingsModule SHARED
AccessThingsModule.h
AccessThingsModule.cpp
MyDataType.h
MyDataType.cpp
)
# Link it to the FalaiseModule library
# This ensures the correct compiler flags, include paths
# and linker flags are applied to our dynamic library.
target_link_libraries(AccessThingsModule PUBLIC Falaise::FalaiseModule)

To see the effect of writing our own type, we compile the above code into a shared library using the above CMake script and then use the following pipeline script to sandwich the module between two dump modules

#@description AccessThings Pipeline
#@key_label "name"
#@meta_label "type"
# - Custom modules
[name="flreconstruct.plugins" type="flreconstruct::section"]
plugins : string[1] = "AccessThingsModule"
AccessThingsModule.directory : string = "."
# - Pipeline configuration
[name="pipeline" type="dpp::chain_module"]
modules : string[3] = "preprocess" "access_things" "postprocess"
[name="preprocess" type="dpp::dump_module"]
title : string = "PreProcess"
[name="postprocess" type="dpp::dump_module"]
title : string = "PostProcess"
[name="access_things" type="AccessThingsModule"]

You should see that the PostProcess stage results in output containing the information written into the ATMCounter bank.

Serializing Custom Objects to Persistant Files/Archives

As it stands, our custom MyDataType data object is storable in a datatools::things instance, but is not capable of being written to file, even though datatools::things is. If you try and run the AccessThingsModule in a pipeline and output to a file, e.g.

$ flreconstruct -i test.brio -p AccessThingsPipeline.conf -o test-reco.brio

an exception will be thrown when trying to write the datatools::things instance to the file (Mac OS X case shown):

...
libc++abi.dylib: terminating with uncaught exception of type boost::archive::archive_exception: unregistered class - derived class not registered or exported
Abort trap: 6
$

This occurs because the underlying Boost serialization system does not know how to persist the MyDataType class to file. To make MyDataType persistable by the serialization system, we need to use a couple of macros and one function implementation to

The first addition is to the MyDataType header, where we add a call to the BOOST_CLASS_EXPORT_KEY macro post the class declaration. We defer detailed explanation of this macro to the Boost Documentation suffice to say that this is needed to ensure templated serialization code is instantiated and to register an identifier for the class in the serialization system.

//! \file MyDataType.h
//! \brief Example custom data type for use with datatools::things
//! \details Store an integer for later use
#ifndef MYDATATYPE_HH
#define MYDATATYPE_HH
// Standard Library
// Third Party
// - Bayeux
// This Project
class MyDataType : public datatools::i_serializable {
public:
//! Construct type
MyDataType();
//! Destructor
virtual ~MyDataType();
//! Increment counter
void increment();
//! Return value of counter
int current_value() const;
//! Declare serialization interfaces for tagging and streaming
private:
int mdtCounter_; //!< Stored counter
};
// Boost.Serialization class export definition
#include <boost/serialization/export.hpp>
BOOST_CLASS_EXPORT_KEY(MyDataType)
#endif // MYDATATYPE_HH

The argument to the macro should be the class name, including a fully qualified namespace if the class is placed inside one.

To isolate the serialization specific code from the main logic of the class, the serialization implementation is written into a dedicated source file:

// Serialization implementation for MyDataType
//
//----------------------------------------------------------------------
// In this first section, we implement the read/write "serialize method.
// If, and only if, you expect MyDataType to be inherited from (and this
// is *not* recommended for data types), this section should go into a
// separate "MyDataType-xxx.ipp" header file. This allows it to
// be #included by derived classes' own "DerivedData-xxx.ipp" files
// which is required to propaget the serialization correctly.
// Interface for the class we're serializing
#include "MyDataType.h"
// Implement the serialize method
// - Boost:
//#include <boost/serialization/base_object.hpp>
#include <boost/serialization/nvp.hpp>
// - Bayeux
#include <bayeux/datatools/i_serializable.ipp>
template<class Archive>
void MyDataType::serialize(Archive& ar, const unsigned int /*version_*/)
{
// Serialize the base class first
// Now the concrete data members *in order*
// Note the use of Boost's "make_nvp" function, "nvp" being
// "name-value pair". This creates an effective map in the archive.
// It's a template method, so easy to use for most types.
ar & boost::serialization::make_nvp("mdtCounter", mdtCounter_);
}
//----------------------------------------------------------------------
// This second section adds the boilerplate for registering the class
// with the Boost Serialization core. It ensures that all needed code
// for serialization is instantiated (TOD: add clearer explanation of
// what's happening here, bottom line, allows use when loaded as
// a plugin or linked as library without exposing details to clients).
// Include the headers for the file formats we want code exported for
// This must come before the BOOST_CLASS_EXPORT_IMPLEMENT expansion,
// then code export is automatic.
// Boost.Serialization class export definition
BOOST_CLASS_EXPORT_IMPLEMENT(MyDataType)

As documented, this roughly splits into two sections:

  1. Implementing the required read/write MyDataType::serialize member function, templated on the type of format (e.g. XML, BRIO) the class will be serialized to. The important points to note are that any base class of MyDataType must be serialized first, followed by the data members in order. Boost's serialization library provides many helper functions to create the key-value pairs in the archive, and its documentation should be consulted for further details here.
  2. Calling the implementation counterpart of BOOST_CLASS_EXPORT_KEY, the BOOST_CLASS_EXPORT_IMPLEMENT macro. This must be called with the same argument as BOOST_CLASS_EXPORT_KEY, and must come after inclusion of Bayeux's bayeux/datatools/archives_instantiation.h header, with lists support file formats. This ordering ensures MyDataType can be serialized automatically to all supported file formats.

This source file is simply added to the inputs to the library:

# - Basic CMake setup
# Check version meets ou requirements
# Declare project, which will configure compiler for us
cmake_minimum_required(VERSION 3.3)
project(AccessThingsModuleCustom)
# Modules use Falaise, so we need to locate this or fail
# Locating Falaise will automatically locate all of its
# dependencies such as Bayeux, ROOT and Boost.
find_package(Falaise REQUIRED)
# Build a dynamic library from our sources
add_library(AccessThingsModule SHARED
AccessThingsModule.h
AccessThingsModule.cpp
MyDataType.h
MyDataType.cpp
MyDataTypeSerialization.cpp
)
# Link it to the FalaiseModule library
# This ensures the correct compiler flags, include paths
# and linker flags are applied to our dynamic library.
target_link_libraries(AccessThingsModule PUBLIC Falaise::FalaiseModule)

Once compiled, we can run as in previous examples, but this time we will not see an exception being thrown as MyDataType is known to the serialization system and is written into the output file. However, if we try to confirm this by using flreconstruct's default dump-to-stdout, we see the exception again:

$ flreconstruct -i test-reco.brio
libc++abi.dylib: terminating with uncaught exception of type boost::archive::archive_exception: unregistered class
Abort trap: 6

This might have been expected as the code for serializing MyDataType is held in the AccessThingsModule library and this is not loaded by default in flreconstruct. When using custom data types, you must remember to add them to the list of libraries to be loaded, for example via a script:

#@description AccessThings Pipeline
#@key_label "name"
#@meta_label "type"
# - Custom modules
[name="flreconstruct.plugins" type="flreconstruct::section"]
plugins : string[1] = "AccessThingsModule"
AccessThingsModule.directory : string = "."
# - Pipeline configuration
[name="pipeline" type="dpp::dump_module"]

If we now run with this script, we can see that our custom type was indeed serialized to the output file:

$ flreconstruct -i test-reco.brio -p ../AccessThingsDump.conf
[notice:void datatools::library_loader::init():449] Automatic loading of library 'AccessThingsModule'...
|-- Bank 'ATMCounter' : "MyDataType"
`-- Bank 'SD' : "mctools::simulated_data"
...

This has only covered the basics of making your data objects serializable. More advanced topics are deferred to later tutorials.

TODOS: