DataProcessor#

class DataProcessor(input_key, data_actions=None)[source]#

A DataProcessor defines a sequence of operations to perform on experimental data. Calling an instance of DataProcessor applies this sequence on the input argument. A DataProcessor is created with a list of DataAction instances. Each DataAction applies its _process method on the data and returns the processed data. The nodes in the DataProcessor may also perform data validation and some minor formatting. The output of one data action serves as input for the next data action. DataProcessor.__call__(datum) usually takes in an entry from the data property of an ExperimentData object (i.e. a dict containing metadata and memory keys and possibly counts, like the Result.data property) and produces the formatted data. DataProcessor.__call__(datum) extracts the data from the given datum under DataProcessor._input_key (which is specified at initialization) of the given datum.

Create a chain of data processing actions.

Parameters:
  • input_key (str) – The initial key in the datum Dict[str, Any] under which the data processor will find the data to process.

  • data_actions (List[DataAction]) – A list of data processing actions to construct this data processor with. If nothing is given the processor returns unprocessed data.

Attributes

is_trained#

Return True if all nodes of the data processor have been trained.

Methods

__call__(data, **options)[source]#

Call self on the given datum. This method sequentially calls the stored data actions on the datum.

Parameters:
  • data (Dict | List[Dict]) – The data, typically from ExperimentData.data(...), that needs to be processed. This dict or list of dicts also contains the metadata of each experiment.

  • options – Run-time options given as keyword arguments that will be passed to the nodes.

Returns:

The data processed by the data processor. This is an arbitrary numpy array that may contain standard errors as a ufloat object.

Return type:

ndarray

append(node)[source]#

Append new data action node to this data processor.

Parameters:

node (DataAction) – A DataAction that will process the data.

call_with_history(data, history_nodes=None)[source]#

Call self on the given datum. This method sequentially calls the stored data actions on the datum and also returns the history of the processed data.

Parameters:
  • data (Dict | List[Dict]) – The data, typically from ExperimentData.data(...), that needs to be processed. This dict or list of dicts also contains the metadata of each experiment.

  • history_nodes (Set) – The nodes, specified by index in the data processing chain, to include in the history. If None is given then all nodes will be included in the history.

Returns:

A tuple of (processed data, history), that are the data processed by the processor and its intermediate state in each specified node, respectively.

Return type:

Tuple[ndarray, List]

train(data)[source]#

Train the nodes of the data processor.

Parameters:

data (Dict | List[Dict]) – The data to use to train the data processor.