Curve Analysis: Fitting your data

For most experiments, we are interested in fitting our results to a pre-defined mathematical model. The Curve Analysis module provides the analysis base class for a variety of experiments with a single experimental parameter sweep. Analysis subclasses can override several class attributes to customize the behavior from data processing to post-processing, including providing systematic initial guesses for parameters tailored to the experiment. Here we describe how the Curve Analysis module works and how you can create new analyses that inherit from the base class.

Curve Analysis overview

The base class CurveAnalysis implements the multi-objective optimization on different sets of experiment results. A single experiment can define sub-experiments consisting of multiple circuits which are tagged with common metadata, and curve analysis sorts the experiment results based on the circuit metadata.

This is an example of showing the abstract data structure of a typical curve analysis experiment:

    - circuits[0] (x=x1_A, "series_A")
    - circuits[1] (x=x1_B, "series_B")
    - circuits[2] (x=x2_A, "series_A")
    - circuits[3] (x=x2_B, "series_B")
    - circuits[4] (x=x3_A, "series_A")
    - circuits[5] (x=x3_B, "series_B")
    - ...

"experiment data"
    - data[0] (y1_A, "series_A")
    - data[1] (y1_B, "series_B")
    - data[2] (y2_A, "series_A")
    - data[3] (y2_B, "series_B")
    - data[4] (y3_A, "series_A")
    - data[5] (y3_B, "series_B")
    - ...

    - "series_A": y_A = f_A(x_A; p0, p1, p2)
    - "series_B": y_B = f_B(x_B; p0, p1, p2)
    - fixed parameters {p1: v}

Here the experiment runs two subsets of experiments, namely, series A and series B. The analysis defines corresponding fit models \(f_A(x_A)\) and \(f_B(x_B)\). Data extraction function in the analysis creates two datasets, \((x_A, y_A)\) for the series A and \((x_B, y_B)\) for the series B, from the experiment data. Optionally, the curve analysis can fix certain parameters during the fitting. In this example, \(p_1 = v\) remains unchanged during the fitting.

The curve analysis aims at solving the following optimization problem:

\[\Theta_{\mbox{opt}} = \arg\min_{\Theta_{\rm fit}} \sigma^{-2} (F(X, \Theta)-Y)^2,\]

where \(F\) is the composite objective function defined on the full experiment data \((X, Y)\), where \(X = x_A \oplus x_B\) and \(Y = y_A \oplus y_B\). This objective function can be described by two fit functions as follows.

\[F(X, \Theta) = f_A(x_A, \theta_A) \oplus f_B(x_B, \theta_B).\]

The solver conducts the least square curve fitting against this objective function and returns the estimated parameters \(\Theta_{\mbox{opt}}\) that minimize the reduced chi-squared value. The parameters to be evaluated are \(\Theta = \Theta_{\rm fit} \cup \Theta_{\rm fix}\), where \(\Theta_{\rm fit} = \theta_A \cup \theta_B\). Since series A and B share the parameters in this example, \(\Theta_{\rm fit} = \{p_0, p_2\}\), and the fixed parameters are \(\Theta_{\rm fix} = \{ p_1 \}\) as mentioned. Thus, \(\Theta = \{ p_0, p_1, p_2 \}\).

Experiment for each series can perform individual parameter sweep for \(x_A\) and \(x_B\), and experiment data yields outcomes \(y_A\) and \(y_B\), which might be of different size. Data processing functions may also compute \(\sigma_A\) and \(\sigma_B\), which are the uncertainty of outcomes arising from the sampling error or measurement error.

More specifically, the curve analysis defines the following data model.

  • Model: Definition of a single curve that is a function of a reserved parameter “x”.

  • Group: List of models. Fit functions defined under the same group must share the fit parameters. Fit functions in the group are simultaneously fit to generate a single fit result.

Once the group is assigned, a curve analysis instance builds a proper internal optimization routine. Finally, the analysis outputs a set of AnalysisResultData entries for important fit outcomes along with a single figure of the fit curves with the measured data points.

With this base class, a developer can avoid writing boilerplate code in various curve analyses subclasses and can quickly write up the analysis code for a particular experiment.

Defining new models

The fit model is defined by the LMFIT Model. If you are familiar with this package, you can skip this section. The LMFIT package manages complicated fit functions and offers several algorithms to solve non-linear least-square problems. Curve Analysis delegates the core fitting functionality to this package.

You can intuitively write the definition of a model, as shown below:

import lmfit

models = [
        expr="amp * exp(-alpha * x) + base",

Note that x is the reserved name to represent a parameter that is scanned during the experiment. In above example, the fit function consists of three parameters (amp, alpha, base), and exp indicates a universal function in Python’s math module. Alternatively, you can take a callable to define the model object.

import lmfit
import numpy as np

def exp_decay(x, amp, alpha, base):
    return amp * np.exp(-alpha * x) + base

models = [lmfit.Model(func=exp_decay)]

See the LMFIT documentation for detailed user guide. They also provide preset models.

If the CurveAnalysis object is instantiated with multiple models, it internally builds a cost function to simultaneously minimize the residuals of all fit functions. The names of the parameters in the fit function are important since they are used in the analysis result, and potentially in your experiment database as a fit result.

Here is another example on how to implement a multi-objective optimization task:

import lmfit

models = [
        expr="amp * exp(-alpha1 * x) + base",
        expr="amp * exp(-alpha2 * x) + base",

In addition, you need to provide data_subfit_map analysis option, which may look like

data_subfit_map = {
    "my_experiment1": {"tag": 1},
    "my_experiment2": {"tag": 2},

This option specifies the metadata of your experiment circuit that is tied to the fit model. If multiple models are provided without this option, the curve fitter cannot prepare the data for fitting. In this model, you have four parameters (amp, alpha1, alpha2, base) and the two curves share amp (base) for the amplitude (baseline) in the exponential decay function. Here one should expect the experiment data will have two classes of data with metadata "tag": 1 and "tag": 2 for my_experiment1 and my_experiment2, respectively.

By using this model, you can flexibly set up your fit model. Here is another example:

import lmfit

models = [
        expr="amp * cos(2 * pi * freq * x + phi) + base",
        expr="amp * sin(2 * pi * freq * x + phi) + base",

You have the same set of fit parameters in the two models, but now you fit two datasets with different trigonometric functions.

Fitting with fixed parameters

You can also keep certain parameters unchanged during the fitting by specifying the parameter names in the analysis option fixed_parameters. This feature is useful especially when you want to define a subclass of a particular analysis class.

class AnalysisA(CurveAnalysis):

    def __init__(self):
                    expr="amp * exp(-alpha * x) + base", name="my_model"

class AnalysisB(AnalysisA):

    def _default_options(cls) -> Options:
        options = super()._default_options()
        options.fixed_parameters = {"amp": 3.0}

        return options

The parameter specified in fixed_parameters is excluded from the fitting. This code will give you identical fit model to the one defined in the following class:

class AnalysisB(CurveAnalysis):

                expr="3.0 * exp(-alpha * x) + base", name="my_model"

However, note that you can also inherit other features, e.g. the algorithm to generate initial guesses for parameters, from the AnalysisA class in the first example. On the other hand, in the latter case, you need to manually copy and paste every logic defined in AnalysisA.

Curve Analysis workflow

Typically curve analysis performs fitting as follows. This workflow is defined in the method CurveAnalysis._run_analysis().

1. Initialization

Curve analysis calls the _initialization() method, where it initializes some internal states and optionally populates analysis options with the input experiment data. In some cases it may train the data processor with fresh outcomes, or dynamically generate the fit models (self._models) with fresh analysis options. A developer can override this method to perform initialization of analysis-specific variables.

2. Data processing

Curve analysis calls the _run_data_processing() method, where the data processor in the analysis option is internally called. This consumes input experiment results and creates the CurveData dataclass. Then the _format_data() method is called with the processed dataset to format it. By default, the formatter takes average of the outcomes in the processed dataset over the same x values, followed by the sorting in the ascending order of x values. This allows the analysis to easily estimate the slope of the curves to create algorithmic initial guess of fit parameters. A developer can inject extra data processing, for example, filtering, smoothing, or elimination of outliers for better fitting.

3. Fitting

Curve analysis calls the _run_curve_fit() method, which is the core functionality of the fitting. Another method _generate_fit_guesses() is internally called to prepare the initial guess and parameter boundary with respect to the formatted data. Developers usually override this method to provide better initial guesses tailored to the defined fit model or type of the associated experiment. See Providing initial guesses for more details. Developers can also override the entire _run_curve_fit() method to apply custom fitting algorithms. This method must return a CurveFitResult dataclass.

4. Post processing

Curve analysis runs several postprocessing against the fit outcome. It calls _create_analysis_results() to create the AnalysisResultData class for the fitting parameters of interest. A developer can inject custom code to compute custom quantities based on the raw fit parameters. See Curve Analysis Results for details. Afterwards, figure plotting is handed over to the Visualization module via the plotter attribute, and a list of created analysis results and the figure are returned.

Providing initial guesses

Fitting without initial guesses for parameters often results in a bad fit. Users can provide initial guesses and boundaries for the fit parameters through analysis options p0 and bounds. These values are the dictionary keyed on the parameter name, and one can get the list of parameters with the CurveAnalysis.parameters. Each boundary value can be a tuple of floats representing minimum and maximum values.

Apart from user provided guesses, the analysis can systematically generate those values with the method _generate_fit_guesses(), which is called with the CurveData dataclass. If the analysis contains multiple model definitions, we can get the subset of curve data with CurveData.get_subset_of() using the name of the series. A developer can implement the algorithm to generate initial guesses and boundaries by using this curve data object, which will be provided to the fitter. Note that there are several common initial guess estimators available in curve_analysis.guess.

The _generate_fit_guesses() also receives the FitOptions instance user_opt, which contains user provided guesses and boundaries. This is a dictionary-like object consisting of sub-dictionaries for initial guess .p0, boundary .bounds, and extra options for the fitter. See the API documentation for available options.

The FitOptions class implements convenient method set_if_empty() to manage conflict with user provided values, i.e. user provided values have higher priority, thus systematically generated values cannot override user values.

def _generate_fit_guesses(self, user_opt, curve_data):

    opt1 = user_opt.copy()
    opt1.bounds = set_if_empty(p1=(0, 10))

    opt2 = user_opt.copy()

    return [opt1, opt2]

Here you created two options with different p1 values. If multiple options are returned like this, the _run_curve_fit() method attempts to fit with all provided options and finds the best outcome with the minimum reduced chi-square value. When the fit model contains some parameter that cannot be easily estimated from the curve data, you can create multiple options by varying the initial guess to let the fitter find the most reasonable parameters to explain the model. This allows you to avoid analysis failure with the poor initial guesses.

Evaluate Fit Quality

A subclass can override _evaluate_quality() method to provide an algorithm to evaluate quality of the fitting. This method is called with the CurveFitResult object which contains fit parameters and the reduced chi-squared value, in addition to the several statistics on the fitting. Qiskit Experiments often uses the empirical criterion chi-squared < 3 as a good fitting.

Curve Analysis Results

Once the best fit parameters are found, the _create_analysis_results() method is called with the same CurveFitResult object.

If you want to create an analysis result entry for the particular parameter, you can override the analysis options result_parameters. By using ParameterRepr representation, you can rename the parameter in the entry.

from qiskit_experiments.curve_analysis import ParameterRepr

def _default_options(cls) -> Options:
    options = super()._default_options()
    options.result_parameters = [ParameterRepr("p0", "amp", "Hz")]

    return options

Here the first argument p0 is the target parameter defined in the series definition, amp is the representation of p0 in the result entry, and Hz is the optional string for the unit of the value if available.

In addition to returning the fit parameters, you can also compute new quantities by combining multiple fit parameters. This can be done by overriding the _create_analysis_results() method.

from qiskit_experiments.framework import AnalysisResultData

def _create_analysis_results(self, fit_data, quality, **metadata):

    outcomes = super()._create_analysis_results(fit_data, **metadata)

    p0 = fit_data.ufloat_params["p0"]
    p1 = fit_data.ufloat_params["p1"]

    extra_entry = AnalysisResultData(
        value=p0 * p1,

    return outcomes

Note that both p0 and p1 are UFloat objects consisting of a nominal value and an error value which assumes the standard deviation. Since this object natively supports error propagation, you don’t have to manually recompute the error of the new value.

See also

API documentation: Curve Analysis Module