Stats

Jackknife and bootstrap resampling for statistical error estimation, covariance matrices, and correlation matrices.

StatsBase

class labc.stats.StatsBase

StatsBase is the shared abstract base for all resampling objects. StatsType, StatsJack, and StatsBoot all inherit from it. In normal usage you never instantiate StatsBase directly — use StatsType.Jack() or StatsType.Boot() instead.

StatsType

class labc.stats.StatsType(statsID: str)

Bases: StatsBase

Base class for statistical analysis.

The standard usage is through the static methods StatsType.Jack() and StatsType.Boot(), which return fully initialised StatsJack and StatsBoot objects:

stats = StatsType.Jack(num_config=100)
mean, err, bins = stats.generate_stats(raw_data)

stats = StatsType.Boot(num_config=100, num_bins=500)
mean, err, bins = stats.generate_stats(raw_data)

StatsType can also be instantiated directly for a bins-only workflow, when only pre-computed bins are available and raw configurations are not. For example, with jackknife bins:

stats = StatsType('Jack')
mean = np.mean(bins, axis=0)
err = stats.err_func(mean, bins)
Parameters:
statsIDstr

Identifier for the resampling method. Known values: 'Jack', 'Boot' (see _KNOWN_IDS).

Attributes:
IDstr

Resampling strategy identifier ('Jack' or 'Boot').

num_configNone

Always None in the bins-only workflow.

num_binsNone

Always None in the bins-only workflow.

seedNone

Always None in the bins-only workflow.

Methods

Boot(*, num_config, num_bins[, seed])

Bootstrap factory — see StatsBoot for full documentation.

Jack(*, num_config[, rebin])

Jackknife factory — see StatsJack for full documentation.

corr(data_x[, data_y])

Compute the correlation matrix of one or two DataStats objects.

cov(data_x[, data_y])

Compute the covariance matrix of one or two DataStats objects.

cov_blocks(*data)

Covariance matrix of multiple DataStats objects merged into one.

cov_blocks_diag(*data)

Block-diagonal covariance matrix of multiple DataStats objects.

err_func(array_mean, array_bins)

Compute the statistical error from resampled bins.

generate_bins(_array_raw)

It generates resampled bins from raw data.

generate_stats(array_raw)

Compute mean, error, and resampled bins from raw configurations.

Raises:
ValueError

If statsID is not in _KNOWN_IDS.

static Jack(*, num_config: int, rebin: int = 1) StatsJack

Jackknife factory — see StatsJack for full documentation.

Parameters:
num_configint

Number of raw gauge configurations.

rebinint, optional

Rebinning factor. Default is 1.

Returns:
StatsJack
static Boot(*, num_config: int, num_bins: int, seed: int = 0) StatsBoot

Bootstrap factory — see StatsBoot for full documentation.

Parameters:
num_configint

Number of raw gauge configurations.

num_binsint

Number of bootstrap samples to generate.

seedint, optional

Random seed. Default is 0.

Returns:
StatsBoot

Jackknife

class labc.stats.StatsJack(num_config: int, rebin: int = 1)

Bases: StatsBase

Jackknife resampling for statistical error estimation.

Supports optional rebinning: neighbouring configurations are averaged into blocks of size rebin before the leave-one-out procedure. This reduces the effective autocorrelation length of the ensemble. Instantiate via Jack().

Parameters:
num_configint

Number of raw gauge configurations.

rebinint, optional

Number of consecutive configurations to average into a single block before jackknifing. Must be between 1 and _MAX_REBIN (= 3) inclusive. Default is 1 (no rebinning).

Attributes:
num_configint

Number of raw gauge configurations.

num_binsint

Number of jackknife bins, ceil(num_config / rebin).

rebinint

Rebinning factor applied before the leave-one-out procedure.

IDstr

Always 'Jack'.

Methods

corr(data_x[, data_y])

Compute the correlation matrix of one or two DataStats objects.

cov(data_x[, data_y])

Compute the covariance matrix of one or two DataStats objects.

cov_blocks(*data)

Covariance matrix of multiple DataStats objects merged into one.

cov_blocks_diag(*data)

Block-diagonal covariance matrix of multiple DataStats objects.

err_func(array_mean, array_bins)

Compute the statistical error from resampled bins.

generate_bins(array_raw)

Generate jackknife bins from raw configurations.

generate_stats(array_raw)

Compute mean, error, and resampled bins from raw configurations.

Raises:
ValueError

If rebin > _MAX_REBIN. For larger rebinning factors, pre-average your configurations manually before constructing this object.

Warns:
UserWarning

If rebin == _MAX_REBIN, since this is unusual and may indicate significant autocorrelations in the data.

Notes

The prefactor used in error and covariance estimation is \(f = N_\mathrm{bins} - 1\).

corr(data_x: DataStats, data_y: DataStats | None = None) ndarray

Compute the correlation matrix of one or two DataStats objects.

Parameters:
data_xDataStats

First dataset.

data_yDataStats, optional

Second dataset. If None, the auto-correlation of data_x is returned. Diagonal entries are exactly 1 in the auto-correlation case.

Returns:
np.ndarray

Correlation matrix, shape (len(data_x), len(data_y)).

Notes

Slicing for fit ranges or thinning should be applied to the inputs before calling this method.

cov(data_x: DataStats, data_y: DataStats | None = None) ndarray

Compute the covariance matrix of one or two DataStats objects.

Parameters:
data_xDataStats

First dataset.

data_yDataStats, optional

Second dataset. If None, the auto-covariance of data_x is returned.

Returns:
np.ndarray

Covariance matrix, shape (len(data_x), len(data_y)).

Notes

Slicing for fit ranges or thinning should be applied to the inputs before calling this method.

cov_blocks(*data: DataStats) ndarray

Covariance matrix of multiple DataStats objects merged into one.

All datasets are concatenated along the observable axis before the covariance is computed, preserving cross-correlations between them.

Parameters:
*dataDataStats

Datasets to merge.

Returns:
np.ndarray

Full covariance matrix of the concatenated dataset.

cov_blocks_diag(*data: DataStats) ndarray

Block-diagonal covariance matrix of multiple DataStats objects.

Computes the covariance of each dataset independently and assembles them into a block-diagonal matrix, assuming no cross-correlations.

Parameters:
*dataDataStats

Datasets for each diagonal block.

Returns:
np.ndarray

Block-diagonal covariance matrix.

err_func(array_mean: ndarray, array_bins: ndarray) ndarray

Compute the statistical error from resampled bins.

Parameters:
array_meannp.ndarray

Sample mean, shape (T,).

array_binsnp.ndarray

Resampled bins, shape (num_bins, T).

Returns:
np.ndarray

Statistical errors, shape (T,).

Notes

The error is computed as:

\[\sigma_i = \sqrt{f \cdot \frac{1}{N_\mathrm{bins}} \sum_b \left(b_i - \bar{x}_i\right)^2}\]

where \(f\) is the prefactor defined by the concrete subclass.

generate_bins(array_raw: ndarray) ndarray

Generate jackknife bins from raw configurations.

Parameters:
array_rawnp.ndarray

Raw configurations, shape (num_config, T).

Returns:
np.ndarray

Jackknife bins, shape (num_bins, T), where num_bins = ceil(num_config / rebin).

Notes

If num_config is not divisible by rebin, the last configuration is repeated to pad to the nearest multiple. Rebinning averages consecutive blocks of rebin configurations before the leave-one-out jackknife is applied to the num_bins block averages.

generate_stats(array_raw: ndarray) tuple[ndarray, ndarray, ndarray]

Compute mean, error, and resampled bins from raw configurations.

Parameters:
array_rawnp.ndarray

Raw configurations, shape (num_config, T).

Returns:
meannp.ndarray

Sample mean, shape (T,).

errnp.ndarray

Statistical error, shape (T,).

binsnp.ndarray

Resampled bins, shape (num_bins, T).

Bootstrap

class labc.stats.StatsBoot(num_config: int, num_bins: int, seed: int = 0)

Bases: StatsBase

Bootstrap resampling for statistical error estimation.

Instantiate via Boot().

Parameters:
num_configint

Number of raw gauge configurations.

num_binsint

Number of bootstrap samples to generate.

seedint, optional

Seed for the random number generator. Default is 0.

Attributes:
num_configint

Number of raw gauge configurations.

num_binsint

Number of bootstrap samples.

seedint

Random seed for reproducible resampling.

IDstr

Always 'Boot'.

Methods

corr(data_x[, data_y])

Compute the correlation matrix of one or two DataStats objects.

cov(data_x[, data_y])

Compute the covariance matrix of one or two DataStats objects.

cov_blocks(*data)

Covariance matrix of multiple DataStats objects merged into one.

cov_blocks_diag(*data)

Block-diagonal covariance matrix of multiple DataStats objects.

err_func(array_mean, array_bins)

Compute the statistical error from resampled bins.

generate_bins(array_raw)

Generate bootstrap bins from raw configurations.

generate_stats(array_raw)

Compute mean, error, and resampled bins from raw configurations.

Notes

The prefactor used in error and covariance estimation is \(f = 1\).

corr(data_x: DataStats, data_y: DataStats | None = None) ndarray

Compute the correlation matrix of one or two DataStats objects.

Parameters:
data_xDataStats

First dataset.

data_yDataStats, optional

Second dataset. If None, the auto-correlation of data_x is returned. Diagonal entries are exactly 1 in the auto-correlation case.

Returns:
np.ndarray

Correlation matrix, shape (len(data_x), len(data_y)).

Notes

Slicing for fit ranges or thinning should be applied to the inputs before calling this method.

cov(data_x: DataStats, data_y: DataStats | None = None) ndarray

Compute the covariance matrix of one or two DataStats objects.

Parameters:
data_xDataStats

First dataset.

data_yDataStats, optional

Second dataset. If None, the auto-covariance of data_x is returned.

Returns:
np.ndarray

Covariance matrix, shape (len(data_x), len(data_y)).

Notes

Slicing for fit ranges or thinning should be applied to the inputs before calling this method.

cov_blocks(*data: DataStats) ndarray

Covariance matrix of multiple DataStats objects merged into one.

All datasets are concatenated along the observable axis before the covariance is computed, preserving cross-correlations between them.

Parameters:
*dataDataStats

Datasets to merge.

Returns:
np.ndarray

Full covariance matrix of the concatenated dataset.

cov_blocks_diag(*data: DataStats) ndarray

Block-diagonal covariance matrix of multiple DataStats objects.

Computes the covariance of each dataset independently and assembles them into a block-diagonal matrix, assuming no cross-correlations.

Parameters:
*dataDataStats

Datasets for each diagonal block.

Returns:
np.ndarray

Block-diagonal covariance matrix.

err_func(array_mean: ndarray, array_bins: ndarray) ndarray

Compute the statistical error from resampled bins.

Parameters:
array_meannp.ndarray

Sample mean, shape (T,).

array_binsnp.ndarray

Resampled bins, shape (num_bins, T).

Returns:
np.ndarray

Statistical errors, shape (T,).

Notes

The error is computed as:

\[\sigma_i = \sqrt{f \cdot \frac{1}{N_\mathrm{bins}} \sum_b \left(b_i - \bar{x}_i\right)^2}\]

where \(f\) is the prefactor defined by the concrete subclass.

generate_bins(array_raw: ndarray) ndarray

Generate bootstrap bins from raw configurations.

Parameters:
array_rawnp.ndarray

Raw configurations, shape (num_config, T).

Returns:
np.ndarray

Bootstrap bins, shape (num_bins, T).

Raises:
ValueError

If the number of configurations in array_raw does not match self.num_config.

Notes

Each bootstrap sample is the mean of num_config configurations drawn with replacement. The random state is seeded with self.seed so results are fully reproducible.

generate_stats(array_raw: ndarray) tuple[ndarray, ndarray, ndarray]

Compute mean, error, and resampled bins from raw configurations.

Parameters:
array_rawnp.ndarray

Raw configurations, shape (num_config, T).

Returns:
meannp.ndarray

Sample mean, shape (T,).

errnp.ndarray

Statistical error, shape (T,).

binsnp.ndarray

Resampled bins, shape (num_bins, T).