Data

Data containers for statistical analysis in lattice QCD.

Provides DataBins, DataStats, and DataErr — containers that store mean values alongside resampled bins and propagate errors transparently through arithmetic operations. Standard NumPy functions work directly on these objects via the __array_ufunc__ and __array_function__ hooks.

The submodules data provides the interface to handle data. Once a StatsType object is initialized, we can store the resampled bins in a DataStats objects and perform all the mathematical operation directly among them. For the hooked numpy functionalities, keep in mind that they act on the axis=1 of an internal 2D numpy array with shape=(1+num_bins, T), where ‘num_bins’ is the number of bins, ‘1’ accounts for the mean value and ‘T’ is the lenght of the data (typically the time extent of a correlator).

from labc import stats
from labc import data as dt

num_config = 100
T = 64
statsjack = stats.StatsType.Jack(num_config=num_config)

# 2D array with shape=(num_config, T) containing the raw data
array_raw_in = np.random.normal(size=(num_config, T))
#resample
mean, err, bins = stats.generate_stats(array_raw_in)

# initialize correlator
corr = dt.DataStats(mean, bins, statsjack)

DataBins

class labc.data.DataBins(mean: ndarray | float, bins: ndarray, *args, **kwargs)

Container for binned data supporting arithmetic error propagation.

Essentially a thin wrapper around a (1+num_bins, T) array: the first row stores the mean and the remaining rows store the resampled bins. It defines arithmetic operations (+, -, *, /) that act element-wise on both the mean and all bins simultaneously.

Parameters:
meannp.ndarray or float

Central value(s). A scalar is promoted to a 1-element array.

binsnp.ndarray

Resampled bins, shape (num_bins, T).

*args, **kwargs

Forwarded to _make_class() when constructing derived objects.

Attributes:
meannp.ndarray

Central value(s), shape (T,).

binsnp.ndarray

Resampled bins, shape (num_bins, T).

Methods

num_bins()

Return the number of resampled bins.

num_bins() int

Return the number of resampled bins.

DataStats

class labc.data.DataStats(mean: np.ndarray | float, bins: np.ndarray, statsType: StatsBase)

Binned data with an associated statistical resampling strategy.

Extends DataBins with error, covariance, and correlation estimates computed via the attached StatsBase resampling object. All three quantities are cached on first access.

Parameters:
meannp.ndarray or float

Central value(s), shape (T,) or scalar.

binsnp.ndarray

Jackknife or bootstrap bins, shape (num_bins, T).

statsTypeStatsBase

Resampling strategy used to compute errors and covariances.

Attributes:
meannp.ndarray

Central value(s), shape (T,).

binsnp.ndarray

Resampled bins, shape (num_bins, T).

statsTypeStatsBase

Resampling strategy attached to this dataset.

errnp.ndarray

Statistical error, shape (T,).

covnp.ndarray

Covariance matrix, shape (T, T).

corrnp.ndarray

Correlation matrix, shape (T, T).

Methods

num_bins()

Return the number of resampled bins.

print([num_digits, scientific])

Return a list of compact mean(err) strings, one per observable.

rel_diff(other)

Element-wise relative difference (self - other) / self.

rel_err()

Absolute relative error |err / mean|, shape (T,).

save(file_out, group, *args, **kwargs)

Save mean, error, and bins to an HDF5 file.

property err: ndarray

Statistical error, shape (T,).

Computed by err_func() of the attached statsType and cached after the first call.

rel_err() ndarray

Absolute relative error |err / mean|, shape (T,).

rel_diff(other: DataStats) DataStats

Element-wise relative difference (self - other) / self.

Parameters:
otherDataStats

Dataset to compare against. Must have the same T and compatible num_bins.

Returns:
DataStats

Relative difference as a new DataStats.

property cov: ndarray

Covariance matrix, shape (T, T).

Computed by cov() of the attached statsType and cached after the first call.

property corr: ndarray

Correlation matrix, shape (T, T).

Computed by corr() of the attached statsType and cached after the first call.

print(num_digits: int = 2, scientific: bool = False) list[str]

Return a list of compact mean(err) strings, one per observable.

Parameters:
num_digitsint, optional

Number of significant digits in the error. Default is 2.

scientificbool, optional

If True, use scientific notation. Default is False.

Returns:
list[str]

One formatted string per element of self.

save(file_out: str, group: str, *args, **kwargs) None

Save mean, error, and bins to an HDF5 file.

Parameters:
file_outstr

Path to the output file. Only .h5 is currently supported.

groupstr

HDF5 group name under which the data are stored.

*args, **kwargs

Forwarded to the writer’s add_mean / add_err / add_bins methods.

num_bins() int

Return the number of resampled bins.

DataErr

class labc.data.DataErr(mean: ndarray | float, err_or_cov: ndarray, *, seed: int | None = None)

Observable with Gaussian uncertainty defined by a mean and covariance.

Stores a central value and a covariance matrix and propagates errors analytically through arithmetic operations. When combined with a DataStats object, the covariance is converted on-the-fly to compatible resampled bins.

Parameters:
meannp.ndarray or float

Central value(s), shape (T,) or scalar.

err_or_covnp.ndarray

Either a 1-D error array of shape (T,) (sqrt of diagonal covariance) or a full 2-D covariance matrix of shape (T, T).

seedint or None, optional

Seed for the random number generator used when sampling bins. Fixing the seed makes bin generation reproducible.

Attributes:
meannp.ndarray

Central value(s), shape (T,).

errnp.ndarray

Diagonal errors sqrt(diag(cov)), shape (T,).

covnp.ndarray

Covariance matrix, shape (T, T).

corrnp.ndarray

Correlation matrix, shape (T, T).

seedint or None

Random seed for bin generation.

Methods

bins([num_bins, statsType])

Return sampled bins, shape (num_bins, T).

to_dataStats(num_bins, statsType)

Convert to a DataStats with num_bins bins.

print

property num_bins: int

Default number of bins used when sampling without a target DataStats.

bins(num_bins: int | None = None, statsType: StatsBase | None = None) np.ndarray

Return sampled bins, shape (num_bins, T).

Parameters:
num_binsint, optional

Number of bins. Defaults to self.num_bins.

statsTypeStatsBase, optional

If provided, bins are structured as jackknife/bootstrap samples via statsType.generate_bins.

Returns:
np.ndarray

Resampled bins, shape (num_bins, T).

to_dataStats(num_bins: int, statsType: StatsBase) DataStats

Convert to a DataStats with num_bins bins. If statsType is provided with num_bins!=None, it must have num_bins=statsType.num_bins.

Parameters:
num_binsint

Number of bins to generate.

statsTypeStatsBase

Resampling strategy for the output DataStats.

Returns:
DataStats

New DataStats with mean and binned representation of this object’s Gaussian uncertainty.

Utilities

labc.data.merge(*data_in: DataStats | DataErr) DataStats | DataErr

Concatenate multiple objects along the observable axis.

All inputs must be of the same type (DataStats or DataErr). For DataStats the statsType is taken from the first element; for DataErr the covariance is assembled as a block-diagonal matrix (no cross-correlations between inputs).

Parameters:
*data_inDataStats or DataErr

Objects to concatenate. May also be passed as a single list or np.ndarray of objects.

Returns:
DataStats or DataErr

Concatenated object of the same type as the inputs.

labc.data.zeros(T: int, num_bins_or_statstype: int | StatsBase) DataBins | DataStats

Return a DataBins or DataStats filled with zeros.

Parameters:
Tint

Number of observables.

num_bins_or_statstypeint or StatsBase

Number of bins (returns DataBins) or a StatsBase object (returns DataStats).

labc.data.ones(T: int, num_bins_or_statstype: int | StatsBase) DataBins | DataStats

Return a DataBins or DataStats filled with ones.

Parameters:
Tint

Number of observables.

num_bins_or_statstypeint or StatsBase

Number of bins (returns DataBins) or a StatsBase object (returns DataStats).

labc.data.empty(T: int, num_bins_or_statstype: int | StatsBase) DataBins | DataStats

Return a DataBins or DataStats with uninitialized values.

Parameters:
Tint

Number of observables.

num_bins_or_statstypeint or StatsBase

Number of bins (returns DataBins) or a StatsBase object (returns DataStats).

labc.data.constant(const: float, num_bins_or_statstype: int | StatsBase) DataBins | DataStats

Return a length-1 object with all values equal to const.

Parameters:
constfloat

Constant value for both mean and bins.

num_bins_or_statstypeint or StatsBase

Passed to ones().

labc.data.gaussian(T: int, statsType: StatsBase, mu: float = 0.0, sigma: float = 1.0) DataStats

Generate a DataStats with Gaussian bins, mean=*mu*, err=*sigma*.

Bins are drawn from \(\mathcal{N}(\mu,\,(\sigma/\sqrt{f})^2)\) where \(f\) is the statsType prefactor, so that err_func returns sigma regardless of the resampling strategy.

Parameters:
Tint

Number of observables.

statsTypeStatsBase

Resampling strategy (jackknife or bootstrap).

mufloat, optional

Mean value. Default is 0.0.

sigmafloat, optional

Target error. Default is 1.0.

Returns:
DataStats

Dataset with mean = mu and err = sigma.

labc.data.uniform(T: int, statsType: StatsBase, low: float = 0.0, high: float = 1.0) DataStats

Generate a DataStats with uniform bins and err=(high-low)/sqrt(12).

Bins are drawn from \(\mathrm{Uniform}(\mathrm{low},\mathrm{high})\) and then divided by \(\sqrt{f}\) (the statsType prefactor) so that err_func returns (high-low)/sqrt(12) for any resampling strategy.

Parameters:
Tint

Number of observables.

statsTypeStatsBase

Resampling strategy (jackknife or bootstrap).

lowfloat, optional

Lower bound of the uniform distribution. Default is 0.0.

highfloat, optional

Upper bound of the uniform distribution. Default is 1.0.

Returns:
DataStats

Dataset with mean = (low+high)/2 and err (high-low)/sqrt(12).

labc.data.Z2(T: int, statsType: StatsBase) DataStats

Generate a DataStats with \(\mathbb{Z}_2\) bins, mean=0, err≈1.

Each bin entry is drawn from {-1/sqrt(f), +1/sqrt(f)} with equal probability, where \(f\) is the statsType prefactor.

Parameters:
Tint

Number of observables.

statsTypeStatsBase

Resampling strategy (jackknife or bootstrap).

Returns:
DataStats

Dataset with mean = 0 and err 1.

Decorators

labc.data.dataStats_args(func)

Decorator that lifts a scalar function to accept DataStats arguments.

The decorated function is called once on the means and once per bin, then the results are assembled into a new DataStats. Use dataStats_vectorized_args() when the function can broadcast over the stacked (1+num_bins, T) array for better performance.

labc.data.dataStats_vectorized_args(func)

Decorator that lifts a vectorized function to accept DataStats arguments.

The decorated function is called once on the stacked (1+num_bins, T) data array (mean in row 0, bins in rows 1…). This is faster than dataStats_args() when the underlying function supports broadcasting over the extra leading axis.

labc.data.dataStats_func(func)

Decorator for functions that return functions.