Data

Data containers for statistical analysis in lattice QCD.

Provides DataBins, DataStats, and DataErr — containers that store mean values alongside resampled bins and propagate errors transparently through arithmetic operations. Standard NumPy functions work directly on these objects via the __array_ufunc__ and __array_function__ hooks.

The submodules data provides the interface to handle data. Once a StatsType object is initialized, we can store the resampled bins in a DataStats objects and perform all the mathematical operation directly among them. For the hooked numpy functionalities, keep in mind that they act on the axis=1 of an internal 2D numpy array with shape=(1+num_bins, T), where ‘num_bins’ is the number of bins, ‘1’ accounts for the mean value and ‘T’ is the lenght of the data (typically the time extent of a correlator).

from labc import stats
from labc import data as dt

num_config = 100
T = 64
statsjack = stats.StatsType.Jack(num_config=num_config)

# 2D array with shape=(num_config, T) containing the raw data
array_raw_in = np.random.normal(size=(num_config, T))
#resample
mean, err, bins = stats.generate_stats(array_raw_in)

# initialize correlator
corr = dt.DataStats(mean, bins, statsjack)

DataBins

class labc.data.DataBins(mean: ndarray | float, bins: ndarray, *args, **kwargs)

Container for binned data supporting arithmetic error propagation.

Essentially a thin wrapper around a (1+num_bins, T) array: the first row stores the mean and the remaining rows store the resampled bins. It defines arithmetic operations (+, -, *, /) that act element-wise on both the mean and all bins simultaneously.

Parameters:

meannp.ndarray or float: Central value(s). A scalar is promoted to a 1-element array.
binsnp.ndarray: Resampled bins, shape (num_bins, T).
*args, **kwargs: Forwarded to _make_class() when constructing derived objects.

Attributes:

meannp.ndarray: Central value(s), shape (T,).
binsnp.ndarray: Resampled bins, shape (num_bins, T).

Methods

num_bins()

Return the number of resampled bins.

num_bins() → int: Return the number of resampled bins.

DataStats

class labc.data.DataStats(mean: np.ndarray | float, bins: np.ndarray, statsType: StatsBase)

Binned data with an associated statistical resampling strategy.

Extends DataBins with error, covariance, and correlation estimates computed via the attached StatsBase resampling object. All three quantities are cached on first access.

Parameters:

meannp.ndarray or float: Central value(s), shape (T,) or scalar.
binsnp.ndarray: Jackknife or bootstrap bins, shape (num_bins, T).
statsTypeStatsBase: Resampling strategy used to compute errors and covariances.

Attributes:

meannp.ndarray: Central value(s), shape (T,).
binsnp.ndarray: Resampled bins, shape (num_bins, T).
statsTypeStatsBase: Resampling strategy attached to this dataset.
errnp.ndarray: Statistical error, shape (T,).
covnp.ndarray: Covariance matrix, shape (T, T).
corrnp.ndarray: Correlation matrix, shape (T, T).

Methods

`num_bins`()	Return the number of resampled bins.
`print`([num_digits, scientific])	Return a list of compact `mean(err)` strings, one per observable.
`rel_diff`(other)	Element-wise relative difference `(self - other) / self`.
`rel_err`()	Absolute relative error `\|err / mean\|`, shape `(T,)`.
`save`(file_out, group, args, *kwargs)	Save mean, error, and bins to an HDF5 file.

property err: ndarray

Statistical error, shape (T,).

Computed by err_func() of the attached statsType and cached after the first call.

rel_err() → ndarray: Absolute relative error |err / mean|, shape (T,).

rel_diff(other: DataStats) → DataStats

Element-wise relative difference (self - other) / self.

Parameters:

otherDataStats: Dataset to compare against. Must have the same T and compatible num_bins.

Returns:

DataStats: Relative difference as a new DataStats.

property cov: ndarray

Covariance matrix, shape (T, T).

Computed by cov() of the attached statsType and cached after the first call.

property corr: ndarray

Correlation matrix, shape (T, T).

Computed by corr() of the attached statsType and cached after the first call.

print(num_digits: int = 2, scientific: bool = False) → list[str]

Return a list of compact mean(err) strings, one per observable.

Parameters:

num_digitsint, optional: Number of significant digits in the error. Default is 2.
scientificbool, optional: If True, use scientific notation. Default is False.

Returns:

list[str]: One formatted string per element of self.

save(file_out: str, group: str, *args, **kwargs) → None

Save mean, error, and bins to an HDF5 file.

Parameters:

file_outstr: Path to the output file. Only .h5 is currently supported.
groupstr: HDF5 group name under which the data are stored.
*args, **kwargs: Forwarded to the writer’s add_mean / add_err / add_bins methods.

num_bins() → int: Return the number of resampled bins.

DataErr

class labc.data.DataErr(mean: ndarray | float, err_or_cov: ndarray, *, seed: int | None = None)

Observable with Gaussian uncertainty defined by a mean and covariance.

Stores a central value and a covariance matrix and propagates errors analytically through arithmetic operations. When combined with a DataStats object, the covariance is converted on-the-fly to compatible resampled bins.

Parameters:

meannp.ndarray or float: Central value(s), shape (T,) or scalar.
err_or_covnp.ndarray: Either a 1-D error array of shape (T,) (sqrt of diagonal covariance) or a full 2-D covariance matrix of shape (T, T).
seedint or None, optional: Seed for the random number generator used when sampling bins. Fixing the seed makes bin generation reproducible.

Attributes:

meannp.ndarray: Central value(s), shape (T,).
errnp.ndarray: Diagonal errors sqrt(diag(cov)), shape (T,).
covnp.ndarray: Covariance matrix, shape (T, T).
corrnp.ndarray: Correlation matrix, shape (T, T).
seedint or None: Random seed for bin generation.

Methods

`bins`([num_bins, statsType])	Return sampled bins, shape `(num_bins, T)`.
`to_dataStats`(num_bins, statsType)	Convert to a `DataStats` with num_bins bins.

print

property num_bins: int: Default number of bins used when sampling without a target DataStats.

bins(num_bins: int | None = None, statsType: StatsBase | None = None) → np.ndarray

Return sampled bins, shape (num_bins, T).

Parameters:

num_binsint, optional: Number of bins. Defaults to self.num_bins.
statsTypeStatsBase, optional: If provided, bins are structured as jackknife/bootstrap samples via statsType.generate_bins.

Returns:

np.ndarray: Resampled bins, shape (num_bins, T).

to_dataStats(num_bins: int, statsType: StatsBase) → DataStats

Convert to a DataStats with num_bins bins. If statsType is provided with num_bins!=None, it must have num_bins=statsType.num_bins.

Parameters:

num_binsint: Number of bins to generate.
statsTypeStatsBase: Resampling strategy for the output DataStats.

Returns:

DataStats: New DataStats with mean and binned representation of this object’s Gaussian uncertainty.

Utilities

labc.data.merge(*data_in: DataStats | DataErr) → DataStats | DataErr

Concatenate multiple objects along the observable axis.

All inputs must be of the same type (DataStats or DataErr). For DataStats the statsType is taken from the first element; for DataErr the covariance is assembled as a block-diagonal matrix (no cross-correlations between inputs).

Parameters:

*data_inDataStats or DataErr: Objects to concatenate. May also be passed as a single list or np.ndarray of objects.

Returns:

DataStats or DataErr: Concatenated object of the same type as the inputs.

labc.data.zeros(T: int, num_bins_or_statstype: int | StatsBase) → DataBins | DataStats

Return a DataBins or DataStats filled with zeros.

Parameters:

Tint: Number of observables.
num_bins_or_statstypeint or StatsBase: Number of bins (returns DataBins) or a StatsBase object (returns DataStats).

labc.data.ones(T: int, num_bins_or_statstype: int | StatsBase) → DataBins | DataStats

Return a DataBins or DataStats filled with ones.

Parameters:

Tint: Number of observables.
num_bins_or_statstypeint or StatsBase: Number of bins (returns DataBins) or a StatsBase object (returns DataStats).

labc.data.empty(T: int, num_bins_or_statstype: int | StatsBase) → DataBins | DataStats

Return a DataBins or DataStats with uninitialized values.

Parameters:

Tint: Number of observables.
num_bins_or_statstypeint or StatsBase: Number of bins (returns DataBins) or a StatsBase object (returns DataStats).

labc.data.constant(const: float, num_bins_or_statstype: int | StatsBase) → DataBins | DataStats

Return a length-1 object with all values equal to const.

Parameters:

constfloat: Constant value for both mean and bins.
num_bins_or_statstypeint or StatsBase: Passed to ones().

labc.data.gaussian(T: int, statsType: StatsBase, mu: float = 0.0, sigma: float = 1.0) → DataStats

Generate a DataStats with Gaussian bins, mean=*mu*, err=*sigma*.

Bins are drawn from \(\mathcal{N}(\mu,\,(\sigma/\sqrt{f})^2)\) where \(f\) is the statsType prefactor, so that err_func returns sigma regardless of the resampling strategy.

Parameters:

Tint: Number of observables.
statsTypeStatsBase: Resampling strategy (jackknife or bootstrap).
mufloat, optional: Mean value. Default is 0.0.
sigmafloat, optional: Target error. Default is 1.0.

Returns:

DataStats: Dataset with mean = mu and err = sigma.

labc.data.uniform(T: int, statsType: StatsBase, low: float = 0.0, high: float = 1.0) → DataStats

Generate a DataStats with uniform bins and err=(high-low)/sqrt(12).

Bins are drawn from \(\mathrm{Uniform}(\mathrm{low},\mathrm{high})\) and then divided by \(\sqrt{f}\) (the statsType prefactor) so that err_func returns (high-low)/sqrt(12) for any resampling strategy.

Parameters:

Tint: Number of observables.
statsTypeStatsBase: Resampling strategy (jackknife or bootstrap).
lowfloat, optional: Lower bound of the uniform distribution. Default is 0.0.
highfloat, optional: Upper bound of the uniform distribution. Default is 1.0.

Returns:

DataStats: Dataset with mean = (low+high)/2 and err ≈ (high-low)/sqrt(12).

labc.data.Z2(T: int, statsType: StatsBase) → DataStats

Generate a DataStats with \(\mathbb{Z}_2\) bins, mean=0, err≈1.

Each bin entry is drawn from {-1/sqrt(f), +1/sqrt(f)} with equal probability, where \(f\) is the statsType prefactor.

Parameters:

Tint: Number of observables.
statsTypeStatsBase: Resampling strategy (jackknife or bootstrap).

Returns:

DataStats: Dataset with mean = 0 and err ≈ 1.

Decorators

labc.data.dataStats_args(func)

Decorator that lifts a scalar function to accept DataStats arguments.

The decorated function is called once on the means and once per bin, then the results are assembled into a new DataStats. Use dataStats_vectorized_args() when the function can broadcast over the stacked (1+num_bins, T) array for better performance.

labc.data.dataStats_vectorized_args(func)

Decorator that lifts a vectorized function to accept DataStats arguments.

The decorated function is called once on the stacked (1+num_bins, T) data array (mean in row 0, bins in rows 1…). This is faster than dataStats_args() when the underlying function supports broadcasting over the extra leading axis.

labc.data.dataStats_func(func): Decorator for functions that return functions.