Data
Data containers for statistical analysis in lattice QCD.
Provides DataBins, DataStats, and DataErr —
containers that store mean values alongside resampled bins and propagate
errors transparently through arithmetic operations. Standard NumPy
functions work directly on these objects via the __array_ufunc__ and
__array_function__ hooks.
The submodules data provides the interface to handle data. Once a StatsType object is initialized,
we can store the resampled bins in a DataStats objects and perform all the mathematical operation
directly among them. For the hooked numpy functionalities, keep
in mind that they act on the axis=1 of an internal 2D numpy array
with shape=(1+num_bins, T),
where ‘num_bins’ is the number of bins,
‘1’ accounts for the mean value
and ‘T’ is the lenght of the data (typically the time extent of a correlator).
from labc import stats
from labc import data as dt
num_config = 100
T = 64
statsjack = stats.StatsType.Jack(num_config=num_config)
# 2D array with shape=(num_config, T) containing the raw data
array_raw_in = np.random.normal(size=(num_config, T))
#resample
mean, err, bins = stats.generate_stats(array_raw_in)
# initialize correlator
corr = dt.DataStats(mean, bins, statsjack)
DataBins
- class labc.data.DataBins(mean: ndarray | float, bins: ndarray, *args, **kwargs)
Container for binned data supporting arithmetic error propagation.
Essentially a thin wrapper around a
(1+num_bins, T)array: the first row stores the mean and the remaining rows store the resampled bins. It defines arithmetic operations (+,-,*,/) that act element-wise on both the mean and all bins simultaneously.- Parameters:
- mean
np.ndarrayorfloat Central value(s). A scalar is promoted to a 1-element array.
- bins
np.ndarray Resampled bins, shape
(num_bins, T).- *args, **kwargs
Forwarded to
_make_class()when constructing derived objects.
- mean
- Attributes:
- mean
np.ndarray Central value(s), shape
(T,).- bins
np.ndarray Resampled bins, shape
(num_bins, T).
- mean
Methods
num_bins()Return the number of resampled bins.
DataStats
- class labc.data.DataStats(mean: np.ndarray | float, bins: np.ndarray, statsType: StatsBase)
Binned data with an associated statistical resampling strategy.
Extends
DataBinswith error, covariance, and correlation estimates computed via the attachedStatsBaseresampling object. All three quantities are cached on first access.- Parameters:
- mean
np.ndarrayorfloat Central value(s), shape
(T,)or scalar.- bins
np.ndarray Jackknife or bootstrap bins, shape
(num_bins, T).- statsType
StatsBase Resampling strategy used to compute errors and covariances.
- mean
- Attributes:
- mean
np.ndarray Central value(s), shape
(T,).- bins
np.ndarray Resampled bins, shape
(num_bins, T).- statsType
StatsBase Resampling strategy attached to this dataset.
errnp.ndarrayStatistical error, shape
(T,).covnp.ndarrayCovariance matrix, shape
(T, T).corrnp.ndarrayCorrelation matrix, shape
(T, T).
- mean
Methods
num_bins()Return the number of resampled bins.
print([num_digits, scientific])Return a list of compact
mean(err)strings, one per observable.rel_diff(other)Element-wise relative difference
(self - other) / self.rel_err()Absolute relative error
|err / mean|, shape(T,).save(file_out, group, *args, **kwargs)Save mean, error, and bins to an HDF5 file.
- property err: ndarray
Statistical error, shape
(T,).Computed by
err_func()of the attachedstatsTypeand cached after the first call.
- property cov: ndarray
Covariance matrix, shape
(T, T).Computed by
cov()of the attachedstatsTypeand cached after the first call.
- property corr: ndarray
Correlation matrix, shape
(T, T).Computed by
corr()of the attachedstatsTypeand cached after the first call.
- print(num_digits: int = 2, scientific: bool = False) list[str]
Return a list of compact
mean(err)strings, one per observable.
DataErr
- class labc.data.DataErr(mean: ndarray | float, err_or_cov: ndarray, *, seed: int | None = None)
Observable with Gaussian uncertainty defined by a mean and covariance.
Stores a central value and a covariance matrix and propagates errors analytically through arithmetic operations. When combined with a
DataStatsobject, the covariance is converted on-the-fly to compatible resampled bins.- Parameters:
- mean
np.ndarrayorfloat Central value(s), shape
(T,)or scalar.- err_or_cov
np.ndarray Either a 1-D error array of shape
(T,)(sqrt of diagonal covariance) or a full 2-D covariance matrix of shape(T, T).- seed
intorNone,optional Seed for the random number generator used when sampling bins. Fixing the seed makes bin generation reproducible.
- mean
- Attributes:
- mean
np.ndarray Central value(s), shape
(T,).- err
np.ndarray Diagonal errors
sqrt(diag(cov)), shape(T,).- cov
np.ndarray Covariance matrix, shape
(T, T).- corr
np.ndarray Correlation matrix, shape
(T, T).- seed
intorNone Random seed for bin generation.
- mean
Methods
bins([num_bins, statsType])Return sampled bins, shape
(num_bins, T).to_dataStats(num_bins, statsType)Convert to a
DataStatswith num_bins bins.print
- bins(num_bins: int | None = None, statsType: StatsBase | None = None) np.ndarray
Return sampled bins, shape
(num_bins, T).- Parameters:
- num_bins
int,optional Number of bins. Defaults to
self.num_bins.- statsType
StatsBase,optional If provided, bins are structured as jackknife/bootstrap samples via
statsType.generate_bins.
- num_bins
- Returns:
np.ndarrayResampled bins, shape
(num_bins, T).
Utilities
- labc.data.merge(*data_in: DataStats | DataErr) DataStats | DataErr
Concatenate multiple objects along the observable axis.
All inputs must be of the same type (
DataStatsorDataErr). ForDataStatsthestatsTypeis taken from the first element; forDataErrthe covariance is assembled as a block-diagonal matrix (no cross-correlations between inputs).
- labc.data.constant(const: float, num_bins_or_statstype: int | StatsBase) DataBins | DataStats
Return a length-1 object with all values equal to const.
- labc.data.gaussian(T: int, statsType: StatsBase, mu: float = 0.0, sigma: float = 1.0) DataStats
Generate a
DataStatswith Gaussian bins, mean=*mu*, err=*sigma*.Bins are drawn from \(\mathcal{N}(\mu,\,(\sigma/\sqrt{f})^2)\) where \(f\) is the
statsTypeprefactor, so thaterr_funcreturns sigma regardless of the resampling strategy.
- labc.data.uniform(T: int, statsType: StatsBase, low: float = 0.0, high: float = 1.0) DataStats
Generate a
DataStatswith uniform bins and err=(high-low)/sqrt(12).Bins are drawn from \(\mathrm{Uniform}(\mathrm{low},\mathrm{high})\) and then divided by \(\sqrt{f}\) (the
statsTypeprefactor) so thaterr_funcreturns(high-low)/sqrt(12)for any resampling strategy.- Parameters:
- Returns:
DataStatsDataset with
mean = (low+high)/2anderr ≈ (high-low)/sqrt(12).
Decorators
- labc.data.dataStats_args(func)
Decorator that lifts a scalar function to accept
DataStatsarguments.The decorated function is called once on the means and once per bin, then the results are assembled into a new
DataStats. UsedataStats_vectorized_args()when the function can broadcast over the stacked(1+num_bins, T)array for better performance.
- labc.data.dataStats_vectorized_args(func)
Decorator that lifts a vectorized function to accept
DataStatsarguments.The decorated function is called once on the stacked
(1+num_bins, T)data array (mean in row 0, bins in rows 1…). This is faster thandataStats_args()when the underlying function supports broadcasting over the extra leading axis.
- labc.data.dataStats_func(func)
Decorator for functions that return functions.