scores.utils.TensorPCA

Tensor-based PCA.

Usage

Source

scores.utils.TensorPCA()

Supports standard (linear) PCA and an optional Random Fourier Feature (RFF) mapping prior to PCA.

Operation Modes

  • linear: standard linear PCA.
  • rff: apply a Random Fourier Feature mapping before PCA.

Mode selection follows the constructor arguments: the RFF branch is inferred only when both gamma and M are provided, unless mode is set explicitly. Supplying only one of these values does not enable the RFF mapping by itself.

Saving / Loading

Persist the module with the standard PyTorch state-dict API. The module registers persistent buffers for PCA and RFF state:

torch.save(instance.state_dict(), "tpca.pt")
tpca2 = TensorPCA(n_components=..., gamma=..., M=..., mode=...)
sd = torch.load("tpca.pt")
tpca2.load_state_dict(sd)

The custom _load_from_state_dict accepts placeholder or differently- shaped tensors and will set or register buffers to avoid size-mismatch errors on fresh instances.

Notes

PCA internals are stored in float64 for numerical fidelity. During preprocessing, inputs are cast to float64 to match the stored mean.

See https://arxiv.org/pdf/2505.15284 for motivation behind RFF-PCA.

Examples

import torch
from seapig.scores.utils import TensorPCA
pca = TensorPCA(n_components=0.90)
X = torch.randn(100, 32)
pca.fit(X)
Z = pca.transform(X)        # projected to lower dimension
X_rec, err = pca.reconstruct(X)  # reconstruction and per-sample L2 error
print(err)
tensor([2.0058, 1.9291, 1.8213, 1.7371, 2.4389, 1.7318, 1.8956, 1.6139, 2.7999,
        1.3375, 1.8060, 1.5045, 1.0151, 1.5380, 1.7393, 2.3805, 1.9327, 1.2392,
        1.8813, 1.4295, 1.4910, 1.8490, 1.7813, 2.3088, 2.0619, 1.5390, 1.9259,
        1.9640, 1.4325, 1.7071, 1.9314, 2.0367, 1.1526, 2.0713, 0.5289, 1.2012,
        2.3030, 1.5801, 1.4331, 0.9826, 1.5877, 2.2835, 1.0568, 1.1616, 2.0296,
        2.1218, 2.2270, 1.6374, 1.6830, 2.2425, 1.7872, 2.3373, 1.6275, 1.1028,
        1.5290, 1.5754, 1.7815, 1.5694, 2.1446, 1.7456, 1.5061, 1.5564, 2.0463,
        1.7947, 1.5016, 1.9780, 1.7616, 1.9326, 2.1084, 0.9584, 1.8802, 2.0631,
        1.8875, 1.5057, 1.6701, 2.1497, 1.4630, 0.8322, 1.4700, 1.5116, 1.8499,
        1.5569, 1.9381, 1.4179, 1.2711, 1.7404, 1.1604, 1.6745, 2.1154, 2.3305,
        2.0704, 0.8755, 1.9547, 1.9037, 1.9707, 1.8827, 0.9674, 1.8882, 1.4635,
        1.7248], dtype=torch.float64)

Methods

Name Description
__init__() Initialise TensorPCA.
finalize() Finalize partial fit: compute covariance SVD and set PCA params.
fit() Fit PCA on the input data X.
fit_transform() Fit PCA on X and return the projected components.
inverse_transform() Reconstruct samples from principal component scores.
partial_fit() Process a single batch for incremental PCA.
reconstruct() Reconstruct an input and return the L2 reconstruction error.
reset_partial() Reset internal accumulators used for partial fitting.
transform() Project input samples onto the retained principal components.

__init__()

Initialise TensorPCA.

Usage

Source

__init__(n_components=0.9, gamma=None, M=None, mode=None)
Parameters
n_components: int or float = 0.90

If an int, the exact number of principal components to retain (must be > 0). If a float in (0, 1], the minimum cumulative explained variance to retain. Defaults to 0.90 (90% variance).

gamma: float or None = None

Bandwidth parameter for the RFF kernel. If provided together with M, RFF mode is enabled automatically.

M: int or None = None

Number of RFF random features (must be > input dimensionality D). If provided together with gamma, RFF mode is enabled automatically.

mode: (linear, rff) = "linear"
Explicit mode override. When None, the mode is inferred from gamma and M.

finalize()

Finalize partial fit: compute covariance SVD and set PCA params.

Usage

Source

finalize()

This method computes the overall mean and centred covariance from accumulated sums and performs SVD to extract principal components.


fit()

Fit PCA on the input data X.

Usage

Source

fit(X, Y=None)

Convenience method that runs a single-batch partial_fit followed by finalize. For large datasets or streaming data, use the incremental partial_fit / finalize interface instead.

Parameters
X: torch.Tensor

Input data of shape (N, D).

Y: None = None
Ignored. Present for API compatibility.

fit_transform()

Fit PCA on X and return the projected components.

Usage

Source

fit_transform(X, Y=None)
Parameters
X: torch.Tensor

Input data of shape (N, D).

Y: None = None
Ignored. Present for API compatibility.
Returns
torch.Tensor
Projected data of shape (N, q) where q is the number of retained components.

inverse_transform()

Reconstruct samples from principal component scores.

Usage

Source

inverse_transform(Z)
Parameters
Z: torch.Tensor
Component scores of shape (N, q).
Returns
torch.Tensor
Reconstructed samples in the preprocessed space, shape (N, D) or (N, M) if RFF mode is used.

partial_fit()

Process a single batch for incremental PCA.

Usage

Source

partial_fit(X)

This accumulates sufficient statistics (sum of samples and sum of outer products) which are later finalised in finalize() to produce the PCA decomposition.


reconstruct()

Reconstruct an input and return the L2 reconstruction error.

Usage

Source

reconstruct(X)

reset_partial()

Reset internal accumulators used for partial fitting.

Usage

Source

reset_partial()

transform()

Project input samples onto the retained principal components.

Usage

Source

transform(X)
Parameters
X: torch.Tensor
Input data of shape (N, D).
Returns
torch.Tensor
Projected data of shape (N, q) where q is the number of retained components.