scores.PCAScore

Returns confidence scores based on PCA reconstruction errors.

Usage

Source

scores.PCAScore()

Computes reconstruction error-based confidence scores where low scores indicate samples that can be well-reconstructed from principal components (likely inliers) and high scores indicate samples with large reconstruction errors (likely outliers).

See https://arxiv.org/pdf/2402.02949v3 for the method description.

Parameters

pca: TensorPCA = TensorPCA(n_components=0.5, gamma=3.0, M=4096)
PCA configuration to use. Defaults to TensorPCA(n_components=0.50, gamma=3.0, M=4096) (RFF-PCA retaining 50% explained variance).

Examples

import torch
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
score = PCAScore(pca=TensorPCA(n_components=0.90))
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))

Attributes

Name Description
ident str(object=’’) -> str

ident

str(object=’’) -> str

ident="pca"

str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

Methods

Name Description
fit() Train a confidence score based on sample embeddings.

fit()

Train a confidence score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

  1. Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
  2. On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None

A torch.Tensor with training sample embeddings. Required when not using model and loaders.

Y: torch.Tensor | None = None

A torch.Tensor with calibration sample embeddings. Optional.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None

A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.

prefix: str | None = None

A str used as filename prefix for saved embeddings. Only used with model and loaders.

q: bool | float = False
A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

See Also