scores.PCAScore

Returns uncertainty scores based on PCA reconstruction errors.

Usage

Source

scores.PCAScore()

Computes reconstruction error-based uncertainty scores where low scores indicate samples that can be well-reconstructed from principal components (likely inliers) and high scores indicate samples with large reconstruction errors (likely outliers).

See https://arxiv.org/pdf/2402.02949v3 for the method description.

Parameters

pca: TensorPCA = TensorPCA(n_components=0.5, gamma=3.0, M=4096)
TensorPCA object to use. Defaults to TensorPCA(n_components=0.50, gamma=3.0, M=4096) (RFF-PCA retaining 50% explained variance).

Examples

import torch
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA

score = PCAScore(pca=TensorPCA(n_components=0.90))
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))
print(result)
{'score': tensor([4.6312, 3.3042, 4.7449, 4.9497, 3.8798, 4.2048, 3.8794, 4.1438, 3.6687,
        5.4851], dtype=torch.float64), 'selected': tensor([ True,  True,  True,  True,  True,  True,  True,  True,  True, False])}

Methods

Name Description
fit() Train an uncertainty score based on sample embeddings.
score() Compute uncertainty scores for query samples.
select() Select samples for prediction based on their uncertainty score.
set_threshold() Set a threshold based on a quantile of the available uncertainty scores.
get_threshold() Get the current threshold value.
plot() Plot densities for uncertainty scores.
plot_embs() Visualize training, validation, and query embeddings in 2D.

fit()

Train an uncertainty score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

  1. Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
  2. On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None

A torch.Tensor with training sample embeddings. Required when not using model and loaders.

Y: torch.Tensor | None = None

A torch.Tensor with calibration sample embeddings. Optional.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None

A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.

prefix: str | None = None

A str used as filename prefix for saved embeddings. Only used with model and loaders.

q: bool | float = False
A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

score()

Compute uncertainty scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
torch.Tensor
1-D tensor of shape (N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their uncertainty score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
dict[str, torch.Tensor]
A dict with keys 'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).

set_threshold()

Set a threshold based on a quantile of the available uncertainty scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters
q: float = 0.99
Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

get_threshold()

Get the current threshold value.

Usage

Source

get_threshold()

plot()

Plot densities for uncertainty scores.

Usage

Source

plot(query_scores=None, bins=100)

By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.

Parameters
query_scores: torch.Tensor | None = None

A torch.Tensor representing query scores to include in the plot. Defaults to None.

bins: int = 100
An int indicating the number of bins to use for density estimation. Defaults to 100.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)
Parameters
query_embeddings: torch.Tensor | None

Embeddings of query samples to visualize.

method: (tsne, umap) = "tsne"

Dimensionality reduction method, by default “tsne”.

method_args: dict[str, Any] | None = None
A dictionary of arguments to pass to the dimensionality reduction method, by default None.