Returns uncertainty scores based on PCA reconstruction errors.
Computes reconstruction error-based uncertainty scores where low scores indicate samples that can be well-reconstructed from principal components (likely inliers) and high scores indicate samples with large reconstruction errors (likely outliers).
See https://arxiv.org/pdf/2402.02949v3 for the method description.
Parameters
pca: TensorPCA = TensorPCA(n_components=0.5, gamma=3.0, M=4096)
-
TensorPCA object to use. Defaults to
TensorPCA(n_components=0.50, gamma=3.0, M=4096) (RFF-PCA retaining 50% explained variance).
Examples
import torch
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
score = PCAScore(pca=TensorPCA(n_components=0.90))
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))
print(result)
{'score': tensor([4.6312, 3.3042, 4.7449, 4.9497, 3.8798, 4.2048, 3.8794, 4.1438, 3.6687,
5.4851], dtype=torch.float64), 'selected': tensor([ True, True, True, True, True, True, True, True, True, False])}
Methods
|
Name
|
Description
|
|
fit()
|
Train an uncertainty score based on sample embeddings.
|
|
score()
|
Compute uncertainty scores for query samples.
|
|
select()
|
Select samples for prediction based on their uncertainty score.
|
|
set_threshold()
|
Set a threshold based on a quantile of the available uncertainty scores.
|
|
get_threshold()
|
Get the current threshold value.
|
|
plot()
|
Plot densities for uncertainty scores.
|
|
plot_embs()
|
Visualize training, validation, and query embeddings in 2D.
|
fit()
Train an uncertainty score based on sample embeddings.
fit(
X=None,
Y=None,
model=None,
loaders=None,
outdir=None,
prefix=None,
q=False
)
This method supports two usage modes:
- Precomputed embeddings: Supply training embeddings via
X and optional calibration embeddings via Y.
- On-the-fly extraction: Supply a
model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.
You must use either embeddings (X/Y) OR model+loaders, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(X=train_embs, Y=val_embs)
# Mode 2: On-the-fly extraction
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None
-
A torch.Tensor with training sample embeddings. Required when not using model and loaders.
Y: torch.Tensor | None = None
-
A torch.Tensor with calibration sample embeddings. Optional.
model: torch.nn.Module | None = None
-
A torch.nn.Module with an .embed() method. Required when not using X.
loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None
-
A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.
outdir: Path | None = None
-
A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.
prefix: str | None = None
-
A str used as filename prefix for saved embeddings. Only used with model and loaders.
q: bool | float = False
-
A
float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.
score()
Compute uncertainty scores for query samples.
score(X=None, model=None, loader=None, outdir=None, prefix=None)
This method supports two usage modes:
- Precomputed embeddings: Supply query embeddings via
X.
- On-the-fly extraction: Supply a
model with an .embed() method and a DataLoader to extract embeddings automatically.
You must use either embeddings (X) OR model+loader, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)
Parameters
X: torch.Tensor | None = None
-
A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None
-
A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None
-
A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None
-
A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None
-
A
str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
torch.Tensor
-
1-D tensor of shape
(N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.
select()
Select samples for prediction based on their uncertainty score.
select(X=None, model=None, loader=None, outdir=None, prefix=None)
This method supports two usage modes:
- Precomputed embeddings: Supply query embeddings via
X.
- On-the-fly extraction: Supply a
model with an .embed() method and a DataLoader to extract embeddings automatically.
You must use either embeddings (X) OR model+loader, but not both.
Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)
Parameters
X: torch.Tensor | None = None
-
A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None
-
A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None
-
A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None
-
A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None
-
A
str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
dict[str, torch.Tensor]
-
A dict with keys
'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).
set_threshold()
Set a threshold based on a quantile of the available uncertainty scores.
Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.
Parameters
q: float = 0.99
-
Quantile in
(0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).
get_threshold()
Get the current threshold value.
plot()
Plot densities for uncertainty scores.
plot(query_scores=None, bins=100)
By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.
Parameters
query_scores: torch.Tensor | None = None
-
A torch.Tensor representing query scores to include in the plot. Defaults to None.
bins: int = 100
-
An
int indicating the number of bins to use for density estimation. Defaults to 100.
plot_embs()
Visualize training, validation, and query embeddings in 2D.
plot_embs(query_embeddings, method="tsne", method_args=None)
Parameters
query_embeddings: torch.Tensor | None
-
Embeddings of query samples to visualize.
method: (tsne, umap) = "tsne"
-
Dimensionality reduction method, by default “tsne”.
method_args: dict[str, Any] | None = None
-
A dictionary of arguments to pass to the dimensionality reduction method, by default None.