scores.EuclideanScore

Returns the KNN-distance based on the Euclidean distance to the nearest samples.

Usage

Source

scores.EuclideanScore()

Computes Euclidean distance-based uncertainty scores where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).

Parameters

k: int = 1

Number of nearest neighbors to use.

stat: (max, mean, median, min) = "max"

Statistic to aggregate distances across the k neighbors.

pca: TensorPCA or None = None

Optional TensorPCA object for dimensionality reduction prior to scoring.

save_index: bool or Path = False
Whether (and where) to save the HNSW index to disk.

Examples

import torch
from seapig.scores import EuclideanScore
score = EuclideanScore(k=5)
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))

Methods

Name Description
fit() Train a uncertainty score based on sample embeddings.
score() Compute uncertainty scores for query samples.
select() Select samples for prediction based on their uncertainty score.
knn_search() Compute the K-nearest-neighbour distances and indices for a set of query embeddings.
set_threshold() Set a threshold based on a quantile of the available uncertainty scores.
get_threshold() Get the current threshold value.
plot() Plot densities for uncertainty scores.
plot_embs() Visualize training, validation, and query embeddings in 2D.

fit()

Train a uncertainty score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

  1. Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
  2. On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None

A torch.Tensor with training sample embeddings. Required when not using model and loaders.

Y: torch.Tensor | None = None

A torch.Tensor with calibration sample embeddings. Optional.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None

A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.

prefix: str | None = None

A str used as filename prefix for saved embeddings. Only used with model and loaders.

q: bool | float = False
A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

score()

Compute uncertainty scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
torch.Tensor
1-D tensor of shape (N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their uncertainty score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
dict[str, torch.Tensor]
A dict with keys 'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).


set_threshold()

Set a threshold based on a quantile of the available uncertainty scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters
q: float = 0.99
Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

get_threshold()

Get the current threshold value.

Usage

Source

get_threshold()

plot()

Plot densities for uncertainty scores.

Usage

Source

plot(query_scores=None, bins=100)

By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.

Parameters
query_scores: torch.Tensor | None = None

A torch.Tensor representing query scores to include in the plot. Defaults to None.

bins: int = 100
An int indicating the number of bins to use for density estimation. Defaults to 100.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)
Parameters
query_embeddings: torch.Tensor | None

Embeddings of query samples to visualize.

method: (tsne, umap) = "tsne"

Dimensionality reduction method, by default “tsne”.

method_args: dict[str, Any] | None = None
A dictionary of arguments to pass to the dimensionality reduction method, by default None.