scores.CosineScore

Returns the KNN-distance based on the cosine distance to the nearest samples.

Usage

Source

scores.CosineScore()

Computes cosine distance-based uncertainty scores where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).

The cosine distance is computed as (1 - cosine_similarity), with a range of [0, 2] where 0 indicates identical vectors, 1 indicates orthogonal vectors, and 2 indicates opposite vectors.

Parameters

k: int = 1

Number of nearest neighbors to use.

stat: (max, mean, median, min) = "max"

Statistic to aggregate distances across the k neighbors.

pca: TensorPCA or None = None

Optional TensorPCA object for dimensionality reduction prior to scoring.

save_index: bool or Path = False
Whether (and where) to save the HNSW index to disk.

Methods

Name Description
fit() Train a uncertainty score based on sample embeddings.
score() Compute uncertainty scores for query samples.
select() Select samples for prediction based on their uncertainty score.
knn_search() Compute the K-nearest-neighbour distances and indices for a set of query embeddings.
set_threshold() Set a threshold based on a quantile of the available uncertainty scores.
get_threshold() Get the current threshold value.
plot() Plot densities for uncertainty scores.
plot_embs() Visualize training, validation, and query embeddings in 2D.

fit()

Train a uncertainty score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

  1. Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
  2. On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None

A torch.Tensor with training sample embeddings. Required when not using model and loaders.

Y: torch.Tensor | None = None

A torch.Tensor with calibration sample embeddings. Optional.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None

A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.

prefix: str | None = None

A str used as filename prefix for saved embeddings. Only used with model and loaders.

q: bool | float = False
A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

score()

Compute uncertainty scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
torch.Tensor
1-D tensor of shape (N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their uncertainty score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
dict[str, torch.Tensor]
A dict with keys 'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).


set_threshold()

Set a threshold based on a quantile of the available uncertainty scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters
q: float = 0.99
Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

get_threshold()

Get the current threshold value.

Usage

Source

get_threshold()

plot()

Plot densities for uncertainty scores.

Usage

Source

plot(query_scores=None, bins=100)

By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.

Parameters
query_scores: torch.Tensor | None = None

A torch.Tensor representing query scores to include in the plot. Defaults to None.

bins: int = 100
An int indicating the number of bins to use for density estimation. Defaults to 100.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)
Parameters
query_embeddings: torch.Tensor | None

Embeddings of query samples to visualize.

method: (tsne, umap) = "tsne"

Dimensionality reduction method, by default “tsne”.

method_args: dict[str, Any] | None = None
A dictionary of arguments to pass to the dimensionality reduction method, by default None.