scores.CosineScore

Returns the KNN-distance based on the cosine distance to the nearest samples.

Usage

scores.CosineScore()

Computes cosine distance-based uncertainty scores where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).

The cosine distance is computed as (1 - cosine_similarity), with a range of [0, 2] where 0 indicates identical vectors, 1 indicates orthogonal vectors, and 2 indicates opposite vectors.

Parameters

k: int = 1: Number of nearest neighbors to use.
stat: (max, mean, median, min) = "max": Statistic to aggregate distances across the k neighbors.
pca: TensorPCA or None = None: Optional TensorPCA object for dimensionality reduction prior to scoring.
save_index: bool or Path = False: Whether (and where) to save the HNSW index to disk.

Methods

Name	Description
fit()	Train a uncertainty score based on sample embeddings.
score()	Compute uncertainty scores for query samples.
select()	Select samples for prediction based on their uncertainty score.
knn_search()	Compute the K-nearest-neighbour distances and indices for a set of query embeddings.
set_threshold()	Set a threshold based on a quantile of the available uncertainty scores.
get_threshold()	Get the current threshold value.
plot()	Plot densities for uncertainty scores.
plot_embs()	Visualize training, validation, and query embeddings in 2D.

fit()

Train a uncertainty score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})

Parameters

X: torch.Tensor | None = None: A torch.Tensor with training sample embeddings. Required when not using model and loaders.
Y: torch.Tensor | None = None: A torch.Tensor with calibration sample embeddings. Optional.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None: A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loaders.
q: bool | float = False: A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

score()

Compute uncertainty scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply query embeddings via X.
On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)

Parameters

X: torch.Tensor | None = None: A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None: A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loader.

Returns

torch.Tensor: 1-D tensor of shape (N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their uncertainty score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply query embeddings via X.
On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)

Parameters

X: torch.Tensor | None = None: A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None: A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loader.

Returns

dict[str, torch.Tensor]: A dict with keys 'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).

knn_search()

Compute the K-nearest-neighbour distances and indices for a set of query embeddings.

Usage

Source

knn_search(query, offset=0)

Parameters

query: torch.Tensor: A 2-D tensor of shape (N, D) containing the embeddings for which distances are to be computed.
offset: int = 0: Number of nearest neighbours to discard from the result. This is typically used to skip self-matching when the query points are drawn from the same set that built the index (e.g. offset=1).

Returns

distances: torch.Tensor: A tensor of shape (N, k) containing the KNN distances for each query point after discarding the first offset nearest neighbours.
indices: torch.Tensor: A tensor of shape (N, k) with the index positions of the nearest neighbours in the reference embedding set after discarding the first offset matches.

Notes

offset is useful when the query set is identical to the reference set, because the nearest neighbour would be the point itself (distance zero). Skipping it yields a meaningful distance to the second nearest neighbour.

set_threshold()

Set a threshold based on a quantile of the available uncertainty scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters

q: float = 0.99: Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

get_threshold()

Get the current threshold value.

Usage

Source

get_threshold()

plot()

Plot densities for uncertainty scores.

Usage

Source

plot(query_scores=None, bins=100)

By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.

Parameters

query_scores: torch.Tensor | None = None: A torch.Tensor representing query scores to include in the plot. Defaults to None.
bins: int = 100: An int indicating the number of bins to use for density estimation. Defaults to 100.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)

Parameters

query_embeddings: torch.Tensor | None: Embeddings of query samples to visualize.
method: (tsne, umap) = "tsne": Dimensionality reduction method, by default “tsne”.
method_args: dict[str, Any] | None = None: A dictionary of arguments to pass to the dimensionality reduction method, by default None.

Parameters

Methods

fit()

Parameters

score()

Parameters

Returns

select()

Parameters

Returns

knn_search()

Parameters

Returns

Notes

set_threshold()

Parameters

get_threshold()

plot()

Parameters

plot_embs()

Parameters

See Also