scores.EmbeddingScore

Base class for embedding-based uncertainty scores.

Usage

scores.EmbeddingScore()

Embedding-based scores quantify deviation from the training distribution using latent-space embeddings. Low scores indicate samples similar to the training distribution (likely inliers), while high scores indicate samples deviating from the training distribution (likely outliers).

Parameters

pca: TensorPCA or None = None: Optional TensorPCA object for dimensionality reduction prior to scoring. When provided, embeddings are projected onto the principal components before the score is computed.

Attributes

ref_embeddings: torch.Tensor or None: Embeddings of training samples used to fit the score.
cal_embeddings: torch.Tensor or None: Embeddings of validation/calibration samples. Optional.
scores: torch.Tensor or None: Uncertainty scores of the calibration (or training) samples.
threshold: torch.Tensor or None: Rejection threshold. Samples with scores above this value are excluded.

Attributes

Name	Description
train_required	bool(x) -> bool

train_required

bool(x) -> bool

train_required: bool = True

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

Methods

Name	Description
fit()	Train a uncertainty score based on sample embeddings.
plot_embs()	Visualize training, validation, and query embeddings in 2D.
score()	Compute uncertainty scores for query samples.
select()	Select samples for prediction based on their uncertainty score.
set_threshold()	Set a threshold based on a quantile of the available uncertainty scores.

fit()

Train a uncertainty score based on sample embeddings.

Usage

Source

fit(X=None, Y=None, model=None, loaders=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})

Parameters

X: torch.Tensor | None = None: A torch.Tensor with training sample embeddings. Required when not using model and loaders.
Y: torch.Tensor | None = None: A torch.Tensor with calibration sample embeddings. Optional.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None: A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loaders.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)

Parameters

query_embeddings: torch.Tensor | None: Embeddings of query samples to visualize.
method: (tsne, umap) = "tsne": Dimensionality reduction method, by default “tsne”.
method_args: dict[str, Any] | None = None: A dictionary of arguments to pass to the dimensionality reduction method, by default None.

score()

Compute uncertainty scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply query embeddings via X.
On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)

Parameters

X: torch.Tensor | None = None: A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None: A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loader.

Returns

torch.Tensor: 1-D tensor of shape (N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their uncertainty score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply query embeddings via X.
On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)

Parameters

X: torch.Tensor | None = None: A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None: A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loader.

Returns

dict[str, torch.Tensor]: A dict with keys 'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).

set_threshold()

Set a threshold based on a quantile of the available uncertainty scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters

q: float = 0.99: Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

Parameters

Attributes

Attributes

train_required

Methods

fit()

Parameters

plot_embs()

Parameters

score()

Parameters

Returns

select()

Parameters

Returns

set_threshold()

Parameters

See Also