scores.EmbeddingScore

Base class for embedding-based confidence scores.

Usage

Source

scores.EmbeddingScore()

Embedding-based scores quantify deviation from the training distribution using latent-space embeddings. Low scores indicate samples similar to the training distribution (likely inliers), while high scores indicate samples deviating from the training distribution (likely outliers).

Parameters

pca: TensorPCA or None = None
Optional PCA for dimensionality reduction prior to scoring. When provided, embeddings are projected onto the principal components before the score is computed.

Attributes

ref_embeddings: torch.Tensor or None

Embeddings of training samples used to fit the score.

cal_embeddings: torch.Tensor or None

Embeddings of validation/calibration samples. Optional.

scores: torch.Tensor or None

Confidence scores of the calibration (or training) samples.

threshold: torch.Tensor or None
Rejection threshold. Samples with scores above this value are excluded.

Attributes

Name Description
train_required bool(x) -> bool

train_required

bool(x) -> bool

train_required: bool = True

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

Methods

Name Description
fit() Train a confidence score based on sample embeddings.
plot_embs() Visualize training, validation, and query embeddings in 2D.
score() Compute confidence scores for query samples.
select() Select samples for prediction based on their confidence score.
set_threshold() Set a threshold based on a quantile of the available confidence scores.

fit()

Train a confidence score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    *args,
    **kwargs
)

This method supports two usage modes:

  1. Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
  2. On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None

A torch.Tensor with training sample embeddings. Required when not using model and loaders.

Y: torch.Tensor | None = None

A torch.Tensor with calibration sample embeddings. Optional.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None

A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loaders.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)
Parameters
query_embeddings: torch.Tensor | None

Embeddings of query samples to visualize.

method: (tsne, umap) = "tsne"

Dimensionality reduction method, by default “tsne”.

method_args: dict[str, Any] | None = None
A dictionary of arguments to pass to the dimensionality reduction method, by default None.

score()

Compute confidence scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
torch.Tensor
1-D tensor of shape (N,) with confidence scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their confidence score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

  1. Precomputed embeddings: Supply query embeddings via X.
  2. On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their confidence score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)
Parameters
X: torch.Tensor | None = None

A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None

A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.

prefix: str | None = None
A str used as filename prefix for saved embeddings. Only used with model and loader.
Returns
dict[str, torch.Tensor]
A dict with keys 'score' (confidence scores) and 'selected' (boolean mask where True means the sample is selected).

set_threshold()

Set a threshold based on a quantile of the available confidence scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters
q: float = 0.99
Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

See Also