scores.EmbeddingScore
Base class for embedding-based confidence scores.
Usage
scores.EmbeddingScore()Embedding-based scores quantify deviation from the training distribution using latent-space embeddings. Low scores indicate samples similar to the training distribution (likely inliers), while high scores indicate samples deviating from the training distribution (likely outliers).
Parameters
pca: TensorPCA or None = None- Optional PCA for dimensionality reduction prior to scoring. When provided, embeddings are projected onto the principal components before the score is computed.
Attributes
ref_embeddings: torch.Tensor or None-
Embeddings of training samples used to fit the score.
cal_embeddings: torch.Tensor or None-
Embeddings of validation/calibration samples. Optional.
scores: torch.Tensor or None-
Confidence scores of the calibration (or training) samples.
threshold: torch.Tensor or None- Rejection threshold. Samples with scores above this value are excluded.
Attributes
| Name | Description |
|---|---|
| train_required | bool(x) -> bool |
train_required
bool(x) -> bool
train_required: bool = True
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
Methods
| Name | Description |
|---|---|
| fit() | Train a confidence score based on sample embeddings. |
| plot_embs() | Visualize training, validation, and query embeddings in 2D. |
| score() | Compute confidence scores for query samples. |
| select() | Select samples for prediction based on their confidence score. |
| set_threshold() | Set a threshold based on a quantile of the available confidence scores. |
fit()
Train a confidence score based on sample embeddings.
Usage
fit(
X=None,
Y=None,
model=None,
loaders=None,
outdir=None,
prefix=None,
*args,
**kwargs
)This method supports two usage modes:
- Precomputed embeddings: Supply training embeddings via
Xand optional calibration embeddings viaY. - On-the-fly extraction: Supply a
modelwith an.embed()method and a dictionary ofDataLoadersto extract embeddings automatically.
You must use either embeddings (X/Y) OR model+loaders, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith training sample embeddings. Required when not usingmodelandloaders. Y: torch.Tensor | None = None-
A
torch.Tensorwith calibration sample embeddings. Optional. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None-
A
dictwithDataLoaderobjects. Required keys:["train"]. Optional key:["val"]. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloaders. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloaders.
plot_embs()
Visualize training, validation, and query embeddings in 2D.
Usage
plot_embs(query_embeddings, method="tsne", method_args=None)Parameters
query_embeddings: torch.Tensor | None-
Embeddings of query samples to visualize.
method: (tsne, umap) = "tsne"-
Dimensionality reduction method, by default “tsne”.
method_args: dict[str, Any] | None = None- A dictionary of arguments to pass to the dimensionality reduction method, by default None.
score()
Compute confidence scores for query samples.
Usage
score(X=None, model=None, loader=None, outdir=None, prefix=None)This method supports two usage modes:
- Precomputed embeddings: Supply query embeddings via
X. - On-the-fly extraction: Supply a
modelwith an.embed()method and aDataLoaderto extract embeddings automatically.
You must use either embeddings (X) OR model+loader, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith query embeddings of shape(N, D). Required when not usingmodelandloader. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None-
A
torch.utils.data.DataLoaderreturningtorch.Tensors or dicts with the"image"key. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloader. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloader.
Returns
torch.Tensor-
1-D tensor of shape
(N,)with confidence scores. Low values indicate likely inliers, high values indicate likely outliers.
select()
Select samples for prediction based on their confidence score.
Usage
select(X=None, model=None, loader=None, outdir=None, prefix=None)This method supports two usage modes:
- Precomputed embeddings: Supply query embeddings via
X. - On-the-fly extraction: Supply a
modelwith an.embed()method and aDataLoaderto extract embeddings automatically.
You must use either embeddings (X) OR model+loader, but not both.
Samples are selected based on their confidence score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith query sample embeddings of shape(N, D). Required when not usingmodelandloader. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None-
A
torch.utils.data.DataLoaderreturningtorch.Tensors or dicts with the"image"key. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloader. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloader.
Returns
dict[str, torch.Tensor]-
A dict with keys
'score'(confidence scores) and'selected'(boolean mask whereTruemeans the sample is selected).
set_threshold()
Set a threshold based on a quantile of the available confidence scores.
Usage
set_threshold(q=0.99)Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.
Parameters
q: float = 0.99-
Quantile in
(0, 1)used to determine the threshold. Defaults to0.99(i.e., 1% of samples are rejected as outliers).
See Also
- seapig.scores.knn.EuclideanScore: Concrete KNN-based implementation.
- seapig.scores.pca.PCAScore: PCA reconstruction error implementation.
- seapig.scores.utils.TensorPCA: PCA utility used for dimensionality reduction.