scores.PyODScore
Uncertainty scores based on detectors supplied by PyOD.
Usage
scores.PyODScore()Computes outlier scores using PyOD detectors where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).
Parameters
detector: pyod.models.base.BaseDetector-
A fitted or unfitted PyOD detector instance. Any detector from the
pyodlibrary that implements fit anddecision_functionis supported (e.g.,pyod.models.knn.KNN). pca: TensorPCA or None = None- Optional PCA for dimensionality reduction prior to scoring.
Notes
Requires the optional pyod dependency. Install with:
pip install pyod
Methods
| Name | Description |
|---|---|
| fit() | Train an uncertainty score based on sample embeddings. |
| score() | Compute uncertainty scores for query samples. |
| select() | Select samples for prediction based on their uncertainty score. |
| set_threshold() | Set a threshold based on a quantile of the available uncertainty scores. |
| get_threshold() | Get the current threshold value. |
| plot() | Plot densities for uncertainty scores. |
| plot_embs() | Visualize training, validation, and query embeddings in 2D. |
fit()
Train an uncertainty score based on sample embeddings.
Usage
fit(
X=None,
Y=None,
model=None,
loaders=None,
outdir=None,
prefix=None,
q=False
)This method supports two usage modes:
- Precomputed embeddings: Supply training embeddings via
Xand optional calibration embeddings viaY. - On-the-fly extraction: Supply a
modelwith an.embed()method and a dictionary ofDataLoadersto extract embeddings automatically.
You must use either embeddings (X/Y) OR model+loaders, but not both.
# Mode 1: Precomputed embeddings
from pyod.models.knn import KNN
from seapig.scores.pyod import PyODScore
my_score = PyODScore(detector=KNN(n_neighbors=5))
my_score.fit(X=train_embs, Y=val_embs)
# Mode 2: On-the-fly extraction
my_score = PyODScore(detector=KNN(n_neighbors=5))
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith training sample embeddings. Required when not usingmodelandloaders. Y: torch.Tensor | None = None-
A
torch.Tensorwith calibration sample embeddings. Optional. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None-
A
dictwithDataLoaderobjects. Required keys:["train"]. Optional key:["val"]. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloaders. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloaders. q: bool | float = False-
A
floatorboolindicating if outliers from the training distribution should be filtered before fitting. Defaults toFalse.
score()
Compute uncertainty scores for query samples.
Usage
score(X=None, model=None, loader=None, outdir=None, prefix=None)This method supports two usage modes:
- Precomputed embeddings: Supply query embeddings via
X. - On-the-fly extraction: Supply a
modelwith an.embed()method and aDataLoaderto extract embeddings automatically.
You must use either embeddings (X) OR model+loader, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith query embeddings of shape(N, D). Required when not usingmodelandloader. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None-
A
torch.utils.data.DataLoaderreturningtorch.Tensors or dicts with the"image"key. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloader. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloader.
Returns
torch.Tensor-
1-D tensor of shape
(N,)with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.
select()
Select samples for prediction based on their uncertainty score.
Usage
select(X=None, model=None, loader=None, outdir=None, prefix=None)This method supports two usage modes:
- Precomputed embeddings: Supply query embeddings via
X. - On-the-fly extraction: Supply a
modelwith an.embed()method and aDataLoaderto extract embeddings automatically.
You must use either embeddings (X) OR model+loader, but not both.
Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith query sample embeddings of shape(N, D). Required when not usingmodelandloader. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None-
A
torch.utils.data.DataLoaderreturningtorch.Tensors or dicts with the"image"key. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloader. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloader.
Returns
dict[str, torch.Tensor]-
A dict with keys
'score'(uncertainty scores) and'selected'(boolean mask whereTruemeans the sample is selected).
set_threshold()
Set a threshold based on a quantile of the available uncertainty scores.
Usage
set_threshold(q=0.99)Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.
Parameters
q: float = 0.99-
Quantile in
(0, 1)used to determine the threshold. Defaults to0.99(i.e., 1% of samples are rejected as outliers).
get_threshold()
Get the current threshold value.
Usage
get_threshold()plot()
Plot densities for uncertainty scores.
Usage
plot(query_scores=None, bins=100)By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.
Parameters
query_scores: torch.Tensor | None = None-
A
torch.Tensorrepresenting query scores to include in the plot. Defaults toNone. bins: int = 100-
An
intindicating the number of bins to use for density estimation. Defaults to100.
plot_embs()
Visualize training, validation, and query embeddings in 2D.
Usage
plot_embs(query_embeddings, method="tsne", method_args=None)Parameters
query_embeddings: torch.Tensor | None-
Embeddings of query samples to visualize.
method: (tsne, umap) = "tsne"-
Dimensionality reduction method, by default “tsne”.
method_args: dict[str, Any] | None = None- A dictionary of arguments to pass to the dimensionality reduction method, by default None.