scores.PyODScore

Uncertainty scores based on detectors supplied by PyOD.

Usage

scores.PyODScore()

Computes outlier scores using PyOD detectors where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).

Parameters

detector: pyod.models.base.BaseDetector: A fitted or unfitted PyOD detector instance. Any detector from the pyod library that implements fit and decision_function is supported (e.g., pyod.models.knn.KNN).
pca: TensorPCA or None = None: Optional PCA for dimensionality reduction prior to scoring.

Notes

Requires the optional pyod dependency. Install with:

pip install pyod

Methods

Name	Description
fit()	Train an uncertainty score based on sample embeddings.
score()	Compute uncertainty scores for query samples.
select()	Select samples for prediction based on their uncertainty score.
set_threshold()	Set a threshold based on a quantile of the available uncertainty scores.
get_threshold()	Get the current threshold value.
plot()	Plot densities for uncertainty scores.
plot_embs()	Visualize training, validation, and query embeddings in 2D.

fit()

Train an uncertainty score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from pyod.models.knn import KNN
from seapig.scores.pyod import PyODScore
my_score = PyODScore(detector=KNN(n_neighbors=5))
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = PyODScore(detector=KNN(n_neighbors=5))
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})

Parameters

X: torch.Tensor | None = None: A torch.Tensor with training sample embeddings. Required when not using model and loaders.
Y: torch.Tensor | None = None: A torch.Tensor with calibration sample embeddings. Optional.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None: A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loaders.
q: bool | float = False: A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

score()

Compute uncertainty scores for query samples.

Usage

Source

score(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply query embeddings via X.
On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
scores = my_score.score(X=test_embeddings)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
scores = my_score.score(model=model, loader=test_dl)

Parameters

X: torch.Tensor | None = None: A torch.Tensor with query embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None: A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loader.

Returns

torch.Tensor: 1-D tensor of shape (N,) with uncertainty scores. Low values indicate likely inliers, high values indicate likely outliers.

select()

Select samples for prediction based on their uncertainty score.

Usage

Source

select(X=None, model=None, loader=None, outdir=None, prefix=None)

This method supports two usage modes:

Precomputed embeddings: Supply query embeddings via X.
On-the-fly extraction: Supply a model with an .embed() method and a DataLoader to extract embeddings automatically.

You must use either embeddings (X) OR model+loader, but not both.

Samples are selected based on their uncertainty score relative to a threshold. Samples with scores lower than the threshold are selected, while samples with scores higher than the threshold are excluded. The threshold should be calibrated beforehand (e.g., on validation samples).

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(X=test_data)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore()
my_score.fit(X=train_data, Y=val_data)
result = my_score.select(model=model, loader=test_loader)

Parameters

X: torch.Tensor | None = None: A torch.Tensor with query sample embeddings of shape (N, D). Required when not using model and loader.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loader: DataLoader[torch.Tensor | dict[str, torch.Tensor]] | None = None: A torch.utils.data.DataLoader returning torch.Tensors or dicts with the "image" key. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loader.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loader.

Returns

dict[str, torch.Tensor]: A dict with keys 'score' (uncertainty scores) and 'selected' (boolean mask where True means the sample is selected).

set_threshold()

Set a threshold based on a quantile of the available uncertainty scores.

Usage

Source

set_threshold(q=0.99)

Samples with scores higher than the threshold are excluded from prediction. If calibration embeddings were provided during fit, the threshold is computed from their scores; otherwise the training sample scores are used.

Parameters

q: float = 0.99: Quantile in (0, 1) used to determine the threshold. Defaults to 0.99 (i.e., 1% of samples are rejected as outliers).

get_threshold()

Get the current threshold value.

Usage

Source

get_threshold()

plot()

Plot densities for uncertainty scores.

Usage

Source

plot(query_scores=None, bins=100)

By default, this method plots densities for the uncertainty scores. Optionally, it can also plot densities for query_scores.

Parameters

query_scores: torch.Tensor | None = None: A torch.Tensor representing query scores to include in the plot. Defaults to None.
bins: int = 100: An int indicating the number of bins to use for density estimation. Defaults to 100.

plot_embs()

Visualize training, validation, and query embeddings in 2D.

Usage

Source

plot_embs(query_embeddings, method="tsne", method_args=None)

Parameters

query_embeddings: torch.Tensor | None: Embeddings of query samples to visualize.
method: (tsne, umap) = "tsne": Dimensionality reduction method, by default “tsne”.
method_args: dict[str, Any] | None = None: A dictionary of arguments to pass to the dimensionality reduction method, by default None.

Parameters

Notes

Methods

fit()

Parameters

score()

Parameters

Returns

select()

Parameters

Returns

set_threshold()

Parameters

get_threshold()

plot()

Parameters

plot_embs()

Parameters

See Also