---------------------------------------------------------------------- This is the API documentation for the seapig library. ---------------------------------------------------------------------- ## Task & Metric Classes Core classes for defining selective inference tasks and metrics SelectiveInferenceTask(task: lightning.pytorch.core.module.LightningModule, score: seapig.scores.base.ConfidenceScore, acc_test_outputs: bool = False, input_key: Optional[Literal['image', 'input', 'images', 'inputs', 'x']] = None, target_key: Optional[Literal['mask', 'label', 'masks', 'labels', 'targets', 'target', 'y', 'y_true']] = None, rc_metric: seapig.metric.RiskCoverageMetric | None = None) -> None Wrap a trained `LightningModule` to attach selection results during inference. The wrapper calls the wrapped model in inference mode and combines its predictions with selection outputs produced by a provided `ConfidenceScore`. Key behavior: - The wrapped task must provide an `.embed(x)` method. The wrapper calls `task.embed(x)` to produce embeddings used by the score. - The wrapped task is copied and set to `eval()` during initialization to avoid accidental training side effects. - If the wrapped task defines `test_metrics` (a `Metric` or `MetricCollection`), it will be wrapped by `SelectiveMetric` so metrics are computed only on selected examples. - If `rc_metric` (a `RiskCoverageMetric`) is provided, the wrapper will update it during test steps; the final risk-coverage values are available via `get_risk_coverage_curve()`. Parameters ---------- task A trained `LightningModule` whose `forward(x)` returns predictions. The module must implement `embed(x)` to produce embeddings for scoring. score A seapig `ConfidenceScore` instance providing the `ConfidenceScore.select` method. input_key Key used to extract inputs from an incoming batch. If `None` (default), the first element of the batch is used (positional index 0). When a string is given it must be one of: `'image'`, `'input'`, `'images'`, `'inputs'`, `'x'`. target_key Key used to extract targets from an incoming batch. If `None` (default), the second element of the batch is used (positional index 1). When a string is given it must be one of: `'mask'`, `'label'`, `'masks'`, `'labels'`, `'targets'`, `'target'`, `'y'`, `'y_true'`. acc_test_outputs If `True`, per-batch outputs (predictions merged with selection results) are accumulated in the `test_outputs` list for later inspection. If `False` (default), outputs are not accumulated and metrics are logged as usual. rc_metric Optional `RiskCoverageMetric` that will be updated during testing. Examples -------- ```python from seapig import SelectiveInferenceTask from seapig.scores import EuclideanScore score = EuclideanScore() # score.fit(X=train_embeddings) # fit before wrapping selective_task = SelectiveInferenceTask(task=model, score=score) ``` SelectiveMetric(base: torchmetrics.metric.Metric | torchmetrics.collections.MetricCollection) -> None Evaluate a metric on full, selected, and rejected subsets. Wraps a `torchmetrics.Metric` or `torchmetrics.MetricCollection` and keeps three independent copies that are updated separately: - `"full"`: all samples passed to `update`. - `"selected"`: samples where the provided selection mask is true. - `"rejected"`: samples where the selection mask is false. The `compute` result is a flat `dict`[`str`, `torch.Tensor`] where each underlying metric name is prefixed with `full/`, `selected/`, or `rejected/`. If a submetric was never updated, its value is a zero `torch.Tensor`. Parameters ---------- base : torchmetrics.Metric | torchmetrics.MetricCollection Metric (or collection) to wrap. Internally the object is deep- copied three times so each subset is tracked independently. Notes ----- - The selection mask may be boolean or numeric; numeric values `> 0` are treated as selected. - Calls that contain no selected (or no rejected) rows do not update the corresponding internal metric for that call. Example ------- ```python from torchmetrics import Accuracy base = Accuracy(task="binary") m = SelectiveMetric(base) preds = torch.tensor([[0.9, 0.1], [0.2, 0.8]]) target = torch.tensor([0, 1]) mask = torch.tensor([1, 0], dtype=torch.bool) m.update(preds, target, mask) results = m.compute() # results contains keys like 'full/accuracy', 'selected/accuracy', ... ``` RiskCoverageMetric(risk: str = 'generalized', n_bins: int = 100, error_fn: collections.abc.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None) -> None Build a risk-coverage curve from scores and per-sample errors. Collects per-sample `scores` and per-sample `residuals` across multiple `update` calls and computes summary area-under-curve values using `seapig.risk_coverage.risk_coverage`. Parameters ---------- risk : {'generalized', 'selective'}, default 'generalized' Which risk definition to use when computing the curve. Must be either `'generalized'` or `'selective'`. n_bins : int, default 100 Number of bins used to downsample the curve when computing AUC summaries. error_fn : callable or None, default None Function `(preds, target) -> residuals` that reduces model predictions and targets to a 1-D tensor of per-sample residuals. If `None`, the default is per-sample mean absolute error. Notes ----- The `compute` method returns three tensors: `rc/auc_empirical`, `rc/auc_reference`, and `rc/auc_excess`. The last computed complete curve object (`RiskCoverage`) is available via `get_curve`. See Also -------- seapig.risk_coverage.risk_coverage : The underlying curve computation. seapig.risk_coverage.RiskCoverage : Container for curve results. Examples -------- ```python import torch from seapig.metric import RiskCoverageMetric metric = RiskCoverageMetric(risk="generalized") preds = torch.rand(50, 1) target = torch.rand(50, 1) scores = torch.rand(50) metric.update(preds, target, scores) result = metric.compute() # result contains keys: 'rc/auc_empirical', 'rc/auc_reference', 'rc/auc_excess' ``` RiskCoverage(coverage: torch.Tensor, threshold: torch.Tensor, risk: torch.Tensor, reference: torch.Tensor, excess: torch.Tensor, risk_type: str, auc_empirical: torch.Tensor, auc_reference: torch.Tensor, auc_excess: torch.Tensor) -> None Container for risk-coverage results. Holds the coverage, score thresholds, empirical and reference risk curves, their difference (excess), and AUC metrics. Attributes ---------- coverage : torch.Tensor Coverage values in `[0, 1]`. threshold : torch.Tensor Sorted score thresholds used to compute coverage. risk : torch.Tensor Empirical risk at each coverage level. reference : torch.Tensor Reference (optimal) risk at each coverage level. excess : torch.Tensor Excess risk (empirical - reference). risk_type : str Either `'generalized'` or `'selective'`; see `risk_coverage`. auc_empirical : torch.Tensor Area under the empirical risk curve (trapezoidal rule). auc_reference : torch.Tensor Area under the reference risk curve (trapezoidal rule). auc_excess : torch.Tensor Area under the excess risk curve (trapezoidal rule). See Also -------- seapig.risk_coverage.risk_coverage : Function that produces this container. seapig.metric.RiskCoverageMetric : Metric wrapper for use with Lightning. ## Embedding based Confidence Scores Confidence scoring methods for embedding-based approaches CosineScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None Returns the KNN-distance based on the cosine distance to the nearest samples. Computes cosine distance-based confidence scores where low scores indicate samples similar to the training distribution (likely inliers) and high scores indicate samples deviating from the training distribution (likely outliers). The cosine distance is computed as `(1 - cosine_similarity)`, with a range of `[0, 2]` where `0` indicates identical vectors, `1` indicates orthogonal vectors, and `2` indicates opposite vectors. Parameters ---------- k : int, default 1 Number of nearest neighbors to use. stat : {'max', 'mean', 'median', 'min'}, default 'max' Statistic to aggregate distances across the k neighbors. pca : TensorPCA or None, default None Optional PCA for dimensionality reduction prior to scoring. save_index : bool or Path, default False Whether (and where) to save the HNSW index to disk. See Also -------- seapig.scores.knn.EuclideanScore : KNN score using Euclidean distance. seapig.scores.knn.MahalanobisScore : KNN score using Mahalanobis distance. EuclideanScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None Returns the KNN-distance based on the Euclidean distance to the nearest samples. Computes Euclidean distance-based confidence scores where low scores indicate samples similar to the training distribution (likely inliers) and high scores indicate samples deviating from the training distribution (likely outliers). Parameters ---------- k : int, default 1 Number of nearest neighbors to use. stat : {'max', 'mean', 'median', 'min'}, default 'max' Statistic to aggregate distances across the k neighbors. pca : TensorPCA or None, default None Optional PCA for dimensionality reduction prior to scoring. save_index : bool or Path, default False Whether (and where) to save the HNSW index to disk. Examples -------- ```python import torch from seapig.scores import EuclideanScore score = EuclideanScore(k=5) score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64)) score.set_threshold(q=0.95) result = score.select(X=torch.randn(10, 64)) ``` See Also -------- seapig.scores.knn.CosineScore : KNN score using cosine distance. seapig.scores.knn.MahalanobisScore : KNN score using Mahalanobis distance. MahalanobisScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None Returns the Mahalanobis distance to the training samples distribution. Computes Mahalanobis distance-based confidence scores where low scores indicate samples similar to the training distribution (likely inliers) and high scores indicate samples deviating from the training distribution (likely outliers). The Mahalanobis distance accounts for correlations in the training data by whitening the embeddings with the Cholesky factor of the training covariance matrix prior to computing Euclidean nearest-neighbour distances. Parameters ---------- k : int, default 1 Number of nearest neighbors to use. stat : {'max', 'mean', 'median', 'min'}, default 'max' Statistic to aggregate distances across the k neighbors. pca : TensorPCA or None, default None Optional PCA for dimensionality reduction prior to scoring. save_index : bool or Path, default False Whether (and where) to save the HNSW index to disk. See Also -------- seapig.scores.knn.EuclideanScore : KNN score using Euclidean distance. seapig.scores.knn.CosineScore : KNN score using cosine distance. PCAScore(pca: seapig.scores.utils.TensorPCA = TensorPCA()) -> None Returns confidence scores based on PCA reconstruction errors. Computes reconstruction error-based confidence scores where low scores indicate samples that can be well-reconstructed from principal components (likely inliers) and high scores indicate samples with large reconstruction errors (likely outliers). See https://arxiv.org/pdf/2402.02949v3 for the method description. Parameters ---------- pca : TensorPCA, optional PCA configuration to use. Defaults to `TensorPCA(n_components=0.50, gamma=3.0, M=4096)` (RFF-PCA retaining 50% explained variance). Examples -------- ```python import torch from seapig.scores import PCAScore from seapig.scores.utils import TensorPCA score = PCAScore(pca=TensorPCA(n_components=0.90)) score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64)) score.set_threshold(q=0.95) result = score.select(X=torch.randn(10, 64)) ``` See Also -------- seapig.scores.utils.TensorPCA : PCA implementation used internally. seapig.scores.knn.EuclideanScore : Alternative distance-based score. PyODScore(detector: pyod.models.base.BaseDetector, pca: seapig.scores.utils.TensorPCA | None = None) -> None Confidence scores based on detectors supplied by PyOD. Computes outlier scores using PyOD detectors where low scores indicate samples similar to the training distribution (likely inliers) and high scores indicate samples deviating from the training distribution (likely outliers). Parameters ---------- detector : pyod.models.base.BaseDetector A fitted or unfitted PyOD detector instance. Any detector from the `pyod` library that implements `fit` and `decision_function` is supported (e.g., `pyod.models.knn.KNN`). pca : TensorPCA or None, default None Optional PCA for dimensionality reduction prior to scoring. Notes ----- Requires the optional `pyod` dependency: `pip install pyod`. See Also -------- seapig.scores.knn.EuclideanScore : Built-in KNN-based score (no extra dependency). seapig.scores.embed.EmbeddingScore : Base class for embedding-based scores. ## Logit based Confidence Scores Confidence scoring methods based on model logits EnergyScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None' Energy-based confidence score. Computes the free energy of the logit distribution. Lower energy (more negative) indicates higher confidence. Supports multiclass, binary, and multilabel tasks. Parameters ---------- temperature : float or None, default None Optional initial temperature. If `None`, temperature is fitted if labels are provided to `fit`. task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass' Task type for score computation. Examples -------- ```python import torch from seapig.scores.logits import EnergyScore logits = torch.randn(2, 3) EnergyScore().score(logits) ``` See Also -------- seapig.scores.logits.SoftmaxScore : Softmax probability-based alternative. EntropyScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None' Entropy-based confidence score. Computes the predictive entropy of the output distribution. Higher entropy indicates higher uncertainty (higher score). Supports multiclass, binary, and multilabel tasks. Parameters ---------- temperature : float or None, default None Optional initial temperature. If `None`, temperature is fitted if labels are provided to `fit`. task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass' Task type for score computation. Examples -------- ```python import torch from seapig.scores.logits import EntropyScore logits = torch.randn(2, 3) EntropyScore().score(logits) ``` See Also -------- seapig.scores.logits.SoftmaxScore : Softmax probability-based alternative. MarginScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None' Top-two margin confidence score. Computes the difference between the top-two logits. A larger margin indicates higher confidence (lower score). Supports multiclass, binary (single/two-logit), and multilabel tasks. Parameters ---------- temperature : float or None, default None Optional initial temperature. If `None`, temperature is fitted if labels are provided to `fit`. task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass' Task type for score computation. Examples -------- ```python import torch from seapig.scores.logits import MarginScore logits = torch.randn(2, 3) MarginScore().score(logits) ``` See Also -------- seapig.scores.logits.SoftmaxScore : Softmax probability-based alternative. SoftmaxScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None' Maximum softmax probability confidence score. Supports multiclass, binary (single/two-logit), and multilabel tasks. Higher maximum softmax probability indicates higher confidence (lower score). Parameters ---------- temperature : float or None, default None Optional initial temperature. If `None`, temperature is fitted if labels are provided to `fit`. task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass' Task type for score computation. Examples -------- ```python import torch from seapig.scores.logits import SoftmaxScore logits = torch.randn(2, 4) SoftmaxScore().score(logits) ``` See Also -------- seapig.scores.logits.EntropyScore : Entropy-based alternative. seapig.scores.logits.EnergyScore : Energy-based alternative. seapig.scores.logits.MarginScore : Margin-based alternative. ## Other Confidence Scores Confidence scoring methods based on other principles RandomScore() -> None Returns random confidence scores per sample. This score assigns a random float in `[0, 1]` to each sample. It is useful as a baseline or for testing purposes. Low scores indicate likely inliers, high scores indicate likely outliers. By default, the threshold is set to `0.99`, so approximately 99% of samples are selected. See Also -------- seapig.scores.base.ConfidenceScore : Abstract base class. ## Abstract Classes Abstract base classes ConfidenceScore() -> None Abstract Base Class for Confidence Scores. Confidence scores quantify the deviation of query samples from the training distribution. Low scores indicate likely inliers (samples similar to training), while high scores indicate likely outliers (samples deviating from training). Samples with scores exceeding the threshold are excluded from prediction. Attributes ---------- trained : bool Whether the score has been trained. Defaults to `False`. train_required : bool Whether training is required before scoring. Defaults to `False`. cal_required : bool Whether calibration is required before selecting. Defaults to `False`. calibrated : bool Whether the score has been calibrated. Defaults to `False`. scores : torch.Tensor or None Confidence scores of the calibration samples. Low scores indicate likely inliers, high scores indicate likely outliers. threshold : torch.Tensor or None Rejection threshold. Samples with scores higher than this value are excluded from prediction. device : str Device to which internal tensors are put. Defaults to `"cpu"`. ident : str String identifying the confidence score implementation. See Also -------- seapig.scores.knn.EuclideanScore : KNN-based score using Euclidean distance. seapig.scores.knn.CosineScore : KNN-based score using cosine distance. seapig.scores.pca.PCAScore : PCA reconstruction error score. seapig.scores.logits.SoftmaxScore : Softmax probability score. EmbeddingScore(pca: seapig.scores.utils.TensorPCA | None = None) -> None Base class for embedding-based confidence scores. Embedding-based scores quantify deviation from the training distribution using latent-space embeddings. Low scores indicate samples similar to the training distribution (likely inliers), while high scores indicate samples deviating from the training distribution (likely outliers). Parameters ---------- pca : TensorPCA or None, default None Optional PCA for dimensionality reduction prior to scoring. When provided, embeddings are projected onto the principal components before the score is computed. Attributes ---------- ref_embeddings : torch.Tensor or None Embeddings of training samples used to fit the score. cal_embeddings : torch.Tensor or None Embeddings of validation/calibration samples. Optional. scores : torch.Tensor or None Confidence scores of the calibration (or training) samples. threshold : torch.Tensor or None Rejection threshold. Samples with scores above this value are excluded. See Also -------- seapig.scores.knn.EuclideanScore : Concrete KNN-based implementation. seapig.scores.pca.PCAScore : PCA reconstruction error implementation. seapig.scores.utils.TensorPCA : PCA utility used for dimensionality reduction. KNNScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None Abstract base class for KNN distance-based confidence scores. Computes distance-based confidence scores where low scores indicate samples similar to the training distribution (likely inliers) and high scores indicate samples deviating from the training distribution (likely outliers). Parameters ---------- k : int, default 1 Number of nearest neighbors used to compute the distance score. stat : {'max', 'mean', 'median', 'min'}, default 'max' Statistic applied to aggregate distances across the k neighbors. pca : TensorPCA or None, default None Optional PCA for dimensionality reduction prior to scoring. save_index : bool or Path, default False If `True`, the HNSW index is saved to a default file. If a `Path` is provided (must end in `.bin`), the index is saved there. See Also -------- seapig.scores.knn.EuclideanScore : Concrete score using Euclidean distance. seapig.scores.knn.CosineScore : Concrete score using cosine distance. seapig.scores.knn.MahalanobisScore : Concrete score using Mahalanobis distance. LogitScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None' Base class for logit-based confidence scores. Supports multiclass, binary (single/two-logit), and multilabel tasks. Handles temperature fitting and input normalization for all cases. Parameters ---------- temperature : float or None, default None Optional temperature to apply to logits. If `None`, no temperature scaling is applied until :meth:`fit` is called. task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass' Type of classification task. Determines score computation and temperature fitting loss. Notes ----- Input shapes and label formats by task: - `multiclass`: logits `(N, C)`, labels `(N,)` long - `binary` single-logit: logits `(N,)` or `(N, 1)`, labels `(N,)` float/long - `binary` two-logit: logits `(N, 2)`, labels `(N,)` long - `multilabel`: logits `(N, C)`, labels `(N, C)` float See Also -------- seapig.scores.logits.SoftmaxScore : Softmax probability-based score. seapig.scores.logits.EntropyScore : Predictive entropy score. seapig.scores.logits.EnergyScore : Energy-based score. seapig.scores.logits.MarginScore : Top-two margin score. Examples -------- ```python import torch from seapig.scores.logits import SoftmaxScore logits = torch.randn(4, 3) score = SoftmaxScore() score.score(logits) ``` ## Utilities Helpers for progress and logging track(iterable: 'Iterable[T]', total: 'int | None' = None, desc: 'str' = 'Working…', unit: 'str' = 'it', leave: 'bool' = True, colour: 'str | None' = None, smoothing: 'float' = 0.3, **kwargs: 'Any') -> 'Iterator[T]' Wrap an iterable with a progress bar. This is the single entry-point for progress display used throughout seapig. When progress is disabled the iterable is returned as-is with zero overhead. Parameters ---------- iterable: The iterable to wrap. total: Total number of items (used by tqdm/rich to render a progress bar). desc: Short description shown to the left of the bar. unit: Unit label shown after the counter (tqdm only). leave: Whether to keep the progress bar visible after completion (tqdm only; rich always removes it). colour: Colour of the progress bar as a CSS colour string, e.g. `"green"` (tqdm only). smoothing: Exponential moving-average smoothing factor for speed estimates (tqdm only). **kwargs: Additional keyword arguments forwarded verbatim to the backend. Yields ------ T Items from *iterable*, unchanged. Examples -------- >>> from seapig.utils.progress import track, disable >>> disable() >>> list(track([1, 2, 3], desc="items")) [1, 2, 3] enable() -> 'None' Globally enable progress reporting. This overrides both the `SEAPIG_PROGRESS` environment variable and the auto-detection logic. Call :func:`reset` to revert to auto mode. disable() -> 'None' Globally disable progress reporting. This overrides both the `SEAPIG_PROGRESS` environment variable and the auto-detection logic. Call :func:`reset` to revert to auto mode. set_backend(backend: 'str') -> 'None' Select the progress-bar backend. Parameters ---------- backend: `"tqdm"` (default) or `"rich"`. When `"rich"` is requested but the `rich` package is not installed, :func:`track` silently falls back to tqdm. Raises ------ ValueError If *backend* is not one of the supported values. get_backend() -> 'str' Return the name of the currently selected backend. Returns ------- str Either `"tqdm"` or `"rich"`. get_logger(name: 'str | None' = None) -> 'logging.Logger' Return a logger scoped to the seapig package. Parameters ---------- name: Dotted module name, e.g. `__name__`. When `None` or when *name* does not start with `"seapig"`, the package-level logger `"seapig"` is returned. Returns ------- logging.Logger A :class:`logging.Logger` instance. Examples -------- >>> from seapig.utils.logging import get_logger >>> logger = get_logger(__name__) configure_logging(level: 'str | int' = 'WARNING', handler: 'logging.Handler | None' = None) -> 'None' Configure the seapig package logger. Sets the log level and attaches *handler* (or a `logging.StreamHandler` writing to *stderr* when `None`) to the `"seapig"` logger. Any previously attached handlers are removed first. The `SEAPIG_LOG_LEVEL` environment variable, when set, overrides the *level* parameter. Parameters ---------- level: Minimum log level, e.g. `"INFO"`, `"DEBUG"`, or an integer constant such as :data:`logging.INFO`. Defaults to `"WARNING"`. handler: A custom `logging.Handler`. When `None` a `logging.StreamHandler` (stderr) with a simple formatter is used. Examples -------- >>> import logging >>> from seapig.utils.logging import configure_logging >>> configure_logging(level="INFO") ---------------------------------------------------------------------- This is the User Guide documentation for the package. ---------------------------------------------------------------------- ## Getting Started ### Quick Start This document is WIP.