----------------------------------------------------------------------
This is the API documentation for the seapig library.
----------------------------------------------------------------------


## Task & Metric Classes

Core classes for defining selective inference tasks and metrics


SelectiveInferenceTask(task: lightning.pytorch.core.module.LightningModule, score: seapig.scores.base.ConfidenceScore, acc_test_outputs: bool = False, input_key: Optional[Literal['image', 'input', 'images', 'inputs', 'x']] = None, target_key: Optional[Literal['mask', 'label', 'masks', 'labels', 'targets', 'target', 'y', 'y_true']] = None, rc_metric: seapig.metric.RiskCoverageMetric | None = None) -> None

Wrap a trained `LightningModule` to attach selection results during inference.

The wrapper calls the wrapped model in inference mode and combines its
predictions with selection outputs produced by a provided `ConfidenceScore`.

Key behavior:

- The wrapped task must provide an `.embed(x)` method. The wrapper calls
  `task.embed(x)` to produce embeddings used by the score.
- The wrapped task is copied and set to `eval()` during initialization
  to avoid accidental training side effects.
- If the wrapped task defines `test_metrics` (a `Metric` or `MetricCollection`),
  it will be wrapped by `SelectiveMetric` so metrics are computed only on
  selected examples.
- If `rc_metric` (a `RiskCoverageMetric`) is provided, the wrapper will
  update it during test steps; the final risk-coverage values are available via
  `get_risk_coverage_curve()`.

Parameters
----------
task
    A trained `LightningModule` whose `forward(x)` returns predictions. The
    module must implement `embed(x)` to produce embeddings for scoring.
score
    A seapig `ConfidenceScore` instance providing the `ConfidenceScore.select` method.
input_key
    Key used to extract inputs from an incoming batch. If `None` (default),
    the first element of the batch is used (positional index 0). When a
    string is given it must be one of: `'image'`, `'input'`,
    `'images'`, `'inputs'`, `'x'`.
target_key
    Key used to extract targets from an incoming batch. If `None` (default),
    the second element of the batch is used (positional index 1). When a
    string is given it must be one of: `'mask'`, `'label'`, `'masks'`,
    `'labels'`, `'targets'`, `'target'`, `'y'`, `'y_true'`.
acc_test_outputs
    If `True`, per-batch outputs (predictions merged with selection results)
    are accumulated in the `test_outputs` list for later inspection. If
    `False` (default), outputs are not accumulated and metrics are logged as usual.
rc_metric
    Optional `RiskCoverageMetric` that will be updated during testing.

Examples
--------
```python
from seapig import SelectiveInferenceTask
from seapig.scores import EuclideanScore

score = EuclideanScore()
# score.fit(X=train_embeddings)  # fit before wrapping
selective_task = SelectiveInferenceTask(task=model, score=score)
```

SelectiveMetric(base: torchmetrics.metric.Metric | torchmetrics.collections.MetricCollection) -> None

Evaluate a metric on full, selected, and rejected subsets.

Wraps a `torchmetrics.Metric` or `torchmetrics.MetricCollection` and
keeps three independent copies that are updated separately:

- `"full"`: all samples passed to `update`.
- `"selected"`: samples where the provided selection mask is true.
- `"rejected"`: samples where the selection mask is false.

The `compute` result is a flat `dict`[`str`, `torch.Tensor`] where
each underlying metric name is prefixed with `full/`, `selected/`,
or `rejected/`. If a submetric was never updated, its value is a
zero `torch.Tensor`.

Parameters
----------
base : torchmetrics.Metric | torchmetrics.MetricCollection
    Metric (or collection) to wrap. Internally the object is deep-
    copied three times so each subset is tracked independently.

Notes
-----
- The selection mask may be boolean or numeric; numeric values `> 0`
  are treated as selected.
- Calls that contain no selected (or no rejected) rows do not update
  the corresponding internal metric for that call.

Example
-------
```python
from torchmetrics import Accuracy
base = Accuracy(task="binary")
m = SelectiveMetric(base)
preds = torch.tensor([[0.9, 0.1], [0.2, 0.8]])
target = torch.tensor([0, 1])
mask = torch.tensor([1, 0], dtype=torch.bool)
m.update(preds, target, mask)
results = m.compute()
# results contains keys like 'full/accuracy', 'selected/accuracy', ...
```

RiskCoverageMetric(risk: str = 'generalized', n_bins: int = 100, error_fn: collections.abc.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None) -> None

Build a risk-coverage curve from scores and per-sample errors.

Collects per-sample `scores` and per-sample `residuals` across
multiple `update` calls and computes summary area-under-curve
values using `seapig.risk_coverage.risk_coverage`.

Parameters
----------
risk : {'generalized', 'selective'}, default 'generalized'
    Which risk definition to use when computing the curve. Must be
    either `'generalized'` or `'selective'`.
n_bins : int, default 100
    Number of bins used to downsample the curve when computing AUC
    summaries.
error_fn : callable or None, default None
    Function `(preds, target) -> residuals` that reduces model
    predictions and targets to a 1-D tensor of per-sample residuals.
    If `None`, the default is per-sample mean absolute error.

Notes
-----
The `compute` method returns three tensors:
`rc/auc_empirical`, `rc/auc_reference`, and `rc/auc_excess`.
The last computed complete curve object (`RiskCoverage`) is
available via `get_curve`.

See Also
--------
seapig.risk_coverage.risk_coverage : The underlying curve computation.
seapig.risk_coverage.RiskCoverage : Container for curve results.

Examples
--------
```python
import torch
from seapig.metric import RiskCoverageMetric
metric = RiskCoverageMetric(risk="generalized")
preds = torch.rand(50, 1)
target = torch.rand(50, 1)
scores = torch.rand(50)
metric.update(preds, target, scores)
result = metric.compute()
# result contains keys: 'rc/auc_empirical', 'rc/auc_reference', 'rc/auc_excess'
```

RiskCoverage(coverage: torch.Tensor, threshold: torch.Tensor, risk: torch.Tensor, reference: torch.Tensor, excess: torch.Tensor, risk_type: str, auc_empirical: torch.Tensor, auc_reference: torch.Tensor, auc_excess: torch.Tensor) -> None

Container for risk-coverage results.

Holds the coverage, score thresholds, empirical and reference risk
curves, their difference (excess), and AUC metrics.

Attributes
----------
coverage : torch.Tensor
    Coverage values in `[0, 1]`.
threshold : torch.Tensor
    Sorted score thresholds used to compute coverage.
risk : torch.Tensor
    Empirical risk at each coverage level.
reference : torch.Tensor
    Reference (optimal) risk at each coverage level.
excess : torch.Tensor
    Excess risk (empirical - reference).
risk_type : str
    Either `'generalized'` or `'selective'`; see `risk_coverage`.
auc_empirical : torch.Tensor
    Area under the empirical risk curve (trapezoidal rule).
auc_reference : torch.Tensor
    Area under the reference risk curve (trapezoidal rule).
auc_excess : torch.Tensor
    Area under the excess risk curve (trapezoidal rule).

See Also
--------
seapig.risk_coverage.risk_coverage : Function that produces this container.
seapig.metric.RiskCoverageMetric : Metric wrapper for use with Lightning.


## Embedding based Confidence Scores

Confidence scoring methods for embedding-based approaches


CosineScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None

Returns the KNN-distance based on the cosine distance to the nearest samples.

Computes cosine distance-based confidence scores where low scores indicate
samples similar to the training distribution (likely inliers) and high scores
indicate samples deviating from the training distribution (likely outliers).

The cosine distance is computed as `(1 - cosine_similarity)`, with a range
of `[0, 2]` where `0` indicates identical vectors, `1` indicates orthogonal
vectors, and `2` indicates opposite vectors.

Parameters
----------
k : int, default 1
    Number of nearest neighbors to use.
stat : {'max', 'mean', 'median', 'min'}, default 'max'
    Statistic to aggregate distances across the k neighbors.
pca : TensorPCA or None, default None
    Optional PCA for dimensionality reduction prior to scoring.
save_index : bool or Path, default False
    Whether (and where) to save the HNSW index to disk.

See Also
--------
seapig.scores.knn.EuclideanScore : KNN score using Euclidean distance.
seapig.scores.knn.MahalanobisScore : KNN score using Mahalanobis distance.

EuclideanScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None

Returns the KNN-distance based on the Euclidean distance to the nearest samples.

Computes Euclidean distance-based confidence scores where low scores indicate
samples similar to the training distribution (likely inliers) and high scores
indicate samples deviating from the training distribution (likely outliers).

Parameters
----------
k : int, default 1
    Number of nearest neighbors to use.
stat : {'max', 'mean', 'median', 'min'}, default 'max'
    Statistic to aggregate distances across the k neighbors.
pca : TensorPCA or None, default None
    Optional PCA for dimensionality reduction prior to scoring.
save_index : bool or Path, default False
    Whether (and where) to save the HNSW index to disk.

Examples
--------
```python
import torch
from seapig.scores import EuclideanScore
score = EuclideanScore(k=5)
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))
```

See Also
--------
seapig.scores.knn.CosineScore : KNN score using cosine distance.
seapig.scores.knn.MahalanobisScore : KNN score using Mahalanobis distance.

MahalanobisScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None

Returns the Mahalanobis distance to the training samples distribution.

Computes Mahalanobis distance-based confidence scores where low scores indicate
samples similar to the training distribution (likely inliers) and high scores
indicate samples deviating from the training distribution (likely outliers).

The Mahalanobis distance accounts for correlations in the training data by
whitening the embeddings with the Cholesky factor of the training covariance
matrix prior to computing Euclidean nearest-neighbour distances.

Parameters
----------
k : int, default 1
    Number of nearest neighbors to use.
stat : {'max', 'mean', 'median', 'min'}, default 'max'
    Statistic to aggregate distances across the k neighbors.
pca : TensorPCA or None, default None
    Optional PCA for dimensionality reduction prior to scoring.
save_index : bool or Path, default False
    Whether (and where) to save the HNSW index to disk.

See Also
--------
seapig.scores.knn.EuclideanScore : KNN score using Euclidean distance.
seapig.scores.knn.CosineScore : KNN score using cosine distance.

PCAScore(pca: seapig.scores.utils.TensorPCA = TensorPCA()) -> None

Returns confidence scores based on PCA reconstruction errors.

Computes reconstruction error-based confidence scores where low scores indicate
samples that can be well-reconstructed from principal components (likely inliers)
and high scores indicate samples with large reconstruction errors (likely outliers).

See https://arxiv.org/pdf/2402.02949v3 for the method description.

Parameters
----------
pca : TensorPCA, optional
    PCA configuration to use. Defaults to
    `TensorPCA(n_components=0.50, gamma=3.0, M=4096)` (RFF-PCA retaining
    50% explained variance).

Examples
--------
```python
import torch
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
score = PCAScore(pca=TensorPCA(n_components=0.90))
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))
```

See Also
--------
seapig.scores.utils.TensorPCA : PCA implementation used internally.
seapig.scores.knn.EuclideanScore : Alternative distance-based score.

PyODScore(detector: pyod.models.base.BaseDetector, pca: seapig.scores.utils.TensorPCA | None = None) -> None

Confidence scores based on detectors supplied by PyOD.

Computes outlier scores using PyOD detectors where low scores indicate samples
similar to the training distribution (likely inliers) and high scores indicate
samples deviating from the training distribution (likely outliers).

Parameters
----------
detector : pyod.models.base.BaseDetector
    A fitted or unfitted PyOD detector instance. Any detector from the
    `pyod` library that implements `fit` and `decision_function`
    is supported (e.g., `pyod.models.knn.KNN`).
pca : TensorPCA or None, default None
    Optional PCA for dimensionality reduction prior to scoring.

Notes
-----
Requires the optional `pyod` dependency:
`pip install pyod`.

See Also
--------
seapig.scores.knn.EuclideanScore : Built-in KNN-based score (no extra dependency).
seapig.scores.embed.EmbeddingScore : Base class for embedding-based scores.


## Logit based Confidence Scores

Confidence scoring methods based on model logits


EnergyScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None'

Energy-based confidence score.

Computes the free energy of the logit distribution. Lower energy (more
negative) indicates higher confidence. Supports multiclass, binary, and
multilabel tasks.

Parameters
----------
temperature : float or None, default None
    Optional initial temperature. If `None`, temperature is fitted if
    labels are provided to `fit`.
task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass'
    Task type for score computation.

Examples
--------
```python
import torch
from seapig.scores.logits import EnergyScore
logits = torch.randn(2, 3)
EnergyScore().score(logits)
```

See Also
--------
seapig.scores.logits.SoftmaxScore : Softmax probability-based alternative.

EntropyScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None'

Entropy-based confidence score.

Computes the predictive entropy of the output distribution. Higher entropy
indicates higher uncertainty (higher score). Supports multiclass, binary,
and multilabel tasks.

Parameters
----------
temperature : float or None, default None
    Optional initial temperature. If `None`, temperature is fitted if
    labels are provided to `fit`.
task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass'
    Task type for score computation.

Examples
--------
```python
import torch
from seapig.scores.logits import EntropyScore
logits = torch.randn(2, 3)
EntropyScore().score(logits)
```

See Also
--------
seapig.scores.logits.SoftmaxScore : Softmax probability-based alternative.

MarginScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None'

Top-two margin confidence score.

Computes the difference between the top-two logits. A larger margin
indicates higher confidence (lower score). Supports multiclass,
binary (single/two-logit), and multilabel tasks.

Parameters
----------
temperature : float or None, default None
    Optional initial temperature. If `None`, temperature is fitted if
    labels are provided to `fit`.
task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass'
    Task type for score computation.

Examples
--------
```python
import torch
from seapig.scores.logits import MarginScore
logits = torch.randn(2, 3)
MarginScore().score(logits)
```

See Also
--------
seapig.scores.logits.SoftmaxScore : Softmax probability-based alternative.

SoftmaxScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None'

Maximum softmax probability confidence score.

Supports multiclass, binary (single/two-logit), and multilabel tasks.
Higher maximum softmax probability indicates higher confidence (lower score).

Parameters
----------
temperature : float or None, default None
    Optional initial temperature. If `None`, temperature is fitted if
    labels are provided to `fit`.
task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass'
    Task type for score computation.

Examples
--------
```python
import torch
from seapig.scores.logits import SoftmaxScore
logits = torch.randn(2, 4)
SoftmaxScore().score(logits)
```

See Also
--------
seapig.scores.logits.EntropyScore : Entropy-based alternative.
seapig.scores.logits.EnergyScore : Energy-based alternative.
seapig.scores.logits.MarginScore : Margin-based alternative.


## Other Confidence Scores

Confidence scoring methods based on other principles


RandomScore() -> None

Returns random confidence scores per sample.

This score assigns a random float in `[0, 1]` to each sample.
It is useful as a baseline or for testing purposes. Low scores
indicate likely inliers, high scores indicate likely outliers.
By default, the threshold is set to `0.99`, so approximately
99% of samples are selected.

See Also
--------
seapig.scores.base.ConfidenceScore : Abstract base class.


## Abstract Classes

Abstract base classes


ConfidenceScore() -> None

Abstract Base Class for Confidence Scores.

Confidence scores quantify the deviation of query samples from the training
distribution. Low scores indicate likely inliers (samples similar to training),
while high scores indicate likely outliers (samples deviating from training).
Samples with scores exceeding the threshold are excluded from prediction.

Attributes
----------
trained : bool
    Whether the score has been trained. Defaults to `False`.
train_required : bool
    Whether training is required before scoring. Defaults to `False`.
cal_required : bool
    Whether calibration is required before selecting. Defaults to `False`.
calibrated : bool
    Whether the score has been calibrated. Defaults to `False`.
scores : torch.Tensor or None
    Confidence scores of the calibration samples. Low scores indicate
    likely inliers, high scores indicate likely outliers.
threshold : torch.Tensor or None
    Rejection threshold. Samples with scores higher than this value are
    excluded from prediction.
device : str
    Device to which internal tensors are put. Defaults to `"cpu"`.
ident : str
    String identifying the confidence score implementation.

See Also
--------
seapig.scores.knn.EuclideanScore : KNN-based score using Euclidean distance.
seapig.scores.knn.CosineScore : KNN-based score using cosine distance.
seapig.scores.pca.PCAScore : PCA reconstruction error score.
seapig.scores.logits.SoftmaxScore : Softmax probability score.

EmbeddingScore(pca: seapig.scores.utils.TensorPCA | None = None) -> None

Base class for embedding-based confidence scores.

Embedding-based scores quantify deviation from the training distribution using
latent-space embeddings. Low scores indicate samples similar to the training
distribution (likely inliers), while high scores indicate samples deviating
from the training distribution (likely outliers).

Parameters
----------
pca : TensorPCA or None, default None
    Optional PCA for dimensionality reduction prior to scoring. When
    provided, embeddings are projected onto the principal components
    before the score is computed.

Attributes
----------
ref_embeddings : torch.Tensor or None
    Embeddings of training samples used to fit the score.
cal_embeddings : torch.Tensor or None
    Embeddings of validation/calibration samples. Optional.
scores : torch.Tensor or None
    Confidence scores of the calibration (or training) samples.
threshold : torch.Tensor or None
    Rejection threshold. Samples with scores above this value are excluded.

See Also
--------
seapig.scores.knn.EuclideanScore : Concrete KNN-based implementation.
seapig.scores.pca.PCAScore : PCA reconstruction error implementation.
seapig.scores.utils.TensorPCA : PCA utility used for dimensionality reduction.

KNNScore(k: int = 1, stat: str = 'max', pca: seapig.scores.utils.TensorPCA | None = None, save_index: bool | pathlib.Path = False) -> None

Abstract base class for KNN distance-based confidence scores.

Computes distance-based confidence scores where low scores indicate samples
similar to the training distribution (likely inliers) and high scores indicate
samples deviating from the training distribution (likely outliers).

Parameters
----------
k : int, default 1
    Number of nearest neighbors used to compute the distance score.
stat : {'max', 'mean', 'median', 'min'}, default 'max'
    Statistic applied to aggregate distances across the k neighbors.
pca : TensorPCA or None, default None
    Optional PCA for dimensionality reduction prior to scoring.
save_index : bool or Path, default False
    If `True`, the HNSW index is saved to a default file. If a `Path`
    is provided (must end in `.bin`), the index is saved there.

See Also
--------
seapig.scores.knn.EuclideanScore : Concrete score using Euclidean distance.
seapig.scores.knn.CosineScore : Concrete score using cosine distance.
seapig.scores.knn.MahalanobisScore : Concrete score using Mahalanobis distance.

LogitScore(temperature: 'float | None' = None, task: 'str' = 'multiclass') -> 'None'

Base class for logit-based confidence scores.

Supports multiclass, binary (single/two-logit), and multilabel tasks.
Handles temperature fitting and input normalization for all cases.

Parameters
----------
temperature : float or None, default None
    Optional temperature to apply to logits. If `None`, no temperature
    scaling is applied until :meth:`fit` is called.
task : {'multiclass', 'binary', 'multilabel'}, default 'multiclass'
    Type of classification task. Determines score computation and
    temperature fitting loss.

Notes
-----
Input shapes and label formats by task:

- `multiclass`: logits `(N, C)`, labels `(N,)` long
- `binary` single-logit: logits `(N,)` or `(N, 1)`, labels `(N,)` float/long
- `binary` two-logit: logits `(N, 2)`, labels `(N,)` long
- `multilabel`: logits `(N, C)`, labels `(N, C)` float

See Also
--------
seapig.scores.logits.SoftmaxScore : Softmax probability-based score.
seapig.scores.logits.EntropyScore : Predictive entropy score.
seapig.scores.logits.EnergyScore : Energy-based score.
seapig.scores.logits.MarginScore : Top-two margin score.

Examples
--------
```python
import torch
from seapig.scores.logits import SoftmaxScore
logits = torch.randn(4, 3)
score = SoftmaxScore()
score.score(logits)
```


## Utilities

Helpers for progress and logging


track(iterable: 'Iterable[T]', total: 'int | None' = None, desc: 'str' = 'Working…', unit: 'str' = 'it', leave: 'bool' = True, colour: 'str | None' = None, smoothing: 'float' = 0.3, **kwargs: 'Any') -> 'Iterator[T]'

Wrap an iterable with a progress bar.

This is the single entry-point for progress display used throughout
seapig.  When progress is disabled the iterable is returned as-is with
zero overhead.

Parameters
----------
iterable:
    The iterable to wrap.
total:
    Total number of items (used by tqdm/rich to render a progress bar).
desc:
    Short description shown to the left of the bar.
unit:
    Unit label shown after the counter (tqdm only).
leave:
    Whether to keep the progress bar visible after completion
    (tqdm only; rich always removes it).
colour:
    Colour of the progress bar as a CSS colour string, e.g. `"green"`
    (tqdm only).
smoothing:
    Exponential moving-average smoothing factor for speed estimates
    (tqdm only).
**kwargs:
    Additional keyword arguments forwarded verbatim to the backend.

Yields
------
T
    Items from *iterable*, unchanged.

Examples
--------
>>> from seapig.utils.progress import track, disable
>>> disable()
>>> list(track([1, 2, 3], desc="items"))
[1, 2, 3]

enable() -> 'None'

Globally enable progress reporting.

This overrides both the `SEAPIG_PROGRESS` environment variable and
the auto-detection logic.  Call :func:`reset` to revert to auto mode.

disable() -> 'None'

Globally disable progress reporting.

This overrides both the `SEAPIG_PROGRESS` environment variable and
the auto-detection logic.  Call :func:`reset` to revert to auto mode.

set_backend(backend: 'str') -> 'None'

Select the progress-bar backend.

Parameters
----------
backend:
    `"tqdm"` (default) or `"rich"`.  When `"rich"` is requested
    but the `rich` package is not installed, :func:`track` silently
    falls back to tqdm.

Raises
------
ValueError
    If *backend* is not one of the supported values.

get_backend() -> 'str'

Return the name of the currently selected backend.

Returns
-------
str
    Either `"tqdm"` or `"rich"`.

get_logger(name: 'str | None' = None) -> 'logging.Logger'

Return a logger scoped to the seapig package.

Parameters
----------
name:
    Dotted module name, e.g. `__name__`.  When `None` or when *name*
    does not start with `"seapig"`, the package-level logger
    `"seapig"` is returned.

Returns
-------
logging.Logger
    A :class:`logging.Logger` instance.

Examples
--------
>>> from seapig.utils.logging import get_logger
>>> logger = get_logger(__name__)

configure_logging(level: 'str | int' = 'WARNING', handler: 'logging.Handler | None' = None) -> 'None'

Configure the seapig package logger.

Sets the log level and attaches *handler* (or a
`logging.StreamHandler` writing to *stderr* when `None`) to the
`"seapig"` logger.  Any previously attached handlers are removed first.

The `SEAPIG_LOG_LEVEL` environment variable, when set, overrides the
*level* parameter.

Parameters
----------
level:
    Minimum log level, e.g. `"INFO"`, `"DEBUG"`, or an integer
    constant such as :data:`logging.INFO`.  Defaults to `"WARNING"`.
handler:
    A custom `logging.Handler`.  When `None` a
    `logging.StreamHandler` (stderr) with a simple formatter is
    used.

Examples
--------
>>> import logging
>>> from seapig.utils.logging import configure_logging
>>> configure_logging(level="INFO")


----------------------------------------------------------------------
This is the User Guide documentation for the package.
----------------------------------------------------------------------


## Getting Started

### Quick Start

This document is WIP.