scores.KNNScore

Abstract base class for KNN distance-based confidence scores.

Usage

Source

scores.KNNScore()

Computes distance-based confidence scores where low scores indicate samples similar to the training distribution (likely inliers) and high scores indicate samples deviating from the training distribution (likely outliers).

Parameters

k: int = 1

Number of nearest neighbors used to compute the distance score.

stat: (max, mean, median, min) = "max"

Statistic applied to aggregate distances across the k neighbors.

pca: TensorPCA or None = None

Optional PCA for dimensionality reduction prior to scoring.

save_index: bool or Path = False
If True, the HNSW index is saved to a default file. If a Path is provided (must end in .bin), the index is saved there.

Attributes

Name Description
k int([x]) -> integer

k

int([x]) -> integer

k: int = k

int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

Methods

Name Description
fit() Train a confidence score based on sample embeddings.

fit()

Train a confidence score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

  1. Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
  2. On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})
Parameters
X: torch.Tensor | None = None

A torch.Tensor with training sample embeddings. Required when not using model and loaders.

Y: torch.Tensor | None = None

A torch.Tensor with calibration sample embeddings. Optional.

model: torch.nn.Module | None = None

A torch.nn.Module with an .embed() method. Required when not using X.

loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None

A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.

outdir: Path | None = None

A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.

prefix: str | None = None

A str used as filename prefix for saved embeddings. Only used with model and loaders.

q: bool | float = False
A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

See Also