scores.KNNScore

Abstract base class for KNN distance-based uncertainty scores.

Usage

scores.KNNScore()

Computes distance-based uncertainty scores where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).

Parameters

k: int = 1: Number of nearest neighbors used to compute the distance score.
stat: (max, mean, median, min) = "max": Statistic applied to aggregate distances across the k neighbors.
pca: TensorPCA or None = None: Optional TensorPCA object for dimensionality reduction prior to scoring.
save_index: bool or Path = False: If True, the HNSW index is saved to a default file. If a Path is provided (must end in .bin), the index is saved there.

Attributes

Name	Description
k	int([x]) -> integer

k

int([x]) -> integer

k: int = k

int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

Methods

Name	Description
fit()	Train a uncertainty score based on sample embeddings.
knn_search()	Compute the K-nearest-neighbour distances and indices for a set of query embeddings.

fit()

Train a uncertainty score based on sample embeddings.

Usage

Source

fit(
    X=None,
    Y=None,
    model=None,
    loaders=None,
    outdir=None,
    prefix=None,
    q=False
)

This method supports two usage modes:

Precomputed embeddings: Supply training embeddings via X and optional calibration embeddings via Y.
On-the-fly extraction: Supply a model with an .embed() method and a dictionary of DataLoaders to extract embeddings automatically.

You must use either embeddings (X/Y) OR model+loaders, but not both.

# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)

# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})

Parameters

X: torch.Tensor | None = None: A torch.Tensor with training sample embeddings. Required when not using model and loaders.
Y: torch.Tensor | None = None: A torch.Tensor with calibration sample embeddings. Optional.
model: torch.nn.Module | None = None: A torch.nn.Module with an .embed() method. Required when not using X.
loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None: A dict with DataLoader objects. Required keys: ["train"]. Optional key: ["val"]. Required when using model.
outdir: Path | None = None: A pathlib.Path pointing to a directory for saving/loading embeddings. Only used with model and loaders.
prefix: str | None = None: A str used as filename prefix for saved embeddings. Only used with model and loaders.
q: bool | float = False: A float or bool indicating if outliers from the training distribution should be filtered before fitting. Defaults to False.

knn_search()

Compute the K-nearest-neighbour distances and indices for a set of query embeddings.

Usage

Source

knn_search(query, offset=0)

Parameters

query: torch.Tensor: A 2-D tensor of shape (N, D) containing the embeddings for which distances are to be computed.
offset: int = 0: Number of nearest neighbours to discard from the result. This is typically used to skip self-matching when the query points are drawn from the same set that built the index (e.g. offset=1).

Returns

distances: torch.Tensor: A tensor of shape (N, k) containing the KNN distances for each query point after discarding the first offset nearest neighbours.
indices: torch.Tensor: A tensor of shape (N, k) with the index positions of the nearest neighbours in the reference embedding set after discarding the first offset matches.

Notes

offset is useful when the query set is identical to the reference set, because the nearest neighbour would be the point itself (distance zero). Skipping it yields a meaningful distance to the second nearest neighbour.

Parameters

Attributes

k

Methods

fit()

Parameters

knn_search()

Parameters

Returns

Notes

See Also