scores.KNNScore
Abstract base class for KNN distance-based uncertainty scores.
Usage
scores.KNNScore()Computes distance-based uncertainty scores where low scores indicate samples similar to the training distribution (low uncertainty) and high scores indicate samples deviating from the training distribution (high uncertainty).
Parameters
k: int = 1-
Number of nearest neighbors used to compute the distance score.
stat: (max, mean, median, min) = "max"-
Statistic applied to aggregate distances across the k neighbors.
pca: TensorPCA or None = None-
Optional TensorPCA object for dimensionality reduction prior to scoring.
save_index: bool or Path = False-
If
True, the HNSW index is saved to a default file. If aPathis provided (must end in.bin), the index is saved there.
Attributes
| Name | Description |
|---|---|
| k | int([x]) -> integer |
k
int([x]) -> integer
k: int = k
int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
Methods
| Name | Description |
|---|---|
| fit() | Train a uncertainty score based on sample embeddings. |
| knn_search() | Compute the K-nearest-neighbour distances and indices for a set of query embeddings. |
fit()
Train a uncertainty score based on sample embeddings.
Usage
fit(
X=None,
Y=None,
model=None,
loaders=None,
outdir=None,
prefix=None,
q=False
)This method supports two usage modes:
- Precomputed embeddings: Supply training embeddings via
Xand optional calibration embeddings viaY. - On-the-fly extraction: Supply a
modelwith an.embed()method and a dictionary ofDataLoadersto extract embeddings automatically.
You must use either embeddings (X/Y) OR model+loaders, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import EuclideanScore
my_score = EuclideanScore(k=2)
my_score.fit(X=train_embs, Y=val_embs)
# Mode 2: On-the-fly extraction
my_score = EuclideanScore(k=2)
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith training sample embeddings. Required when not usingmodelandloaders. Y: torch.Tensor | None = None-
A
torch.Tensorwith calibration sample embeddings. Optional. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None-
A
dictwithDataLoaderobjects. Required keys:["train"]. Optional key:["val"]. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloaders. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloaders. q: bool | float = False-
A
floatorboolindicating if outliers from the training distribution should be filtered before fitting. Defaults toFalse.
knn_search()
Compute the K-nearest-neighbour distances and indices for a set of query embeddings.
Usage
knn_search(query, offset=0)Parameters
query: torch.Tensor-
A 2-D tensor of shape
(N, D)containing the embeddings for which distances are to be computed. offset: int = 0-
Number of nearest neighbours to discard from the result. This is typically used to skip self-matching when the query points are drawn from the same set that built the index (e.g.
offset=1).
Returns
distances: torch.Tensor-
A tensor of shape
(N, k)containing the KNN distances for each query point after discarding the firstoffsetnearest neighbours. indices: torch.Tensor-
A tensor of shape
(N, k)with the index positions of the nearest neighbours in the reference embedding set after discarding the firstoffsetmatches.
Notes
offsetis useful when the query set is identical to the reference set, because the nearest neighbour would be the point itself (distance zero). Skipping it yields a meaningful distance to the second nearest neighbour.