scores.PCAScore
Returns confidence scores based on PCA reconstruction errors.
Usage
scores.PCAScore()Computes reconstruction error-based confidence scores where low scores indicate samples that can be well-reconstructed from principal components (likely inliers) and high scores indicate samples with large reconstruction errors (likely outliers).
See https://arxiv.org/pdf/2402.02949v3 for the method description.
Parameters
pca: TensorPCA = TensorPCA(n_components=0.5, gamma=3.0, M=4096)-
PCA configuration to use. Defaults to
TensorPCA(n_components=0.50, gamma=3.0, M=4096)(RFF-PCA retaining 50% explained variance).
Examples
import torch
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
score = PCAScore(pca=TensorPCA(n_components=0.90))
score.fit(X=torch.randn(200, 64), Y=torch.randn(50, 64))
score.set_threshold(q=0.95)
result = score.select(X=torch.randn(10, 64))Attributes
| Name | Description |
|---|---|
| ident | str(object=’’) -> str |
ident
str(object=’’) -> str
ident="pca"
str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
Methods
| Name | Description |
|---|---|
| fit() | Train a confidence score based on sample embeddings. |
fit()
Train a confidence score based on sample embeddings.
Usage
fit(
X=None,
Y=None,
model=None,
loaders=None,
outdir=None,
prefix=None,
q=False
)This method supports two usage modes:
- Precomputed embeddings: Supply training embeddings via
Xand optional calibration embeddings viaY. - On-the-fly extraction: Supply a
modelwith an.embed()method and a dictionary ofDataLoadersto extract embeddings automatically.
You must use either embeddings (X/Y) OR model+loaders, but not both.
# Mode 1: Precomputed embeddings
from seapig.scores import PCAScore
from seapig.scores.utils import TensorPCA
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(X=train_embs, Y=val_embs)
# Mode 2: On-the-fly extraction
my_score = PCAScore(pca=TensorPCA(n_components=0.90))
my_score.fit(model=model, loaders={"train": train_loader, "val": val_loader})Parameters
X: torch.Tensor | None = None-
A
torch.Tensorwith training sample embeddings. Required when not usingmodelandloaders. Y: torch.Tensor | None = None-
A
torch.Tensorwith calibration sample embeddings. Optional. model: torch.nn.Module | None = None-
A
torch.nn.Modulewith an.embed()method. Required when not usingX. loaders: dict[str, DataLoader[torch.Tensor | dict[str, torch.Tensor]]] | None = None-
A
dictwithDataLoaderobjects. Required keys:["train"]. Optional key:["val"]. Required when usingmodel. outdir: Path | None = None-
A
pathlib.Pathpointing to a directory for saving/loading embeddings. Only used withmodelandloaders. prefix: str | None = None-
A
strused as filename prefix for saved embeddings. Only used withmodelandloaders. q: bool | float = False-
A
floatorboolindicating if outliers from the training distribution should be filtered before fitting. Defaults toFalse.
See Also
- seapig.scores.utils.TensorPCA: PCA implementation used internally.
- seapig.scores.knn.EuclideanScore: Alternative distance-based score.