detoxai.cavs
Submodules
detoxai.cavs.cav submodule
Credit: https://github.com/frederikpahde/rrclarc
- detoxai.cavs.cav.compute_cav(vecs: ndarray, targets: ndarray, cav_type: str = 'svm') tuple[source]
Compute a concept activation vector (CAV) for a set of vectors and targets.
- Parameters:
vecs – torch.Tensor of shape (n_samples, n_features)
targets – torch.Tensor of shape (n_samples,)
cav_type – str, type of CAV to compute. One of [“svm”, “ridge”, “signal”, “mean”]
vecs – np.ndarray:
targets – np.ndarray:
cav_type – str: (Default value = “svm”)
- Returns:
torch.Tensor of shape (1, n_features)
detoxai.cavs.extract_activations submodule
- detoxai.cavs.extract_activations.get_all_layers(model: Module, prefix: str = '') dict[source]
Recursively get all layers from the model.
- Parameters:
model (nn.Module) – The PyTorch model.
prefix (str) – Prefix for the layer names (used during recursion).
model – nn.Module:
prefix – str: (Default value = “”)
- Returns:
Dictionary mapping layer names to layer modules.
- Return type:
dict
- detoxai.cavs.extract_activations.get_layer_by_name(model: Module, layer_name: str) Module[source]
Retrieve a layer from the model by its name.
- Parameters:
model (nn.Module) – The PyTorch model.
layer_name (str) – Dot-separated name of the layer.
model – nn.Module:
layer_name – str:
- Returns:
The layer module.
- Return type:
nn.Module
- detoxai.cavs.extract_activations.load_activations(save_path: str) dict[str, ndarray][source]
- Parameters:
save_path – str:
Returns:
- detoxai.cavs.extract_activations.extract_activations(model: Module, dataloader: DataLoader, experiment_name: str, save_dir: str, layers: list | None = None, device: str = 'cuda', use_cache: bool = True) dict[str, ndarray][source]
Extract activations from all layers of a model for data from a dataloader.
- Parameters:
model (nn.Module) – The PyTorch model.
dataloader (DataLoader) – The PyTorch DataLoader.
experiment_name (str) – Name of the experiment.
save_dir (str) – Directory to save the activations.
layers (list) – List of layer names to extract activations from.
device (str) – Device to run the model on.
use_cache (bool) – Whether to use cached activations.
model – nn.Module:
dataloader – DataLoader:
experiment_name – str:
save_dir – str:
layers – list | None: (Default value = None)
device – str: (Default value = “cuda”)
use_cache – bool: (Default value = True)
- Returns:
Dictionary mapping layer names to activations.
- Return type:
dict
detoxai.cavs.mass_mean_probe submodule
- detoxai.cavs.mass_mean_probe.compute_mass_mean_probe(vecs: ndarray, targets: ndarray) tuple[Tensor, Tensor][source]
Compute the mass mean probe from the activations of a model.
- Parameters:
vecs (np.ndarray) – Activations of the model, shape (samples, features).
targets (np.ndarray) – Target labels for the samples, shape (samples,).
vecs – np.ndarray:
targets – np.ndarray:
- Returns:
A tuple containing - mass_mean_probe (torch.Tensor): The mass mean probe. - mean_activation_over_nonartifact_samples (torch.Tensor): The mean activation over non-artifact samples. - mean_activation_over_artifact_samples (torch.Tensor): The mean activation over artifact samples
- Return type:
tuple