detoxai.cavs

Submodules

detoxai.cavs.cav submodule

Credit: https://github.com/frederikpahde/rrclarc

detoxai.cavs.cav.compute_cav(vecs: ndarray, targets: ndarray, cav_type: str = 'svm') tuple[source]

Compute a concept activation vector (CAV) for a set of vectors and targets.

Parameters:
  • vecs – torch.Tensor of shape (n_samples, n_features)

  • targets – torch.Tensor of shape (n_samples,)

  • cav_type – str, type of CAV to compute. One of [“svm”, “ridge”, “signal”, “mean”]

  • vecs – np.ndarray:

  • targets – np.ndarray:

  • cav_type – str: (Default value = “svm”)

Returns:

torch.Tensor of shape (1, n_features)

detoxai.cavs.extract_activations submodule

detoxai.cavs.extract_activations.get_all_layers(model: Module, prefix: str = '') dict[source]

Recursively get all layers from the model.

Parameters:
  • model (nn.Module) – The PyTorch model.

  • prefix (str) – Prefix for the layer names (used during recursion).

  • model – nn.Module:

  • prefix – str: (Default value = “”)

Returns:

Dictionary mapping layer names to layer modules.

Return type:

dict

detoxai.cavs.extract_activations.get_layer_by_name(model: Module, layer_name: str) Module[source]

Retrieve a layer from the model by its name.

Parameters:
  • model (nn.Module) – The PyTorch model.

  • layer_name (str) – Dot-separated name of the layer.

  • model – nn.Module:

  • layer_name – str:

Returns:

The layer module.

Return type:

nn.Module

detoxai.cavs.extract_activations.load_activations(save_path: str) dict[str, ndarray][source]
Parameters:

save_path – str:

Returns:

detoxai.cavs.extract_activations.extract_activations(model: Module, dataloader: DataLoader, experiment_name: str, save_dir: str, layers: list | None = None, device: str = 'cuda', use_cache: bool = True) dict[str, ndarray][source]

Extract activations from all layers of a model for data from a dataloader.

Parameters:
  • model (nn.Module) – The PyTorch model.

  • dataloader (DataLoader) – The PyTorch DataLoader.

  • experiment_name (str) – Name of the experiment.

  • save_dir (str) – Directory to save the activations.

  • layers (list) – List of layer names to extract activations from.

  • device (str) – Device to run the model on.

  • use_cache (bool) – Whether to use cached activations.

  • model – nn.Module:

  • dataloader – DataLoader:

  • experiment_name – str:

  • save_dir – str:

  • layers – list | None: (Default value = None)

  • device – str: (Default value = “cuda”)

  • use_cache – bool: (Default value = True)

Returns:

Dictionary mapping layer names to activations.

Return type:

dict

detoxai.cavs.mass_mean_probe submodule

detoxai.cavs.mass_mean_probe.compute_mass_mean_probe(vecs: ndarray, targets: ndarray) tuple[Tensor, Tensor][source]

Compute the mass mean probe from the activations of a model.

Parameters:
  • vecs (np.ndarray) – Activations of the model, shape (samples, features).

  • targets (np.ndarray) – Target labels for the samples, shape (samples,).

  • vecs – np.ndarray:

  • targets – np.ndarray:

Returns:

A tuple containing - mass_mean_probe (torch.Tensor): The mass mean probe. - mean_activation_over_nonartifact_samples (torch.Tensor): The mean activation over non-artifact samples. - mean_activation_over_artifact_samples (torch.Tensor): The mean activation over artifact samples

Return type:

tuple

Module contents