detoxai.methods.leace

Submodules

detoxai.methods.leace.leace submodule

class detoxai.methods.leace.leace.LEACE(model: Module | LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: ModelCorrectionMethod

extract_activations(dataloader: DataLoader, intervention_layers: list[str], use_cache: bool = True, save_dir: str = '/home/docs/.detoxai/activations') None[source]
Parameters:
  • dataloader – torch.utils.data.DataLoader:

  • intervention_layers – list[str]:

  • use_cache – bool: (Default value = True)

  • save_dir – str: (Default value = ACTIVATIONS_DIR)

Returns:

apply_model_correction(intervention_layers: list[str], use_n_examples: int = 15000, **kwargs) None[source]

Apply the LEACE eraser to the specified layers of the model.

Parameters:
  • intervention_layers – list[str]:

  • use_n_examples – int: (Default value = 15_000)

  • **kwargs

Returns:

add_clarc_hook(eraser: LeaceEraser, layer_names: list) None[source]

Applies debiasing to the specified layers of a PyTorch model using the provided CAV.

Parameters:
  • model (nn.Module) – The PyTorch model to be debiased.

  • cav (torch.Tensor) – The Concept Activation Vector, shape (channels,).

  • mean_length (torch.Tensor) – Mean activation length of the unaffected activations.

  • layer_names (list) – List of layer names (strings) to apply the hook on.

  • alpha (float) – Scaling factor for the debiasing.

  • eraser – LeaceEraser:

  • layer_names – list:

Returns:

A list of hook handles. Keep them to remove hooks later if needed.

Return type:

list

Module contents