detoxai.methods.clarcs

Submodules

detoxai.methods.clarcs.aclarc submodule

class detoxai.methods.clarcs.aclarc.ACLARC(model: LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: CLARC

apply_model_correction(cav_layers: list[str], *args, **kwargs)
Parameters:
  • cav_layers – list[str]:

  • *args

  • **kwargs

Returns:

detoxai.methods.clarcs.clarc submodule

detoxai.methods.clarcs.clarc.require_activations_and_cav(func)[source]
Parameters:

func

Returns:

class detoxai.methods.clarcs.clarc.CLARC(model: LightningModule, experiment_name: str, device: str)[source]

Bases: ModelCorrectionMethod, ABC

extract_activations(dataloader: DataLoader, layers: list, use_cache: bool = True, save_dir: str = '/home/docs/.detoxai/activations') None[source]
Parameters:
  • dataloader – torch.utils.data.DataLoader:

  • layers – list:

  • use_cache – bool: (Default value = True)

  • save_dir – str: (Default value = ACTIVATIONS_DIR)

Returns:

compute_cavs(cav_type: str, cav_layers: list[str]) None[source]
Parameters:
  • cav_type – str:

  • cav_layers – list[str]:

Returns:

abstractmethod apply_model_correction(cav_layer: str) None[source]
Parameters:

cav_layer – str:

Returns:

detoxai.methods.clarcs.hooks submodule

detoxai.methods.clarcs.hooks.stabilize(x: Tensor, epsilon: float = 1e-08) Tensor[source]
Parameters:
  • x – torch.Tensor:

  • epsilon – float: (Default value = 1e-8)

Returns:

detoxai.methods.clarcs.hooks.mass_mean_probe_hook(probe: Tensor, alpha: float)[source]
Parameters:
  • probe – torch.Tensor:

  • alpha – float:

Returns:

detoxai.methods.clarcs.hooks.add_mass_mean_probe_hook(model: Module, probe: Tensor, layer_names: list, alpha: float = 1.0) list[source]

Adds a probe to the specified layers of a PyTorch model.

Parameters:
  • model (nn.Module) – The PyTorch model to be probed.

  • probe (torch.Tensor) – The probe tensor to be added to the output.

  • layer_names (list) – List of layer names (strings) to apply the hook on.

  • alpha (float) – Scaling factor for the probe.

  • model – nn.Module:

  • probe – torch.Tensor:

  • layer_names – list:

  • alpha – float: (Default value = 1.0)

Returns:

A list of hook handles. Keep them to remove hooks later if needed.

Return type:

list

detoxai.methods.clarcs.hooks.clarc_hook(cav: Tensor, mean_length: Tensor, alpha: float)[source]

Creates a forward hook to adjust layer activations based on the CAV.

Parameters:
  • cav (torch.Tensor) – Concept Activation Vector of shape (channels,).

  • mean_length (float) – Desired mean alignment length.

  • cav – torch.Tensor:

  • mean_length – torch.Tensor:

  • alpha – float:

Returns:

A hook function to be registered with a PyTorch module.

Return type:

function

detoxai.methods.clarcs.hooks.add_clarc_hook(model: Module, cav: Tensor, mean_length: Tensor, layer_name: str, alpha: float = 1.0) list[source]

Applies debiasing to the specified layers of a PyTorch model using the provided CAV.

Parameters:
  • model (nn.Module) – The PyTorch model to be debiased.

  • cav (torch.Tensor) – The Concept Activation Vector, shape (channels,).

  • mean_length (torch.Tensor) – Mean activation length of the unaffected activations.

  • layer_names (list) – List of layer names (strings) to apply the hook on.

  • alpha (float) – Scaling factor for the debiasing.

  • model – nn.Module:

  • cav – torch.Tensor:

  • mean_length – torch.Tensor:

  • layer_name – str:

  • alpha – float: (Default value = 1.0)

Returns:

A list of hook handles. Keep them to remove hooks later if needed.

Return type:

list

detoxai.methods.clarcs.pclarc submodule

class detoxai.methods.clarcs.pclarc.PCLARC(model: LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: CLARC

apply_model_correction(cav_layers: list[str], *args, **kwargs)
Parameters:
  • cav_layers – list[str]:

  • *args

  • **kwargs

Returns:

detoxai.methods.clarcs.rrclarc submodule

class detoxai.methods.clarcs.rrclarc.RRMaskingPattern(value)[source]

Bases: Enum

MAX_LOGIT = 'max_logit'
TARGET_LOGIT = 'target_logit'
ALL_LOGITS = 'all_logits'
ALL_LOGITS_RANDOM = 'all_logits_random'
LOGPROBS = 'logprobs'
class detoxai.methods.clarcs.rrclarc.RRLossType(value)[source]

Bases: Enum

L2 = 'l2'
L1 = 'l1'
COSINE = 'cosine'
class detoxai.methods.clarcs.rrclarc.RRCLARC(model: LightningModule, experiment_name: str, device: str, rr_config: dict = {}, **kwargs)[source]

Bases: CLARC

apply_model_correction(cav_layers: list[str], *args, **kwargs)
Parameters:
  • cav_layers – list[str]:

  • *args

  • **kwargs

Returns:

rr_clarc_hook() Callable[source]
masked_criterion(y_hat: Tensor, y: Tensor) Tensor[source]
Parameters:
  • y_hat – torch.Tensor:

  • y – torch.Tensor:

Returns:

rr_loss(gradient: Tensor) Tensor[source]
Parameters:

gradient – torch.Tensor:

Returns:

modified_training_step() Callable[source]

Module contents