detoxai.methods.clarcs
Submodules
detoxai.methods.clarcs.aclarc submodule
detoxai.methods.clarcs.clarc submodule
- class detoxai.methods.clarcs.clarc.CLARC(model: LightningModule, experiment_name: str, device: str)[source]
Bases:
ModelCorrectionMethod,ABC- extract_activations(dataloader: DataLoader, layers: list, use_cache: bool = True, save_dir: str = '/home/docs/.detoxai/activations') None[source]
- Parameters:
dataloader – torch.utils.data.DataLoader:
layers – list:
use_cache – bool: (Default value = True)
save_dir – str: (Default value = ACTIVATIONS_DIR)
Returns:
detoxai.methods.clarcs.hooks submodule
- detoxai.methods.clarcs.hooks.stabilize(x: Tensor, epsilon: float = 1e-08) Tensor[source]
- Parameters:
x – torch.Tensor:
epsilon – float: (Default value = 1e-8)
Returns:
- detoxai.methods.clarcs.hooks.mass_mean_probe_hook(probe: Tensor, alpha: float)[source]
- Parameters:
probe – torch.Tensor:
alpha – float:
Returns:
- detoxai.methods.clarcs.hooks.add_mass_mean_probe_hook(model: Module, probe: Tensor, layer_names: list, alpha: float = 1.0) list[source]
Adds a probe to the specified layers of a PyTorch model.
- Parameters:
model (nn.Module) – The PyTorch model to be probed.
probe (torch.Tensor) – The probe tensor to be added to the output.
layer_names (list) – List of layer names (strings) to apply the hook on.
alpha (float) – Scaling factor for the probe.
model – nn.Module:
probe – torch.Tensor:
layer_names – list:
alpha – float: (Default value = 1.0)
- Returns:
A list of hook handles. Keep them to remove hooks later if needed.
- Return type:
list
- detoxai.methods.clarcs.hooks.clarc_hook(cav: Tensor, mean_length: Tensor, alpha: float)[source]
Creates a forward hook to adjust layer activations based on the CAV.
- Parameters:
cav (torch.Tensor) – Concept Activation Vector of shape (channels,).
mean_length (float) – Desired mean alignment length.
cav – torch.Tensor:
mean_length – torch.Tensor:
alpha – float:
- Returns:
A hook function to be registered with a PyTorch module.
- Return type:
function
- detoxai.methods.clarcs.hooks.add_clarc_hook(model: Module, cav: Tensor, mean_length: Tensor, layer_name: str, alpha: float = 1.0) list[source]
Applies debiasing to the specified layers of a PyTorch model using the provided CAV.
- Parameters:
model (nn.Module) – The PyTorch model to be debiased.
cav (torch.Tensor) – The Concept Activation Vector, shape (channels,).
mean_length (torch.Tensor) – Mean activation length of the unaffected activations.
layer_names (list) – List of layer names (strings) to apply the hook on.
alpha (float) – Scaling factor for the debiasing.
model – nn.Module:
cav – torch.Tensor:
mean_length – torch.Tensor:
layer_name – str:
alpha – float: (Default value = 1.0)
- Returns:
A list of hook handles. Keep them to remove hooks later if needed.
- Return type:
list
detoxai.methods.clarcs.pclarc submodule
detoxai.methods.clarcs.rrclarc submodule
- class detoxai.methods.clarcs.rrclarc.RRMaskingPattern(value)[source]
Bases:
Enum- MAX_LOGIT = 'max_logit'
- TARGET_LOGIT = 'target_logit'
- ALL_LOGITS = 'all_logits'
- ALL_LOGITS_RANDOM = 'all_logits_random'
- LOGPROBS = 'logprobs'
- class detoxai.methods.clarcs.rrclarc.RRLossType(value)[source]
Bases:
Enum- L2 = 'l2'
- L1 = 'l1'
- COSINE = 'cosine'
- class detoxai.methods.clarcs.rrclarc.RRCLARC(model: LightningModule, experiment_name: str, device: str, rr_config: dict = {}, **kwargs)[source]
Bases:
CLARC- apply_model_correction(cav_layers: list[str], *args, **kwargs)
- Parameters:
cav_layers – list[str]:
*args
**kwargs
Returns: