detoxai.methods.clarcs

Submodules

detoxai.methods.clarcs.aclarc submodule

class detoxai.methods.clarcs.aclarc.ACLARC(model: LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: CLARC

apply_model_correction(cav_layers: list[str], *args, **kwargs)

Parameters:

cav_layers – list[str]:
*args
**kwargs

Returns:

detoxai.methods.clarcs.clarc submodule

detoxai.methods.clarcs.clarc.require_activations_and_cav(func)[source]

Parameters:: func

Returns:

class detoxai.methods.clarcs.clarc.CLARC(model: LightningModule, experiment_name: str, device: str)[source]

Bases: ModelCorrectionMethod, ABC

extract_activations(dataloader: DataLoader, layers: list, use_cache: bool = True, save_dir: str = '/home/docs/.detoxai/activations') → None[source]

Parameters:

dataloader – torch.utils.data.DataLoader:
layers – list:
use_cache – bool: (Default value = True)
save_dir – str: (Default value = ACTIVATIONS_DIR)

Returns:

compute_cavs(cav_type: str, cav_layers: list[str]) → None[source]

Parameters:

cav_type – str:
cav_layers – list[str]:

Returns:

abstractmethod apply_model_correction(cav_layer: str) → None[source]

Parameters:: cav_layer – str:

Returns:

detoxai.methods.clarcs.hooks submodule

detoxai.methods.clarcs.hooks.stabilize(x: Tensor, epsilon: float = 1e-08) → Tensor[source]

Parameters:

x – torch.Tensor:
epsilon – float: (Default value = 1e-8)

Returns:

detoxai.methods.clarcs.hooks.mass_mean_probe_hook(probe: Tensor, alpha: float)[source]

Parameters:

probe – torch.Tensor:
alpha – float:

Returns:

detoxai.methods.clarcs.hooks.add_mass_mean_probe_hook(model: Module, probe: Tensor, layer_names: list, alpha: float = 1.0) → list[source]

Adds a probe to the specified layers of a PyTorch model.

Parameters:

model (nn.Module) – The PyTorch model to be probed.
probe (torch.Tensor) – The probe tensor to be added to the output.
layer_names (list) – List of layer names (strings) to apply the hook on.
alpha (float) – Scaling factor for the probe.
model – nn.Module:
probe – torch.Tensor:
layer_names – list:
alpha – float: (Default value = 1.0)

Returns:

A list of hook handles. Keep them to remove hooks later if needed.

Return type:

list

detoxai.methods.clarcs.hooks.clarc_hook(cav: Tensor, mean_length: Tensor, alpha: float)[source]

Creates a forward hook to adjust layer activations based on the CAV.

Parameters:

cav (torch.Tensor) – Concept Activation Vector of shape (channels,).
mean_length (float) – Desired mean alignment length.
cav – torch.Tensor:
mean_length – torch.Tensor:
alpha – float:

Returns:

A hook function to be registered with a PyTorch module.

Return type:

function

detoxai.methods.clarcs.hooks.add_clarc_hook(model: Module, cav: Tensor, mean_length: Tensor, layer_name: str, alpha: float = 1.0) → list[source]

Applies debiasing to the specified layers of a PyTorch model using the provided CAV.

Parameters:

model (nn.Module) – The PyTorch model to be debiased.
cav (torch.Tensor) – The Concept Activation Vector, shape (channels,).
mean_length (torch.Tensor) – Mean activation length of the unaffected activations.
layer_names (list) – List of layer names (strings) to apply the hook on.
alpha (float) – Scaling factor for the debiasing.
model – nn.Module:
cav – torch.Tensor:
mean_length – torch.Tensor:
layer_name – str:
alpha – float: (Default value = 1.0)

Returns:

A list of hook handles. Keep them to remove hooks later if needed.

Return type:

list

detoxai.methods.clarcs.pclarc submodule

class detoxai.methods.clarcs.pclarc.PCLARC(model: LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: CLARC

apply_model_correction(cav_layers: list[str], *args, **kwargs)

Parameters:

cav_layers – list[str]:
*args
**kwargs

Returns:

detoxai.methods.clarcs.rrclarc submodule

class detoxai.methods.clarcs.rrclarc.RRMaskingPattern(value)[source]

Bases: Enum

MAX_LOGIT = 'max_logit'

TARGET_LOGIT = 'target_logit'

ALL_LOGITS = 'all_logits'

ALL_LOGITS_RANDOM = 'all_logits_random'

LOGPROBS = 'logprobs'

class detoxai.methods.clarcs.rrclarc.RRLossType(value)[source]

Bases: Enum

L2 = 'l2'

L1 = 'l1'

COSINE = 'cosine'

class detoxai.methods.clarcs.rrclarc.RRCLARC(model: LightningModule, experiment_name: str, device: str, rr_config: dict = {}, **kwargs)[source]

Bases: CLARC

apply_model_correction(cav_layers: list[str], *args, **kwargs)

Parameters:

cav_layers – list[str]:
*args
**kwargs

Returns:

rr_clarc_hook() → Callable[source]

masked_criterion(y_hat: Tensor, y: Tensor) → Tensor[source]

Parameters:

y_hat – torch.Tensor:
y – torch.Tensor:

Returns:

rr_loss(gradient: Tensor) → Tensor[source]

Parameters:: gradient – torch.Tensor:

Returns:

modified_training_step() → Callable[source]

detoxai.methods.clarcs

Submodules

detoxai.methods.clarcs.aclarc submodule

detoxai.methods.clarcs.clarc submodule

detoxai.methods.clarcs.hooks submodule

detoxai.methods.clarcs.pclarc submodule

detoxai.methods.clarcs.rrclarc submodule

Module contents