detoxai.methods.posthoc

Submodules

detoxai.methods.posthoc.naive_threshold submodule

class detoxai.methods.posthoc.naive_threshold.NaiveThresholdOptimizer(model: Module | LightningModule, experiment_name: str, device: str, dataloader: DetoxaiDataLoader, outputs_are_logits: bool = True, **kwargs: Any)[source]

Bases: PosthocBase

Optimizes classification threshold using forward hooks.

apply_model_correction(last_layer_name: str, threshold_range: Tuple[float, float] = (0.05, 0.95), objective_function: Callable[[float, float], float] | None = None, threshold_steps: int = 100, metric: str = 'EO_GAP', **kwargs: Any) → None[source]

Applies threshold modification hook to model.

Parameters:

last_layer_name – str:
threshold_range – Tuple[float:
float] – (Default value = (0.05)
0.95)
objective_function – Optional[Callable[[float:
float]] – (Default value = None)
threshold_steps – int: (Default value = 100)
metric – str: (Default value = “EO_GAP”)
**kwargs – Any:

Returns:

detoxai.methods.posthoc.posthoc_base submodule

class detoxai.methods.posthoc.posthoc_base.PosthocBase(model: Module | LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: ModelCorrectionMethod, ABC

Abstract base class for binary post-hoc debiasing methods.

abstractmethod apply_model_correction() → None[source]

detoxai.methods.posthoc.reject_option_classification submodule

class detoxai.methods.posthoc.reject_option_classification.ROCModelWrapper(base_model: Module, theta: float, L_values: Dict[int, int])[source]

Bases: Module

forward(input, sensitive_features)[source]

Parameters:

input
sensitive_features

Returns:

class detoxai.methods.posthoc.reject_option_classification.RejectOptionClassification(model: Module, experiment_name: str, device: str, dataloader: DetoxaiDataLoader, theta_range: Tuple[float, float] = (0.55, 0.95), theta_steps: int = 20, metric: str = 'EO_GAP', objective_function: Callable[[float, float], float] | None = None, **kwargs: Any)[source]

Bases: PosthocBase

Implements Reject Option Classification (ROC) for fairness optimization.

This class implements a post-hoc fairness optimization method that modifies model predictions based on a confidence threshold (theta). Predictions with confidence below theta are flipped to optimize for both accuracy and fairness.

Args:

Returns:

apply_model_correction(**kwargs) → Module[source]

Returns a wrapped model that applies ROC correction during inference.

Parameters:: **kwargs

Returns:

detoxai.methods.posthoc

Submodules

detoxai.methods.posthoc.naive_threshold submodule

detoxai.methods.posthoc.posthoc_base submodule

detoxai.methods.posthoc.reject_option_classification submodule

Module contents