detoxai.methods.posthoc

Submodules

detoxai.methods.posthoc.naive_threshold submodule

class detoxai.methods.posthoc.naive_threshold.NaiveThresholdOptimizer(model: Module | LightningModule, experiment_name: str, device: str, dataloader: DetoxaiDataLoader, outputs_are_logits: bool = True, **kwargs: Any)[source]

Bases: PosthocBase

Optimizes classification threshold using forward hooks.

apply_model_correction(last_layer_name: str, threshold_range: Tuple[float, float] = (0.05, 0.95), objective_function: Callable[[float, float], float] | None = None, threshold_steps: int = 100, metric: str = 'EO_GAP', **kwargs: Any) None[source]

Applies threshold modification hook to model.

Parameters:
  • last_layer_name – str:

  • threshold_range – Tuple[float:

  • float] – (Default value = (0.05)

  • 0.95)

  • objective_function – Optional[Callable[[float:

  • float]] – (Default value = None)

  • threshold_steps – int: (Default value = 100)

  • metric – str: (Default value = “EO_GAP”)

  • **kwargs – Any:

Returns:

detoxai.methods.posthoc.posthoc_base submodule

class detoxai.methods.posthoc.posthoc_base.PosthocBase(model: Module | LightningModule, experiment_name: str, device: str, **kwargs)[source]

Bases: ModelCorrectionMethod, ABC

Abstract base class for binary post-hoc debiasing methods.

abstractmethod apply_model_correction() None[source]

detoxai.methods.posthoc.reject_option_classification submodule

class detoxai.methods.posthoc.reject_option_classification.ROCModelWrapper(base_model: Module, theta: float, L_values: Dict[int, int])[source]

Bases: Module

forward(input, sensitive_features)[source]
Parameters:
  • input

  • sensitive_features

Returns:

class detoxai.methods.posthoc.reject_option_classification.RejectOptionClassification(model: Module, experiment_name: str, device: str, dataloader: DetoxaiDataLoader, theta_range: Tuple[float, float] = (0.55, 0.95), theta_steps: int = 20, metric: str = 'EO_GAP', objective_function: Callable[[float, float], float] | None = None, **kwargs: Any)[source]

Bases: PosthocBase

Implements Reject Option Classification (ROC) for fairness optimization.

This class implements a post-hoc fairness optimization method that modifies model predictions based on a confidence threshold (theta). Predictions with confidence below theta are flipped to optimize for both accuracy and fairness.

Args:

Returns:

apply_model_correction(**kwargs) Module[source]

Returns a wrapped model that applies ROC correction during inference.

Parameters:

**kwargs

Returns:

Module contents