detoxai.core

Submodules

detoxai.core.evaluation submodule

detoxai.core.evaluation.evaluate_model(model: Module, dataloader: DataLoader, pareto_metrics: list[str] | None = None, device: str | None = None) → dict[source]

Evaluate the model on various metrics

Parameters:

model (-) – Model to evaluate
dataloader (-) – DataLoader for the dataset
pareto_metrics (-) – List of metrics to include in the pareto front
device (-) – Device to use for evaluation (“cpu” or “cuda”)

Returns:

detoxai.core.interface submodule

detoxai.core.interface.parse_methods_config(methods_config: dict) → dict[source]

Here we compare what was passed and overwrite the default configuration

Parameters:: methods_config – dict:

Returns:

detoxai.core.interface.debias(model: Module, dataloader: DetoxaiDataLoader | DataLoader, methods: list[str] | str = 'all', metrics: list[str] | str = 'all', methods_config: dict = {}, pareto_metrics: list[str] = ['balanced_accuracy', 'equalized_odds'], return_type: str = 'all', device: str = 'cpu', include_vanila_in_results: bool = True, test_dataloader: DetoxaiDataLoader | DataLoader = None, num_of_classes: int | None = None) → CorrectionResult | dict[str, CorrectionResult][source]

Run a suite of correction methods on the model and return the results

Parameters:

model – Model to run the correction methods on
dataloader – DetoxaiDataLoader object with the dataset
harmful_concept – Concept to debias – this is the protected attribute # NOT SUPPORTED YET
methods – List of correction methods to run
metrics – List of metrics to include in the configuration
methods_config – Configuration for each correction method
pareto_metrics – List of metrics to use for the pareto front and selection of best method
return_type (optional) – Type of results to return. Options are ‘pareto-front’, ‘all’, ‘best’ “pareto-front”: Return the results CorrectionResult objects only for results on the pareto front “all”: Return the results for all correction methods “best”: Return the results for the best correction method, chosen with ideal point method from pareto front
device (optional) – Device to run the correction methods on
include_vanila_in_results (optional) – Include the vanilla model in the results
test_dataloader (optional) – DataLoader for the test dataset. If not provided, the original dataloader is used
num_of_classes (optional) – Number of classes in the dataset. Default is None, which means the number of classes will be inferred from the dataloader

detoxai.core.interface.run_correction(method: str, method_kwargs: dict, pareto_metrics: list[str] | None = None) → CorrectionResult[source]

Run the specified correction method

Parameters:

method – Correction method to run
kwargs – Arguments for the correction method
method – str:
method_kwargs – dict:
pareto_metrics – list[str] | None: (Default value = None)

Returns:

detoxai.core.interface.get_supported_methods() → list[str][source]

Get a list of supported methods

Returns:: List of supported methods
Return type:: list[str]

detoxai.core.interface_helpers submodule

detoxai.core.interface_helpers.load_supported_tags() → dict[source]: From ./datasets/catalog/<dataset_name>/labels_mapping.yaml, load the dicts.

detoxai.core.interface_helpers.construct_metrics_config(metrics: list[str] | str = 'all', types: str = 'GAP') → dict[source]

Construct the metrics configuration for the fairness and performance metrics

Parameters:

metrics – List of metrics to include in the configuration
types – Type of metric to use. Options are “GAP” or “RATIO”
metrics – list[str] | str: (Default value = “all”)
types – str: (Default value = “GAP”)

Returns:

detoxai.core.interface_helpers.resolve_layer(model, layer) → Module | None[source]

Resolve a layer name to a layer in the model

Parameters:

model
layer

Returns:

detoxai.core.interface_helpers.infer_layers(corrector, layers: list[str] | str) → list[str][source]

Infer the layers to use for the correction method

There are wildcards available: - ‘last’: Use the last layer - ‘penultimate’: Use the penultimate layer Otherwise, a list of actual layer names can be passed

Parameters:

corrector – Correction method object
layers – Layer specification

Returns:

detoxai.core.mcda_helpers submodule

detoxai.core.mcda_helpers.is_pareto_efficient(costs: ndarray, return_mask: bool = True) → ndarray[source]

Find the pareto-efficient points

Parameters:

costs – An (n_points, n_costs) array
return_mask – True to return a mask
costs – np.ndarray:
return_mask – bool: (Default value = True)

Returns:

An array of indices of pareto-efficient points. If return_mask is True, this will be an (n_points, ) boolean array Otherwise it will be a (n_efficient_points, ) integer array of indices.

Credit: https://stackoverflow.com/questions/32791911/fast-calculation-of-pareto-front-in-python

detoxai.core.mcda_helpers.filter_pareto_front(results: dict[str, CorrectionResult]) → dict[str, CorrectionResult][source]

Filter the results to only include those on the pareto front

Parameters:

results – List of CorrectionResult objects to filter
results – list[CorrectionResult]:

Returns:

detoxai.core.mcda_helpers.select_best_method(results: dict[str, CorrectionResult]) → CorrectionResult[source]

Select the best correction method from the results using the ideal point method

Parameters:

results – List of CorrectionResult objects to choose from
results – list[CorrectionResult]:

Returns:

detoxai.core.model_wrappers submodule

class detoxai.core.model_wrappers.BaseLightningWrapper(model: ~torch.nn.modules.module.Module, criterion: ~torch.nn.modules.module.Module | None = CrossEntropyLoss(), performance_metrics: ~torchmetrics.collections.MetricCollection | None = None, learning_rate: float | None = 0.001, optimizer: ~torch.optim.optimizer.Optimizer | None = <class 'torch.optim.adam.Adam'>)[source]

Bases: LightningModule

training_step(batch, batch_idx)[source]

Parameters:

batch
batch_idx

Returns:

on_train_batch_end(outputs, batch, batch_idx)[source]

Parameters:

outputs
batch
batch_idx

Returns:

on_train_epoch_end()[source]

test_step(batch, batch_idx)[source]

Parameters:

batch
batch_idx

Returns:

on_test_batch_end(outputs, batch, batch_idx)[source]

Parameters:

outputs
batch
batch_idx

Returns:

on_test_epoch_end()[source]

configure_optimizers()[source]

forward(x)[source]

Parameters:: x

Returns:

predict_step(batch)[source]

Parameters:: batch

Returns:

class detoxai.core.model_wrappers.FairnessLightningWrapper(model: ~torch.nn.modules.module.Module, criterion: ~torch.nn.modules.module.Module | None = CrossEntropyLoss(), performance_metrics: ~torchmetrics.collections.MetricCollection | None = None, fairness_metrics: ~torchmetrics.collections.MetricCollection | None = None, learning_rate: float | None = 0.001, optimizer: ~torch.optim.optimizer.Optimizer | None = <class 'torch.optim.adam.Adam'>)[source]

Bases: BaseLightningWrapper

training_step(batch, batch_idx)[source]

Parameters:

batch
batch_idx

Returns:

on_train_batch_end(outputs, batch, batch_idx)[source]

Parameters:

outputs
batch
batch_idx

Returns:

on_train_epoch_end()[source]

test_step(batch, batch_idx)[source]

Parameters:

batch
batch_idx

Returns:

on_test_batch_end(outputs, batch, batch_idx)[source]

Parameters:

outputs
batch
batch_idx

Returns:

on_test_epoch_end()[source]

predict_step(batch, batch_idx, dataloader_idx=None)[source]

Parameters:

batch
batch_idx
dataloader_idx – (Default value = None)

Returns:

detoxai.core.results_class submodule

class detoxai.core.results_class.CorrectionResult(method: str, model: BaseLightningWrapper, metrics: dict)[source]

Bases: object

get_all_metrics() → dict[source]

get_metric(metric: str) → float[source]

Parameters:: metric – str:

Returns:

get_model() → BaseLightningWrapper[source]

get_method() → str[source]

detoxai.core.xai submodule

class detoxai.core.xai.XAIMetricsCalculator(dataloader: DetoxaiDataLoader, lrphandler: LRPHandler)[source]

Bases: object

calculate_metrics(model: Module, rect_pos: tuple[int, int], rect_size: tuple[int, int], vanilla_model: Module = None, sailmap_metrics: list[str] = ['RRF', 'HRF', 'MRR', 'DET', 'ADR', 'DIF', 'RDDT'], batches: int = 2, condition_on: str = 'proper_label', verbose: bool = False, neutral_point: float = 0.5, abs_on_neutral: bool = True) → dict[str, float][source]

Calculate the metrics for the given model and sailmaps

Parameters:

model – nn
rect_pos – tuple
rect_size – tuple
vanilla_model – nn
sailmap_metrics – list
batches – int
condition_on – str
verbose – bool
model – nn.Module:
rect_pos – tuple[int:
int]
rect_size – tuple[int:
vanilla_model – nn.Module: (Default value = None)
sailmap_metrics – list[str]:
batches – int: (Default value = 2)
condition_on – str: (Default value = ConditionOn.PROPER_LABEL.value)
verbose – bool: (Default value = False)
source_range (#) – tuple[float:
float] – (Default value = (0))
neutral_point – float: (Default value = 0.5)
abs_on_neutral – bool: (Default value = True)

Returns:

The calculated metrics where the key is the metric name and the value is the calculated metric

Return type:

dict[str, float]

class detoxai.core.xai.SailRectMetric[source]

Bases: ABC

calculate_batch(sailmaps: ndarray, rect_pos: tuple[int, int], rect_size: tuple[int, int], ret_format: tuple[str] = ('mean', 'std')) → dict[str, float][source]

Calculate the metric for a single batch of sailmaps

Parameters:

sailmaps – np.ndarray:
rect_pos – tuple[int:
int]
rect_size – tuple[int:
ret_format – tuple[str]: (Default value = (“mean”)
"std")

Returns:

reduce(ret_format: tuple[str] = ('mean', 'std')) → dict[str, float][source]

Calculate the metric for already aggregated sailmaps

Parameters:

ret_format – tuple[str]: (Default value = (“mean”)
"std")

Returns:

aggregate(sailmaps: ndarray, rect_pos: tuple[int, int], rect_size: tuple[int, int], vanilla_sailmaps: ndarray = None)[source]

Aggregate sailmaps for later calculation

Parameters:

sailmaps – np.ndarray:
rect_pos – tuple[int:
int]
rect_size – tuple[int:
vanilla_sailmaps – np.ndarray: (Default value = None)

Returns:

structure_output(per_sample: ndarray[float], ret_format: tuple[str] = ('mean', 'std')) → dict[str, float][source]

Parameters:

per_sample – np.ndarray[float]:
ret_format – tuple[str]: (Default value = (“mean”)
"std")

Returns:

class detoxai.core.xai.RRF(**kwargs)[source]

Bases: SailRectMetric

Rectangle Relevance Fraction

egin{equation} mathbf{RRF} =

rac{displaystyle sum_{(i,j) in R} p_{ij}}{displaystyle sum_{i = 1}^N sum_{j = 1}^M p_{ij}}

end{equation}

Here, $mathbf{RRF}$ measures the fraction of total relevance that falls within ROI.

Args:

Returns:

class detoxai.core.xai.HRF(epsilon: float = 0.05, **kwargs)[source]

Bases: SailRectMetric

subsection{High-Relevance Fraction (HRF)}: egin{equation} mathbf{HRF} = displaystyle

rac{1}{ ert R ert} sum_{(i,j) in R} mathbbm{1}_{{p_{ij} > epsilon}}

end{equation}

$mathbf{HRF}$ quantifies the proportion of pixels inside the ROI whose relevance exceeds a predefined threshold $epsilon$, indicating how many pixels are highly important for prediction.

Args:

Returns:

class detoxai.core.xai.MRR(**kwargs)[source]

Bases: SailRectMetric

subsection{Mean Relevance Ratio (MRR)}

egin{equation}
mathbf{MRR} =

rac{displaystyle rac{1}{ ert R ert} sum_{(i,j) in R} p_{ij}}{displaystyle rac{1}{N M - ert R ert} sum_{(i,j) otin R} p_{ij}},

end{equation} $mathbf{MRR}$ quantifies the ratio of the mean pixel value inside the ROI to the mean pixel value outside it. $mathbf{MRR} = 1$ indicates that the mean values are equal, while $mathbf{MRR} > 1$ says the mean pixel within the ROI has a higher intensity.

Args:

Returns:

class detoxai.core.xai.DET(**kwargs)[source]

Bases: SailRectMetric

subsection{Distribution Equivalence Testing (DET)}

The goal of the statistical test is to determine whether the pixels extit{inside} the rectangle have higher intensity than those extit{outside} the rectangle. Since the number of pixels and their intensity distributions inside and outside the ROI can vary, a non-parametric, unpaired statistical Mann-Whitney-Wilcoxon test is used. This permutation test assesses whether the intensity values from one group (inside) tend to be higher than those from the other (outside).

The null hypothesis $H_0$ for the test is that the intensity distributions inside and outside the rectangle are equal: egin{equation} egin{split}

H_0: F_{ ext{inside}}(x) &= F_{ ext{outside}}(x) H_1: F_{ ext{inside}}(x) &> F_{ ext{outside}}(x)

end{split} end{equation}

To perform the test, all pixel intensities are ranked, and the sum of ranks for each group (inside and outside the ROI) is computed. The test then evaluates the probability that the intensity values inside the rectangle are statistically higher than those outside. The final outcome of the DET is a binary decision: extbf{TRUE} indicates that the null hypothesis is rejected (i.e., there is statistically significant evidence that the pixels inside the rectangle have higher intensity), while extbf{FALSE} signifies that we fail to reject the null hypothesis, meaning that the evidence is inconclusive regarding a higher intensity inside the rectangle.

Args:

Returns:

reduce(ret_format: tuple[str] = ('mean', 'std')) → dict[str, float][source]

Calculate the metric for already aggregated sailmaps

Parameters:

ret_format – tuple[str]: (Default value = (“mean”)
"std")

Returns:

class detoxai.core.xai.ADR(**kwargs)[source]

Bases: SailRectMetric

Average Difference in Region (ADR)

ADR measures the mean pixel-wise difference between vanilla and debiased saliency maps within the region of interest (ROI). A positive value indicates that vanilla saliency values are generally higher than debiased ones in the region.

Args:

Returns:

class detoxai.core.xai.DIF(eps: float = 0.001, **kwargs)[source]

Bases: SailRectMetric

Decreased Intensity Fraction (DIF)

DIF measures the ratio of pixels showing decreased intensity in the debiased model compared to the vanilla model. It represents the fraction of pixels inside a rectangle that significantly flipped their saliency value.

Args:

Returns:

class detoxai.core.xai.RDDT(**kwargs)[source]

Bases: SailRectMetric

Rectangle Difference Distribution Testing (RDDT)

Performs a Wilcoxon signed rank test to determine if pixels from the vanilla model have significantly higher intensity than those from the debiased model within the ROI. Returns 1 if the test rejects the null hypothesis (indicating vanilla has higher intensity), 0 otherwise.

Args:

Returns:

reduce(ret_format: tuple[str] = ('mean', 'std')) → dict[str, float][source]

Parameters:

ret_format – tuple[str]: (Default value = (“mean”)
"std")

Returns:

detoxai.core

Submodules

detoxai.core.evaluation submodule

detoxai.core.interface submodule

detoxai.core.interface_helpers submodule

detoxai.core.mcda_helpers submodule

detoxai.core.model_wrappers submodule

detoxai.core.results_class submodule

detoxai.core.xai submodule

Module contents