Harness

class genderbench.probing.harness.Harness(probes: list[Probe], log_dir: str = None, **kwargs)

Harness represents a predefined set of Probes that are supposed to be run together to provide a comprehensive evaluation for generator.

Parameters:
  • probes (list[Probe]) – Probe in status.NEW.

  • log_dir (str, optional) – A logging path. If set to None, environment variable LOG_DIR is used instead.

  • **kwargs – Arguments from the following list will be set for all probes: log_strategy, log_dir, calculate_cis, bootstrap_cycles, bootstrap_alpha. See Probe for more details.

results

Stores all the results from the probes. Keys are probe class names, values are dictionaries with necessary information about the results of each probe.

Type:

dict[str, dict]

uuid

UUID identifier.

Type:

uuid.UUID

log_results(probe_results)

Log calculated marks and metrics into a file.

property marks

Dictionary of all the marks for individual probes.

Returns:

dict[str, dict]

property metrics

Dictionary of all the metrics for individual probes.

Returns:

dict[str, dict]

run(generator: Generator) tuple[dict[str, dict], dict[str, float]]

Iteratively run all probes and store the results into a JSONL file.

Parameters:

generator (Generator) – Evaluated text generator.

Returns:

A tuple containing:

  • Dictionary describing the calculated marks.

  • Dictionary with metrics and their values.

Return type:

tuple[dict[str, dict]], dict[str, float]