Harness

class genderbench.probing.harness.Harness(probes: list[Probe], log_dir: str = None, **kwargs)

Harness represents a predefined set of Probes that are supposed to be run together to provide a comprehensive evaluation for generator.

Parameters:

probes (list[Probe]) – Probe in status.NEW.
log_dir (str, optional) – A logging path. If set to None, environment variable LOG_DIR is used instead.
**kwargs – Arguments from the following list will be set for all probes: log_strategy, log_dir, calculate_cis, bootstrap_cycles, bootstrap_alpha. See Probe for more details.

results

Stores all the results from the probes. Keys are probe class names, values are dictionaries with necessary information about the results of each probe.

Type:: dict[str, dict]

uuid

UUID identifier.

Type:: uuid.UUID

log_results(probe_results): Log calculated marks and metrics into a file.

property marks

Dictionary of all the marks for individual probes.

Returns:: dict[str, dict]

property metrics

Dictionary of all the metrics for individual probes.

Returns:: dict[str, dict]

run(generator: Generator) → tuple[dict[str, dict], dict[str, float]]

Iteratively run all probes and store the results into a JSONL file.

Parameters:

generator (Generator) – Evaluated text generator.

Returns:

A tuple containing:

Dictionary describing the calculated marks.

Dictionary with metrics and their values.

Return type:

tuple[dict[str, dict]], dict[str, float]