Harness
- class genderbench.probing.harness.Harness(probes: list[Probe], log_dir: str = None, **kwargs)
Harness represents a predefined set of Probes that are supposed to be run together to provide a comprehensive evaluation for generator.
- Parameters:
probes (list[Probe]) – Probe in
status.NEW.log_dir (str, optional) – A logging path. If set to None, environment variable LOG_DIR is used instead.
**kwargs – Arguments from the following list will be set for all probes: log_strategy, log_dir, calculate_cis, bootstrap_cycles, bootstrap_alpha. See Probe for more details.
- results
Stores all the results from the probes. Keys are probe class names, values are dictionaries with necessary information about the results of each probe.
- Type:
dict[str, dict]
- uuid
UUID identifier.
- Type:
uuid.UUID
- log_results(probe_results)
Log calculated marks and metrics into a file.
- property marks
Dictionary of all the marks for individual probes.
- Returns:
dict[str, dict]
- property metrics
Dictionary of all the metrics for individual probes.
- Returns:
dict[str, dict]
- run(generator: Generator) tuple[dict[str, dict], dict[str, float]]
Iteratively run all probes and store the results into a JSONL file.
- Parameters:
generator (Generator) – Evaluated text generator.
- Returns:
A tuple containing:
Dictionary describing the calculated marks.
Dictionary with metrics and their values.
- Return type:
tuple[dict[str, dict]], dict[str, float]