Probe

class genderbench.probing.probe.Probe(evaluator: Evaluator, metric_calculator: MetricCalculator, num_repetitions: int = 1, sample_k: int | None = None, calculate_cis: bool = True, bootstrap_cycles: int = 1000, bootstrap_alpha: float = 0.95, random_seed: int = 123, log_strategy: Literal['no', 'during', 'after'] = 'after', log_dir: str = None)

Probes are capable of orchestrating the entire probing pipeline to calculate metrics and marks for text generators. Each Probe is designed to quantify one or more harmful behaviors that such text generators might manifest.

The lifecycle of Probe consists of four main steps:

  1. Creating ProbeItems and their Attempts.

  2. Running generator on all the created Attempts

  3. Evaluating the generated answers with evaluator.

  4. Calculating metrics and marks based on the evaluations.

Parameters:
  • evaluator (Evaluator) – Evaluator used to evaluate generated answers in attempts.

  • metric_calculator (MetricCalculator) – MetricCalculator used to calculate metrics from the evaluated Attempts.

  • num_repetitions (int) – How many Attempts are created for each Prompt. Useful to increase the precision of measurments. Defaults to 1.

  • sample_k (Optional[int], optional) – How many ProbeItems are sampled from the full dataset. When set to None, all the samples are used. Defaults to None.

  • calculate_cis (bool, optional) – Whether to calculate confidence intervals (via bootstrapping) for metrics or use the raw values. Defaults to True.

  • bootstrap_cycles (int, optional) – How many resamplings of ProbeItems are done for confidence interval calculations. Defaults to 1000.

  • bootstrap_alpha (float, optional) – The alpha level for confidence interval calculations. Defaults to 0.95.

  • random_seed (int, optional) – Random seed used when we create ProbeItems. Defaults to 123.

  • log_strategy (Literal["after", "during", "no"], optional) –

    How often is the state of the probe logged into a file as a JSON line:

    • ”after” - After the entire run lifecycle.

    • ”during” - After each of the 4 steps in the run lifecycle.

    • ”no” - Never.

    Defaults to “no”.

  • log_dir (str, optional) – Path to the logging directory. If None, LOG_DIR environment variable is used. Defaults to None.

metrics

Calculated metrics. Available only in status.FINISHED.

Type:

dict[str, float]

marks

Calculated marks. Available only in status.FINISHED.

Type:

dict[str, dict]

status

Current status of the Probe, one of status.NEW, status.POPULATED, status.GENERATED, status.EVALUATED, status.FINISHED. Status is changed after each of the four steps.

Type:

status

uuid

UUID identifier.

Type:

uuid.UUID

probe_items

List of current ProbeItems. Available starting from status.POPULATED.

Type:

list[ProbeItem]

property attempts: Generator[Attempt, None, None]

Generator of all the attempts that belong to this Probe.

Yields:

Attempt

calculate_marks()

Calculate marks and prepare output mark dictionary.

Returns:

Assessment of the mark based on coressponding

metric value.

Return type:

dict[str, dict]

calculate_metrics()

Calculate metrics and marks based on the results of evaluation. This is the fourth and final step in the run lifecycle. Moves the status from EVALUATED to FINISHED.

create_probe_items()

Populate probe_items with corrensponding prepared ProbeItems. This is the first step in the run lifecycle. Moves the status from NEW to POPULATED.

evaluate()

Use evaluator to evaluate the generated texts and populate the evaluation field in all the Attempts. This is the third step in the run lifecycle. Moves the status from GENERATED to EVALUATED.

classmethod from_json_dict(json_dict)

Create a new Probe object from a JSON-serializable dictionary representation.

Parameters:

json_dict (dict) – JSON-serializable dictionary. Generated by to_json_dict.

Returns:

Restored Probe object.

Return type:

Attempt

classmethod from_log_file(log_file: str) Probe

Restore a Probe object from a log file.

Parameters:

log_file (str) – Path to a log file generated by log_current_state.

Returns:

Restored Probe object.

Return type:

Attempt

generate(generator: Generator)

Use text generator to generate texts based on all the Attempts from this Probe, and populate their answer field. This is the second step in the run lifecycle. Moves the status from POPULATED to GENERATED.

Parameters:

generator (Generator) – Text generator that is being probed.

log_current_state()

Log current state of Probe into a log file.

property log_file: str

Path to the log file.

Returns:

str

run(generator: Generator) tuple[dict[str, dict], dict[str, float]]

This is the main process being used to probe generator for harmful behavior.

Parameters:

generator (Generator) – Evaluated text generator.

Returns:

A tuple containing:

  • Dictionary describing the calculated marks.

  • Dictionary with metrics and their values.

Return type:

tuple[dict[str, dict]], dict[str, float]

sample(k: int) list[ProbeItem]

Sample k existing ProbeItems.

Parameters:

k (int) – How many ProbeItems are sampled.

Returns:

Sampled ProbeItems.

Return type:

list[ProbeItem]

to_json_dict() dict

Prepare a JSON-serializable dictionary representation. Used for logging.

Returns:

JSON-serializable dictionary.

Return type:

dict