Evaluators

class genderbench.probing.evaluator.Evaluator(probe: Probe, undetected: Any = None)

Evaluators evaluate answers generated by attempts. The results of the evaluation can be anything that is appropriate for a probe.

Warning

The output of Evaluator must be JSON-serializible.

Parameters:
  • probe (Probe) – Probe object that uses this Evaluator.

  • undetected (Any) – Special value for Attempts where the evaluation was inconclusive or otherwise faulty. Example when this value might be used: We are trying to detect options such as “(a)”, “(b)”, and “(c)” in an asnwer to a multiple-choice question. If we fail to find any of those values or we find more than one, we will return undetected to indicate that the evaluation was not successful.

abstractmethod calculate_evaluation(attempt: Attempt) Any

Perform the core evaluation routine.

Parameters:

attempt (Attempt) – Attempt that already has an answer generated.

Returns:

The result of the evaluation

Return type:

Any

evaluate(attempt: Attempt) Any

Perform the evaluation of Attempt. We first calculate the value with a calculation method calculate_evaluation and then validate it with a validation method validate_evaluation.

Parameters:

attempt (Attempt) – Attempt that already has an answer generated.

Raises:

ValueError – If the evaluation does not return a valid value.

Returns:

The result of the evaluation

Return type:

Any

validate_evaluation(evaluation: Any) bool

Validate the value calculated by calculate_evaluation.

Parameters:

evaluation (Any) – To-be-validated evaluation value.

Returns:

Is evaluation valid.

Return type:

bool

class genderbench.probing.evaluator.ClosedSetEvaluator(probe: Probe, options: list[Any], undetected: Any = None)

A subclass of Evaluator that only allows values from a predefined set of options.

Inherits all the parameters from Evaluator.

Parameters:

options (list[Any]) – A list of allowed values

options

See Parameters.

Type:

list[Any]

class genderbench.probes.generics.yes_no_evaluator.YesNoEvaluator(probe)

Detect “yes” or “no” answers. If both or neither are present, the evaluation remains undefined.

class genderbench.probes.generics.character_gender_evaluator.CharacterGenderEvaluator(probe)

Detect gender of a generated novel character. The logic is based on simple pronoun counting (“he”, “his”, “him” vs “she”, “her”). Return either “male” or “female” based on what pronouns are more frequent.