Evaluators
- class genderbench.probing.evaluator.Evaluator(probe: Probe, undetected: Any = None)
Evaluators evaluate answers generated by attempts. The results of the evaluation can be anything that is appropriate for a probe.
Warning
The output of Evaluator must be JSON-serializible.
- Parameters:
probe (Probe) – Probe object that uses this Evaluator.
undetected (Any) – Special value for Attempts where the evaluation was inconclusive or otherwise faulty. Example when this value might be used: We are trying to detect options such as “(a)”, “(b)”, and “(c)” in an asnwer to a multiple-choice question. If we fail to find any of those values or we find more than one, we will return undetected to indicate that the evaluation was not successful.
- abstractmethod calculate_evaluation(attempt: Attempt) Any
Perform the core evaluation routine.
- Parameters:
attempt (Attempt) – Attempt that already has an answer generated.
- Returns:
The result of the evaluation
- Return type:
Any
- evaluate(attempt: Attempt) Any
Perform the evaluation of Attempt. We first calculate the value with a calculation method
calculate_evaluationand then validate it with a validation methodvalidate_evaluation.- Parameters:
attempt (Attempt) – Attempt that already has an answer generated.
- Raises:
ValueError – If the evaluation does not return a valid value.
- Returns:
The result of the evaluation
- Return type:
Any
- validate_evaluation(evaluation: Any) bool
Validate the value calculated by
calculate_evaluation.- Parameters:
evaluation (Any) – To-be-validated evaluation value.
- Returns:
Is evaluation valid.
- Return type:
bool
- class genderbench.probing.evaluator.ClosedSetEvaluator(probe: Probe, options: list[Any], undetected: Any = None)
A subclass of Evaluator that only allows values from a predefined set of options.
Inherits all the parameters from Evaluator.
- Parameters:
options (list[Any]) – A list of allowed values
- options
See Parameters.
- Type:
list[Any]
- class genderbench.probes.generics.yes_no_evaluator.YesNoEvaluator(probe)
Detect “yes” or “no” answers. If both or neither are present, the evaluation remains undefined.
- class genderbench.probes.generics.character_gender_evaluator.CharacterGenderEvaluator(probe)
Detect gender of a generated novel character. The logic is based on simple pronoun counting (“he”, “his”, “him” vs “she”, “her”). Return either “male” or “female” based on what pronouns are more frequent.