Open Bug 1890948 Opened 2 months ago Updated 2 months ago

LLM-based evaluator for the small models

Categories

(Core :: Machine Learning, enhancement)

enhancement

Tracking

()

People

(Reporter: tarek, Assigned: tarek)

References

Details

(Whiteboard: [genai])

This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM.

Given a labeled dataset, the script will :

  • call the small model and run its inference
  • send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality

The first version will run against an HTTP endpoint llammafile+llava+Mistral 7B.

Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization

Whiteboard: [genai]

We should check existing projects, like https://github.com/confident-ai/deepeval or https://openpipe.ai/

Type: defect → enhancement
No longer blocks: 1883591
Whiteboard: [genai]
Whiteboard: [genai]
You need to log in before you can comment on or make changes to this bug.