Closed
Bug 1890948
Opened 10 months ago
Closed 7 months ago
LLM-based evaluator for the small models
Categories
(Core :: Machine Learning, enhancement)
Core
Machine Learning
Tracking
()
RESOLVED
FIXED
People
(Reporter: tarek, Assigned: tarek)
References
Details
(Whiteboard: [genai])
This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM.
Given a labeled dataset, the script will :
- call the small model and run its inference
- send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality
The first version will run against an HTTP endpoint llammafile+llava+Mistral 7B.
Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization
Assignee | ||
Updated•10 months ago
|
Whiteboard: [genai]
Assignee | ||
Comment 1•10 months ago
|
||
We should check existing projects, like https://github.com/confident-ai/deepeval or https://openpipe.ai/
Assignee | ||
Updated•10 months ago
|
Type: defect → enhancement
Assignee | ||
Updated•10 months ago
|
Whiteboard: [genai]
Assignee | ||
Updated•10 months ago
|
Whiteboard: [genai]
Updated•10 months ago
|
Assignee | ||
Comment 2•7 months ago
|
||
done in https://github.com/mozilla/checkvite/blob/main/checkvite/accuracy.py and labels built for gpt in https://github.com/mozilla/distilvit/blob/main/distilvit/gpt4.py
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•