Open
Bug 1890948
Opened 2 months ago
Updated 2 months ago
LLM-based evaluator for the small models
Categories
(Core :: Machine Learning, enhancement)
Core
Machine Learning
Tracking
()
NEW
People
(Reporter: tarek, Assigned: tarek)
References
Details
(Whiteboard: [genai])
This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM.
Given a labeled dataset, the script will :
- call the small model and run its inference
- send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality
The first version will run against an HTTP endpoint llammafile+llava+Mistral 7B.
Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization
Assignee | ||
Updated•2 months ago
|
Whiteboard: [genai]
Assignee | ||
Comment 1•2 months ago
|
||
We should check existing projects, like https://github.com/confident-ai/deepeval or https://openpipe.ai/
Assignee | ||
Updated•2 months ago
|
Type: defect → enhancement
Assignee | ||
Updated•2 months ago
|
Whiteboard: [genai]
Assignee | ||
Updated•2 months ago
|
Whiteboard: [genai]
Updated•2 months ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•