Bug 1890948 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM.

Given a labeled dataset, the script will :

- call the small model and run its inference
- send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality

The first version will run against an HTTP endpoint llammafile+Mistral 7B.

Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization
This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM.

Given a labeled dataset, the script will :

- call the small model and run its inference
- send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality

The first version will run against an HTTP endpoint llammafile+llava+Mistral 7B.

Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization

Back to Bug 1890948 Comment 0