This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM. Given a labeled dataset, the script will : - call the small model and run its inference - send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality The first version will run against an HTTP endpoint llammafile+Mistral 7B. Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization
Bug 1890948 Comment 0 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM. Given a labeled dataset, the script will : - call the small model and run its inference - send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality The first version will run against an HTTP endpoint llammafile+llava+Mistral 7B. Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization