Closed Bug 1890948 Opened 10 months ago Closed 7 months ago

LLM-based evaluator for the small models

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: tarek, Assigned: tarek)

References

Details

(Whiteboard: [genai])

Tarek Ziadé (:tarek)

Assignee

Description

•

10 months ago

•

Edited

This will be a fully autonomous project on github. That will be used to evaluate the quality of small models in an automated way using an LLM.

Given a labeled dataset, the script will :

call the small model and run its inference
send the inputs and outpouts to an LLM and ask it to send back an assessment of the quality

The first version will run against an HTTP endpoint llammafile+llava+Mistral 7B.

Example of using LLMs to evaluate SLMs https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization

Tarek Ziadé (:tarek)

Assignee

Updated

•

10 months ago

Whiteboard: [genai]

Tarek Ziadé (:tarek)

Assignee

Comment 1

•

10 months ago

We should check existing projects, like https://github.com/confident-ai/deepeval or https://openpipe.ai/

Tarek Ziadé (:tarek)

Assignee

Updated

•

10 months ago

Type: defect → enhancement

Tarek Ziadé (:tarek)

Assignee

Updated

•

10 months ago

No longer blocks: 1883591

Tarek Ziadé (:tarek)

Assignee

Updated

•

10 months ago

Whiteboard: [genai]

Tarek Ziadé (:tarek)

Assignee

Updated

•

10 months ago

Whiteboard: [genai]

Jira Integration Bot

Updated

•

10 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/GENAI-22

Tarek Ziadé (:tarek)

Assignee

Comment 2

•

7 months ago

done in https://github.com/mozilla/checkvite/blob/main/checkvite/accuracy.py and labels built for gpt in https://github.com/mozilla/distilvit/blob/main/distilvit/gpt4.py

Status: NEW → RESOLVED

Closed: 7 months ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

LLM-based evaluator for the small models

Categories

(Core :: Machine Learning, enhancement)

Tracking

()

People

(Reporter: tarek, Assigned: tarek)

References

Details

(Whiteboard: [genai])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Updated

Updated

Updated

Comment 2