Open Bug 2001690 Opened 4 months ago Updated 4 months ago

Consumes excessive RAM when running Qwen3-0.6B-ONNX on onnx-native

Categories

(Core :: Machine Learning: On Device, defect)

defect

Tracking

()

People

(Reporter: thasan, Unassigned, NeedInfo)

References

Details

(Whiteboard: [genai])

Performance Profile: https://share.firefox.dev/4rjmyNa
Model: onnx-community/Qwen3-0.6B-ONNX

Configuration:

"reasoning-text-generation": {
  modelId: "onnx-community/Qwen3-0.6B-ONNX",
  task: "text-generation",
  modelRevision: "main",
  modelHub: "huggingface",
  dtype: "int8",
  device: "cpu",
  backend: "onnx-native",
  inputArgs: [
    [
      {
        role: "system",
        content: "You are a helpful AI assistant that thinks step-by-step before answering.",
      },
      {
        role: "user",
        content: "Explain why the sky is blue to a 5-year-old.",
      },
    ],
  ],
  runOptions: {
    max_new_tokens: 512,
    temperature: 0.6,
    do_sample: false,
    return_full_text: false,
  },
  timeout: 120,
}
Whiteboard: [genai]

The severity field is not set for this bug.
:tarek, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(tziade)
Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.