History Semantic Search suggests unrelated items for Chinese
Categories
(Firefox :: Address Bar, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox147 | --- | fixed |
People
(Reporter: lilydjwg, Assigned: mak)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sng])
Attachments
(2 files)
In the screenshot the search term is "Japanese input method" in Chinese, and the results are "untitled document" and "cannot complete the request" in Chinese.
I have a lot of other cases but it contains private urls. Whenever there are not a lot of results, two sematic search results appear, but usually not semantically related, especially for Chinese.
Originally posted as bug 1991590 comment 1 but I'm advised to open a new bug for this case.
Comment 1•4 months ago
|
||
I am attempting to reproduce your issue on my system, but it would seem that I might need some help finding the cause.
These are the steps that I used to reproduce:
- I launched the latest Firefox Release v143.0.4 AND Nightly v145.0a1 zh-CN language.
- I set the Traditional Chinese language as the display language inside the browser.
- Opened a new tab and input the same search term "日文输入法" in both URLbar and Searchbar.
Notice: The suggestions displayed below appear to be related because the characters displayed are also displayed in the search term.
This being said, I cannot seem to reproduce it on my Windows 10 system using the steps above.
Please answer some questions to try and narrow down the cause:
- Which operating system are you using?
- In which Firefox version did you reproduce this issue?
- Can you tell us which Add-ons you are using?
- Does it reproduce in safe mode?
- Does it reproduce in a newly created profile? (info here)
Thank you for your help! Please let me know if you need help with any of the instructions above.
Notice: The suggestions displayed below appear to be related because the characters displayed are also displayed in the search term.
Good catch, I didn't realize that. But it's far from sematic anyway.
- I'm using Arch Linux
- I don't remember the exact version, but it might be 2025-09-28 judging from the date I took the screenshot
- Well, I'm using about 60 addons
- Yes. I disabled sematic search after this bug and it took quite some time to reindex my history. The version is 2025-10-07 this time.
- No, sematic search seems to be not populated (there is no actual data in places_semantic.sqlite).
It seems that Firefox is using the all-MiniLM-L6-v2 model, which is small but doesn't seem to understand Chinese in a sematic way (and it is not listed in the Multilingual Models section). The BAAI/bge-m3 model could give relevant results but it's big. Maybe Firefox could use an external text embedding API (e.g. using llama.cpp) instead and let the user to choose and run a model. (Is a webextension already able to do that?)
| Assignee | ||
Updated•3 months ago
|
Updated•3 months ago
|
| Assignee | ||
Comment 3•3 months ago
|
||
For now the local model we use is English-based, it doesn't mishandle latin-based languages, but it's not ready for the rest of the world.
I'll implement a Region/Locale filter to limit the feature for now, and in the future we can extend it once we can get better local models (either multi-language, or specific based on the user history main languages).
| Assignee | ||
Comment 4•2 months ago
|
||
Comment 6•2 months ago
|
||
| bugherder | ||
Updated•2 months ago
|
Description
•