Tokenisation used by Gloda will find "way" or "18" when looking for "2-way" or "B18".
Categories
(Thunderbird :: Search, defect)
Tracking
(Not tracked)
People
(Reporter: evichlaettstar, Unassigned)
Details
Attachments
(10 files, 2 obsolete files)
|
10.80 KB,
image/png
|
Details | |
|
16.61 KB,
image/png
|
Details | |
|
9.65 KB,
application/octet-stream
|
Details | |
|
10.02 KB,
application/octet-stream
|
Details | |
|
16.86 KB,
image/png
|
Details | |
|
16.19 KB,
image/png
|
Details | |
|
16.11 KB,
image/png
|
Details | |
|
16.55 KB,
image/png
|
Details | |
|
15.34 KB,
image/png
|
Details | |
|
15.23 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0
Steps to reproduce:
- I search for the term '2-way'.
- I search for the term 'B18'.
Actual results:
- Search result shows all e-mails containing the term 'way' and the e-mails containing the term '2-way'.
- Search result shows all e-mails containing the term '18' and the e-mails containing the term 'B18'.
Expected results:
- Search result should show only e-mails containing the term '2-way'.
- Search result should show only e-mails containing the term 'B18'.
Comment 1•6 years ago
|
||
That's the global search, right, the one from the box with <Ctrl+K>? Global search is not literal, it tokenises, also for verb forms, so if you look for "look" you'll find "looked", "looks", "looking", etc.
| Reporter | ||
Comment 2•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #1)
That's the global search, right, the one from the box with <Ctrl+K>? Global search is not literal, it tokenises, also for verb forms, so if you look for "look" you'll find "looked", "looks", "looking", etc.
The problem described by me concerns all searches in TB.
Tokenising does not concern the delineated problem.
The problem rather seems to be that the search engine truncates the search term if the search term contains letters and numbers.
Comment 3•6 years ago
|
||
Let's see whether 2-way and B18 when doing a non-global "regular" body search.
Comment 4•6 years ago
|
||
As I said, "regular" search, right-click on folder, "Search Messages...", same as Ctrl+Shift+F or using the quick filter bar (QFB) does not tokenise. Looking for "2-way" or "B18" in the body, I only find the message generated from the previous comment, not messages only containing "way" or "18".
I think I'll close this bug since it works as designed.
| Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #4)
As I said, "regular" search, right-click on folder, "Search Messages...", same as Ctrl+Shift+F or using the quick filter bar (QFB) does not tokenise. Looking for "2-way" or "B18" in the body, I only find the message generated from the previous comment, not messages only containing "way" or "18".
I think I'll close this bug since it works as designed.
You are right if you try to reproduce it with a few simple test e-mails. However, the problem exists indeed. Now I try to find out constraints, maybe a minimum number of e-mails within a folder or characters per e-mail or whatever.
btw:
I already rebuilt the global-messages-db.sqlite. Problem still exists.
Comment 6•6 years ago
|
||
Please understand that TB has TWO search methods:
- "regular" search, right-click on folder, "Search Messages...", same as Ctrl+Shift+F or using the quick filter bar (QFB)
- Global search, the so-called Gloda.
"Regular" search doesn't tokenise and will match strings exactly, so looking for "B18" will NOT find "18".
Gloda search, which uses global-messages-db.sqlite, does tokenise. You can rebuild the database as many times as you like, that won't change how it works. "B18" will be tokenised somehow and will find "18". One reason why we do it that way is given in comment #1. Further reading:
https://developer.mozilla.org/en-US/docs/Mozilla/Thunderbird/gloda#Full-text_search
| Reporter | ||
Comment 7•6 years ago
|
||
https://developer.mozilla.org/en-US/docs/Mozilla/Thunderbird/gloda#Full-text_search
Thank you for the link. Unfortunately I don't really understand this special knowledge. Too many technical terms I cannot translate in my native language German.
If I'm looking for "B18",
Ctrl+Shift+F or QFB finds "18" in my test folder, but it finds only "B18" in the sent folder. Strange.
Now I copied all e-mails from test folder to test folder 2 and test folder 3. It finds only "B18". Same e-mails are in all four folders. That is strange.
Ctrl+K finds only "B18" but only in the sent folder, not in the three test folders. That is strange too.
If I'm looking for "2-way",
Ctrl+Shift+F or QFB finds "way". That's bad.
Ctrl+K finds "way". That's bad.
Comment 8•6 years ago
|
||
Quick search for 2-way only finds 4 messages from this bug.
Comment 9•6 years ago
|
||
Looking for "way" finds many more. So I don't understand why this is behaving differently from yours.
| Reporter | ||
Comment 10•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #8)
Quick search for 2-way only finds 4 messages from this bug.
Maybe you find only 4 messages, because in the other messages is only the word "way". But if the other messages would contain "way" and somewhere else "2", then you had more hits.
Comment 11•6 years ago
|
||
OK, my bugmail folder has 129252 messages. Looking for "2-way" in the body finds six, looking for "way" finds 15549, also matches on "neil@parkwaycc.co.uk", and looking for "2 way" (with a space) finds 14776 messages.
So in your theory "2-way" should have found those 14776 messages?
Just for fun, since I'm one of the developers here, I added some debug here where the search string is matched against the body's content:
https://searchfox.org/comm-central/rev/630f951ef8efd45af34ef07382851a4ab3184d6c/mailnews/base/search/src/nsMsgSearchTerm.cpp#1002
Using QFB, I see "2-way" being compared, if I enter "2 way" with the quotes into the search box, it's actually looking for "2 way" and not "2" and "way" separately. Just was an aside, did you know that "2|way" works for "2" or "way".
"Search Messages", Ctrl+Shift+F, has some other input rules, there you don't need to quote the string to get an exact match.
So I really don't know how your statement from comment #7
Ctrl+Shift+F or QFB finds "18" in my test folder, but it finds only "B18" in the sent folder. Strange.
Ctrl+Shift+F or QFB finds "way". That's bad. [when looking for 2-way]
can be true.
I'm happy for you to attach your test folder, maybe zipped up, then I can take a further look.
| Reporter | ||
Comment 12•6 years ago
|
||
Test folder containing plain text of obviously more than 6 e-mails. But TB shows only 6 e-mails.
| Reporter | ||
Comment 13•6 years ago
|
||
Test folder containing 6 e-mails.
| Reporter | ||
Comment 14•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #11)
So in your theory "2-way" should have found those 14776 messages?
Yes, because that is the behaviour of my TB - just in some folders, as I presume now.
Using QFB, I see "2-way" being compared, if I enter "2 way" with the quotes into the search box, it's actually looking for "2 way" and not "2" and "way" separately.
But in some of my folders it's looking for "2" and "way" separately.
did you know that "2|way" works for "2" or "way".
This is new for me, but does not solve my problem.
"Search Messages", Ctrl+Shift+F, has some other input rules, there you don't need to quote the string to get an exact match.
So I really don't know how your statement from comment #7
Ctrl+Shift+F or QFB finds "18" in my test folder, but it finds only "B18" in the sent folder. Strange.
Ctrl+Shift+F or QFB finds "way". That's bad. [when looking for 2-way]
can be true.I'm happy for you to attach your test folder, maybe zipped up, then I can take a further look.
I attached 2 zipped folders "testordner.zip" and "testordner 3.zip", wherein TB shows me the same 6 e-mails. But both zip archives have different sizes. I looked in the plain text of archive "testordner.zip" and see the text of more than 6 e-mails. What's going on in my TB 60.7.1?
Comment 15•6 years ago
|
||
You may want to repair those folders, right-click, Properties, "Repair Folder".
Testordner has 19 messages, "B18" isn't found, "18" is found various times, "2-way" is found in one message. Testordner_3 has three messages, "B18" is found once.
Overall, I can't see any problem.
| Reporter | ||
Comment 16•6 years ago
|
||
Test folder A for search filtering
| Reporter | ||
Comment 17•6 years ago
|
||
Test folder B for search filtering
| Reporter | ||
Comment 18•6 years ago
|
||
| Reporter | ||
Comment 19•6 years ago
|
||
| Reporter | ||
Comment 20•6 years ago
|
||
| Reporter | ||
Comment 21•6 years ago
|
||
| Reporter | ||
Comment 22•6 years ago
|
||
| Reporter | ||
Comment 23•6 years ago
|
||
| Reporter | ||
Comment 24•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #15)
You may want to repair those folders, right-click, Properties, "Repair Folder".
Testordner has 19 messages, "B18" isn't found, "18" is found various times, "2-way" is found in one message. Testordner_3 has three messages, "B18" is found once.
Overall, I can't see any problem.
Many apologizes, as my test setup was not correct.
Unfortunately it is now impossible to export the "Testordner" again. I tried it several times with Add-on ImportExportTools, no way. All other folders can be exported. Weird.
Next try.
I created Test Folder A. In this folder I moved all incoming test e-mails.
I created Test Folder B. In this folder I copied all e-mails from Test Folder A.
As you can see in my 6 screenshots the search filtering results differs between both Test Folders. Can you reproduce this behaviour?
Comment 25•6 years ago
|
||
I really don't have more time to invest into this. As I said before, most likely your folders need repair. Furthermore, I don't trust ImportExportTools, it's better just to grab the folder file off the file system.
Since you gave me the folders without the .msf file, adding them to my profile rebuilds the index, which is what a repair would do.
I see this:
Folder A has six messages, and folder B has 12 messages, they are all doubled-up.
"B18" will hit will hit "Testmail 2", once in A and twice in B. "2-way" hits "Noch ein Test 1", again, once in A and twice in B. I think this is the desired outcome.
| Reporter | ||
Comment 26•6 years ago
|
||
OK. Deleting all msf-files didn't help. So I deleted all messages in TB's local IMAP folders. Afterwards I reloaded all messages from the IMAP server. Now the search filters work as you described.
Sorry for misallocation of my problem. Now I know that obviously I have to empty all my local IMAP folders from time to time in order to be able doing correct searches. Thanks for your effort and your leading to the solution of my search filter problem.
Description
•