Open Bug 532636 Opened 15 years ago Updated 6 years ago

searching for text in quoted emails (replies and forwards) yields no search hits in results (Search all messages/Gloda Global Search fails)

Categories

(Thunderbird :: Search, defect)

x86
Windows 7
defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: haihai001, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: testcase, Whiteboard: [datalossy])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2b5pre) Gecko/20091202 Namoroka/3.6b5pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.5) Gecko/20091202 Lightning/1.0b1pre Shredder/3.0.1pre

If I search for text which was quoted in an email, nothing is found.

Reproducible: Always

Steps to Reproduce:
1. Answer an old email to yourself. Send it as an answer to the same adress.
2. You get the email with the old text as a quote.
3. Now try to search for something in the quoted text.
Haichen could be more accurate, please? What is the exact words that you search?
Can you share the email too ?

How did you search , using search all messages or via the message body filter ?
Component: General → Search
QA Contact: general → search
testmail:
--------------
Am 03.12.2009 12:42, schrieb haichen:
>
> Suchtest currywurst
>
>

bratwurst
-------------
If you try to search the word "bratwurst" with globalsearch it is found.
neither "currywurst" or "suchtest" will be found.

That always happened with quoted mails. You can't find anything that's in a quote with search all messages (global search)

hopefully my poor english makes clear what i want to say.
Attached file testcase
It happeans too me.

STR (step to reproduce):
1. add email testcase to your inbox.
2. search words as in comment #3;
3. searching with filters has success, search all messages don't work (I used glodaquilla, the mail has gloda Id).

Ludo can you confirm too?

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091203 Lightning/1.0b1pre Shredder/3.0.1pre ID:20091203032015
Keywords: testcase
And you searched both words , as one as a Capital letter ...
1. searching Suchtest   (all messages): don't work
2. searching currywurst (all messages): don't work
3. searching bratwurst  (all messages): work

it seems related to Content-Transfer-Encoding: 7bit because I cand find words in my bugzilla folder that isn't Content-Transfer-Encoding: 7bit.
Another bug that described a fails of gloda when Content-Transfer-Encoding: 7bit is bug #481616, but isn't a dupe of this.

haichen could you also confirm that this is related to Content-Transfer-Encoding: 7bit?

Aniway is confirmed here and I can't find any dupe at this moment.
Status: UNCONFIRMED → NEW
Ever confirmed: true
No. I can find a mail where quotes are unsearchable without Content-Transfer-Encoding: 7bit

To:  <...@hotmail.com>
Content-Type: multipart/alternative; boundary=001636498c8f97913c0464af8d9e
Return-Path: ...@googlemail.com
X-OriginalArrivalTime: 09 Mar 2009 13:24:26.0525 (UTC) FILETIME=[5E6BBCD0:01C9A0BA]

--001636498c8f97913c0464af8d9e
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Attachment #415872 - Attachment mime type: message/rfc822 → text/plain
asuth, did I dream, that you had told me quoted material intentionally isn't indexed?
That was real.  The time when I told you unicorns can teleport was a dream.
I tried it again on Mozilla/5.0 (Windows NT 6.1; rv:2.0b8pre) Gecko/20101129 Thunderbird/3.3a2pre and Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.14pre) Gecko/20101129 Lightning/1.0b3pre Lanikai/3.1.7pre.
Does not work.
Blocks: glodafailtracker
No longer blocks: qfasfailtracker
Summary: searching for text in quoted emails yields no result → searching for text in quoted emails yields no result (Search all messages/Gloda Global Search)
Am I reading this right.

This bug is having a long sleep because the failure to generate a result is intentional (Per asuth)
(In reply to Matt from comment #12)
> This bug is having a long sleep because the failure to generate a result is
> intentional (Per asuth)

Looking back at my response, I realize that my response to Wayne definitely did not provide enough background.  I think this was because I've explained the rationale elsewhere before, but it bears repeating in case someone wants to address this bug since I am having trouble finding that:

Quoted text is intentionally stripped from what we insert into the full-text index, yes.  This was motivated by 3 desires; we wanted to:
1) Bound our disk usage.
2) Provide a snippet of what the author of a message wrote in their message.  The "experimental toolbar" that drove a lot of gloda development used this to display snippets of what a message contained, and it wanted what was written, not what was quoted.
3) Avoid having a message that is along the lines of "I agree" out-score the message it is in reply to because it is more recent and otherwise has the same score.

Desire #3 could be addressed by having the quoted text exist in a separate column that uses a decreased weighting factor.  Desire #1 could be partially addressed by only processing text quoted to some specific level of depth (like just 1 level).  There are also other possible solutions if the quoted messages is available and a post-search integration step can look at messages in reply to that message.


In regards to your explicit question, I do not believe anyone is actively working on the global database as of late.  However, I am available to provide guidance and reviews if anyone wants to provide patches to improve any aspect of the global database.
I'm relatively new to using Thunderbird as my main e-mail client.  This feature is causing some usability issues for new users like me, in my opinion.  See the question that I recently sent regarding this topic:

https://support.mozilla.org/en-US/questions/991577

So to summarize, today there are 3 ways to search messages in Thunderbird, from most global, to most specific:

1. Edit/Find/Search Messages (CTRL-SHIFT-F)
2. Global Search (CTRL-K)
3. Quick Filter (CTRL-SHIFT-K)

The problem from a usability perspective, in my humble opinion, is that Global Search is very visible in the toolbar and the user's guide hints that it will search "everywhere" (https://support.mozilla.org/en-US/kb/global-search?esab=a&s=search&r=1&as=s).  It is not easy at all to find information about this "feature" that doesn't index quoted messages, and honestly, even if it was crystal clear from the doc, we shouldn't have to read documentation in the first place to understand how to use a search bar...

It saddens me to say that before I got my answer from my question above, I reluctantly had to go back to Outlook to find my message.  Outlook has a similar "global search" field, and it found the search string in quoted text immediately...

How about this?  Add a checkbox option next to the global search field to "include search in quoted text (slower)".  When enabled, instead of using indexed search, perform a conventional full-body search in the whole mailbox (equivalent to the search performed by Edit/Find/Search Messages).  That would not require solving the indexing challenges reported by Andrew in comment #13, and would shield the user from having to know about these technical limitations.  From an end-user perspective, it would "just work" and find messages...
(In reply to yves.canty from comment #14)

> This feature is causing some usability issues for new users like me, in my opinion.

I agree, and not just for new users! The main problem is the 'help', which states "The search is performed in all fields". It doesn't say that quoted text isn't searched. It took me an hour or two to work out why the 'global search' was giving fewer results than the menu 'find'. Finding the problem was made more difficult by the fact that very large numbers of results are missed, without warning, while the database is being generated after activating 'global search', which took about 1.5 hours on my computer. This isn't mentioned in the 'help' either.
I was just unable to find a forwarded message that had content only in the forwarded piece :(

I can guess there are some classes of users for whom this might be frequent or even the majority of their messages. Their search experience would be really bad.

http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/modules/connotent.js#119
Summary: searching for text in quoted emails yields no result (Search all messages/Gloda Global Search) → searching for text in quoted emails (replies and forwards) yields no search hits in results (Search all messages/Gloda Global Search fails)
Whiteboard: [datalossy]
(In reply to yves.canty from comment #14)
[...]
> 
> How about this?  Add a checkbox option next to the global search field to
> "include search in quoted text (slower)".  When enabled, instead of using
> indexed search, perform a conventional full-body search in the whole mailbox
> (equivalent to the search performed by Edit/Find/Search Messages).  That
> would not require solving the indexing challenges reported by Andrew in
> comment #13, and would shield the user from having to know about these
> technical limitations.  From an end-user perspective, it would "just work"
> and find messages...

I agree with your strategy. As for the precise solution, for Ctrl+Shift+F this could be fixed by replacing the "Body" option with "Body (excluding quotations)" and "Body (including quotations)". I suppose this will not be implemented immediately, but I strongly recommend renaming "Body" to "Body (excluding quotations)" ASAP.

It appears that this also affects the line introducing a quotation ("Foo Bar wrote:"). If so, I was most likely hit by this bug, since I used Search all messages to cleanup my mailbox from mailing list threads in which I was not involved by searching for mails sent to the mailing list in which I was not the sender or an explicit recipient, and which did not contain my names in their Body. I deleted the 10K mails found, so I agree with the "datalossy" attribute.
The issue I just reported may simply result from bug #1280840.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: