Open Bug 1501358 Opened 2 years ago

Improve nsMsgBodyHandler::StripHtml()

Categories

(Thunderbird :: Filters, enhancement)

38 Branch
x86
Windows Vista
enhancement
Not set
normal

Tracking

(Not tracked)

People

(Reporter: jorgk-bmo, Unassigned)

References

(Blocks 1 open bug)

Details

+++ This bug was initially created as a clone of Bug #1230815 +++

nsMsgBodyHandler::StripHtml() doesn't handle closing angle brackets in strings, like for example:
<tag attr1="xx>" attr2="xxx">

It also doesn't compact insignificant white-space, so in
<body>
xyz  huhu
</body>
the search term "xyz huhu" isn't found.

Instead of writing yet another partial HTML parser we might consider passing the HTML to Gecko for parsing and then serialising it out again, even as plain text. Sample code here:
https://dxr.mozilla.org/comm-central/rev/2a29ee0adb310b54a6a2df72034953fed8f2b043/comm/mailnews/mime/src/mimeTextHTMLParsed.cpp#91

That's of course not going to make body search faster. It has also been a long term plan to pass messages through MIME during the search, so if we do that, we could also keep this issue here in mind.
You need to log in before you can comment on or make changes to this bug.