Closed
Bug 1509493
Opened 6 years ago
Closed 6 years ago
Parsing bug leads to duplication of characters in plain-text emails
Categories
(Core :: Networking, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla65
Tracking | Status | |
---|---|---|
firefox65 | --- | fixed |
People
(Reporter: fabian, Assigned: fabian)
Details
(Whiteboard: [necko-triaged])
Attachments
(4 files, 5 obsolete files)
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36 Steps to reproduce: View a plain-text mail with the following body content in Thunderbird's mail view: ``` johndoe@gmail.com)@gmail.com ``` Actual results: The content of the mail is shown as: ``` johnjohndoe@gmail.com)@gmail.com ``` where `john` is a hyperlink with target `mailto:johnjohndoe@gmail.com` and the rest of the body is a hyperlink with target `mailto:johnjohndoe@gmail.com)@gmail.com`. When inspecting the generated HTML markup with a debugger, the anchor tag for the first hyperlink is left dangling. Expected results: The plain-text part of the content as shown by Thunderbird should agree with the actual plain-text content of the mail. Thunderbird may add some hyperlink markup with mailto: links, but it should not change the textual representation of the mail. This behavior occurs because `mozTXTToHTMLConv.cpp::CalculateURLBoundaries` incorrectly calculates the boundaries of URLs/mail addresses that are nested (as in the example). In order to avoid loops, it calculates the boundaries of the prefix without taking URLs into account. Thus, when the second `@` character is encountered, it cuts into the HTML markup that had already been inserted when processing the address `johndoe@gmail.com`. Since the markup generated from this address ends in `johndoe@gmail.com</a>` and `</a>` is four characters long, this results in the first four characters of the address remaining together with the opening anchor tag.
Assignee | ||
Updated•6 years ago
|
OS: Unspecified → All
Hardware: Unspecified → Desktop
Comment 1•6 years ago
|
||
Confirmed. Grrr, that damn mozTXTToHTMLConv again :-( - That code is actually in Mozilla core. Since you looked at it, can you submit a fix? Also please add a unit test to test_mozTXTToHTMLConv.js (that's the easy bit).
Status: UNCONFIRMED → NEW
Component: Message Reader UI → Networking
Ever confirmed: true
Product: Thunderbird → Core
Version: 60 → 60 Branch
Assignee | ||
Comment 2•6 years ago
|
||
I think that the current single-pass implementation with its recursive calls is hard to get right and reason about. I would propose switching to the following two-pass algorithm, which should not require too many changes: 1. Iterate over the plain-text message as before and store start, end and replacement string, but don't make any changes yet. 2. Sort the triples of start, end and replacement string first by start, then by (end - start). 3. Go through the string and greedily apply non-overlapping replacements in the order given by 2. What do you think?
Comment 3•6 years ago
|
||
Not my code or responsibility. Original author: BenB, current reponsibility: Core::Netwerk people. You can attach a patch with a test case and someone will take a look in due course. I'm just the poor sod reading most bug reports and getting them channelled the right way (and fix some if I can).
Flags: needinfo?(ben.bucksch)
Comment 4•6 years ago
|
||
Bug confirmed. Minor severity, because this is malformed to start with. Garbage in, garbage out. Fabian, if you have a small and specific patch that does not change the existing logic (which might break other cases), then great.
Severity: normal → minor
Flags: needinfo?(ben.bucksch)
Comment 5•6 years ago
|
||
(It would be great if someone from the TB team just took the component and rewrote it with mozilla::Tokenizer)
Comment 6•6 years ago
|
||
That comment looks like a déjà vu ;-) - See bug 1274242 comment #22 from May 2016.
Assignee | ||
Comment 7•6 years ago
|
||
The bug can also be triggered by more natural strings like: (johndoe@gmail.com)johndoe@gmail.com Log messages containing such substrings showed up in log files of an email server I maintain, and led to plenty of confusion when sent to Thunderbird users. The CalculateURLBoundaries function contains a comment /* prevent loops */ which supposedly explains why ScanTXT has to be called without URL checks enabled. Does anybody have a concrete example for why this is needed? If not, then my first attempt at a fix would just be to call the full ScanTXT, but if this would cause a regression, it would likely be observed in terms of degraded performance and thus not caught by the tests.
Flags: needinfo?(ben.bucksch)
Assignee | ||
Comment 8•6 years ago
|
||
Assignee | ||
Comment 9•6 years ago
|
||
Depends on D13389
Assignee | ||
Comment 10•6 years ago
|
||
In mozTXTToHTMLConv, FindURL is not able to correctly calculate replaceBefore for nested email addresses/URLs such as john@doe.org}john@doe.org. As a workaround, we keep track of the end of the last URL HTML markup in the output string and skip any subsequent URLs whose replaceBefore would cut into this markup. Depends on D13390
Updated•6 years ago
|
Flags: needinfo?(ben.bucksch)
Attachment #9028597 -
Flags: review+
Comment 11•6 years ago
|
||
Comment on attachment 9028598 [details] Bug 1509493 - Do not include unmatched ')' in email addresses r+ after a comment change
Attachment #9028598 -
Flags: review+
Comment 12•6 years ago
|
||
Comment on attachment 9028599 [details] Bug 1509493 - Fix markup generation for nested email addresses r+ after a comment change
Attachment #9028599 -
Flags: review+
Comment 13•6 years ago
|
||
Sadly we don't have approval rights in Core::Netwerk. Valentin or Honza, can you please take a look.
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(honzab.moz)
Comment 14•6 years ago
|
||
You want to land this stuff? If so, then you only need to add the checkin-needed keyword.
Flags: needinfo?(honzab.moz)
Assignee | ||
Updated•6 years ago
|
Keywords: checkin-needed
Comment 15•6 years ago
|
||
Hold on, I can land it, but it needs review by a Core::Netwerk peer. Sure, BenB is the original author (and the patches look good to me as well), but formally they need approval by an M-C peer.
Flags: needinfo?(honzab.moz)
Keywords: checkin-needed
Updated•6 years ago
|
Assignee: nobody → fabian
Status: NEW → ASSIGNED
Comment 16•6 years ago
|
||
I've reviewed changes to this component since decades :).
Comment 17•6 years ago
|
||
you can land this. Thanks.
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(honzab.moz)
Comment 18•6 years ago
|
||
OK, but somehow Lando is declining landing this, something not accepted in https://phabricator.services.mozilla.com//D13391.
Comment 19•6 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #18) > OK, but somehow Lando is declining landing this, something not accepted in > https://phabricator.services.mozilla.com//D13391. For some reason it appears as one parent revision with two children. You can land https://lando.services.mozilla.com/D13390/ first, then then https://phabricator.services.mozilla.com/D13391/
Assignee | ||
Comment 20•6 years ago
|
||
My bad, I messed up the hierarchy. It should have been reverted to the default now.
Comment 21•6 years ago
|
||
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/integration/autoland/rev/5e4a5cf81c26 Provide basic test coverage for email address parsing r=BenB https://hg.mozilla.org/integration/autoland/rev/6498765e1d65 Do not include unmatched ')' in email addresses r=BenB https://hg.mozilla.org/integration/autoland/rev/05562b7d3eff Fix markup generation for nested email addresses r=BenB
Comment 22•6 years ago
|
||
Landed. Fabian, since you're an expert in this code now, maybe had over to bug 1505911. Feel free to adopt my test enhancements and fix the code, it shouldn't be hard.
Comment 23•6 years ago
|
||
Backed out for failing win xpcshell Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=214642922&revision=05562b7d3effc75bba30fb7f02175162f1484816 Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=214638891&repo=autoland&lineNumber=13011 Backout: https://hg.mozilla.org/integration/autoland/rev/a0e0ae690520f845febd99f243e7910fda287aad
Flags: needinfo?(fabian)
Assignee | ||
Comment 24•6 years ago
|
||
The test failure is on Windows 7: 17:43:10 INFO - Unexpected conversion by scanTXT: input=RFC1738: <URL:mailto:john.doe+test@mozilla.org> then, output=RFC1738: <URL:mailto:john.doe+test@mozilla.org> then, link= href="mailto:john.doe+test@mozilla.org"` The mailto: link is not recognized. The only potentially OS dependent part of the code seems to be ShouldLinkify, is it known to behave differently under Windows?
Flags: needinfo?(fabian) → needinfo?(ben.bucksch)
Comment 25•6 years ago
|
||
I think you're on very dangerous ground here. Looks like the test only failed on Win7 and not Win10. Running it locally in a TB binary it passes on Win10, but then TB also provides a mailto handler. For FF the handler should be external https://searchfox.org/mozilla-central/rev/e22c0d152060f4f8d4ca8904094f15f65a1b6f93/browser/app/profile/firefox.js#658 and who knows what |rv = externalHandler->ExternalAppExistsForScheme(scheme, &exists);| will return. I think your best bet is to exclude Windows from the test using AppConstants.platform == "win".
Comment 26•6 years ago
|
||
> The only potentially OS dependent part of the code seems to be ShouldLinkify, is it known to behave differently under Windows?
ShouldLinkify() checks whether the concrete URL scheme is valid and can be handled, based on which URL handlers are installed on the user's system. This is an important feature. This way, we don't offer links that won't work when clicked, and we can support syntactically valid URLs of schemes we don't know yet, be it tel: or skype: or what have you.
What's surprising that it fails for mailto:. That shouldn't fail, and I've never heard of that. It might be because of the build machine setup, that it doesn't have a mail app installed, and Thunderbird OS integration hasn't run, because this is just unit tests. So, this is not a realistic case.
Still, I think it would make sense to hardcode "mailto", "http" and "https" and OK these 3 schemes, and not look up URL scheme handlers for them, and just return OK for them. This would be in ShouldLinkify() after the call to ExtractScheme().
Flags: needinfo?(ben.bucksch)
Comment 27•6 years ago
|
||
(In reply to Ben Bucksch (:BenB) from comment #26) > Still, I think it would make sense to hardcode "mailto", "http" and "https" > and OK these 3 schemes, ... What happens if the code runs in FF and there is no mailto handler on the system?
Assignee | ||
Comment 28•6 years ago
|
||
> I think you're on very dangerous ground here. Looks like the test only failed on Win7 and not Win10. Running it locally in a TB binary it passes on Win10, but then TB also provides a mailto handler. This is probably because Win10 ships with a mail application, whereas Win7 does not. > I think your best bet is to exclude Windows from the test using AppConstants.platform == "win". I feel a bit uneasy about this. Why should we be able to assume that other platforms will continue to offer the "mailto:" scheme by default? > What happens if the code runs in FF and there is no mailto handler on the system? Based on my understanding of the code, this would simply amount to rendering a link with an unsupported scheme, which an HTML mail could do anyway. Would this be considered an issue in certain situations? When clicked, a link with an unregistered scheme leads to a "The address wasn’t understood" error message in Firefox, which sounds reasonably descriptive. > Still, I think it would make sense to hardcode "mailto", "http" and "https" and OK these 3 schemes, and not look up URL scheme handlers for them, and just return OK for them. This would be in ShouldLinkify() after the call to ExtractScheme(). I would prefer this solution, as it will guarantee consistent unit test behavior even in the face of potential platform changes in the future.
Assignee | ||
Comment 29•6 years ago
|
||
Assignee | ||
Comment 30•6 years ago
|
||
Assignee | ||
Comment 31•6 years ago
|
||
Updated•6 years ago
|
Attachment #9028910 -
Attachment is obsolete: true
Updated•6 years ago
|
Attachment #9028911 -
Attachment is obsolete: true
Updated•6 years ago
|
Priority: -- → P2
Whiteboard: [necko-triaged]
Comment 32•6 years ago
|
||
Please rebase all patches here. M-C reformatted all 10,000+ C++ files in the tree in bug 1511181 and also bug 1505911 caused conflicts on the test file.
Assignee | ||
Comment 33•6 years ago
|
||
Depends on D13511
Assignee | ||
Comment 34•6 years ago
|
||
Depends on D13644
Assignee | ||
Comment 35•6 years ago
|
||
In mozTXTToHTMLConv, FindURL is not able to correctly calculate replaceBefore for nested email addresses/URLs such as john@doe.org}john@doe.org. As a workaround, we keep track of the end of the last URL HTML markup in the output string and skip any subsequent URLs whose replaceBefore would cut into this markup. Depends on D13645
Assignee | ||
Comment 36•6 years ago
|
||
I have rebased the commits, creating new Phabricator diffs for the three commits that had already been submitted for landing previously. Possibly due to the changes in bug 1505911 (change to the signature of ScanTXT), moz build currently fails for me and I haven't been able to run the tests locally.
Comment 37•6 years ago
|
||
Valentin, in https://phabricator.services.mozilla.com//D13511 they suggest to add this code: nsAutoCString scheme; if (scheme == "http" || scheme == "https" || scheme == "mailto") { return true; I'm really surprised that that even compiles. Can you help my confusion? (In reply to Fabian Henneke from comment #36) > Possibly due to the changes in bug 1505911 (change to the signature of > ScanTXT), moz build currently fails for me and I haven't been able to run > the tests locally. Well, FF and TB compile so what's the problem if you rebased your patches properly?
Flags: needinfo?(valentin.gosu)
Comment 38•6 years ago
|
||
I've applied D13645.patch D13646.patch D13644.patch D13511.patch and it all compiles and |mach xpcshell-test netwerk/test/unit/test_mozTXTToHTMLConv.js| passes. I thinks it's good to go after we clarify the string comparison issue.
Assignee | ||
Comment 39•6 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #38) > I've applied > D13645.patch > D13646.patch > D13644.patch > D13511.patch > and it all compiles and |mach xpcshell-test > netwerk/test/unit/test_mozTXTToHTMLConv.js| passes. I thinks it's good to go > after we clarify the string comparison issue. Never mind, I forgot to also hg pull /comm. Builds and passes tests for me now. If you don't mind, please submit for a try run in my stead, as I don't have the relevant commit access yet.
Comment 40•6 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #37) > Valentin, in https://phabricator.services.mozilla.com//D13511 they suggest > to add this code: > nsAutoCString scheme; > if (scheme == "http" || scheme == "https" || scheme == "mailto") { > return true; > I'm really surprised that that even compiles. Can you help my confusion? I suspect one of the string implementations has an operator== defined, but I can't find it right now. But I did find another similar instance so this should be fine: https://searchfox.org/mozilla-central/rev/2edebf41a2b2e624859bf15ed5a98bdd002f73b4/dom/cache/DBSchema.cpp#2486
Comment 41•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c53f812b81ac263bd0cedc3afcca9907bbf5a8dd
Flags: needinfo?(valentin.gosu)
Comment 42•6 years ago
|
||
All green. Ben can you stick the last r+ on and I'll re-land. Fabian, I hope the dependencies are set correctly.
Updated•6 years ago
|
Attachment #9028883 -
Flags: review+
Updated•6 years ago
|
Attachment #9029207 -
Flags: review+
Comment 43•6 years ago
|
||
Comment on attachment 9029206 [details] Bug 1509493 - Provide basic test coverage for email address parsing Sorry that I missed one
Attachment #9029206 -
Flags: review+
Updated•6 years ago
|
Attachment #9029208 -
Flags: review+
Comment 44•6 years ago
|
||
Comment on attachment 9028597 [details] Bug 1509493 - Provide basic test coverage for email address parsing Fabian, you can mark supercedes patches with the "obsolete" flag in the attachment details, like I'm doing here.
Attachment #9028597 -
Attachment is obsolete: true
Updated•6 years ago
|
Attachment #9028598 -
Attachment is obsolete: true
Updated•6 years ago
|
Attachment #9028599 -
Attachment is obsolete: true
Comment 45•6 years ago
|
||
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/integration/autoland/rev/51e67c682e39 Always linkify common URL schemes in plain text r=BenB https://hg.mozilla.org/integration/autoland/rev/b4808c4e7d61 Provide basic test coverage for email address parsing r=BenB https://hg.mozilla.org/integration/autoland/rev/00d7ff2eeb0a Do not include unmatched ')' in email addresses r=BenB https://hg.mozilla.org/integration/autoland/rev/95e75223f6c7 Fix markup generation for nested email addresses r=BenB
Comment 46•6 years ago
|
||
Looking at this https://hg.mozilla.org/integration/autoland/rev/95e75223f6c7#l2.14 again: + input: "bug#1509493 (john@mozilla.org)john@mozilla.org test", + url: "mailto:john@mozilla.org", + text: "john@mozilla.org" Would it have been better to have two different e-mail addresses here? Which one does it pick for URL and text?
Comment 47•6 years ago
|
||
Not really. If it linkyfies only one of them, or both, that's both correct behavior. What's wrong is if it takes one of the () into the linked text. That was the bug, and that's what the test checks. Insofar, it's a good test for this specific bug, without producing false positives.
Comment 48•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/51e67c682e39 https://hg.mozilla.org/mozilla-central/rev/b4808c4e7d61 https://hg.mozilla.org/mozilla-central/rev/00d7ff2eeb0a https://hg.mozilla.org/mozilla-central/rev/95e75223f6c7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in
before you can comment on or make changes to this bug.
Description
•