Closed Bug 1310467 Opened 4 years ago Closed 4 years ago

Crash on viewing a particular message (possible memory corruption)

Categories

(MailNews Core :: LDAP Integration, defect)

defect
Not set
critical

Tracking

(thunderbird_esr4551+ fixed, thunderbird52+ fixed, thunderbird53 fixed, thunderbird54 fixed)

RESOLVED FIXED
Thunderbird 54.0
Tracking Status
thunderbird_esr45 51+ fixed
thunderbird52 + fixed
thunderbird53 --- fixed
thunderbird54 --- fixed

People

(Reporter: garming, Assigned: douglas.bagnall)

References

Details

(4 keywords)

Crash Data

Attachments

(5 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Build ID: 20161010144024

Steps to reproduce:

Default fresh install of Thunderbird 45.4 on Ubuntu 16.04 (or 14.04):

1. Receive (or import message) into Inbox
2. Click on dodgy message
3. Click on a different message
4. Click on the dodgy message a second time

In some cases clicking off the message or switching views is unnecessary in the crash. Other times it can take a few clicks. Forwarding with this message attached also triggers the crash.


Actual results:

Thunderbird dies with SIGILL or SIGSEGV, indicating some form of memory corruption, possibly a use after free.

gdb generates a number of different backtraces (mostly in GC, cycle collector code - example attached).

The critical part of the faulty message appears to be in the patch file attached. The crash appears to occur on viewing, and is independent of mail protocols. The crash can be reproduced with a just this message in a local folder.


Expected results:

Thunderbird should not have crashed.
Attached file gdb.txt
An example btfull while gdb attached
I could not reproduce using your testcase

A couple crashes reported with this signature - 
Bug 1280186 - Crash in DoMarking<T>
Bug 1226270 - crash in void DoMarking<T>

Frankly reminds me of bug Bug 1211845/Bug 1016524 - crash in MimeMultipart_parse_eof with multipart/related.
Severity: normal → critical
Keywords: crash
I've just wrangled another VM which just has Ubuntu 15.10 on it, installed https://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-52.0a1.en-US.linux-x86_64.tar.bz2 and the crash occurs upon clicking the dodgy message in a local folder on its own. Clicking off the message to another folder then clicking on it again reproduces the crash. So it looks like it still affects the most recent branches.
Double clicking the message can also sometimes trigger the crash.
When I open the attachment above ("Patch to pass a Windows RODC test.eml") in Thunderbird (as offered by Firefox) it instantly crashes.

Two crash reports say [@ @0x0 | CanonicalizeXPCOMParticipant ]
while a third says [ @ js::UncheckedUnwrap ].
(In reply to Douglas Bagnall from comment #5)
> When I open the attachment above ("Patch to pass a Windows RODC test.eml")
> in Thunderbird (as offered by Firefox) it instantly crashes.
> 
> Two crash reports say [@ @0x0 | CanonicalizeXPCOMParticipant ]
> while a third says [ @ js::UncheckedUnwrap ].

Thank you for commenting.  Please also post all your crash IDs (as text, not an image) listed at help | troubleshooting.
Thunderbird 45.3:
https://crash-stats.mozilla.com/report/index/ad582196-656e-4943-bf6f-32ce92161018

Thunderbird 52.0a1:
https://crash-stats.mozilla.com/report/index/1d50af95-18c6-424f-9221-eb29a2161018

I've got a box in our cloud with ssh access where you can view see the crash:

ssh -X ubuntu@150.242.43.127
Password: EoFvo2chVpoH

For the Ubuntu package Thunderbird 45.3:

`thunderbird Patch\ to\ pass\ a\ Windows\ RODC\ test.eml`

For an unpacked version of https://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-52.0a1.en-US.linux-x86_64.tar.bz2:

`./thunderbird/thunderbird Patch\ to\ pass\ a\ Windows\ RODC\ test.eml`
I notice the daily crash has MimeInlineText_parse_eof in its backtrace, so it does seem to be a multipart issue like you mentioned.
(In reply to Garming Sam from comment #8)
> I notice the daily crash has MimeInlineText_parse_eof in its backtrace, so
> it does seem to be a multipart issue like you mentioned.

indeed, bp-1d50af95-18c6-424f-9221-eb29a2161018 as that frame

thanks so much for the information in comment 7. In my case, I got 
* using File | Open Saved Message  ShouldMarkCrossCompartment in bp-191ceba9-2dd9-4e55-973b-a32082161018
* thunderbird Patch...   jemalloc_crash | arena_salloc | je_realloc | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | mozilla::dom::workers::WorkerPrivate::AddHolder bp-cb6094aa-9d72-48c0-873f-c5d202161018

Now, neither of these have MimeInlineText_parse_eof or anything to do with mime on the stack. But it reliably crashes and I think you are on to something.

Hopefully jorgk, kent or magnus can suggest a next step
Blocks: 1016524
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: IMAP
Ever confirmed: true
Keywords: testcase
Product: Thunderbird → MailNews Core
Version: 45 Branch → 45
I had to use mobaxterm on Windows. After logging in I did 
  ./thunderbird/thunderbird Patch  (just use escape to let it autocomplete the file name)
Keywords: reproducible
Looking at the stack, I don't see how mime would really be the cause no. A string bug perhaps?
Really need to run this in an ASAN build (which I don't think we have for Thunderbird) or under Valgrind. Short of that, do any interesting assertions pop up on a debug build? Calling this sec-moderate because of the user-interaction required: memory corruption is bad but there isn't much chance for an attacker to groom memory in this scenario.
Keywords: sec-moderate
(In reply to Magnus Melin from comment #11)
> Looking at the stack, I don't see how mime would really be the cause no. A
> string bug perhaps?

could be.

my two crashes were
 ShouldMarkCrossCompartment   bp-191ceba9-2dd9-4e55-973b-a32082161018
 jemalloc_crash | arena_salloc | je_realloc | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | mozilla::dom::workers::WorkerPrivate::AddHolder bp-cb6094aa-9d72-48c0-873f-c5d202161018
> user-interaction required

I am not sure if that is exactly right. The bug can be triggered if you are sent this email.

I guess the extent to which you can groom memory depends on the extent to which this particular attachment is uniquely special rather than being a representative of a large class.

These are my crashes:

bp-7b79df9d-df4e-41ad-9c03-6607d2161017
bp-3dcc8257-9445-4bd9-ae9a-a5b7f2161017
bp-611599a1-fc68-4dfd-a15a-545792161017
bp-776e51ba-7bce-411f-98c0-2f34f2160920
bp-250b0f26-5941-47ff-80ed-b33822160920
(In reply to Daniel Veditz [:dveditz] from comment #12)
> user-interaction required
I would be just a little skeptical about this. There are particular cases where I've seen the number of unread junk mails surreptitiously go down one (although I guess that might be the use of an unrelated bug/feature). I would hope that the junk mail filter wouldn't trigger this reading through body text, but I suppose that also requires at least some configuration too.

If it is user-interaction required, I think it's quite fair to say it's quite a limited user-interaction. You don't necessarily need to click the email to view it, e.g. deleting the email above or possibly switching mail folders in some situations. It's also quite easy to look innocuous because you only have the sender and subject to rely on. 

I've now done some additional testing, and it does not seem to be the case that you need the attachment (so you couldn't simply think that any email with an attachment is not ok). You can send the patch contents inline and trigger the same crash, which appears to indicate a string bug. Simply writing a draft message with the contents, and click Save over and over again appears to be enough to trigger the crash. As long as Thunderbird is searching the message, or indexing it, I think this will probably cause an issue.

Regardless, I believe that how severe the issue depends mostly on how much the patch contents influences the crashes, not on how easy it is to trigger the crash (because it seems fairly trivial, and not outside the typical workflow of a user). If the contents do not have much sway, then I would easily concede that user interaction is a clear prerequisite.
Playing around with rr, Robert O'Callahan's reversible debugger, I've managed to isolate a cause from a particular trace with a null pointer fault.

Program received signal SIGSEGV, Segmentation fault.
0x00007feb82585f9a in NS_LogCOMPtrRelease (aCOMPtr=0x7feb523cf3c0, aObject=0x7feb51c3b900)
    at /home/garming/comm-central/mozilla/xpcom/base/nsTraceRefcnt.cpp:1262
1262	  void* object = dynamic_cast<void*>(aObject);
(rr) up
#1  0x00007feb8303b74a in nsCOMPtr<nsIContent>::~nsCOMPtr (this=0x7feb523cf3c0, __in_chrg=<optimized out>)
    at /home/garming/comm-central/obj-x86_64-pc-linux-gnu/dist/include/nsCOMPtr.h:402
402	    NSCAP_LOG_RELEASE(this, mRawPtr);

(rr) watch -l mRawPtr
Hardware watchpoint 1: -location mRawPtr
(rr) reverse-cont
Continuing.

Hardware watchpoint 1: -location mRawPtr

Old value = (nsIContent *) 0x7feb51c3b900
New value = (nsIContent *) 0x7feb51c3b9c0
nsldapi_hex_unescape (s=0x7feb523cf3c6 "", 
    s@entry=0x7feb523cf3a0 "gar-mas-0-0.", '\344' <repeats 14 times>, "\300\271\303Q\353\177\300\271\303Q\353\177")
    at /home/garming/comm-central/ldap/c-sdk/libraries/libldap/unescape.c:71
71		*p = '\0';
(rr) up
#1  0x00007feb905543ca in nsldapi_url_parse (url=0x7feb51c6be5f "gar-mas-0-0.%s/%s", ludpp=ludpp@entry=0x7ffe2bf46d98, 
    dn_required=dn_required@entry=1) at /home/garming/comm-central/ldap/c-sdk/libraries/libldap/url.c:246
246			nsldapi_hex_unescape( ludp->lud_host );
(rr) up
#2  0x00007feb905546cb in ldap_url_parse (url=<optimized out>, ludpp=0x7ffe2bf46d98)
    at /home/garming/comm-central/ldap/c-sdk/libraries/libldap/url.c:153
153		if (( rc = nsldapi_url_parse( url, ludpp, 1 )) == 0 ) {

It seems that the usage of the ldap url, particularly with the %s is the issue. Receiving an email from my colleague Douglas with only 'ldap://gar-mas-0-0.%s/%s' in the body triggers the fault.

According to the debugger, the pointer was somehow modified in ldap_url_parse, specially with the hex unescape routine (and possibly the null termination). It seems quite likely to be some form of length mismatch.
Attached file crash.eml
A much smaller email that causes the crash.
Attached file ldap_url_backtrace
Attached is a longer backtrace of the ldap_url_parse.
There are a few problems in ldap/c-sdk/libraries/libldap/unescape.c

Firstly, unhex:

static int
unhex( char c )
{
	return( c >= '0' && c <= '9' ? c - '0'
	    : c >= 'A' && c <= 'F' ? c - 'A' + 10
	    : c - 'a' + 10 );
}

will treat an unknown character as lying in the [a-f] range. 

Secondly, and this is the worst:

void
nsldapi_hex_unescape( char *s )
{
/*
 * Remove URL hex escapes from s... done in place.  The basic concept for
 * this routine is borrowed from the WWW library HTUnEscape() routine.
 */
	char	*p;

	for ( p = s; *s != '\0'; ++s ) {                /* 10 */
		if ( *s == '%' ) {                      /* 11 */
			if ( *++s != '\0' ) {           
				*p = unhex( *s ) << 4;  /* 13 */
			}
			if ( *++s != '\0' ) {           /* 15 */
				*p++ += unhex( *s );
			}
		} else {
			*p++ = *s;
		}
	}

	*p = '\0';
}

suppose (as is actually the case here) the url snippet is "%s". 

At line 10, *s is '%', we go into the loop.
At line 11, we take the unescape branch. *s is still '%'.
At line 12, we move *s to 's',
At line 13, we decode 's' as 448. We stick 448 into the byte, which truncates it to 192 (or -64, if the char is signed).
At line 15, we move *s to '\0', meaning we *don't* do line 16 (which in itself it would be harmless).
We return to the top of the loop.
At line 10, we ++s, so *s is now beyond the string.
Still at line 10, we compare *s to '\0'...

Interestingly enough, the first time this happens for the host part of the url, and we know that the dn part of the url lies just beyond, so we actually know that *s now points to another "%s" (because the full url is "%s/%s", which has been mangled to "%s\0%s".

The second time this is run, it is for the dn part of the url, which in this case was "%s", but will been changed ("decoded") by the host part overrunning. (I think -- I haven't actually followed it that far).







At line 14
I can write a patch tomorrow.
(In reply to Douglas Bagnall from comment #20)
> I can write a patch tomorrow.

That would be great. I can use your email and trace into the LDAP url routine, but I don't get a crash. Happy to look at a patch though.
Attached patch 20615.patchSplinter Review
This should fix it. It compiles, and we haven't crashed.

I found a similar thing in mozilla-central (Bug 1313244).
You'll want to request review - I'm not sure who that might be. Perhaps magnus can suggest.
(probably none of the Thunderbird folk can see Bug 1313244)
Crash Signature: [@ ShouldMarkCrossCompartment ] [@ js::UncheckedUnwrap ] [@ jemalloc_crash | arena_malloc | je_malloc | nsStringBuffer::Alloc ] [@ jemalloc_crash | arena_salloc | je_realloc | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | mozilla::dom::workers::…
From our analysis of the above bug, the usual scenario would appear to be either a crash or displaying an LDAP url with internal memory encoded into it (which when clicked could send this information). In terms of write capability, it's limited to hex unescaping adjacent memory, which either shifts memory across or inserts null bytes.
Comment on attachment 8804945 [details] [diff] [review]
20615.patch

Looks good to me.
Attachment #8804945 - Flags: review?(rkent)
Attachment #8804945 - Flags: feedback+
Please don't commit this until Bug 1313244 is resolved.

That refers to the same problem in Core/XPCOM, and the patches ought to be coordinated. The error seems to have been copied more widely.
Comment on attachment 8804945 [details] [diff] [review]
20615.patch

Review of attachment 8804945 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me too, thanks for the patch!

"Please don't commit this until Bug 1313244 is resolved.

That refers to the same problem in Core/XPCOM, and the patches ought to be coordinated. The error seems to have been copied more widely."

I don't have access to that bug, I suspect neither does Jorg. He will probably coordinate landing this, so could you please cc him on that bug?
Attachment #8804945 - Flags: review?(rkent) → review+
Douglas, thanks for granting us access to the other bug, looks familiar ;-) Please post a comment here when you want me to land this.
Whiteboard: DO NOT LAND THIS YET - see comment #26 and #27
OK, Core/XPCOM (Bug 1313244) is not affected after all.

Nevertheless, please hold off for a bit because Red Hat are wanting to fix 389-dsgw and Illumos should do something.
Bug 1313244 was closed as invalid. So where does that leave us here?
We have this bug in illumos (www.illumos.org) and I'd like to apply the patch there.
However before I do that, I'll need an illumos.org bug, and that will need to say _something_ about this problem.  So:

What can I say about this without causing problems for others?
How long does this information need to remain private?

Thanks,
Gordon Ross
Flags: needinfo?(douglas.bagnall)
(In reply to gordon.w.ross from comment #31)
> What can I say about this without causing problems for others?
> How long does this information need to remain private?

I don't know what kind of schedule the Thunderbird people have.

We're also waiting for Red Hat who have the same issue in 389-dsgw:

https://fedorahosted.org/389/browser/389-dsgw/fileurl.c#L231

They have responded positively but not definitively with regard to a timeframe.
(In reply to Jorg K (GMT+1) from comment #30)
> Bug 1313244 was closed as invalid. So where does that leave us here?

This bug is still real, producing a reliable segfault on (at least) Ubuntu Thunderbird 45.4.0 is the user looks at an email containing an offending ldap url. 

Bug 1313244 referred to code in core/XPCOM which looked similar but was protected by an earlier step. Its brief but distracting appearance is entirely my fault as I didn't read the code properly.
So now we have Illumos/OmniOS, Thunderbird, and 389-dsgw subscribed to this bug. As far as I can tell you guys have the largest and most alive projects with affected libldap forks. (It seems there are a few hundred more on github, but many of those are snapshot-forks or dead. Others have discovered and patched the bug independently).

Could you sort out the schedule amongst yourselves?
The way it works in illumos is typically we tell our security & distro contacts first, then push the fix to illumos-gate, cut our patches & make them available, THEN tell people about it.  That order might trip up some others here.  AFAIC, I could push this fix quickly (this week), and then get things out.  The prototype bug summary I have is "LDAP unescape needs to be stricter.", and typically that bug report will have little/none filled in.

I'm more worried about Thunderbird and 389-dsgw than myself.
Flags: sec-bounty?
huzaifas, is redhat ready to go?

danmcd, what about illumos?
Flags: needinfo?(huzaifas)
Flags: needinfo?(douglas.bagnall)
Flags: needinfo?(danmcd)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #36)
> huzaifas, is redhat ready to go?
> 
> danmcd, what about illumos?

Hi,

There are several other software pieces which use this particular implementation of hex_unescape() and may be affected. Clearly in each case the impact depends on what kind of data is passed to it. The above being said, while we are ok with fixing this issue publicly, i think doing this just before christmas is bad timing. I would personally like to do this after the holidays, so others have better time to respond.

Also it would be a good idea to post this to distros list (for the benefit of others), which also makes sense after the holidays.
Flags: needinfo?(huzaifas)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #36)
> huzaifas, is redhat ready to go?
> 
> danmcd, what about illumos?

I can likely have this ready in 24-48 hours. Good enough?
Flags: needinfo?(danmcd)
Perhaps we should try to set a target date.

We could further delay 45.6.0 to the week after Christmas, which isn't stretch given we don't yet have a fix for Bug 1323107.

Or wait 5-6 weeks for 45.7.0 in the vicinity of 2017-01-23, ref https://wiki.mozilla.org/RapidRelease/Calendar



Yet another issue, for Kent/jorg, is the question of risk and order/logistics of landing the patch on trunk, and branches, and version 45 and
Flags: needinfo?(rkent)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #39)
> Perhaps we should try to set a target date.
> 
> We could further delay 45.6.0 to the week after Christmas, which isn't
> stretch given we don't yet have a fix for Bug 1323107.
> 
I already see THUNDERBIRD_45_6_0_RELEASE and THUNDERBIRD_45_6_0_BUILD1  tags in the repo, so maybe fixing this in the next cycle makes better sense imo.
> I already see THUNDERBIRD_45_6_0_RELEASE and THUNDERBIRD_45_6_0_BUILD1  tags in the repo, so maybe fixing this in the next cycle makes better sense imo.

There is a big question mark at the moment as to whether that can ship this week.  A rebuild is not out of the question.
(I thought I answered the NI but I don't see it, must have forgot to save).

I suppose Wayne's question is whether I feel comfortable shipping the patch without prior end-user testing on other channels. That changes are very local, which means we can review that change without having to worry about unforessen interactions with other parts of the code. So yes I am comfortable landing this on esr45 without a prior landing on beta or aurora.

As to timing, I don't see anything that critical in Thunderbird 45.6.0, so once we can build it again it would be good to get this patch in, even if that means waiting a few days.
45.7.0 build should happen this week.


(In reply to Huzaifa Sidhpurwala from comment #37)
> ...
> Also it would be a good idea to post this to distros list (for the benefit
> of others), which also makes sense after the holidays.

Has this been done?  If not who does it?
Flags: needinfo?(huzaifas)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #43)
> 45.7.0 build should happen this week.
> 
> 
> (In reply to Huzaifa Sidhpurwala from comment #37)
> > ...
> > Also it would be a good idea to post this to distros list (for the benefit
> > of others), which also makes sense after the holidays.
> 
> Has this been done?  If not who does it?

I can do this today, if everyone is ok, what un-embargo date do you want to set? (1 week from today should be good, distros does not like long embargoes)
Flags: needinfo?(huzaifas)
Is 20615.patch still the canonical fix for this?  If so, I can push it into illumos-gate whenever.
Attachment #8804945 - Flags: approval-comm-esr45+
(In reply to danmcd from comment #45)
> Is 20615.patch still the canonical fix for this?

As far as I know.

Note that it doesn't address one bug described in comment #19, namely that a string with invalid escape characters will be interpreted as a valid escape (for example "%gg" will be read as '\0'). That is probably not serious as it doesn't change the range of the output string (i.e. you can also get '\0' with "%00"), and has been around for so long that somebody somewhere is probably depending on it.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Whiteboard: DO NOT LAND THIS YET - see comment #26 and #27
Target Milestone: --- → Thunderbird 54.0
Version: 45 → Trunk
Attachment #8804945 - Flags: approval-comm-beta?
Attachment #8804945 - Flags: approval-comm-beta+
Attachment #8804945 - Flags: approval-comm-aurora?
Attachment #8804945 - Flags: approval-comm-aurora+
Pardon my ignorance of the Mozilla source line, does this fix force an update of NSS or NSPR?
Group: mail-core-security → core-security-release
(In reply to danmcd from comment #49)
> Pardon my ignorance of the Mozilla source line, does this fix force an
> update of NSS or NSPR?

Never mind.  I found out myself.  "No".
Flags: sec-bounty? → sec-bounty+
If I understand comment #47 and comment #48 correctly, the patch is out for Thunderbird.

I assume that means Redhat and Illumos should stop waiting for any embargo, if that is what they are doing, and this bug can lose its secret flag.
Flags: needinfo?(huzaifas)
Flags: needinfo?(danmcd)
Can we make it lose its "secret" flag?
Flags: needinfo?(danmcd)
Flags: needinfo?(huzaifas)
Assignee: nobody → douglas.bagnall
Component: Networking: IMAP → LDAP Integration
You need to log in before you can comment on or make changes to this bug.