Closed Bug 1677338 (CVE-2020-26970) Opened 2 years ago Closed 2 years ago

Stack smashing in nsSmtpProtocol.cpp.

Categories

(Thunderbird :: Security, defect)

defect

Tracking

(thunderbird_esr78+ fixed)

RESOLVED FIXED
84 Branch
Tracking Status
thunderbird_esr78 + fixed

People

(Reporter: ishikawa, Assigned: ishikawa)

Details

(Keywords: sec-high)

Attachments

(2 files)

This is a serious security bug. It is not a drill.

While I ran xpcshell tests with ASAN version of DEBUG C-C TB
binary, I found a serious problem.
There is a memory overwrite that smashes stack one way or the other.

Luckily the culprit is identified and it is a single line of code.
But the security implication is serious.

Output of grep -i sanitizer: log1274-xpcshell.txt 

 0:08.69 pid:346406 ==346406==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdb325bb40 at pc 0x7f8c8bad9daa bp 0x7ffdb325b770 sp 0x7ffdb325b768
 0:09.88 pid:346406 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 0:47.54 pid:346883 ==346883==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd70313e40 at pc 0x7f5c8e195daa bp 0x7ffd70313a70 sp 0x7ffd70313a68
 0:48.73 pid:346883 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 1:02.46 pid:347101 ==347101==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcfb9efa80 at pc 0x7f3463413daa bp 0x7ffcfb9ef6b0 sp 0x7ffcfb9ef6a8
 1:03.62 pid:347101 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 1:04.67 pid:347128 ==347128==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdec4320a0 at pc 0x7effbe12bdaa bp 0x7ffdec431cd0 sp 0x7ffdec431cc8
 1:05.83 pid:347128 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 1:08.26 pid:347177 ==347177==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd44e37f20 at pc 0x7f1f37688daa bp 0x7ffd44e37b50 sp 0x7ffd44e37b48
 1:09.45 pid:347177 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 1:10.50 pid:347203 ==347203==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe09142780 at pc 0x7ff5e7023daa bp 0x7ffe091423b0 sp 0x7ffe091423a8
 1:11.70 pid:347203 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 1:12.75 pid:347229 ==347229==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffddf1f1f60 at pc 0x7f7076971daa bp 0x7ffddf1f1b90 sp 0x7ffddf1f1b88
 1:14.05 pid:347229 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 1:33.80 pid:347484 ==347484==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcf66e0a40 at pc 0x7f2494993daa bp 0x7ffcf66e0670 sp 0x7ffcf66e0668
 1:34.98 pid:347484 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 2:16.33 pid:347965 ==347965==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdbc966340 at pc 0x7fb7ef7badaa bp 0x7ffdbc965f70 sp 0x7ffdbc965f68
 2:17.50 pid:347965 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 2:32.50 pid:348179 ==348179==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffde9061480 at pc 0x7f2fe1807daa bp 0x7ffde90610b0 sp 0x7ffde90610a8
 2:33.70 pid:348179 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 2:34.84 pid:348206 ==348206==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff5d1317e0 at pc 0x7f339d7e3daa bp 0x7fff5d131410 sp 0x7fff5d131408
 2:36.03 pid:348206 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 2:38.70 pid:348254 ==348254==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe44b93d20 at pc 0x7ff301338daa bp 0x7ffe44b93950 sp 0x7ffe44b93948
 2:39.87 pid:348254 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 2:41.03 pid:348280 ==348280==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd0075b2c0 at pc 0x7f981c7b4daa bp 0x7ffd0075aef0 sp 0x7ffd0075aee8
 2:42.20 pid:348280 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt
 2:43.35 pid:348306 ==348306==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffd3723520 at pc 0x7f553b191daa bp 0x7fffd3723150 sp 0x7fffd3723148
 2:44.65 pid:348306 SUMMARY: AddressSanitizer: stack-buffer-overflow /NEW-SSD/NREF-COMM-CENTRAL/mozilla/nsprpub/pr/src/io/prscanf.c:288:51 in GetInt

I am attaching an excerpt from the xpcshelltes log with ASAN version of C-C TB with DEBUG flag.

Problem occurs here.
https://searchfox.org/comm-central/source/mailnews/compose/src/nsSmtpProtocol.cpp#563

     char codeClass, codeSubject, codeDetail;
---> if (PR_sscanf(line + chars_read, "%1u.%1u.%1u ", &codeClass, &codeSubject,
                  &codeDetail) == 3)
      m_responseCodeEnhanced = codeClass * 100 + codeSubject * 10 + codeDetail;

Note that |codeClass|, |codeSubject|, |codeDetail| are declared as |char| type.
However, the specifier has 'u' and not 'c'.
The intention seems to read a number to the varialbles.

This is the klux of the problem.

As I see it, PR_sscanf() processing of '%1u' is going through the hoops, but
eventually the scan function assumes that value's width is tagged as
(without specification such as 'h','l', 'll', 'L', etc.)

  state->sizeSpec = _PR_size_none; 

( https://searchfox.org/comm-central/source/mozilla/nsprpub/pr/src/io/prscanf.c#556 )

But then after the value is scanned and computed (for, %1u, that is %u), it is written back to the pointer of the variable that receives the value with the following statement.
This is in the call chain of Convert() -> GetInt().
https://searchfox.org/comm-central/source/mozilla/nsprpub/pr/src/io/prscanf.c#287

                unsigned long lval = strtoul(buf, NULL, base);

--->            if (state->sizeSpec == _PR_size_none) {
                    *va_arg(state->ap, PRUintn *) = lval;

PRUintn is unsigned int according to the following.:
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PRUintn
And it is indeed 4 bytes according to attached ASAN log.

So WE ARE WRITING 4 (FOUR) BYTES INTO CHARACTER VARIABLES ON THE STACK (!)
ASAN aborts the program on the first write, but all the three variables have memory overwrite issues.
Thus the code stomps on whatever memory value out there on the stack. Grrr...

DoS possibility.

I think this could have been exploited by malicious SMTP server to cause at least a denial of service attack on some unfortunate TB binaries (this depends on how the auto variables are placed on the stack. The layout differs from a compiler to the other, and optimization-level and debug flag may affect the layout and code. So it is hard to say definitely.)

Serious Exploit?

Whether this could have been used to exploit unsuspecting user's computer, I am not sure. But again it is up to the particular compiler and the particular optimization and debug level, and CPU-depdendent... There may have been unfortunate TB binaries.

Still this is a very serious bug.

I don't know why my earlier valgrind run of xpcshell tests did not reveal this bug. (Maybe, just maybe, the particular tests did not run under valgrind because of the timeout that are caused by x10-20 slowdown due to valgrind. I need to figure out how to extend xpctest timeout for valgrind run. At least, I should be able to reproduce the issue under valgrind with proper setting of timeout for the tests that triggered the issue reported here.)

Getting this fixed is the first priority, but I have a proposal.

I understand ASAN binary of TB is created for nightly these days.

Suggestion/Proposal

On the tryserver, we should run a mochitest and xpcshell test using the ASAN binary at least once a month for C-C, and/or as new nightly appears.
For running xpcshell tests, we should add "--serialize" and "--verbose" flag.
This is how I run my local memory tests of C-C TB.
Otherwise, the error messages may not be visible at all and the error messages get mixed up among different processes and hard to analyze.

I would also suggest a monthly valgrind run.
But, I have not been able to run C-C TB mochitest under valgrind at all since we switched to mochitest from |make mozmill| test suite. Tough.
Also, I have no idea why, but valgrind cannot run the big TB binary as ordinary user anymore on my Debian GNU/Linux OS. It seems that Valgrind+TB fails when the stack needs to be extended (i.e., new page needs to be mapped for the stack.).
Valgrind+C-C TB does run under superuser. However, mochitest has its own problem and I never get past the initial window or anything. I set the time out to more than 30 minutes, but eventually, the timeout is triggered before no tests are performed with mochitest.
|make mozmll| ran fine after my tweaking of timeout values...
xpcshelltests do run and I could find a few memory-related errors.

Simply declaring the three variables as "unsigned int" should fix the issue.
I am checking the fix locally now.

The simple fix solves the issue.
I am a bit surprised that this bug persisted for a long time (not sure how long, though.)

Assignee: nobody → ishikawa
Attachment #9187896 - Flags: review?(mkmelin+mozilla)
Attachment #9187896 - Flags: review?(mkmelin+mozilla) → review?(benc)
Status: NEW → ASSIGNED
Target Milestone: --- → 84 Branch

I am a bit surprised that this bug persisted for a long time (not sure how long, though.)

My theory as to why this bug hasn't caused more chaos is that the three char variables on the stack stomp over each other, but in order. So, by the time they've all been scanned, they all contain the right values. And the following three bytes on the stack will also have been blatted. But if the next variable on the stack isn't a char, it'll probably be shifted to the next 8-byte boundary (or 4-byte boundary? I forget the rules for padding on 32bit vs 64bit architectures...).
So those three extra destroyed bytes will just be harmless padding.
Would also have different (likely really bad and incorrect) results on big-endian architecture.

Doesn't make the bug any better, but explains why we might not have crashed every time we sent an email :-)

Comment on attachment 9187896 [details]
A fix. Verified by running ASAN version locally.

This looks like the right fix to me.

I don't think it's a bug in PR_sscanf(). My understanding is:
In the format string "%1u", the '1' refers to the max width of the input field (in characters) rather than any assumption about the destination storage. The 'u' requires a full unsigned int, even if you're only reading in a single digit. So PR_sscanf() is right to be storing out 32bit values (or even 64bit, depending on architecture int size).

Attachment #9187896 - Flags: review?(benc) → review+

Also, I think the parsing is wrong, and should handle 3-digit codes for the 2nd and third fields... see Bug 1677405.

(In reply to Ben Campbell from comment #3)

I am a bit surprised that this bug persisted for a long time (not sure how long, though.)

My theory as to why this bug hasn't caused more chaos is that the three char variables on the stack stomp over each other, but in order.

To be honest, the compiler can do anything to store the character variables as it desires, just like CLANG compiler does for ASAN compilation.
As Ben noted the direction of stack growth is also important.
So it is anyone's guess which version of TB was affected in the past (depending on compiler version, optimization flag, and used CPU, of course).

[I am afraid that the particular version of valgrind on my local PC is a bit buggy. It could not detect the problem somehow. I am looking into it.]

So, by the time they've all been scanned, they all contain the right values. And the following three bytes on the stack will also have been blatted.
But if the next variable on the stack isn't a char, it'll probably be shifted to the next 8-byte boundary (or 4-byte boundary? I forget the rules for padding on 32bit vs 64bit architectures...).
So those three extra destroyed bytes will just be harmless padding.

There is no guarantee that the three variables are stored on the stack in the mentioned order. (It could be reversed. Anything goes.)

So if this is how the char variables are stored back to back, we have a problem.

low <--- address ---> high
|codeDetail| codeSubject|codeClass|???|???...

codeClass is set by PR_sscanf().
But it is trashed when codeSubject is set.
Both codeSubject and codeClass are trashed when codeDetail is set.
(Oh, this is a scenario on a little-endian CPU.)

Please note that |???|???...| can contain either frame pointer or stored return PC or something related to function call mechanism on a CPU where stack grows downward. (Do we have canary for stack protection depending on compiler used?).
This part can be trashed when codeClass is set.
So there IS a chance either the frame pointer or return PC could be affected.
Of course, this is again CPU-dependent (whether byte-addressable or not also counts.)

Would also have different (likely really bad and incorrect) results on big-endian architecture.
Right.

Doesn't make the bug any better, but explains why we might not have crashed every time we sent an email :-)

I hope not. :-)
If it did, we would have noticed this bug earlier earlier.
The problem could have been a silent weird behavior, etc. which no one could explain.

Anyway, I am glad I could address this issue thanks to ASAN.

(In reply to Ben Campbell from comment #4)

Comment on attachment 9187896 [details]
A fix. Verified by running ASAN version locally.

This looks like the right fix to me.

I don't think it's a bug in PR_sscanf(). My understanding is:
In the format string "%1u", the '1' refers to the max width of the input field (in characters) rather than any assumption about the destination storage. The 'u' requires a full unsigned int, even if you're only reading in a single digit. So PR_sscanf() is right to be storing out 32bit values (or even 64bit, depending on architecture int size).

Thank you for the review. So how do we go about landing this?
Do we need to set checkin-needed-tb keyword or something?

I'll land it for today's nightly.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

(In reply to Magnus Melin [:mkmelin] from comment #9)

https://hg.mozilla.org/comm-central/rev/b7906efbd35b5e80179b6ed1433f1187931eec7b

Thank you for landing.

Regarding my proposal of running tests on tryserver with ASAN binary, where should I raise the issue?
I sent an e-mail to Maildev mailing list, but I am not sure whether that is enough.

As far as I know, ASAN-binary of C-C TB has been created for nightly, correct?
Or maybe I should create a new bugzilla for this?

BTW, I feel a bit awkward regarding the following:

https://firefox-source-docs.mozilla.org/bug-mgmt/processes/security-approval.html

Should we not patch the current ESR, too, with a minor update?
Maybe I misunderstood.

Status: NEW → ASSIGNED
status-thunderbird_esr78: --- → affected
tracking-thunderbird_esr78: --- → + <==== This means we will patch ESR eventually(?)
Target Milestone: --- → 84 Branch

For asan, Rob already replied there, I don't think anything more is needed.

Yes, we should fix this for 78 too, but let's have it on nightly/beta some days first.

Comment on attachment 9187896 [details]
A fix. Verified by running ASAN version locally.

[Approval Request Comment]
Potential sec issue, with a bit of limited vector: it would need to be your own SMTP attacking you.

Attachment #9187896 - Flags: approval-comm-esr78?

(In reply to Magnus Melin [:mkmelin] from comment #12)

For asan, Rob already replied there, I don't think anything more is needed.

Yes, we should fix this for 78 too, but let's have it on nightly/beta some days first.

Thank you. Got it.

Group: mail-core-security → core-security-release

Comment on attachment 9187896 [details]
A fix. Verified by running ASAN version locally.

[Triage Comment]
Approved for esr78

Kaie, this will need an advisory for 78.5.1

Flags: needinfo?(kaie)
Attachment #9187896 - Flags: approval-comm-esr78? → approval-comm-esr78+

Tom, could you please assign a CVE? Thanks in advance

Flags: needinfo?(tom)
Alias: CVE-2020-26970
Flags: needinfo?(tom)

(In reply to Magnus Melin [:mkmelin] from comment #13)

Comment on attachment 9187896 [details]
A fix. Verified by running ASAN version locally.

[Approval Request Comment]
Potential sec issue, with a bit of limited vector: it would need to be your own SMTP attacking you.

Magnus,
It seems that people's usual SMTP servers attack the users In some countries.
Sigh...
Oh well.
It is not great to think that the e-mail suddenly starts to behave erratically in times of emergency (civil unrest, etc.).
However, such disruption of communication channel has been a fact of life, though, over the last centuries.
Anyway, I am pleased to think it has landed and I have one less thing to worry about.

Sure, but if an SMTP server wants to DOS attack one of its users, it seems easier to just not let them log in. Same effect.

(In reply to Magnus Melin [:mkmelin] from comment #19)

Sure, but if an SMTP server wants to DOS attack one of its users, it seems easier to just not let them log in. Same effect.

What worried me was more sinister, i.e. the stack smashing could do something funny, but luckily, the format specifier was "u" and not 8 octets long even in 64-bit binary and so we were safe in that regard. Or are we? Who knows? I had never expected that a freed area previously obtained by malloc() could be abused to invoke a routine at an (semi-) arbitrary address until it was shown how to do it cicra 2000.

I am still puzzled why valgrind could not detect the issue, but I have been hampered by the failure to compile C-C TB with GCC, my choice of compiler on local PC.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98000

Summarizing the severity of this issue:

The bug causes attacker controlled values to be written to the call stack, which could lead to the execution of attacker controlled code.

This sounds like sec-high.

Flags: needinfo?(kaie)
Keywords: sec-high

drive-by comment as a Firefox security person:
Attackers have written code execution exploits with similarly small overwrite capabilities. I agree with Kai's interpretation and suggest treating this as a very serious issue indeed.

(In reply to Kai Engert (:KaiE:) from comment #21)

Summarizing the severity of this issue:

The bug causes attacker controlled values to be written to the call stack, which could lead to the execution of attacker controlled code.

This sounds like sec-high.

Thank you for your comment. The reason I used "Stack Smashing" in the title is exactly because I feared that bad guys can concoct something sinister for a particular binary on a particular CPU compiled by a certain version of a certain compiler. That may be only a portion of the overall TB users, but if you have 10 million users all over the world, that can be a lot.
It is not a matter of simple DoS. It may run a malware-supplied script or something. That was what I was worried about.
An organization with a big pocket may be able to do this. That is the reality in today's world.
(See Blackhat and other similar-minded conferences or publications. There are full of clever exploits and I suspect many of us come out very depressed after learning the exploits.)

One reason I try to run tests under valgrind after my patches are applied locally is to avoid such serious security issues.
(One of these days ASAN build is useful in the face of non-testability of mochitest under valgrind.)

Again, I am not sure why I had not caught this much sooner with valgrind. But that's another story.

(In reply to Frederik Braun [:freddy] from comment #22)

drive-by comment as a Firefox security person:
Attackers have written code execution exploits with similarly small overwrite capabilities. I agree with Kai's interpretation and suggest treating this as a very serious issue indeed.

I agree. That is why I wrote at the beginning of my original post.:

This is a serious security bug. It is not a drill.

I feel reading a few articles presented at BlackHat conferences of the last dozen years or so should be mandatory
for programmers. Of course, there are chances that some students are sucked into the netherworld from there, but
I think the general population ought to be aware of the exploits used by Blackhats and should feel wary of certain class of bugs so that ICT industry shapes up as a whole. TB or FF alone cannot cope with the issue adequately. Libraries such as gibc, compiler people such as GCC and clang , middleware people inclucing X-related gui software packages, OS people, and debug tools people (ASAN, valgrind, etc.) need to get involved.

Anything that smashes data on stack really needs to be handled with serious suspicion.
Blackhats are known to spend copious time on such issues and come up with wonderful exploits.

(In reply to Magnus Melin [:mkmelin] from comment #25)

https://hg.mozilla.org/comm-central/rev/93e224ddb6efa1c5d9b45ce6cd71e83403b216b2 (clang format this)

Magnus, are you sure this piece of change belongs to this bug?
The code does not seem to be related to the section of the source that the patch in this bugzilla modified (?)

Not sure, but maybe there had been a clang-format parameter change that would
invoke the rearrangement of some lines in mailnews/compose/src/nsSmtpProtocol.cpp if clang-format is applied.
But since that part is not related to this bugzilla entry, I am a bit confused.

I have a feeling that we may need to run clang-format on the source tree from time to time.: there have been cases I noticed somewhat unexpected changes (or unchanges, so to speak) in the source code which I thought went through formatting months ago.

Looks like it was unrelated, sorry. There was a a few formatting changes needed in mailnews so I tried to splice them up. Might have been due to clang-formatting upgrade then. Anyway, mailnews is once again up to speed.

(In reply to Magnus Melin [:mkmelin] from comment #27)

Looks like it was unrelated, sorry. There was a a few formatting changes needed in mailnews so I tried to splice them up. Might have been due to clang-formatting upgrade then. Anyway, mailnews is once again up to speed.

Thank you for taking care of keeping source tree in good shape.

TB needs more people like you.

Thank you and have a peaceful festive season.

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.