Closed Bug 1653884 Opened 4 years ago Closed 4 years ago

Crash in [@ Botan::`anonymous namespace'::RSA_Verify_Operation::verify_mr]

Categories

(MailNews Core :: Security: OpenPGP, defect, P1)

x86
Windows 10

Tracking

(thunderbird_esr78+ fixed, thunderbird82 fixed)

RESOLVED FIXED
83 Branch
Tracking Status
thunderbird_esr78 + fixed
thunderbird82 --- fixed

People

(Reporter: wsmwk, Assigned: rjl)

References

Details

(Keywords: crash, topcrash-thunderbird)

Crash Data

Attachments

(3 files)

This bug is for crash report bp-f777a3ac-fd81-4d38-988d-da04b0200718.

Top 10 frames of crashing thread:

0 kernelbase.dll RaiseException 
1 rnp.dll _CxxThrowException /builds/worker/workspace/obj-build/comm/third_party/rnp/f:/dd/vctools/crt/vcruntime/src/eh/throw.cpp:129
2 rnp.dll Botan::`anonymous namespace'::RSA_Verify_Operation::verify_mr comm/third_party/botan/src/lib/pubkey/rsa/rsa.cpp:603
3 rnp.dll Botan::PK_Ops::Verification_with_EMSA::is_valid_signature comm/third_party/botan/src/lib/pubkey/pk_ops.cpp:123
4 rnp.dll Botan::PK_Verifier::verify_message comm/third_party/botan/src/lib/pubkey/pubkey.cpp:328
5 rnp.dll Botan::KeyPair::signature_consistency_check comm/third_party/botan/src/lib/pubkey/keypair/keypair.cpp:77
6 rnp.dll Botan::RSA_PrivateKey::check_key const comm/third_party/botan/src/lib/pubkey/rsa/rsa.cpp:350
7 rnp.dll std::_Func_impl_no_alloc<`lambda at /builds/worker/workspace/obj-build/comm/third_party/botan/build/include/botan/internal/ffi_util.h:89:38', int>::_Do_call 
8 rnp.dll Botan_FFI::ffi_guard_thunk comm/third_party/botan/src/lib/ffi/ffi.cpp:93
9 rnp.dll Botan_FFI::apply_fn<Botan::Private_Key, 2140551262, `lambda at /builds/worker/checkouts/gecko/comm/third_party/botan/src/lib/ffi/ffi_pkey.cpp:148:11'> /builds/worker/workspace/obj-build/comm/third_party/botan/build/include/botan/internal/ffi_util.h:89

Nickolay, could you please change RNP to fail gracefully (with an error result) if you get an exception from Botan?

Do you understand why key generation is failing?

Flags: needinfo?(o.nickolay)

An Botan exception should NEVER trigger a crash.

All exceptions inside of RNP should be catched on FFI boundary, and error code must be returned instead.
Will try to reproduce the issue, it looks like definitely Botan's one (as it fails on validation of just-generated key).

Flags: needinfo?(o.nickolay)

Another example CxxThrowException | Botan::`anonymous namespace'::RSA_Verify_Operation::verify_mr bp-cd9232e5-02f9-4b5d-adf0-c64180200911

Flags: needinfo?(o.nickolay)

This was discussed here:
https://github.com/rnpgp/rnp/issues/1287

Botan and RNP both handle exceptions on FFI boundary, so most likely it is some sort of memory corruption.

Flags: needinfo?(o.nickolay)

Kai, the combined number of crashes of the two signatures now makes this a topcrash for 78.2.2 - approximately in the top 30.

Lev, can you provide Kai with a reproducible testcase? If so, you can describe it here, and if there is an example file please email it to him

Severity: -- → S2
Crash Signature: [@ Botan::`anonymous namespace'::RSA_Verify_Operation::verify_mr] → [@ Botan::`anonymous namespace'::RSA_Verify_Operation::verify_mr] [@ CxxThrowException | Botan::`anonymous namespace'::RSA_Verify_Operation::verify_mr ]
Flags: needinfo?(kaie)
Flags: needinfo?(blacklion)

I'll try to make keyring without any private keys which leads to crash

Kai, I've sent you GPG keyring which leads to crash.
Crash id is bp-a022cb5d-c3b6-4488-b3b2-487d60200920

Flags: needinfo?(blacklion)

Is a workaround possible?

Flags: needinfo?(o.nickolay)

I could try to write bisect script which will create keyring without half the keys, try to import it, rinse-n-repeat, to find key which cause crash. But not right now...
And, of course, it is possible to delete all public keys from keyring and start life with a clean slate :) All public keys must be on key servers!

I cannot reproduce the crash, I tried win32, win64, linux32, linux64, all work for me.

Lev, I wonder if the crash happens while importing your secret key, because you said it crashes shortly after you enter the password to unlock it.

You could try to backup your gnupg directory, then temporarily move the public key ring file away, so that only the secret keys are in your gnupg directory. Then start the migration. Does it crash?

Flags: needinfo?(kaie)

Well, it seems obvious that it's about a private key, when looking at the crash stack...

Wayne, does it look like all reported crashes are on Windows with a 32bit executable?

(In reply to Wayne Mery (:wsmwk) from comment #12)

Is a workaround possible?

We don't know what's happening.

Flags: needinfo?(o.nickolay)

Kai, crash happens in the middle of importing process, much later, than it asks for PK password. I
I've checked, that crash repeats when there is no PK in directory, exactly with same data as I sent you. Crash bp-a022cb5d-c3b6-4488-b3b2-487d60200920 is WITHOUT PK.
I'll try to import ONLY PK tomorrow, but, again, I've checked, that sent data cause crash, at least on my computer.
Maybe, I need to cleanup leftovers from previous trys somehow?
Maybe, I could run some "debug" build for more comprehensive diagnostics?

(PK is "private key" in previous comment, I've realized, that it could be Public key as well)

Also, I could try to remove public part of my key from keyring too, as now Thunderbird shows error about it if private key is not found (instead request for password).

For example, logging each key ID BEFORE import could be useful, as we will know key which leads to crash.

Good idea. You can use config editor to set preference extensions.enigmail.logDirectory to a directory of your choice to enable logging of the Enigmail migrator Add-on. In that directory a file enigdbug.txt will be created.
With that, I get log lines like this:
2020-09-23 10:11:26.481 [DEBUG] setupWizard2.js: importKeys: pubFile: /tmp/enig-exp/0x....asc
where ... is the key ID

If it crashes, I wonder if the newest ID will be flushed to file prior to the crash.
I can create a test version of that Add-on that adds flushing after each write to the logfile.

Lev, you can save this to file, then use add-on manager, settings wheel icon, and select import from file, then pick this one. Restart TB to be sure it's picked up. Then start migration again from the profile. It might be even slower than before, because of the many writes to file. But if you crash, the last line in enigdbug.txt will hopefully refer to the key that causes the crash.

Crashed with test version of Enigmail migration plugin.
Crash ID bp-90a2eae8-f830-4717-8c51-f5f230200923

The last line referred to key 7420DF86BCE15A458DCE997639278DA8109E6244.

Can you try to
gpg --export-keys --armor 7420DF86BCE15A458DCE997639278DA8109E6244 > key-7420DF86BCE15A458DCE997639278DA8109E6244.asc
with Thunderbird OpenPGP key manager, use file open public key, select that file.

Does that crash for you?

(I still don't crash.)

It's interesting that your public keyring is 19 MB in total, and just this one is 1.5 MB big.
It has 3296 signatures from other keys, and 1 signature is reported by gnupg as "bad".

Finally, I can reproduce the crash.

With my local Windows optimized build, we get an exception in botan rsa.cpp:

      BigInt public_op(const BigInt& m) const
         {
         if(m >= m_public->get_n())
-->         throw Invalid_Argument("RSA public op - input is too large");

         return m_public->public_op(m);
         }

The opt build doesn't have the variable contents, they are optimized away.
I will create a local debug build in the hope it will tell us more details.

... still experimenting with the opt build (debug build takes time)...

This is very confusing.

I have wrapped the above statement like this:

  try {
         if(m >= m_public->get_n())
-->         throw Invalid_Argument("RSA public op - input is too large");
         }
  } catch (Exception &e) {
    BigInt none; return none;
  } catch (...) {
    BigInt none; return none;
  }

Regardless, the Visual Studio debugger reports that the "throw" is unhandled.
I don't understand how this is possible.

Kai, as from discussion here: https://github.com/rnpgp/rnp/issues/1287#issuecomment-684971878, it looks like some sort of memory corruption.
After this call Botan's FFI wraps all exception, and afterwards our FFI also handles all exceptions.

I'm in the MSVS debugger, and I see ffi_guard_thunk is in the call stack a few levels above.

But the runtime code doesn't go there. Instead, it reports an unhandled exception and stops.
The exception handling doesn't work for some reason.

(In reply to Nickolay Olshevsky from comment #29)

Kai, as from discussion here: https://github.com/rnpgp/rnp/issues/1287#issuecomment-684971878, it looks like some sort of memory corruption.

I'm not yet convinced it's memory corruption.

I've changed RSA_Public_Operation::public_op() to return an empty BigInt on failure (instead of throwing), in the hope that this causes the operation to fail. (At least it continues.)

With this change, I'm able to successfull import the public key. And I'm able to perform lots of functionality afterwards. If we had a memory corruption, I'd expect that I'd soon crash afterwards when performing other operations. But that doesn't happen, no crash.

Maybe it is memory corruption. Maybe memory corruption causes exception handling to no work. But I think it's possible that simply the exception handling is broken or disabled or overridden in some way, that we cannot think of yet. Maybe as a consequence of using the Thunderbird build environment.

I've just learned that we haven't ran the botan/rnp tests suites using the binaries that are produced as part of the Thunderbird build.

I'd like to try building botan/rnp with their usual approach, to check if that fixes the crash.

Yes, there are some keys of FreeBSD security officers, for example, which have a tons of signatures.

OpenPGP Key Manager -> Import Public Key -> CRASH bp-77341c71-544e-4a8f-8872-504170200923

But at least I could remove this key from keyring and try again! :-)

And, I think, this key could be added as testcase into testsuite :-)

Thank you for all your work, and good luck with fixing this bug. If I could provide any additional information - let me know!

Nickolay, Rob, I think I have confirmed that my theory was correct. The crash is caused by non-working exception handling.

When building RNP and Botan as part of the Thunderbird build system, I'm guessing that some of the build flags disable the C++ exception handling system in a way that breaks Botan's and RNP's expectations.

I've manually created alternative binaries using the original library build rules, cross compiling on Fedora using MinGW for Windows.
With those binaries, it works fine, no crash.

Priority: -- → P1
Assignee: nobody → rob

In Windows distributions of Thunderbird, the directory that contains thunderbird.exe also contains a file rnp.dll - this is a file that bundles multiple libraries, RNP, Botan and json-c. That file is currently created as part of the Thunderbird build system.

I've manually created alternative binaries, that don't crash.
If you want to test them, I've uploaded them to my personal server.

Version information:
https://kuix.de/mozilla/openpgp/info-libs-20200923.txt

Windows 32 bit:
https://kuix.de/mozilla/openpgp/rnp-botan-32-libs-20200923.zip
https://kuix.de/mozilla/openpgp/rnp-botan-32-libs-20200923.zip.asc

Windows 64 bit:
https://kuix.de/mozilla/openpgp/rnp-botan-64-libs-20200923.zip
https://kuix.de/mozilla/openpgp/rnp-botan-64-libs-20200923.zip.asc

Files with extension .asc are detached signatures, to show they were really made by me
(key fingerprint 21D16E67E18398C8DA9DDF2E1C27423725007724).

Lev, if you want to test:

Find the directory that contains your thunderbird.exe
Rename file rnp.dll to a different filename, for example orig-rnp.dll
Download the archive that matches your Thunderbird binary, either 32 or 64 bit.
Extract the archive.
Copy the individual files to the thunderbird directory.
Run Thunderbird

(In reply to Kai Engert (:KaiE:) from comment #35)

Lev, if you want to test:
Yes, it works, thank you!

Clang-cl/MSVC default exceptions are SEH style and do not
propagate the same as standard c++ exceptions that RNP is
expecting. The /EHs flag sets the correct behavior.

I ran Botan's configure.py with MSVC set as the compiler and compared the compile flags set there against what is set in the moz.build files. The one that stood out is /EHs, which in MSVC allows for throwing structured exceptions.

https://docs.microsoft.com/en-us/cpp/build/reference/eh-exception-handling-model?view=vs-2017

The compiler always generates code that supports asynchronous structured exception handling (SEH). By default (that is, if no /EHsc, /EHs, or
/EHa option is specified), the compiler supports SEH handlers in the native C++ catch(...) clause. However, it also generates code that only partially supports C++ exceptions. The default exception unwinding code doesn't destroy automatic C++ objects outside of try blocks that go out of scope because of an exception. Resource leaks and undefined behavior may result when a C++ exception is thrown.

There's a lot there that I don't follow, but I understood that the default mode may result in undefined behavior when a c++ exception is thrown.

So the fix is change -Xclang -fcxx-exceptions to /EHs. I built locally with this and verified that there is no crash on trunk or on 78.2.0 (that's just what was on my Windows VM already).

Try build (trunk):
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=653f4dc92d67b8ed6b032163c7f0e326caa20e0d

Using /EHs fixes the issue in my local build.

Thanks for the explanation, didn't know about this key/difference in exception handling (oh, those happy times when I worked with single compiler for single OS..).
Some offtopic: Did you have a chance to test whether latest RNP is build ok under Windows/MSVC (without MinGW)?

(In reply to Nickolay Olshevsky from comment #40)

Thanks for the explanation, didn't know about this key/difference in exception handling (oh, those happy times when I worked with single compiler for single OS..).
Some offtopic: Did you have a chance to test whether latest RNP is build ok under Windows/MSVC (without MinGW)?

Not sure what you're asking?
Does it build on a Windows host?
I did not check that, but when this lands in comm-central (probably later today) we have a win64 build job that builds on a Windows host to catch those kinds of problems.

BTW, Thunderbird updated itself to 78.3.0 release, restored rpn.dll in process and crashes at start after that. As I missed update, it was scary expirience :)
Replacing rpn.dll again with special build from this bug helps.

(In reply to Rob Lemley [:rjl] from comment #41)

(In reply to Nickolay Olshevsky from comment #40)

Some offtopic: Did you have a chance to test whether latest RNP is build ok under Windows/MSVC (without MinGW)?

Not sure what you're asking?

Nickolay, we only used MinGW for my experimental build.

Usually the automated builds use clang-cl, cross-compiling to Windows.

We don't use the MSVC compiler, IIUC, as Rob has explained here:
https://github.com/rnpgp/rnp/issues/1287#issuecomment-685067756

We link to MSVC libraries, but we don't use MSVC compiler for building.

Regarding "latest RNP", we haven't tested newer builds than the 2020-09-13 yet.

I'll land this.

Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/41caf6adc1ed
Use standard c++ exception handling in Botan/RNP. r=kaie

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

Comment on attachment 9177536 [details]
Bug 1653884 - Use standard c++ exception handling in Botan/RNP. r=kaie

[Approval Request Comment]
Regression caused by (bug #): none
User impact if declined: users crash when working with correspondents OpenPGP keys
Testing completed (on c-c, etc.): manually
Risk to taking this patch (and alternatives if risky): very little risk, build flags now aligned with upstream

Attachment #9177536 - Flags: approval-comm-esr78?
Attachment #9177536 - Flags: approval-comm-beta?

Comment on attachment 9177536 [details]
Bug 1653884 - Use standard c++ exception handling in Botan/RNP. r=kaie

[Triage Comment]
Approved by wsmwk via Matrix for 82.0b1

Attachment #9177536 - Flags: approval-comm-beta? → approval-comm-beta+

Rob, I meant that now it should be possible to build rnp on Windows natively, without MinGW/MSYS2.

Comment on attachment 9177536 [details]
Bug 1653884 - Use standard c++ exception handling in Botan/RNP. r=kaie

[Triage Comment]
Approved for esr78

Attachment #9177536 - Flags: approval-comm-esr78? → approval-comm-esr78+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: