Closed Bug 1300152 Opened 8 years ago Closed 7 years ago

Add nsIDebug2::rustPanic to allow triggering Rust panic for testing

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed
firefox55 --- fixed

People

(Reporter: ted, Assigned: ted)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

For testing purposes it would be really useful to be able to trigger a crash in Rust code. I think the only meaningful way to do this is to provide a function in the Rust code in libxul that will crash on-demand, and then expose it via XPCOM. nsIDebug2 already has abort, so that seems like an OK place to hang it off of.
Comment on attachment 8787692 [details]
bug 1300152 - add rust to mozinfo.json.

https://reviewboard.mozilla.org/r/76384/#review74472

::: python/mozbuild/mozbuild/mozinfo.py:85
(Diff revision 1)
> -    d['official'] = bool(substs.get('MOZILLA_OFFICIAL'))
> -    d['sm_promise'] = bool(substs.get('SPIDERMONKEY_PROMISE'))
>  
> +    # This is where all bool-like values go.
> +    for info_name, subst_name in [
> +            ('debug', 'MOZ_DEBUG'),

nit: Are these supposed to be indented only 1 more level instead of 2 more levels?
Attachment #8787692 - Flags: review?(mshal) → review+
Comment on attachment 8787691 [details]
bug 1300152 - Add nsIDebug2::rustPanic to allow triggering Rust panic for testing.

https://reviewboard.mozilla.org/r/76382/#review74824

WFM.  Your call on whether to add my suggestion to `test_crash_rust_panic.js`.

::: toolkit/crashreporter/test/unit/test_crash_rust_panic.js:8
(Diff revision 1)
>    do_crash(function() {
> -             crashType = CrashTestUtils.CRASH_MOZ_CRASH;
> +             Components.classes["@mozilla.org/xpcom/debug;1"].getService(Components.interfaces.nsIDebug2).rustPanic("OH NO");
> -             crashReporter.annotateCrashReport("TestKey", "TestValue");
>             },
>             function(mdump, extra) {
> -             do_check_eq(extra.TestKey, "TestValue");
> +             //TODO: check some extra things?

Can we do something like:

```
var did_crash = false;
do_crash(..., function(mdump, extra) { did_crash = true; /* TODO more checks */ ... });

do_check_true(did_crash);
```

I know we guard this to only execute on Rust-enabled builds, but it seems like we ought to be checking *something* here.
Attachment #8787691 - Flags: review?(nfroyd) → review+
Comment on attachment 8787691 [details]
bug 1300152 - Add nsIDebug2::rustPanic to allow triggering Rust panic for testing.

https://reviewboard.mozilla.org/r/76382/#review74824

> Can we do something like:
> 
> ```
> var did_crash = false;
> do_crash(..., function(mdump, extra) { did_crash = true; /* TODO more checks */ ... });
> 
> do_check_true(did_crash);
> ```
> 
> I know we guard this to only execute on Rust-enabled builds, but it seems like we ought to be checking *something* here.

The machinery of do_crash does check that we produced a minidump:
https://dxr.mozilla.org/mozilla-central/rev/dbe4b47941c7b3d6298a0ead5e40dd828096c808/toolkit/crashreporter/test/unit/head_crashreporter.js#88

The callback is for more specific checks. If we add an annotation with the Rust panic message, for example, we could check it here.
Ah, excellent!  That's what I get for not poking into the test machinery too deeply.
Comment on attachment 8787692 [details]
bug 1300152 - add rust to mozinfo.json.

https://reviewboard.mozilla.org/r/76384/#review74472

> nit: Are these supposed to be indented only 1 more level instead of 2 more levels?

This is the indentation that Emacs wants to give it. I'm not really sure what the proper indentation is here!
The mozinfo.py change means that it's no longer possible to disable one of these flags by setting them to 0, since bool('0') is True. 

Was that intentional? If yes, what's the correct way to unset such a flag? The change has caused some problems with e.g.
https://dxr.mozilla.org/comm-central/source/mail/config/mozconfigs/linux32/nightly#18
Flags: needinfo?(ted)
Perhaps we should explain why the Thunderbird developers like Aleth and myself are coming to this bug. This is causing a massive lot of oranges since all the test which are marked "run-if = addon_signing",
for example:
https://dxr.mozilla.org/comm-central/rev/77940cbf0c2a9f52c209fbbde5b2e7d4c74a1501/mozilla/toolkit/mozapps/extensions/test/xpcshell/xpcshell-shared.ini#252
are now running and failing in our Xpcshell test suite.
Backed out for permafailing test_crash_rust_panic.js on Windows 8 x64 pgo:

https://hg.mozilla.org/integration/mozilla-inbound/rev/dc0b1953a7f1b649309a5a370c3b0f845578a45b
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ed4423183bde2957b5c12fb5f92b82723fb0aa8

Log of first occurrence of failure: https://treeherder.mozilla.org/logviewer.html#?job_id=35499035&repo=mozilla-inbound

17:01:49  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit/test_crash_rust_panic.js | xpcshell return code: 0
17:01:49     INFO -  TEST-INFO took 350ms
17:01:49     INFO -  >>>>>>>
17:01:49     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
17:01:49     INFO -  PROCESS | 1164 | thread '<unnamed>' panicked at 'OH NO', C:\\builds\\moz2_slave\\m-in-w64-pgo-00000000000000000\\build\\src\\toolkit\\library\\rust\\lib.rs:13
17:01:49     INFO -  PROCESS | 1164 | note: Run with `RUST_BACKTRACE=1` for a backtrace.
17:01:49     INFO -  No minidump found!
17:01:49     INFO -  C:/slave/test/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/head_crashreporter.js:handleMinidump:102
17:01:49     INFO -  C:/slave/test/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/head_crashreporter.js:do_crash:85
17:01:49     INFO -  C:/slave/test/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/test_crash_rust_panic.js:run_test:4
17:01:49     INFO -  C:\slave\test\build\tests\xpcshell\head.js:_execute_test:535
17:01:49     INFO -  -e:null:1
17:01:49     INFO -  exiting test
17:01:49     INFO -  <<<<<<<
(In reply to Jorg K (GMT+2, PTO during summer) from comment #13)
> Perhaps we should explain why the Thunderbird developers like Aleth and
> myself are coming to this bug. This is causing a massive lot of oranges
> since all the test which are marked "run-if = addon_signing",
> for example:
> https://dxr.mozilla.org/comm-central/rev/
> 77940cbf0c2a9f52c209fbbde5b2e7d4c74a1501/mozilla/toolkit/mozapps/extensions/
> test/xpcshell/xpcshell-shared.ini#252
> are now running and failing in our Xpcshell test suite.

Gotcha. Since this needs a fix to re-land anyway, I will fix the mozinfo bits to do `substs.get(var) == '1'` instead. We have historically treated most configure variables as "unset == false, set == true", but that hasn't been strictly enforced in most cases.

I'll have to look into the Win64 PGO failure here to see what's going on there.
Flags: needinfo?(ted)
Depends on: 1301751
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Turns out it was failing on Win64 opt as well, so that's good to know, it's not PGO-only:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&selectedJob=35621773
You do have a knack for the incomprehensible.

We've been having 10.10 opt e10s browser-chrome and devtools shutdown crashes without symbols for a couple of days, https://treeherder.mozilla.org/logviewer.html#?job_id=4927217&repo=mozilla-central https://treeherder.mozilla.org/logviewer.html#?job_id=4927223&repo=mozilla-central

Tonight I spent the hours of retriggering to track them down, only to find I was too late: treeherder requires some staring at to see it, since you landed on top of bustage and then we backed you out on top of bustage, but https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&tochange=8c9117de1e7f40af42b7cbce25bc3780c032fe45&fromchange=b0c4d7caec6f3941a1d273c82b090153ee278760&filter-searchStr=10.10%20opt%20e10s%20browser-chrome&group_state=expanded&selectedJob=35647916 is your landing where several of the also-permafailing bc3 runs and one bc5 run had the [@ XUL + 0x3e4d140] crash, and nothing before did, and then https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=10.10%20opt%20e10s%20browser-chrome&tochange=dc0b1953a7f1b649309a5a370c3b0f845578a45b&fromchange=bb033983fe8d5128bc3e01df3b56974898e25c57&selectedJob=35648880 is the backout, with several instances on pushes before the backout and then despite the tons of orange on the backout, none of it is that crash. What there is a lot of, though, is An Clue: you didn't cause a new shutdown crash, what you did was turn the too-frequent browser-chrome and devtools shutdown crash of bug 1294009 into a symbol-free crash. So, don't break our shutdown crash symbols next time, kthx :)
So if I read comment #18 correctly, this bug shouldn't have been backed out. We'd appreciate it however if it could be relanded *modified* as per comment #16 ;-)
No, comment 18 is another reason it should have been backed out: not only did it fail its own test frequently on Win8 and permanently on Win8 PGO, but it also somehow caused an existing shutdown crash to not have symbols in the stack, making it unrecognizable.
Sorry, misunderstood. Thanks for explaining it.
(In reply to Phil Ringnalda (:philor) from comment #18)
> You do have a knack for the incomprehensible.
> 
> We've been having 10.10 opt e10s browser-chrome and devtools shutdown
> crashes without symbols for a couple of days,

Just to close the loop here, this is tracked in bug 1301751. Apparently failing to dump symbols in that particular way does not cause the build to fail, which is exciting.

The Win8 test failures would have been easier to notice if we didn't have those tests disabled-by-default on try. :-(
Depends on: 1302078
No longer blocks: 1268328
Blocks: 1348896
Attachment #8787692 - Attachment is obsolete: true
Comment on attachment 8787691 [details]
bug 1300152 - Add nsIDebug2::rustPanic to allow triggering Rust panic for testing.

I'd like to re-land this. I was able to ditch the mozinfo patch because --disable-rust went away. Pretty minor tweaks here, with the exception of the Rust code moving from toolkit/library/rust/lib.rs to toolkit/library/rust/shared/lib.rs.

I've marked the test as failing on win64 until we get that other bug sorted out.
Attachment #8787691 - Flags: review+ → review?(nfroyd)
Comment on attachment 8787691 [details]
bug 1300152 - Add nsIDebug2::rustPanic to allow triggering Rust panic for testing.

https://reviewboard.mozilla.org/r/76382/#review124482

::: toolkit/library/rust/shared/lib.rs:19
(Diff revision 2)
> +use std::ffi::CStr;
> +use std::os::raw::c_char;
> +
> +/// Used to implement `nsIDebug2::RustPanic` for testing purposes.
> +#[no_mangle]
> +pub extern fn intentional_panic(message: *const c_char) {

I'm sad that we have to export this from libxul, but I don't see a good way to avoid doing that, short of hacky linker scripts or something.

Please make this `pub extern "C" fn`, just to be explicit.
Attachment #8787691 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/40eb77a98bcfbf70cffac94581b0d4ad869b3ded
bug 1300152 - Add nsIDebug2::rustPanic to allow triggering Rust panic for testing. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/dbe7d041a445ab9d0aca611cf84d49ff7df73654
bug 1300152 - Add nsIDebug2::rustPanic to allow triggering Rust panic for testing. r=froydnj
I mentioned this in bug 1275780, but I had put `fails-if` in the xpcshell.init, but the thing I needed was `fail-if`. I fixed that and re-landed.
Flags: needinfo?(ted)
https://hg.mozilla.org/mozilla-central/rev/dbe7d041a445
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
This can be triggered in a Nightly build by entering

> Cc["@mozilla.org/xpcom/debug;1"].getService(Ci.nsIDebug2).rustPanic("crash");

in the Browser Console. To enable the input box in the Browser Console menu, open the Web Console (Hamburger menu -> Web Developer -> Web Console or cmd/ctrl+shift+K) Then click the gear icon in the upper right of the devtools pane, and check "enable browser chrome and add-on debugging toolboxes" under Advanced Preferences. Then open the Browser Console and paste the above line into the newly-enabled input box.

Note that this input box has interface-level privileges, which are required to access the crash method. It can also do a lot of other nasty things, so don't paste code from untrusted sources there. This is why the input box is disabled by default.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: