Open Bug 1853913 Opened 1 year ago Updated 6 days ago

Perma macOS 11 shippable toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | xpcshell return code: 0

Categories

(Toolkit :: Crash Reporting, defect, P5)

defect

Tracking

()

REOPENED
Tracking Status
firefox-esr115 --- unaffected
firefox118 --- unaffected
firefox119 --- wontfix
firefox120 --- wontfix

People

(Reporter: intermittent-bug-filer, Assigned: haik, NeedInfo)

References

(Regression)

Details

(Keywords: intermittent-failure, intermittent-testcase, regression)

Attachments

(1 file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=429668109&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/BA1rYtpSTJO1BXCM7KpoaA/runs/0/artifacts/public/logs/live_backing.log


[task 2023-09-19T12:29:43.469Z] 12:29:43     INFO -  TEST-START | toolkit/crashreporter/test/unit_ipc/test_content_phc3.js
[task 2023-09-19T12:29:43.758Z] 12:29:43  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | xpcshell return code: 0
[task 2023-09-19T12:29:43.758Z] 12:29:43     INFO -  TEST-INFO took 289ms

This started to fail with this merge https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=0a60f8be5517ddc928471d4a378b46228773fbc5
and fails only on mozilla-central.
Could be from https://hg.mozilla.org/mozilla-central/rev/8e3c8eba5d70b0f35e69af8b01b083ce4b75f246 or maybe https://hg.mozilla.org/mozilla-central/rev/330a48f1b6ad46d41bf24c85a6cc8aeb2d4621b5
Gabriele, could you have a look over it? Thank you.

Full list of test failing is this one:

7278	12:29:43 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | xpcshell return code: 0
7348	12:29:44 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc2.js | xpcshell return code: 0
7418	12:29:44 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc.js | xpcshell return code: 0
7548	12:29:44 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_annotation.js | xpcshell return code: 0
7630	12:29:45 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_large_annotation.js | xpcshell return code: 0
7700	12:29:45 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_exception_time_annotation.js | xpcshell return code: 0
7770	12:29:45 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_oom_annotation.js | xpcshell return code: 0
Flags: needinfo?(gsvelto)
Summary: Perma WebRender Shippable toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | single tracking bug → Perma macOS 11 shippable toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | single tracking bug
See Also: → 1853914
Summary: Perma macOS 11 shippable toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | single tracking bug → Perma macOS 11 shippable toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | xpcshell return code: 0

I suspect this was caused by bug 1593072. This line in the log is the smoking gun:

INFO - PID 7676 | JavaScript error: resource://test/CrashTestUtils.sys.mjs, line 53: Error: couldn't open library /opt/worker/tasks/task_169512460876493/build/tests/xpcshell/tests/toolkit/crashreporter/test/libtestcrasher.dylib: dlopen(/opt/worker/tasks/task_169512460876493/build/tests/xpcshell/tests/toolkit/crashreporter/test/libtestcrasher.dylib, 2): no suitable image found. Did find:

Haik, we build a special library to crash the xpcshell tests, it's call testcrasher and built here. Do I need to change it in some way to accommodate for the new entitlements?

Flags: needinfo?(gsvelto)

Forgot the NI? see my question in comment 3.

Flags: needinfo?(haftandilian)

(In reply to Gabriele Svelto [:gsvelto] from comment #3)

I suspect this was caused by bug 1593072. This line in the log is the smoking gun:

INFO - PID 7676 | JavaScript error: resource://test/CrashTestUtils.sys.mjs, line 53: Error: couldn't open library /opt/worker/tasks/task_169512460876493/build/tests/xpcshell/tests/toolkit/crashreporter/test/libtestcrasher.dylib: dlopen(/opt/worker/tasks/task_169512460876493/build/tests/xpcshell/tests/toolkit/crashreporter/test/libtestcrasher.dylib, 2): no suitable image found. Did find:

Haik, we build a special library to crash the xpcshell tests, it's call testcrasher and built here. Do I need to change it in some way to accommodate for the new entitlements?

Thanks for the background. This is definitely looks like a regression from the entitlement fixes and the fix is probably going to be to change how the executable doing the loading is signed.

With bug 1593072, most of our executables are not permitted to load third party libraries, but that restriction is not compatible with how we self-sign non-production try builds so we relax it on those builds. That is not happening here so we need to fix that. I'll take the bug and look into this some more.

In the full log, this part gives it a away not valid for use in process using Library Validation: mapped file has no Team ID and is not a platform binary (signed with custom identity or adhoc?).

Assignee: nobody → haftandilian
Flags: needinfo?(haftandilian)
Keywords: regression
Regressed by: 1593072

Set release status flags based on info from the regressing bug 1593072

The problem here is that when the test is run on the official mozilla-central repo, it is run with our production cert and entitlements which prevent the parent process firefox executable and plugin-container executables from loading dylibs not signed by Mozilla or Apple.

The xpcshell executable being used by the test is not signed and therefore not subject to the entitlements, but xpcshell spawns a plugin-container executable which loads libtestcrasher.dylib and since libtestcrasher.dylib is not signed, the load is not permitted by macOS.

This test should pass for try pushes because those builds use our developer entitlements which permit loading of third party libraries by plugin-container to workaround other issues with signing.

Solutions I can think of (from least to most hacky):

  1. :gsvelto mentioned that this test could be refactored to not use libtestcrasher.dylib. That seems like the best approach.

  2. Have our signing infrastructure sign libtestcrasher.dylib which would allow it to be loaded by plugin-container on production builds.

  3. Tests that need to inject a dylib on production builds could first strip off the signature and entitlements from executables so that injection can work. (This can be done with the codesign command.) The downside of this is that the tests would not be as representative of real world use because the entitlements might have other impacts affecting test execution.

I'll look into these options.

Sorry we didn't catch this before landing bug 1593072. We tested with production certs, but didn't run this test.

Flags: needinfo?(haftandilian)

Haik, any updates on this one? Being that it's permanently crashing should we consider disabling these tests on macOS 11?

Flags: needinfo?(haftandilian)
Flags: needinfo?(haftandilian)

@Cosmin, with the workaround fix just landed for bug 1856972, the test should stop failing. Eventually we'll have to revisit this problem and update the test. Once it is confirmed that the test is passing again, we can close this bug.

Flags: needinfo?(haftandilian)

Still failing after those changes reached central: https://treeherder.mozilla.org/jobs?repo=mozilla-central&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&searchStr=os%2Cx%2C11%2Cship%2Cxpc&revision=6769216667ff487b1442f654ee41dc5f12c23983&selectedTaskRun=APaw57pLTzaeIm8LBedSOg.0
List of tests:

7912	06:47:18 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit/test_oom_annotation.js | xpcshell return code: 0
7975	06:47:18 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit/test_crashreporter_crash.js | xpcshell return code: 0
8033	06:47:18 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_annotation.js | xpcshell return code: 0
8108	06:47:18 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_large_annotation.js | xpcshell return code: 0
8171	06:47:19 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_exception_time_annotation.js | xpcshell return code: 0
8234	06:47:19 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_oom_annotation.js | xpcshell return code: 0


6233	06:46:28 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc.js | xpcshell return code: 0
6296	06:46:28 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc2.js | xpcshell return code: 0
6359	06:46:28 WARNING - TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit_ipc/test_content_phc3.js | xpcshell return code: 0
Flags: needinfo?(haftandilian)

(In reply to Haik Aftandilian [:haik] from comment #11)

@Cosmin, with the workaround fix just landed for bug 1856972, the test should stop failing. Eventually we'll have to revisit this problem and update the test. Once it is confirmed that the test is passing again, we can close this bug.

Ah, thanks. My comment was incorrect. The fix for bug 1856972 affected the parent process executable only and this test is failing because of the plugin-container child process entitlements (per comment 7). I would rather not disable the test because it should still be passing for try pushes. If we could disable it only for mozilla-central pushes, that could be a good short term workaround. I will try to get to this next week.

Duplicate of this bug: 1859041

You're right, as it does not affect try pushes will leave it as is until a fix is in place. Thanks for looking into it.

Duplicate of this bug: 1853914

I had thought it might be reasonable to add a workaround for this particular test involving stripping the signatures from our executables and re-signing in a way that allows loading of libtestcrasher.dylib. But after looking at xpcshell tests more, I don't see a way to do that without in a minimal way.

@gsvelto, you had mentioned possibly rewriting this test to not depend on loading a dylib. Did you have a method in mind? It sounds like that would mean adding the crashing code in shipping builds.

Alternatively, we could have libtestcrasher.dylib be signed. @Heitor, could you comment on how complex that would be?

Flags: needinfo?(hneiva)
Flags: needinfo?(haftandilian)
Flags: needinfo?(gsvelto)

I would stuff the various crash methods into nsIDebug2, we already have some there like this one. This would make the crash methods available to mochitests too, which would be nice.

Flags: needinfo?(gsvelto)

Update

There have been 40 total failures within the last 7 days, all of them on OS X 11 WebRender Shippable opt.

Recent log: https://treeherder.mozilla.org/logviewer?job_id=435024551&repo=mozilla-central&lineNumber=9073

Whiteboard: [stockwell disable-recommended] → [stockwell needwork:owner]

I was able to pull libtestcrasher.dylib from the build and sign it in a separate task. I poked around for awhile at the test tasks but was unable to (A) find where the signed file needs to be fetched and (B) how to unpack and use it.

Anyone with more experience than me getting around the test tasks ?

Flags: needinfo?(hneiva)

Update

There have been 48 total failures within the last 7 days, all of them on OS X 11 WebRender Shippable opt.

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=437641779&repo=mozilla-central&lineNumber=5276

Whiteboard: [stockwell needwork:owner][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended] → [stockwell needwork:owner][stockwell disable-recommended]
Whiteboard: [stockwell needwork:owner][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended][stockwell disable-recommended] → [stockwell needwork:owner][stockwell disable-recommended]

In the last 2 weeks there is no occurrency of the fail

Whiteboard: [stockwell needwork:owner][stockwell disable-recommended]
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE

needinfo for myself to resolve this so the test can continue to be run.

Flags: needinfo?(haftandilian)
Status: RESOLVED → REOPENED
Flags: needinfo?(haftandilian)
Resolution: INCOMPLETE → ---
Flags: needinfo?(haftandilian)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: