Closed Bug 1889839 Opened 1 year ago Closed 1 year ago

Intermittent UntrustedModulesFixture.Serialize | SEH exception with code 0xc0000005 thrown in the test body. @ (null):-1

Categories

(Toolkit :: Startup and Profile System, defect, P5)

defect

Tracking

()

RESOLVED FIXED
130 Branch
Tracking Status
firefox130 --- wontfix
firefox131 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gstoll)

References

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered][stockwell disable-recommended])

Attachments

(5 files)

Filed by: ctuns [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=453578758&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MBg2OYyrSOCeQdcyVQgsOg/runs/0/artifacts/public/logs/live_backing.log


[task 2024-04-05T01:07:33.231Z] 01:07:33     INFO -  TEST-PASS | TestDllServices.DoNotGetNestedMicrosoftCertificate | test completed (time: 2ms)
[task 2024-04-05T01:07:33.231Z] 01:07:33     INFO -  TEST-START | UntrustedModulesFixture.Serialize
[task 2024-04-05T01:07:33.370Z] 01:07:33     INFO -  Received data. (attempts=1)
[task 2024-04-05T01:07:33.396Z] 01:07:33     INFO -  Received data. (attempts=2)
[task 2024-04-05T01:07:33.417Z] 01:07:33     INFO -  Received data. (attempts=2)
[task 2024-04-05T01:07:33.436Z] 01:07:33     INFO -  Received data. (attempts=2)
[task 2024-04-05T01:07:33.449Z] 01:07:33     INFO -  Received data. (attempts=2)
[task 2024-04-05T01:07:33.462Z] 01:07:33  WARNING -  TEST-UNEXPECTED-FAIL | UntrustedModulesFixture.Serialize | SEH exception with code 0xc0000005 thrown in the test body. @ (null):-1
[task 2024-04-05T01:07:33.462Z] 01:07:33  WARNING -  TEST-UNEXPECTED-FAIL | UntrustedModulesFixture.Serialize | test completed (time: 230ms)
[task 2024-04-05T01:07:33.463Z] 01:07:33     INFO -  TEST-START | UntrustedModulesFixture.Backup

First occurrence, so far: backfill range and retriggers.

Whiteboard: [retriggered]

Update:
There have been 67 failures within the last 7 days:
• 19 failures on Windows 11 x86 22H2 WebRender opt
• 32 failures on Windows 11 x86 22H2 WebRender debug
• 15 failures on Windows 11 x64 22H2 CCov WebRender opt
• 1 failures on Windows 11 x64 22H2 WebRender opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=461585236&repo=mozilla-central&lineNumber=40082

[task 2024-06-09T11:27:42.232Z] 11:27:42     INFO -  TEST-START | UntrustedModulesFixture.Serialize
[task 2024-06-09T11:27:42.360Z] 11:27:42     INFO -  Received data. (attempts=1)
[task 2024-06-09T11:27:42.391Z] 11:27:42     INFO -  Received data. (attempts=2)
[task 2024-06-09T11:27:42.422Z] 11:27:42     INFO -  Received data. (attempts=2)
[task 2024-06-09T11:27:42.446Z] 11:27:42     INFO -  Received data. (attempts=2)
[task 2024-06-09T11:27:42.469Z] 11:27:42     INFO -  Received data. (attempts=2)
[task 2024-06-09T11:27:42.471Z] 11:27:42  WARNING -  TEST-UNEXPECTED-FAIL | UntrustedModulesFixture.Serialize | SEH exception with code 0xc0000005 thrown in the test body. @ (null):-1
[task 2024-06-09T11:27:42.471Z] 11:27:42  WARNING -  TEST-UNEXPECTED-FAIL | UntrustedModulesFixture.Serialize | test completed (time: 238ms)
[task 2024-06-09T11:27:42.472Z] 11:27:42     INFO -  TEST-START | UntrustedModulesFixture.Backup
[task 2024-06-09T11:27:42.485Z] 11:27:42     INFO -  Received data. (attempts=2)
[task 2024-06-09T11:27:42.507Z] 11:27:42     INFO -  Received data. (attempts=2)
<...>
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

Hi Dave! Can you please take a look at this? With this failure rate, the test was in the disable recommended list.
Thank you!

Flags: needinfo?(dtownsend)

I don't know anything about this test. Greg, looks like you've touched it some, can you take a look?

Flags: needinfo?(dtownsend) → needinfo?(gstoll)

Yes, I'll take a look, thanks!

Assignee: nobody → gstoll
Status: NEW → ASSIGNED
Flags: needinfo?(gstoll)
Keywords: leave-open
Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/992040b8163c add print statements to narrow down crash on CI r=yjuglaret

Hi Greg, can you take another look at this? The failure rate is still pretty high.
Recent log: https://treeherder.mozilla.org/logviewer?job_id=465863213&repo=mozilla-central

Flags: needinfo?(gstoll)

Yeah, I was just looking at these. It's weird because it seems like the most common failure mode is now like the recent log you linked, which looks like a crash in NativeNtBlockSet_Write. I'll look into it.

Flags: needinfo?(gstoll)
See Also: → 1892522
Attachment #9413629 - Attachment description: Bug 1889839 - add more logging for intermittent UntrustedModulesFixture.Serialize test r=yjuglaret! → Bug 1889839 - possibly fix intermittent UntrustedModulesFixture.Serialize test r=yjuglaret!
Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/abbc09c66740 possibly fix intermittent UntrustedModulesFixture.Serialize test r=yjuglaret

There are still occasional failures like this one. The last log before crashing is UntrustedModulesFixture::Setup bottom. So the failure point is still the following line: AutoJSContextWithGlobal cx(mCleanGlobal);. But since we moved this line, the failure point moved as well. Then the log shows the call stack for the failure:

#05: XRE_GetBootstrap[D:\task_172231404848826\build\application\firefox\gtest\xul.dll +0xaf2f061]
#06: mozilla_net_is_label_safe[D:\task_172231404848826\build\application\firefox\gtest\xul.dll +0x338a0ea]
#07: VR_RuntimePath[D:\task_172231404848826\build\application\firefox\gtest\xul.dll +0x8efdfbb]

Which, when applying the symbols for this job, yields:

#05: xul!JS::GetNonCCWObjectGlobal(JSObject*)+0x31
#06: xul!xpc::NativeGlobal(JSObject*)+0x1a
#07: xul!mozilla::dom::AutoJSAPI::Init(JSObject*)+0x1b

Looking at the disassembly there, the failure occurs as we access reinterpret_cast<const JS::shadow::Object*>(mCleanGlobal)->shape->immutableFlags (through js::IsProxy), we crash because the shape is an invalid pointer. Presumably a GC occured and collected the object behind mCleanGlobal.

So the issue is likely that a GC can still occur, this time during UntrustedModulesFixture::InitialModuleLoadOnce which is the only remaining code that gets executed between UntrustedModulesFixture::SetUp and the start of the test.

Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/872fea3a084b move creation of JS object to right before the test starts to avoid GC r=yjuglaret DONTBUILD
Target Milestone: --- → 130 Branch
Blocks: 1892522
See Also: 1892522

The proposed fix has been in for a while with no failures on main, hooray!

Keywords: leave-open
Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/18a7ff23e6f4 part 1: restructure TelemetryFixture to make it impossible for mCleanGlobal to be GC'd r=yjuglaret https://hg.mozilla.org/integration/autoland/rev/ba6f1bae31b9 part 2: remove print statements that were added for debugging r=yjuglaret

Sorry! Should have a fix.

Flags: needinfo?(gstoll)
Pushed by gstoll@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/88d802211b84 part 1: restructure TelemetryFixture to make it impossible for mCleanGlobal to be GC'd r=yjuglaret https://hg.mozilla.org/integration/autoland/rev/0add3e0d22f0 part 2: remove print statements that were added for debugging r=yjuglaret
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

The patch landed in nightly and beta is affected.
:gstoll, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox130 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(gstoll)
Flags: needinfo?(gstoll)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: