Closed Bug 1552099 Opened 4 years ago Closed 2 years ago

Crash in [@ EnqueuePromiseReactionJob] spiking from zh-cn locales since 2019-05-15

Categories

(Core :: JavaScript Engine, defect, P1)

66 Branch
defect

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox66 --- wontfix
firefox67 - wontfix
firefox67.0.1 --- wontfix
firefox68 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox87 --- wontfix
firefox88 + fixed
firefox89 + fixed

People

(Reporter: philipp, Assigned: arai)

References

Details

(Keywords: crash)

Crash Data

[Tracking Requested - why for this release]:

This bug is for crash report bp-14f8a344-f0c7-42b9-9a5a-f51fd0190516.

Top 10 frames of crashing thread:

0 xul.dll static bool EnqueuePromiseReactionJob js/src/builtin/Promise.cpp:1069
1 xul.dll static bool ResolvePromise js/src/builtin/Promise.cpp:1125
2 xul.dll static bool FulfillMaybeWrappedPromise js/src/builtin/Promise.cpp:1162
3 xul.dll static bool ResolvePromiseInternal js/src/builtin/Promise.cpp:862
4 xul.dll static bool ResolveOrRejectPromise js/src/jsapi.cpp:4038
5 xul.dll mozilla::dom::Promise::MaybeResolve dom/promise/Promise.cpp:298
6 xul.dll static void mozilla::dom::Promise::MaybeSomething<RefPtr<mozilla::dom::cache::Cache>&> dom/promise/Promise.h:251
7 xul.dll mozilla::dom::cache::CacheOpChild::Recv__delete__ dom/cache/CacheOpChild.cpp:158
8 xul.dll mozilla::dom::cache::PCacheOpChild::OnMessageReceived ipc/ipdl/PCacheOpChild.cpp:106
9 xul.dll void mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2087

this content crash signature is spiking up since 24 hours ago from users of zh-cn builds, so far there are about 700 submitted reports:

https://crash-stats.mozilla.com/signature/?useragent_locale=zh-cn&signature=EnqueuePromiseReactionJob&date=%3E%3D2019-05-01#graphs

urls in reports commonly show sites on the https://blog.csdn.net domain, so probably some change on the site there or from embedded ad scripts is starting to trigger this now

Type: task → defect

Jason, any chance you can look at this?

Flags: needinfo?(jorendorff)
OS: Windows → All

OK, this is probably bug 1447327. Lots of good info in that bug, including STR. We can fix this.

Depends on: 1447327
Flags: needinfo?(jorendorff)
Priority: -- → P1

Not tracking for 67, we only had a handful of crashes with this signature during the beta cycle and only 4 since we shipped 67 to release.

Hi there,
Some users reported that they encountered a crash when using a script to access bilibili.com. I traced it back to this bug.

A reproducible method is: use tampermonkey to install the script greasyfork.org/scripts/10357, and then visit bilibili.com, there is a high probability that the page will crash, while using other browsers to perform the same operation will not.

This is the crash report related to it:
bp-a5196111-f09d-4548-919d-b54670210320
bp-ee903d31-401d-4e5b-bd5c-a07c80210328
bp-7d449510-2242-4574-8109-b733e0210328
bp-6d38b51e-5478-4646-8c72-f5bb40210328

BTW I don't have access to see comments in crash report. I was wondering if there is any mention of url in the comments.

[Tracking Requested - why for this release]: There is a sharp increase of these crashes on March 11, almost all crashes are from zh-CN builds.

Probably the same as https://github.com/mozilla-mobile/fenix/issues/18155

EDIT: Or not, this one seems to be showing only Desktop crashes.

Flags: needinfo?(ryanvm)

(In reply to yxu from comment #4)

BTW I don't have access to see comments in crash report. I was wondering if there is any mention of url in the comments.

https://www.bilibili.com/ is indeed showing up a lot in the URL field of submitted crash reports. Steven, can you please help us find an owner for this big crash spike on release?

Flags: needinfo?(sdetar)

Tooru, would you be the right person to investigate this issue?

So far, most of the crashes are null pointer reads at this location.

This is a bit strange as this line is supposed to be guarded by the previous condition, unless the location are badly reported, in which case this could be within the nonCCWGlobal function, which would suggest that a realm got GC-ed?

Flags: needinfo?(sdetar) → needinfo?(arai.unmht)
Severity: critical → S2

(In reply to yxu from comment #4)

Hi there,
Some users reported that they encountered a crash when using a script to access bilibili.com. I traced it back to this bug.

A reproducible method is: use tampermonkey to install the script greasyfork.org/scripts/10357, and then visit bilibili.com, there is a high probability that the page will crash, while using other browsers to perform the same operation will not.

This is the crash report related to it:
bp-a5196111-f09d-4548-919d-b54670210320
bp-ee903d31-401d-4e5b-bd5c-a07c80210328
bp-7d449510-2242-4574-8109-b733e0210328
bp-6d38b51e-5478-4646-8c72-f5bb40210328

BTW I don't have access to see comments in crash report. I was wondering if there is any mention of url in the comments.

Thank you for the info.

Unfortunately I cannot reproduce the crash with the following steps:

  1. Install Firefox 87.0 zh-CN (https://ftp.mozilla.org/pub/firefox/releases/87.0/win64/zh-CN/Firefox%20Setup%2087.0.exe), on Windows 10 Pro 20H2
  2. Run Firefox 87.0 zh-CN, with a new clean profile
  3. Install Tampermonkey https://addons.mozilla.org/en-US/firefox/addon/tampermonkey/
  4. Install the the script greasyfork.org/scripts/10357
  5. Open https://www.bilibili.com/ in 10 tabs
  6. Click the up/down arrows added by scripts 4 times, for each tab

Does the crash happen for you with the above steps?
If it happens, how frequent is it?

If it doesn't, do you have any idea what the difference between your steps and the above?

Flags: needinfo?(yxu)

Interesting, I found that this crash only occurs after logging in bilibili account. Enabling the script on the bilibili.com homepage before logging in don't cause a crash, but it happens immediately after that. And the frequency is 100%. Tooru, do you have a bilibili account login for testing?

The crashes reported by me and other users is also under Windows, version 87.0. Judging from the increased time of feedback and crashes, it is reasonable to suspect that it happened from 86.0.1.

In addition, I found https://greasyfork.org/zh-CN/scripts/372673 will also cause the same type of crash.

Flags: needinfo?(yxu)

Thank you again!
Indeed, reproduced the crash after logging into a test account I just created.
will investigate the crash.

Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED

The crash happens because reaction->getAndClearIncumbentGlobalObject() returns a CCW with mHasSecurityPolicy == true and mFlags == 0x01, that I suppose CrossCompartmentSecurityWrapper.
(not XrayWrapper because it doesn't have WrapperFactory::IS_XRAY_WRAPPER_FLAG),

https://searchfox.org/mozilla-central/rev/402802eb42ba7d905531fd8e3b089fc8b2292f66/js/src/builtin/Promise.cpp#1250

  Rooted<GlobalObject*> global(cx);
  if (JSObject* objectFromIncumbentGlobal =
          reaction->getAndClearIncumbentGlobalObject()) {
    objectFromIncumbentGlobal = CheckedUnwrapStatic(objectFromIncumbentGlobal);

https://searchfox.org/mozilla-central/rev/402802eb42ba7d905531fd8e3b089fc8b2292f66/js/src/proxy/Wrapper.cpp#382

JS_FRIEND_API JSObject* js::CheckedUnwrapStatic(JSObject* obj) {
  while (true) {
    JSObject* wrapper = obj;
    obj = UnwrapOneCheckedStatic(obj);
    if (!obj || obj == wrapper) {
      return obj;
    }
  }
}

JS_FRIEND_API JSObject* js::UnwrapOneCheckedStatic(JSObject* obj) {
...
  const Wrapper* handler = Wrapper::wrapperHandler(obj);
  return handler->hasSecurityPolicy() ? nullptr : Wrapper::wrappedObject(obj);
}

That results in returning nullptr from js::UnwrapOneCheckedStatic, and js::CheckedUnwrapStatic.

there seems to be a kind of race between "associating multiple globals with document.domain" and "a userscript's execution",
and that behaves badly around setting/getting incumbent global, in promise handling.

the "multiple globals" are top-level page and inline frame, hosted in different subdomain.

the inline frame's script tries to call a function in the top-level page's script, and receives a Promise object, and then calls then on it.
js::GetObjectFromIncumbentGlobal somehow creates a security wrapper in some case, and that results in failure case in CheckedUnwrapStatic above.
to my understanding, in such case the access to the top-level page's data should fail (and so the promise handling in then call won't happen), but it somehow succeeds.

in bilibili page, the security wrapper is created almost 100% of the time, but with simplified testcase, it's created only 10% or less of the time.
and in failure case, looks like the userscript's execution takes longer, and also it fails to access top-level page's property.
(the userscript accesses window.top.addEventListener, and fails)

the issue is either:

  • (a) the incumbent global object shouldn't be a security wrapper, but unexpectedly it becomes so
  • (b) the incumbent global object can be a security wrapper, and the code around that should handle the case

if (a), we should look into the code that causes the situation and somehow fix that (not yet sure the details).
if (b), we should throw either when getting incumbent global object before storing it into reaction (GetObjectFromIncumbentGlobal), or when getting incumbent global object from reaction (EnqueuePromiseReactionJob, or more later)

Depends on: 1702863
Flags: needinfo?(arai.unmht)

So far, the fix from bug 1702863 looks like it's working well on Nightly (no crashes reported since it landed). We'll keep an eye on this after 88.0rc1 ships and close it out if that trend continues.

No signs of this crash on 88.0rc1 either. Closing this out.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch
You need to log in before you can comment on or make changes to this bug.