1182197 - crash in mozilla::dom::Promise::MaybeSettle or mozilla::dom::Promise::Settle

Reporter

Description

•

10 years ago

[Tracking Requested - why for this release]: This bug was filed from the Socorro interface and is report bp-7cf1e3e2-1448-46e0-99ce-e2b282150709. ============================================================= Top Frames: 0 xul.dll mozilla::dom::Promise::MaybeSettle(JS::Handle<JS::Value>, mozilla::dom::Promise::PromiseState) dom/promise/Promise.cpp 1 xul.dll mozilla::dom::Promise::ResolveInternal(JSContext*, JS::Handle<JS::Value>) dom/promise/Promise.cpp 2 xul.dll mozilla::dom::Promise::JSCallback(JSContext*, unsigned int, JS::Value*) dom/promise/Promise.cpp 3 xul.dll js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/vm/Interpreter.cpp 4 xul.dll js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/proxy/DirectProxyHandler.cpp 5 xul.dll js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/proxy/CrossCompartmentWrapper.cpp 6 xul.dll js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/vm/Interpreter.cpp 7 xul.dll Interpret js/src/vm/Interpreter.cpp [...] It looks like all of those have an address of 0x4, so this seems to be a near-null deref. Those crashes are new to 40, and are 0.9% of overall 40.0b2 data.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 1

•

10 years ago

Dup of bug 1178271?

Flags: needinfo?(kairo)

Robert Kaiser

Reporter

Updated

•

10 years ago

Crash Signature: [@ mozilla::dom::Promise::MaybeSettle(JS::Handle<T>, mozilla::dom::Promise::PromiseState)] → [@ mozilla::dom::Promise::MaybeSettle(JS::Handle<T>, mozilla::dom::Promise::PromiseState)] [@ mozilla::dom::Promise::MaybeSettle]

Flags: needinfo?(kairo)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

10 years ago

Flags: needinfo?(kairo)

Robert Kaiser

Reporter

Comment 2

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #1) > Dup of bug 1178271? If MaybeSettle and Settle are the same, then it might very well be - but the signature here has more volume in the wild.

Flags: needinfo?(kairo)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 3

•

10 years ago

Here's the full implementation of MaybeSettle: 1383 Promise::MaybeSettle(JS::Handle<JS::Value> aValue, 1384 PromiseState aState) 1385 { 1386 // Promise.all() or Promise.race() implementations will repeatedly call 1387 // Resolve/RejectInternal rather than using the Maybe... forms. Stop SetState 1388 // from asserting. 1389 if (mState != Pending) { 1390 return; 1391 } 1392 1393 Settle(aValue, aState); 1394 } so your signature just depends on what inlining happened. Since bug 1178271 is the one with more information, marking dependent.

Depends on: 1178271

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 4

•

10 years ago

This was mentioned in the FF40 Beta stability issues in today's channel meeting.

tracking-firefox40: ? → +

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 5

•

10 years ago

Boris, KaiRo mentioned that we are hitting this crash on Beta channel at 1% which is still pretty high. Can you please help find an owner for this bug? I don't see much activity on bug 1178271 either and therefore requesting your help. Thanks!

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

10 years ago

It's really not clear what's going on here, unfortunately. I don't know who can own this because it's not clear where things are going wrong. It's possibly our memory is just corrupt. It's possible there are bugs in the GC or CC... Do we have any useful data on when this started? On whether it's correlated with anything?

Flags: needinfo?(bzbarsky)

(Away)

Comment 7

•

10 years ago

On aurora, this was first seen in 20150515091916. The crash volume was pretty low at the time, so the issue may have been introduced a build or two before then. Here's a possible regression range: https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=d14cf14709f4&tochange=b03c5f1a77a1 It's mostly Pocket changes. On nightly this was first seen in 20150430030201 but the daily volume was even lower, so it's really hard to say. If there's no obvious fix in sight, can you think of any RELEASE_ASSERTs or other diagnostics that might help point to the problem?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

10 years ago

One thing that might be handy is the JS stack when we're resolving this cced promise. That would give us some idea of what code is involved that's keeping it around after it's been CCed, at least.... This stack would presumably contain URIs in it, though ideally only chrome:// ones, so I'm not sure what our policies are around collecting such a thing.

Lawrence Mandel [:lmandel] (use needinfo)

Comment 9

•

10 years ago

This bug is now wontfix for 40. I see crash reports for every beta in the 40 cycle except beta8 and beta9. (beta9 was only released today.) Did we perhaps fix this in beta8? https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3Adom%3A%3APromise%3A%3AMaybeSettle%28JS%3A%3AHandle%3CT%3E%2C+mozilla%3A%3Adom%3A%3APromise%3A%3APromiseState%29

status-firefox40: affected → wontfix

tracking-firefox41: --- → +

tracking-firefox42: --- → +

Caspy7

Comment 10

•

10 years ago

I experienced this crash (cfce7fcf-f66e-470f-9515-dc0dd2150802). I was seeing black drawing over the UI & Page. Scrolling and such would resolve it, but eventually it got too frustrating and I went to shut down the browser at which point it crashed.

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 11

•

10 years ago

David, Boris, I am trying to follow up on 41 tracked bugs. This crash is showing up most predominantly on 40.0.2 followed up by 41.0b2. Do we have additional crash dumps that could be investigated to help root cause it further? I know you mentioned a regression range in comment 7 and I am wondering whether we may want to try out a speculative fix?

Flags: needinfo?(dmajor)

Flags: needinfo?(bzbarsky)

(Away)

Comment 12

•

10 years ago

I defer to bz; I can't really speak to the next steps here without knowing the code.

Flags: needinfo?(dmajor)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

10 years ago

We have no idea for a speculative fix so far. It _looks_ like someone is holding on to a Promise after unlink and then exposing it to JS, but I have no idea what would be doing that, exactly... We could try to modify the Promise code to gather the information I suggest in comment 8, as long as we're OK with shipping that information out in crash dumps...

Flags: needinfo?(bzbarsky)

Robert Kaiser

Reporter

Comment 14

•

10 years ago

I personally would be very happy if we put something in there that makes the crashes easier to investigate. It sounds like this could happen more often with Promises getting used more and more.

(Away)

Comment 15

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #8) > One thing that might be handy is the JS stack when we're resolving this cced > promise. That would give us some idea of what code is involved that's > keeping it around after it's been CCed, at least.... > > This stack would presumably contain URIs in it, though ideally only > chrome:// ones, so I'm not sure what our policies are around collecting such > a thing. Flagging Benjamin to confirm, but AFAIK this is fine since all fields are restricted-access by default.

Flags: needinfo?(benjamin)

Benjamin Smedberg

Comment 16

•

10 years ago

If somebody can guarantee that the stacks involved only contain chrome frames and never content frames, then this is absolutely fine (and we could probably expose this publicly). If it's possible that this contains content frames, we need to talk more, since then it would probably needs to be controlled by the submit-URLs submission checkbox.

Flags: needinfo?(benjamin)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 17

•

10 years ago

We can't guarantee that they only contain chrome frames, sadly. At least not with a simple patch.

(Away)

Comment 18

•

10 years ago

Can we redact non-chrome frames?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 19

•

10 years ago

Yes, with a not-simple patch. We'd have to figure out how to define "chrome frame" for our purposes and then find them in there...

(Away)

Comment 20

•

10 years ago

So, for purposes of deciding between "redact non-chrome frames" and "make this annotation respect the submit-URL checkbox": How often do you expect to see non-chrome frames? How much would your debugging be hindered by not seeing them? Would you be willing to go without them in exchange for a higher rate of user submissions? How much work is the not-simple patch?

Andrew McCreight [:mccr8]

Comment 21

•

10 years ago

Are there any common URLs for the pages these are crashing on? Maybe we can reproduce something locally.

(Away)

Comment 22

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #21) > Are there any common URLs for the pages these are crashing on? Maybe we can > reproduce something locally. It's basically a typical distribution of web usage in general (facebook, gmail, youtube, twitter).

Boris Zbarsky [:bzbarsky]

Assignee

Comment 23

•

10 years ago

> How often do you expect to see non-chrome frames? I don't know. We _think_ this is correlated to Promise usage in Pocket, but maybe something else can trigger it too... We just don't have much info. > How much would your debugging be hindered by not seeing them? Probably not much. > Would you be willing to go without them in exchange for a higher rate of user > submissions? I think that as long as we get _one_ user submission here we'll likely have as much data as we can get from multiple such submissions.. > How much work is the not-simple patch? My estimate is that to do it right is 1-2 days of work. The simple patch is probably closer to 1-2 hours.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 24

•

10 years ago

Attached patch Patch to just grab the fullfillment stack when !mGlobal — Details — Splinter Review

David, can we just get this into crash reports and see how things look? I'm not sure what's involved in getting the XPCOM stackString into a crash report, exactly...

Flags: needinfo?(dmajor)

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Crash Signature: [@ mozilla::dom::Promise::MaybeSettle(JS::Handle<T>, mozilla::dom::Promise::PromiseState)] [@ mozilla::dom::Promise::MaybeSettle] → [@ mozilla::dom::Promise::MaybeSettle(JS::Handle<T>, mozilla::dom::Promise::PromiseState)] [@ mozilla::dom::Promise::MaybeSettle] [@ mozilla::dom::Promise::Settle(JS::Handle<T>, mozilla::dom::Promise::PromiseState)] [@ mozilla::dom::Promise::Settle]

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

10 years ago

No longer depends on: 1178271

(Away)

Comment 26

•

10 years ago

#ifdef MOZ_CRASHREPORTER #include "nsExceptionHandler.h" #endif #ifdef MOZ_CRASHREPORTER CrashReporter::AnnotateCrashReport(NS_LITERAL_CSTRING("xpcom_stack"), ...) #endif

Flags: needinfo?(dmajor)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 27

•

10 years ago

Attached patch investigation patch. Log the stack at promise fulfillment on a CCed promise into the crash reporter data — Details — Splinter Review

Attachment #8658358 - Flags: review?(nsm.nikhil)

Attachment #8658358 - Flags: review?(dmajor)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Updated

•

10 years ago

Attachment #8658358 - Flags: review?(nsm.nikhil) → review+

(Away)

Comment 28

•

10 years ago

Comment on attachment 8658358 [details] [diff] [review] investigation patch. Log the stack at promise fulfillment on a CCed promise into the crash reporter data Review of attachment 8658358 [details] [diff] [review]: ----------------------------------------------------------------- The double conversion of the string is kind of unfortunate but since we're about to crash anyway, it seems fine. I don't really know how those various JS objects on the stack work (I'll leave that to nsm). Is there any chance of the callees doing a null pointer dereference or some such, given that we're already in a weird state?

Attachment #8658358 - Flags: review?(nsm.nikhil)

Attachment #8658358 - Flags: review?(dmajor)

Attachment #8658358 - Flags: review+

(Away)

Comment 29

•

10 years ago

That was meant to be an r+ but apparently I mid-aired?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 30

•

10 years ago

> The double conversion of the string is kind of unfortunate Yeah... One of those conversions is just a straight copy in the common case, though. And yes, this is not exactly hot code! > Is there any chance of the callees doing a null pointer dereference or some such, given > that we're already in a weird state? Chance, maybe. I mean, the whole JS heap could be bonkers. But none of those callees should use the promise or any of its members other than the stack member, at least...

Andrew McCreight [:mccr8]

Comment 31

•

10 years ago

Would it be worthwhile to change the null check of mGlobal in the ctor to be a RELEASE_ASSERT? Also I noticed the regression range sort of lines up with bug 1058695, but I'm not sure how that would lead to this, so that's probably a red herring. When we do these various PromiseCallback Calls, is mGlobal being unmark grayed? I forget how automatically that happens nowadays.

Pulsebot

Comment 32

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/947d0307ed77

Boris Zbarsky [:bzbarsky]

Assignee

Comment 33

•

10 years ago

I assume we'll get some data from this on just nightly, right? So no need to uplift anywhere? > Would it be worthwhile to change the null check of mGlobal in the ctor to be a > RELEASE_ASSERT? You mean the MOZ_ASSERT? The PRomise ctor is protected; the only caller should be Promise::Create. Promise::Create then immediately calls CreateWrapper, which tries to Init an AutoJSAPI with the global. If it were null, the Init() call would return false, CreateWrapper would throw on the ErrorResult, and Create() would return null and let the Promise just go away. So I'm not sure it's all that worthwhile. > Also I noticed the regression range sort of lines up with bug 1058695 Well... That's interesting. That bug is what added the dereference of mGlobal to MaybeSettle, no? Before that, MaybeSettle didn't touch mGlobal. So the bug with mGlobal being null was preexisting, I bet, but the crash was in fact introduced by bug 1058695. > When we do these various PromiseCallback Calls, is mGlobal being unmark grayed? mGlobal is an nsIGlobalObject pointer, not a JS thing.

Blocks: 1058695

Whiteboard: [leave open]

(Away)

Comment 34

•

10 years ago

I just noticed: should we get the stack prior to doing the reject, for symmetry?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 35

•

10 years ago

We could, but we're not seeing crashes with the reject on the stack, right? Since I assume the testing patch will just get backed out once we have the data we want, I didn't worry too much about it.

Carsten Book [:Tomcat]

Comment 36

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/947d0307ed77

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 37

•

10 years ago

Comment on attachment 8658358 [details] [diff] [review] investigation patch. Log the stack at promise fulfillment on a CCed promise into the crash reporter data Review of attachment 8658358 [details] [diff] [review]: ----------------------------------------------------------------- I already r+ed

Attachment #8658358 - Flags: review?(nsm.nikhil)

Andrew McCreight [:mccr8]

Comment 38

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #33) > > When we do these various PromiseCallback Calls, is mGlobal being unmark grayed? > > mGlobal is an nsIGlobalObject pointer, not a JS thing. Sorry, I meant the mGlobal fields of the PromiseCallback subclasses (WrapperPromiseCallback, ResolvePromiseCallback, RejectPromiseCallback) which have type JS::Heap<JSObject*>. I was wondering because I see WrapperPromiseCallback::Call() in many (but not all) of these crash stacks, but maybe that's just always the way we often invoke these promises. Some promises are triggered from the observer service. Like this one: https://crash-stats.mozilla.com/report/index/c72a5c7d-ec67-4046-9c48-bdf702150908 Note that this is spinning the event loop while we're notifying observers. It would be nice to know what is triggering those. Some are being triggered from the storage CompletionNotifier: https://crash-stats.mozilla.com/report/index/45c8ec4d-5a2c-4c8f-b8cc-0047c2150907 It doesn't seem too likely that there's something particular about the promise that is triggering this, but who knows. It does seem worth noting that this all appears to be chrome JS.

Flags: needinfo?(continuation)

Keywords: leave-open

Whiteboard: [leave open]

Andrew McCreight [:mccr8]

Comment 39

•

10 years ago

dmajor, could you look into the crash dumps for a few of these stacks? It would be nice to know what is triggering the observer service in these stacks: https://crash-stats.mozilla.com/report/index/dce65f8b-18f7-4cba-abd5-1f0bc2150908 https://crash-stats.mozilla.com/report/index/c72a5c7d-ec67-4046-9c48-bdf702150908 https://crash-stats.mozilla.com/report/index/a48c08d2-9f0b-42ea-b584-2a2042150908 Also, it would be nice to get some more frames for these crashes that have CompletionNotifier near the end to see what is spinning the event loop: https://crash-stats.mozilla.com/report/index/41a4f21f-9b63-4fbf-b81a-037c92150908 https://crash-stats.mozilla.com/report/index/81040734-b10d-448f-aa4c-f480b2150908 https://crash-stats.mozilla.com/report/index/36b1d90b-8f73-4765-bf4d-048b12150908 Thanks.

Flags: needinfo?(dmajor)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 40

•

10 years ago

> Sorry, I meant the mGlobal fields of the PromiseCallback subclasses Ah. Those might want to unmarkgray mGlobal in Call, indeed... > but maybe that's just always the way we often invoke these promises. Right. A promise chain will end up with those PromiseCallback things in the callstack.

Andrew McCreight [:mccr8]

Comment 41

•

10 years ago

dmajor looked at one of these observer crashes and the thing that was triggering the observer service was like this: 33 0022f908 515dca1e xul!nsObserverList::NotifyObservers+0x3e 34 0022f928 51715991 xul!nsObserverService::NotifyObservers+0xc6 35 0022f958 5146c4d2 xul!nsXREDirProvider::DoShutdown+0x72 36 0022f99c 5146837a xul!ScopedXPCOMStartup::~ScopedXPCOMStartup+0x51 So we're in shutdown, but not that late in shutdown.

(Away)

Comment 42

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #39) > dmajor, could you look into the crash dumps for a few of these stacks? It > would be nice to know what is triggering the observer service in these > stacks: > https://crash-stats.mozilla.com/report/index/dce65f8b-18f7-4cba-abd5- > 1f0bc2150908 > https://crash-stats.mozilla.com/report/index/c72a5c7d-ec67-4046-9c48- > bdf702150908 > https://crash-stats.mozilla.com/report/index/a48c08d2-9f0b-42ea-b584- > 2a2042150908 Those all have the same stack as the one you linked me to on IRC (in other words, comment 41). > > Also, it would be nice to get some more frames for these crashes that have > CompletionNotifier near the end to see what is spinning the event loop: > https://crash-stats.mozilla.com/report/index/41a4f21f-9b63-4fbf-b81a- > 037c92150908 > https://crash-stats.mozilla.com/report/index/81040734-b10d-448f-aa4c- > f480b2150908 > https://crash-stats.mozilla.com/report/index/36b1d90b-8f73-4765-bf4d- > 048b12150908 > > Thanks. Below CompletionNotifier is: 10 0019f7bc 55429daf xul!mozilla::storage::`anonymous namespace'::CompletionNotifier::Run+0x17 11 0019f8cc 553cdc36 xul!nsThread::ProcessNextEvent+0x6e8 12 0019f8fc 553cf0ba xul!mozilla::ipc::MessagePump::Run+0x5f 13 0019f934 553ceac3 xul!MessageLoop::RunHandler+0x20 14 0019f954 554291ad xul!MessageLoop::Run+0x19 15 0019f960 55428cf4 xul!nsBaseAppShell::Run+0x32 16 0019f96c 55498db9 xul!nsAppShell::Run+0x1b 17 0019f97c 552e8a43 xul!nsAppStartup::Run+0x20 18 0019fb60 552e8c6c xul!XREMain::XRE_mainRun+0x4b9

Flags: needinfo?(dmajor)

Andrew McCreight [:mccr8]

Comment 43

•

10 years ago

(In reply to David Major [:dmajor] from comment #42) > Those all have the same stack as the one you linked me to on IRC (in other > words, comment 41). Thanks, I'll try looking through things that listen for that shutdown stuff and see what might be triggering JS. > Below CompletionNotifier is: Thanks. It is good to know this is not nested event loop spinning.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1203289

Robert Kaiser

Reporter

Updated

•

10 years ago

Summary: crash in mozilla::dom::Promise::MaybeSettle(JS::Handle<T>, mozilla::dom::Promise::PromiseState) → crash in mozilla::dom::Promise::MaybeSettle or mozilla::dom::Promise::Settle

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 44

•

10 years ago

Hoping that the investigative patch in Nightly will help us find a possible cause and fix for 42. It is too late for 41.

status-firefox41: affected → wontfix

Boris Zbarsky [:bzbarsky]

Assignee

Comment 45

•

10 years ago

So I was thinking about this a bit more... given that this is fallout from bug 1058695, we could in fact consider adding a null-check for mGlobal and treat null as IsDying(). That might be ok for 41 if there's still time for that; it's certainly better than crashing...

Flags: needinfo?(rkothari)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 46

•

10 years ago

Boris, given that this crash is in the top 10 for 41 at 1.17%, we might be able to consider taking a fix in 41RC, depending on the complexity and risk associated. Would you be able to nominate a patch for uplift today? We gtb 41 RC Monday am PST.

Flags: needinfo?(rkothari) → needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 47

•

10 years ago

Attached patch Patch would look something like this — Details — Splinter Review

I _think_ this is pretty safe, but it could use a second set of eyes. Also, I'm out on Monday so if someone is going to uplift it then it won't be me...

Flags: needinfo?(bzbarsky)

Attachment #8660010 - Flags: review?(nsm.nikhil)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 48

•

10 years ago

Resetting the status because there is a good chance we might taking a likely fix for this in 41RC.

status-firefox41: wontfix → affected

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Comment 49

•

10 years ago

Comment on attachment 8660010 [details] [diff] [review] Patch would look something like this Review of attachment 8660010 [details] [diff] [review]: ----------------------------------------------------------------- This looks fine to me, I only read the bug report, but it definitely doesn't do any more harm. Is the patch supposed to do anything else eventually considering the attachment description?

Attachment #8660010 - Flags: review?(nsm.nikhil) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 50

•

10 years ago

No, that's it. I guess I'll try landing that on inbound and if it doesn't blow up the world people should consider uplifting it on Monday...

Pulsebot

Comment 51

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b015413eba45

Phil Ringnalda (:philor)

Comment 52

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/b015413eba45

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 53

•

10 years ago

Nikhil, would you be able to nominate this patch for uplift to Beta? We will gtb 41 RC on Monday 9/14 by noon. Thanks!

Flags: needinfo?(nsm.nikhil)

Andrew McCreight [:mccr8]

Comment 54

•

10 years ago

Comment on attachment 8660010 [details] [diff] [review] Patch would look something like this Approval Request Comment [Feature/regressing bug #]: bug 1058695 [User impact if declined]: crashes [Describe test coverage new/current, TreeHerder]: at least some of this code runs regularly [Risks and why]: this should mostly restore existing behavior, but we don't entirely understand what is happening here [String/UUID change made/needed]: none

Flags: needinfo?(nsm.nikhil)

Attachment #8660010 - Flags: approval-mozilla-beta?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 55

•

10 years ago

Comment on attachment 8660010 [details] [diff] [review] Patch would look something like this [Triage Comment] This is another fix that we are taking in 41 RC. This fix is safe and low risk.

Attachment #8660010 - Flags: approval-mozilla-beta? → approval-mozilla-release+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 56

•

10 years ago

https://hg.mozilla.org/releases/mozilla-release/rev/8bf66ad668d2

status-firefox41: affected → fixed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 57

•

10 years ago

Also landing to beta at Ritu's request: https://hg.mozilla.org/releases/mozilla-beta/rev/7aaeff6d190d

(Away)

Comment 58

•

10 years ago

While investigating something else, I happened to notice that a large number of these crashes are within the "OOM danger zone" for AvailableVirtualMemory. Could it be related? Maybe high memory means more CC which leads to the unlink problem?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 59

•

10 years ago

Could be... Do we by any chance have any reports with the stack instrumentation we added at this point?

(Away)

Comment 60

•

10 years ago

No, and it's going to be difficult, because: - This crash is pretty rare on nightly to begin with, and - With the null check workaround, we'll no longer get a crash report for this issue. We'd have to look for cced_promise_stack annotations on machines that go on to crash somewhere else later.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 61

•

10 years ago

Hmm. I didn't realize this was rare on nightly. Should have pushed the investigation patch to all channels before landing the workaround... Also, the null check hasn't landed on nightly and aurora, right? So there we would still get crashes (though as you say they might be rare).

(Away)

Comment 62

•

10 years ago

The null-check landed on m-c in comment 52. Perhaps you could undo the null check before merge day and let the logging go to aurora. We get some single-digit crashes per day there.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 63

•

10 years ago

> The null-check landed on m-c in comment 52. So it did. I clearly can't remember what I did. Undoing the null check on m-c makes sense. I'll do that next time the tree is open.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 64

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/28161d563685 backs out the null-check.

(Away)

Comment 65

•

10 years ago

Got some data from aurora 43. Most of the stacks are: cced_promise_stack PromiseSet.prototype.add/<@resource://gre/modules/AsyncShutdown.jsm:145:9 Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:934:23 this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:813:7 Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:744:11 this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:776:7 this.PromiseWalker.completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:711:7 Spinner.prototype.observe@resource://gre/modules/AsyncShutdown.jsm:523:9 But there are a couple others: cced_promise_stack ConnectionData.prototype<._executeStatement/pending<.handleCompletion@resource://gre/modules/Sqlite.jsm:784:13 cced_promise_stack i@http://platform.twitter.com/widgets.js:9:3662 o@http://platform.twitter.com/widgets.js:9:3692 promise callback*o/</<@http://platform.twitter.com/widgets.js:9:3736 o/<@http://platform.twitter.com/widgets.js:9:3709 o@http://platform.twitter.com/widgets.js:9:3599 o@http://platform.twitter.com/widgets.js:9:2844 a@http://platform.twitter.com/widgets.js:9:3032 @http://platform.twitter.com/widgets.js:1:1785 a<@http://platform.twitter.com/widgets.js:8:9002 @http://platform.twitter.com/widgets.js:1:1713 e@http://platform.twitter.com/widgets.js:1:406 @http://platform.twitter.com/widgets.js:1:1379 @http://platform.twitter.com/widgets.js:1:1 @http://platform.twitter.com/widgets.js:1:1 By the way, it looks like we didn't land any changes on 42? If these stacks don't point to an obvious fix, could you land the bandaid on beta for now?

Flags: needinfo?(bzbarsky)

Andrew McCreight [:mccr8]

Comment 66

•

10 years ago

Bug 1203289 landed on 42. But I guess that did not fix it.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 67

•

10 years ago

Hmm. The Promise-backend stuff is under a nested event loop spin from an observer notification.... but clearly twitter is not likely to be managing that (modulo sync XHR perhaps). I'll think a bit about this over the weekend, then probably land the null-check on Monday. :(

Sylvestre Ledru [:Sylvestre]

Comment 68

•

10 years ago

Boris, any news on this? It seems that the attachment 8660010 [details] [diff] [review] which landed only in 41 mitigated the crash. Most of the crashes are in beta (but still affect release).

Boris Zbarsky [:bzbarsky]

Assignee

Comment 69

•

10 years ago

So Andrew and I walked through some of this code just now, and we think we see a possible hole in the GC/CC stuff in Promise code. I filed bug 1213391 on it. I will post a patch there and aim to get that onto Aurora and see whether that nixes the crashes, but in the meantime I have landed the bandaid on beta: https://hg.mozilla.org/releases/mozilla-beta/rev/e96fd3320ab4

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

10 years ago

Depends on: 1213391

Boris Zbarsky [:bzbarsky]

Assignee

Comment 70

•

10 years ago

The fix for bug 1213391 is now on m-c and aurora; we should see what crashstats says about this crash there....

Flags: needinfo?(dmajor)

Robert Kaiser

Reporter

Comment 71

•

10 years ago

bz, dmajor is on PTO, if you are looking for stats stuff, e.g. how crash volume might change, it's better to ni? myself, but this is so rare on Dev Edition and Nightly that those channels won't tell us much, I think only Beta will give us info if this has really been fixed.

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 72

•

10 years ago

OK. I guess we wait for 1213391 to ride the trains to beta, then....

Flags: needinfo?(bzbarsky)

Tobias B. Besemer [:BesTo] (QA)

Updated

•

10 years ago

status-firefox43: --- → ?

status-firefox44: --- → affected

OS: Windows NT → Windows

Hardware: x86 → All

Version: Trunk → 39 Branch

Comment hidden (off-topic)

Comment hidden (offtopic)

Sylvestre Ledru [:Sylvestre]

Comment 76

•

10 years ago

As we are going to build 42 rc today, this is too late for this bug. Wontfix.

status-firefox42: affected → wontfix

Boris Zbarsky [:bzbarsky]

Assignee

Comment 77

•

10 years ago

42 is "fixed" by the change mentioned in comment 69. The question is whether this crash signature is visible in 43.

status-firefox42: wontfix → fixed

Robert Kaiser

Reporter

Comment 78

•

10 years ago

[Tracking Requested - why for this release]: (In reply to Boris Zbarsky [:bz] from comment #77) > The question is whether this crash signature is visible in 43. It is. This is 1.8% of 43.0b1 crashes.

status-firefox43: ? → affected

tracking-firefox43: --- → ?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 79

•

10 years ago

OK, so bug 1213391 didn't fix this. :( Andrew, should we just permanently add this null-check and give up? :(

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 80

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #79) > Andrew, should we just permanently add this null-check and give up? :( Yeah, it is going to be hard to figure out what is actually going on here. Maybe have an assertion in there with the null check.

Flags: needinfo?(continuation)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 81

•

10 years ago

Tracking for 43 and 44 as this is still fairly high volume.

tracking-firefox43: ? → +

tracking-firefox44: --- → +

Boris Zbarsky [:bzbarsky]

Assignee

Comment 82

•

10 years ago

Attached patch workaround. Add a null-check for mGlobal in Promise::Settle — Details — Splinter Review

OK, let's just do this and then I guess backport to aurora/beta

Attachment #8686282 - Flags: review?(continuation)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

10 years ago

Assignee: nobody → bzbarsky

Status: NEW → ASSIGNED

Andrew McCreight [:mccr8]

Comment 83

•

10 years ago

Comment on attachment 8686282 [details] [diff] [review] workaround. Add a null-check for mGlobal in Promise::Settle Review of attachment 8686282 [details] [diff] [review]: ----------------------------------------------------------------- Can we maybe assert if mGlobal is null? Though maybe any such failures will be intermittent and thus not of use to anybody. It might be nice to throw in a comment about how we don't really expect mGlobal to be null, but it happens in the wild.

Attachment #8686282 - Flags: review?(continuation) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 84

•

10 years ago

Attached patch workaround. Add a null-check for mGlobal in Promise::Settle — Details — Splinter Review

Pulsebot

Comment 85

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b0d7da69ec7b

Boris Zbarsky [:bzbarsky]

Assignee

Comment 86

•

10 years ago

Comment on attachment 8686311 [details] [diff] [review] workaround. Add a null-check for mGlobal in Promise::Settle Approval Request Comment [Feature/regressing bug #]: Unknown. [User impact if declined]: The crashes continue until morale improves. [Describe test coverage new/current, TreeHerder]: We have yet to reproduce this. [Risks and why]: Low risk. We've been shipping this for several releases now in practice. [String/UUID change made/needed]: None.

Attachment #8686311 - Flags: approval-mozilla-beta?

Attachment #8686311 - Flags: approval-mozilla-aurora?

Carsten Book [:Tomcat]

Comment 87

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b0d7da69ec7b

Liz Henry (:lizzard) (relman/hg->git project)

Comment 88

•

10 years ago

Comment on attachment 8686311 [details] [diff] [review] workaround. Add a null-check for mGlobal in Promise::Settle Possible crash fix, high volume crash, let's uplift it.

Attachment #8686311 - Flags: approval-mozilla-beta?

Attachment #8686311 - Flags: approval-mozilla-beta+

Attachment #8686311 - Flags: approval-mozilla-aurora?

Attachment #8686311 - Flags: approval-mozilla-aurora+

Tobias B. Besemer [:BesTo] (QA)

Updated

•

10 years ago

Keywords: topcrash-win

Boris Zbarsky [:bzbarsky]

Assignee

Comment 89

•

10 years ago

Fixed, see comment 87.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Keywords: leave-open

Resolution: --- → FIXED

Robert Kaiser

Reporter

Updated

•

10 years ago

status-firefox45: --- → fixed

Target Milestone: --- → mozilla45

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 90

•

10 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/c9f3f25d86ca

status-firefox44: affected → fixed

Carsten Book [:Tomcat]

Comment 91

•

10 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/cb1c655250a3

status-firefox43: affected → fixed

(Away)

Updated

•

9 years ago

Flags: needinfo?(dmajor)

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: DOM → DOM: Core & HTML

Patch to just grab the fullfillment stack when !mGlobal 10 years ago Boris Zbarsky [:bzbarsky] 1.80 KB, patch		Details \| Diff \| Splinter Review
investigation patch. Log the stack at promise fulfillment on a CCed promise into the crash reporter data 10 years ago Boris Zbarsky [:bzbarsky] 2.81 KB, patch	away : review+	Details \| Diff \| Splinter Review
Patch would look something like this 10 years ago Boris Zbarsky [:bzbarsky] 490 bytes, patch	nsm : review+ ritu : approval-mozilla-release+	Details \| Diff \| Splinter Review
Buffer overflows in Firefox 42.0.0.5763.txt 10 years ago Cees T. 2.35 MB, text/plain		Details
Firefox blacking out content again for no apparent reason.txt 10 years ago Cees T. 23.64 KB, text/plain		Details
workaround. Add a null-check for mGlobal in Promise::Settle 10 years ago Boris Zbarsky [:bzbarsky] 1.59 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
workaround. Add a null-check for mGlobal in Promise::Settle 10 years ago Boris Zbarsky [:bzbarsky] 1.74 KB, patch	lizzard : approval-mozilla-aurora+ lizzard : approval-mozilla-beta+	Details \| Diff \| Splinter Review