Closed
Bug 965527
Opened 11 years ago
Closed 11 years ago
Intermittent B2G desktop Shutdown "###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm"
Categories
(Toolkit Graveyard :: OS.File, defect)
Tracking
(firefox29 unaffected, firefox30 fixed, firefox31 fixed, firefox-esr24 unaffected, b2g-v1.4 fixed, b2g-v2.0 fixed)
RESOLVED
FIXED
mozilla31
Tracking | Status | |
---|---|---|
firefox29 | --- | unaffected |
firefox30 | --- | fixed |
firefox31 | --- | fixed |
firefox-esr24 | --- | unaffected |
b2g-v1.4 | --- | fixed |
b2g-v2.0 | --- | fixed |
People
(Reporter: RyanVM, Assigned: Yoric)
References
Details
(Keywords: intermittent-failure)
Attachments
(5 files, 1 obsolete file)
1.31 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
1.45 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
9.39 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
2.30 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
7.91 KB,
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=33754620&tree=B2g-Inbound
b2g_ubuntu64_vm b2g-inbound opt test reftest on 2014-01-29 11:03:47 PST for push fbd02ed19b1a
slave: tst-linux64-ec2-352
11:09:08 INFO - REFTEST TEST-START | Shutdown
11:09:08 INFO - WARNING: A phase completion condition is taking too long to complete. Condition: OS.File: flush I/O queued before profile-before-change Phase: profile-before-change
11:09:08 INFO - ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: OS.File: flush I/O queued before profile-before-change Phase: profile-before-change
11:09:08 INFO - [Parent 2336] ###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 390
11:09:08 INFO - [Parent 2336] ###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 390
11:09:08 WARNING - PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 5•11 years ago
|
||
I am in the process of instrumenting AsyncShutdown and OS.File to get better reporting on such things (bug 957123). In the meantime, there is nothing I can do.
Flags: needinfo?(dteller)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 35•11 years ago
|
||
philor, comment 34 seems to be a different bug. Could you file it and assign it to rvitillo?
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Whiteboard: [leave open]
Reporter | ||
Comment 37•11 years ago
|
||
Keywords: checkin-needed
Whiteboard: [leave open] → [leave open][fixed-in-fx-team]
Whiteboard: [leave open][fixed-in-fx-team] → [leave open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 40•11 years ago
|
||
For some reason, we end up in a case in which the logging throws an error. Let's make it even more careful.
Attachment #8377050 -
Flags: review+
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Reporter | ||
Comment 41•11 years ago
|
||
Keywords: checkin-needed
Reporter | ||
Comment 42•11 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 44•11 years ago
|
||
This one is a little clearer.
1. OS.File (and only OS.File) has trouble shutting down;
2. the reset timer is on;
3. we don't know whether there is any message pending, because it's not a DEBUG build
This could be a bug in OS.File (most likely) or this could be some OS.File client spamming OS.File with requests. Working on further expanding the logging.
Assignee | ||
Comment 45•11 years ago
|
||
Yet more logging and as a bonus, a test on that logging.
Attachment #8378956 -
Flags: review+
Assignee | ||
Comment 46•11 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 51•11 years ago
|
||
Same try, after pull: https://tbpl.mozilla.org/?tree=Try&rev=4882894cfb96
So here we go for further logging.
Keywords: checkin-needed
Reporter | ||
Comment 52•11 years ago
|
||
Keywords: checkin-needed
Whiteboard: [leave open] → [leave open][fixed-in-fx-team]
Reporter | ||
Comment 53•11 years ago
|
||
Whiteboard: [leave open][fixed-in-fx-team] → [leave open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 56•11 years ago
|
||
So, we now know that some call to OS.File.writeAtomic is placed a few seconds before the deadline and doesn't receive a reply within ~1 minute. On closer inspection of the code, we might have a race condition if the first call to OS.File is placed during a brief window during shutdown, which would explain the issue.
Hardening the process, let's see if it works.
Assignee: nobody → dteller
Attachment #8382941 -
Flags: review?(nfroyd)
Comment 57•11 years ago
|
||
Comment on attachment 8382941 [details] [diff] [review]
Hardening OS.File shutdown flag
Review of attachment 8382941 [details] [diff] [review]:
-----------------------------------------------------------------
Sure, makes sense.
::: toolkit/components/osfile/modules/osfile_async_front.jsm
@@ +205,5 @@
> + LOG("OS.File is not available anymore. Request has been rejected.");
> + throw new Error("OS.File has been shut down.");
> + }
> + return code();
> + }.bind(this));
Fat arrow function instead?
Attachment #8382941 -
Flags: review?(nfroyd) → review+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Reporter | ||
Comment 62•11 years ago
|
||
Keywords: checkin-needed
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 64•11 years ago
|
||
(In reply to TBPL Robot from comment #63)
This was on a push from after comment 62.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 67•11 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/dfd483bbeea6 at Yoric's request.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 82•11 years ago
|
||
I'm blocked by the lack of information on the file being written. Reworking the debugging information to ensure that we keep this data, without keeping data that is too expensive.
Attachment #8394126 -
Flags: review?(nfroyd)
Comment 83•11 years ago
|
||
Comment on attachment 8394126 [details] [diff] [review]
4. Yet more detailed logging
Review of attachment 8394126 [details] [diff] [review]:
-----------------------------------------------------------------
Is there a reason you're turning this on for non-DEBUG builds too? I worry that we're going to be collecting all this information in non-debug (i.e. release) builds and that's going to slow things down.
r=me if we can turn this back into mostly DEBUG-only logging before the next uplift.
::: toolkit/components/osfile/modules/osfile_async_front.jsm
@@ +139,5 @@
> + * Extract a shortened version of an object, fit for logging.
> + *
> + * This function returns a copy of the original object in which all
> + * long strings, Arrays, TypedArrays, ArrayBuffers are removed and
> + * replaced with plceholders. Use this function to sanitize objects
Nit: "placeholders"
@@ +160,5 @@
> + if (Array.isArray(obj)) {
> + if (obj.length > 32) {
> + return {"Long array": obj.length};
> + }
> + return [summarizeObject(k) for (k of obj)];
Perhaps it would be better to do something like (possibly not real JS):
return [summarizeObject(k) for (k of obj.slice(0, Math.min(obj.length, 32)))] + [(obj.length - 32) + " trailing elements omitted"];
WDYT?
@@ +203,5 @@
> + Debugging: {
> + /**
> + * The latest message sent and still waiting for a reply. In DEBUG
> + * builds, the entire message is stored, which may be memory-consuming.
> + * In non-DEBUG builds, only the method name is stored.
This comment is no longer accurate. (Might have been more straightforward if you had made the Debugging-as-object changes, then the summary+message # changes.)
@@ +211,5 @@
> + /**
> + * The latest reply received, or null if we are waiting for a reply.
> + * In DEBUG builds, the entire response is stored, which may be
> + * memory-consuming. In non-DEBUG builds, only exceptions and
> + * method names are stored.
Likewise.
@@ +326,3 @@
> reply = error;
> isError = true;
> + if (error instanceof PromiseWorker.WorkerError) {
I guess this refactoring makes this clearer...
Attachment #8394126 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 84•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #83)
> Comment on attachment 8394126 [details] [diff] [review]
> 4. Yet more detailed logging
>
> Review of attachment 8394126 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Is there a reason you're turning this on for non-DEBUG builds too? I worry
> that we're going to be collecting all this information in non-debug (i.e.
> release) builds and that's going to slow things down.
For some reason, the error doesn't seem to happen on DEBUG builds, so I need info on non-DEBUG builds.
> r=me if we can turn this back into mostly DEBUG-only logging before the next
> uplift.
> @@ +160,5 @@
> > + if (Array.isArray(obj)) {
> > + if (obj.length > 32) {
> > + return {"Long array": obj.length};
> > + }
> > + return [summarizeObject(k) for (k of obj)];
>
> Perhaps it would be better to do something like (possibly not real JS):
>
> return [summarizeObject(k) for (k of obj.slice(0, Math.min(obj.length,
> 32)))] + [(obj.length - 32) + " trailing elements omitted"];
>
> WDYT?
Actually, I don't think we are ever going to get an array with size > 5 in the current setting, so I wanted to go for something simple. What do you think about keeping such a refinement for later, if needed?
> @@ +326,3 @@
> > reply = error;
> > isError = true;
> > + if (error instanceof PromiseWorker.WorkerError) {
>
> I guess this refactoring makes this clearer...
Logging was making this piece of code too complicated for its own good. I took the opportunity to clean it up a little.
Comment 85•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #84)
> > Is there a reason you're turning this on for non-DEBUG builds too? I worry
> > that we're going to be collecting all this information in non-debug (i.e.
> > release) builds and that's going to slow things down.
>
> For some reason, the error doesn't seem to happen on DEBUG builds, so I need
> info on non-DEBUG builds.
Ah, I see I was misled by the mis-starred failures above. OK, then, yes, we do need this in opt builds.
> > @@ +160,5 @@
> > > + if (Array.isArray(obj)) {
> > > + if (obj.length > 32) {
> > > + return {"Long array": obj.length};
> > > + }
> > > + return [summarizeObject(k) for (k of obj)];
> >
> > Perhaps it would be better to do something like (possibly not real JS):
> >
> > return [summarizeObject(k) for (k of obj.slice(0, Math.min(obj.length,
> > 32)))] + [(obj.length - 32) + " trailing elements omitted"];
> >
> > WDYT?
>
> Actually, I don't think we are ever going to get an array with size > 5 in
> the current setting, so I wanted to go for something simple. What do you
> think about keeping such a refinement for later, if needed?
That's fine. If you wanted to bump down the max size summarized for this, that'd be OK too.
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Reporter | ||
Comment 86•11 years ago
|
||
Keywords: checkin-needed
Whiteboard: [leave open] → [leave open][fixed-in-fx-team]
Assignee | ||
Comment 87•11 years ago
|
||
Try (after quickfix ex->error): https://tbpl.mozilla.org/?tree=Try&rev=3fe41b0017df
Assignee | ||
Comment 88•11 years ago
|
||
Talos xperf Try: https://tbpl.mozilla.org/?tree=Try&rev=bc4d65c55845
Assignee | ||
Comment 89•11 years ago
|
||
Oops, typo.
Attachment #8394126 -
Attachment is obsolete: true
Attachment #8394801 -
Flags: review+
Reporter | ||
Comment 90•11 years ago
|
||
Something in this checkin-needed push caused Win7 talos xperf regressions. It wasn't clear to me which was at fault, so I backed out the entire push. Please verify that your patch wasn't at fault and re-request checkin when ready :)
https://hg.mozilla.org/integration/fx-team/rev/6d0a341040a9
https://tbpl.mozilla.org/php/getParsedLog.php?id=36500476&tree=Fx-Team
Whiteboard: [leave open][fixed-in-fx-team] → [leave open]
Reporter | ||
Comment 91•11 years ago
|
||
Try runs look good, thanks :)
https://hg.mozilla.org/integration/fx-team/rev/8f768ced9020
Whiteboard: [leave open] → [leave open][fixed-in-fx-team]
Whiteboard: [leave open][fixed-in-fx-team] → [leave open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 95•11 years ago
|
||
So, the latest DUP informs us that:
- it's not the first message;
- a reset is pending at the time of the crash;
- one call to writeAtomic remains unanswered.
I believe that it's the same issue as the OS.File part of bug 961317. There's a patch over there pending review.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 101•11 years ago
|
||
Tomcat, could you file a separate bug for this one and Cc bsmedberg and gps? The crash is part of experiments.jsm
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 103•11 years ago
|
||
Tomcat, the last two are a different bug.
Flags: needinfo?(cbook)
Comment 104•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on April 9th-16th) from comment #101)
> Tomcat, could you file a separate bug for this one and Cc bsmedberg and gps?
> The crash is part of experiments.jsm
yeah sorry, filed bug 996601 for it
Flags: needinfo?(cbook)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 106•11 years ago
|
||
Wrong bug, honzab, that should be bug 995027.
As a side-note, I believe that the current bug was fixed by the patch of bug 961317. Closing. Please reopen if needed.
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(honzab.moz)
Resolution: --- → FIXED
Comment 107•11 years ago
|
||
I believe you. Not sure that bug popped on me tho. It's hard, too similar ones one by one :)
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 108•11 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #107)
> I believe you. Not sure that bug popped on me tho. It's hard, too similar
> ones one by one :)
It shows up in the TBPL suggestion list and you can see the Experiments.jsm stuff in the parsed log immediately prior to the failure.
status-b2g-v1.4:
--- → fixed
status-b2g-v2.0:
--- → fixed
status-firefox29:
--- → unaffected
status-firefox30:
--- → fixed
status-firefox31:
--- → fixed
status-firefox-esr24:
--- → unaffected
Whiteboard: [leave open]
Target Milestone: --- → mozilla31
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 111•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 8th-12th) from comment #110)
> Gijs: Nope, unrelated bug.
Thanks! Sorry, meant to star bug 924622, not sure what went wrong there. :-\
Flags: needinfo?(gijskruitbosch+bugs)
Comment 112•11 years ago
|
||
Tweaking summary to avoid future mis-stars.
Summary: Intermittent B2G desktop Shutdown | application crashed [@ mozalloc_abort(char const*)] after "###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm" → Intermittent B2G desktop Shutdown "###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm"
Updated•1 year ago
|
Product: Toolkit → Toolkit Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•