Closed Bug 957123 Opened 10 years ago Closed 10 years ago

[Async Shutdown] Better error reporting

Categories

(Toolkit :: Async Tooling, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: Yoric, Assigned: Yoric)

References

Details

(Whiteboard: [Async:team])

Attachments

(4 files, 19 obsolete files)

10.43 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
11.93 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
7.07 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
87.62 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
      No description provided.
Debugging AsyncShutdown error messages (e.g. bug 948600) indicates that we need better error reporting. We should be able to register a mechanism used to indicate in which state a blocker is at the moment we print a warning or cause a crash.
Assignee: nobody → dteller
Attachment #8356600 - Flags: review?(nfroyd)
Same one, with the patch.
Attachment #8356601 - Attachment is obsolete: true
Attachment #8356601 - Flags: review?(nfroyd)
Attachment #8356604 - Flags: review?(nfroyd)
Comment on attachment 8356600 [details] [diff] [review]
Extending AsyncShutdown error reporting

Review of attachment 8356600 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, I think, needs a few tweaks.

::: toolkit/modules/AsyncShutdown.jsm
@@ +118,5 @@
> +  }
> +  try {
> +    // Evaluate state(), normalize the result into something that we can
> +    // safely stringify or upload.
> +    let string = JSON.stringify(state());

Judging by how |state| was defined in the OS.File patch, I think you should be clearer in the contract of |state|: does it return a string describing things, or an object?  (The current code would JSON.stringify twice; it's not obvious to me that was the desired behavior.)

@@ +119,5 @@
> +  try {
> +    // Evaluate state(), normalize the result into something that we can
> +    // safely stringify or upload.
> +    let string = JSON.stringify(state());
> +    let data = JSON.parse(data);

Wait, what?

@@ +234,5 @@
>       *       // No specific guarantee about completion of profileBeforeChange
>       * });
>       *
>       */
> +    addBlocker: function(name, condition, state = null) {

|state| needs documentation in the comment above.
Attachment #8356600 - Flags: review?(nfroyd) → feedback+
Applied feedback
Attachment #8356604 - Attachment is obsolete: true
Attachment #8356604 - Flags: review?(nfroyd)
Attachment #8357090 - Flags: review+
Clarified documentation, fixed |data|/|string| confusion.
Attachment #8356600 - Attachment is obsolete: true
Attachment #8357092 - Flags: review?(nfroyd)
Comment on attachment 8357092 [details] [diff] [review]
Extending AsyncShutdown error reporting, v2

Review of attachment 8357092 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/modules/osfile_unix_allthreads.jsm
@@ +42,5 @@
>  let libc;
> +let libc_candidates =  [ "a.out",
> +                         // This NSPR-specific special library name causes
> +                         // the system to use RTLD_DEFAULT, which is usually
> +                         // good. It will not work on Android, though.

Comments are better placed before the thing they're explaining.

@@ +44,5 @@
> +                         // This NSPR-specific special library name causes
> +                         // the system to use RTLD_DEFAULT, which is usually
> +                         // good. It will not work on Android, though.
> +
> +                         "libc.so",

Maybe note that this is the Android name?  Do we have to care about libc.so.6?  I guess Android probably doesn't care about library versioning...

::: toolkit/modules/AsyncShutdown.jsm
@@ +214,5 @@
>       * resulting promise is either resolved or rejected. If
>       * |condition| is not a function but another value |v|, it behaves
>       * as if it were a function returning |v|.
> +     * @param {function*} state Optionally, a function returning the
> +     * current state of the blocker as an object. Used for providing

Nit: I think this should say "information about the current state of the blocker..." for clarity's sake.
Attachment #8357092 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd (:froydnj) from comment #9)
> Comment on attachment 8357092 [details] [diff] [review]
> Extending AsyncShutdown error reporting, v2
> 
> Review of attachment 8357092 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/osfile/modules/osfile_unix_allthreads.jsm
> @@ +42,5 @@
> >  let libc;
> > +let libc_candidates =  [ "a.out",
> > +                         // This NSPR-specific special library name causes
> > +                         // the system to use RTLD_DEFAULT, which is usually
> > +                         // good. It will not work on Android, though.
Oh, looks like I folded the patches for two distinct bugs together somehow. My bad.
Removed the code that had nothing to do there.
Try: https://tbpl.mozilla.org/?tree=Try&rev=93fcbd961ba5
Attachment #8357092 - Attachment is obsolete: true
Attachment #8358331 - Flags: review+
Nothing to see, just a wrong commit message.
Attachment #8358329 - Attachment is obsolete: true
Attachment #8358339 - Flags: review+
Oops, fixed another typo.
Attachment #8358331 - Attachment is obsolete: true
Attachment #8358422 - Flags: review+
Cleaned-up the code.
Irving, Try indicates that this fails on a single test, which I believe is in your realm. Do you have an idea of what can be happening?

Try: https://tbpl.mozilla.org/?tree=Try&rev=e572a23bbf26
Attachment #8358339 - Attachment is obsolete: true
Flags: needinfo?(irving)
Ah, it's probably better if I don't post every message twice.
Attachment #8369371 - Attachment is obsolete: true
Ok, I believe that I have finally tracked down the issue.
test_locked.js and test_locked2.js both called file.close() as if worked synchronously. If we only use OS.File, this works. However, the tests also use nsIFile, so I suspect that this is the root of the problem.
Flags: needinfo?(irving)
Same patch, better layout.
Attachment #8370043 - Attachment is obsolete: true
Attachment #8370050 - Flags: review?(bmcbride)
Attachment #8370050 - Flags: feedback?(irving)
Attachment #8370050 - Flags: review?(bmcbride) → review+
Fixed typo, applied same treatment to another test.
Attachment #8370050 - Attachment is obsolete: true
Attachment #8370050 - Flags: feedback?(irving)
Attachment #8370737 - Flags: review+
Comment on attachment 8369637 [details] [diff] [review]
Providing additional details from OS.File shutdown, v7

Nathan, would you care to provide an additional review for this patch?
Summary of the changes:
- folded with the patch of bug 961665 that introduces Scheduler.queue;
- to avoid memory leaks, updating latestSent/latestReceived only if OS.Constants.Sys.DEBUG is true;
- slight cleanup.
Attachment #8369637 - Flags: review?(nfroyd)
Comment on attachment 8369637 [details] [diff] [review]
Providing additional details from OS.File shutdown, v7

Review of attachment 8369637 [details] [diff] [review]:
-----------------------------------------------------------------

I see mismatches between the description and the patch.

::: toolkit/components/osfile/modules/osfile_async_front.jsm
@@ +241,5 @@
>        options = methodArgs[methodArgs.length - 1];
>      }
> +    return this.push(() => Task.spawn(function*() {
> +      Scheduler.latestReceived = null;
> +      Scheduler.latestSent = [method, ...args];

I am concerned about the memory problem potential here.  I am also concerned that your summary said that we only touched these when OS.Constants.Sys.DEBUG, but I see no evidence of that constant being consulted.  Did you forget a qref?

@@ +1279,5 @@
> +    return {
> +      launched: Scheduler.launched,
> +      shutdown: Scheduler.shutdown,
> +      latestSent: Scheduler.latestSent,
> +      latestReceived: Scheduler.latestReceived,

Assuming that we're only ever going to update these in debug builds, is it really worth posting them in all builds?
Attachment #8369637 - Flags: review?(nfroyd) → review-
Oh, great, forgot to upload the updated patch.
Also, I just applied your feedback for function getDetails.
Attachment #8369637 - Attachment is obsolete: true
Comment on attachment 8370737 [details] [diff] [review]
Making a few of the tests more async, v3

Review of attachment 8370737 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/extensions/test/xpcshell/head_addons.js
@@ +1083,5 @@
>   *          Optional parameter to ignore add-ons that are incompatible in
>   *          aome way with the application
>   */
>  function installAllFiles(aFiles, aCallback, aIgnoreIncompatible) {
> +  let deferred = Promise.defer();

I'd prefer a separate promiseInstallAllFiles() rather than making this function a hybrid of both promise- and callback-based async. Or, since you only use this in one place, just put the Promise wrapper in line there.

@@ +1495,5 @@
> + */
> +function promiseAddonsByIDs(list) {
> +  let deferred = Promise.defer();
> +  AddonManager.getAddonsByIDs(list, (x) => do_execute_soon(deferred.resolve(x)));
> +  return deferred.promise;

do_execute_soon(deferred.resolve(...)) is (arguably) not necessary here; Promises guarantee that the stack will be unwound before .then() handlers are called. However, the toolkit Promise implementation doesn't always unwind all the way back to the event loop, so if the code is sensitive to whether others waiting on the event loop get a turn, the tests could still fail. Same for other places in the patch.

::: toolkit/mozapps/extensions/test/xpcshell/test_locked.js
@@ +158,5 @@
>                                 "addon3@tests.mozilla.org",
>                                 "addon4@tests.mozilla.org",
>                                 "addon7@tests.mozilla.org",
> +                               "theme2@tests.mozilla.org"]);
> +  let deferred = Promise.defer();

Nit: I like names that indicate what's being promised, for example

let pUpdateFinished = Promise.defer();

(similarly throughout this patch)

@@ -180,3 @@
>  
> -function end_test() {
> -  testserver.stop(do_test_finished);

I think we need to keep the testserver.stop (which is async) as part of the test shutdown.

@@ +255,5 @@
> +  do_check_true(isThemeInAddonsList(profileDir, t2.id));
> +
> +  // Open another handle on the JSON DB with as much Unix and Windows locking
> +  // as we can to simulate some other process interfering with it
> +  shutdownManager();

shutdownManager spins the event loop; if we're taskifying these tests it would be nice to have a promiseShutdownManager and promiseRestartManager that didn't - perhaps a follow-up bug for that?

::: toolkit/mozapps/extensions/test/xpcshell/test_locked2.js
@@ +70,5 @@
>  
>  const profileDir = gProfD.clone();
>  profileDir.append("extensions");
>  
> +add_task(function() {

in test_locked.js you use function*() - which one is better?
Attachment #8370737 - Flags: feedback+
Comment on attachment 8370749 [details] [diff] [review]
Providing additional details from OS.File shutdown, v8

Review of attachment 8370749 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/modules/osfile_async_front.jsm
@@ +189,5 @@
>      this.resetTimer = setTimeout(File.resetWorker, delay);
>    },
>  
> +  /**
> +   * Push a task on top of the queue.

s/on top/on the tail/

@@ +191,5 @@
>  
> +  /**
> +   * Push a task on top of the queue.
> +   *
> +   * @param {function} code A function returning a Promise.

Document that code() is invoked after the preceding task on the queue resolves or rejects.

@@ +200,5 @@
> +    let deferred = Promise.defer();
> +    let promise = Task.spawn(function* () {
> +      try {
> +        yield this.queue;
> +        return (yield code());

This won't execute code() if this.queue rejects; is this the behaviour we want? As long as the queue is always managed using push() the rejections get eaten, so it should be OK, but it's a bit fragile.

Either way, I don't think this benefits from Task.spawn; I think you get the same result from

push: function(code) {
  // invoke code() after the previous promise resolves
  // if we wanted to be paranoid we could put ()=>code() on the
  // reject handler too
  let afterCode = this.queue.then(() => code());

  // swallow any errors from code() on the queue
  this.queue = afterCode.then(null, () => undefined);

  return afterCode;
}


Would it be too extreme to boil this down to:

push: function(code) {
  let fn = () => code();
  return this.queue = this.queue.then(fn, fn);
}


(This is my revenge for when you pushed me to simplify my Promise use in DeferredSave.jsm ;-)
(In reply to :Irving Reid from comment #26)
> @@ +200,5 @@
> > +    let deferred = Promise.defer();
> > +    let promise = Task.spawn(function* () {
> > +      try {
> > +        yield this.queue;
> > +        return (yield code());
> 
> This won't execute code() if this.queue rejects; is this the behaviour we
> want?

We have an invariant « this.queue cannot reject », documented a bit above.

> push: function(code) {
>   // invoke code() after the previous promise resolves
>   // if we wanted to be paranoid we could put ()=>code() on the
>   // reject handler too
>   let afterCode = this.queue.then(() => code());
> 
>   // swallow any errors from code() on the queue
>   this.queue = afterCode.then(null, () => undefined);
> 
>   return afterCode;
> }

Indeed, the spawn is not critical. I need to think about the uncaught-error-reporting properties of each approach.

> Would it be too extreme to boil this down to:
> 
> push: function(code) {
>   let fn = () => code();
>   return this.queue = this.queue.then(fn, fn);
> }
> 
> 
> (This is my revenge for when you pushed me to simplify my Promise use in
> DeferredSave.jsm ;-)

Well, that one would make |queue| fallible, which I don't want because it might cause unwanted "uncaught rejection" messages.
Turns out that my OS.File patch turns bug 966709 into a permanent orange, so let's fix it while we're at it.
Attachment #8371346 - Flags: review?(rvitillo)
Comment on attachment 8371346 [details] [diff] [review]
3. Porting test_TelemetrySendOldPings to OS.File

Review of attachment 8371346 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8371346 - Flags: review?(rvitillo) → review+
Whiteboard: [Async:team]
Comment on attachment 8370749 [details] [diff] [review]
Providing additional details from OS.File shutdown, v8

Review of attachment 8370749 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, with Irving's documentation concerns addressed.
Attachment #8370749 - Flags: review?(nfroyd) → review+
Attachment #8358422 - Attachment description: Extending AsyncShutdown error reporting, v4 → 1. Extending AsyncShutdown error reporting, v4
Applied feedback.
Attachment #8370737 - Attachment is obsolete: true
Attachment #8372149 - Flags: review+
Attachment #8371346 - Attachment description: Porting test_TelemetrySendOldPings to OS.File → 3. Porting test_TelemetrySendOldPings to OS.File
Backed out for xpcshell failures on all platforms. Please run this through Try before requesting checkin again.
https://hg.mozilla.org/integration/fx-team/rev/cff6ebdce225

https://tbpl.mozilla.org/php/getParsedLog.php?id=34292232&tree=Fx-Team
Flags: in-testsuite+
Whiteboard: [Async:team][fixed-in-fx-team] → [Async:team]
Sorry about that, it seems that I forgot to upload an updated patch.
This failure showed up on the push after yours. Being that I'm about to head out the door and don't feel like leaving things to chance, I backed it out again.
https://tbpl.mozilla.org/php/getParsedLog.php?id=34313053&tree=Fx-Team

https://hg.mozilla.org/integration/fx-team/rev/cbaafa9a0dc3
Whiteboard: [Async:team][fixed-in-fx-team] → [Async:team]
Good call, this looks like an insufficiently-fixed race condition in a test. I'll try and fix this over the week-end.
Others like it also have happened, so it wasn't a one-off.
So, after three more days of debugging, I confirm that there is a race condition somewhere between XPIProvider, AddonManager and test_AddonRepository. For some reason, the cleanup routine that removes the directory once it is empty sometimes takes place after the test that ensures that the directory has been removed. This may be related to the hard-to-follow combination of nsIFile and OS.File in XPIProvider, or to some client of DeferredSave.

I am rather confident that my patches did not introduce this race condition, although they obviously made it more apparent, possibly by delaying evaluation of some OS.File stuff by two ticks instead of one.

I have not been able to locate the source of the race condition. However, it seems (pending confirmation) that the cleanup routine always eventually takes place. I am tempted to just loop at the end of the test until the cleanup is complete, call this a day, land this patch and concentrate on the dependent bugs.

Blair, would this be ok with you?
Flags: needinfo?(bmcbride)
Comment on attachment 8373645 [details] [diff] [review]
2bis. Waiting for cleanup before finishing test_AddonRepository.js

Review of attachment 8373645 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, but please do file a bug with all the info from comment 41 and comment 44 (and CC Irving) so we have something we can find/reference in the future.
Attachment #8373645 - Flags: review?(bmcbride) → review+
Flags: needinfo?(bmcbride)
Folding.
Attachment #8372149 - Attachment is obsolete: true
Attachment #8373645 - Attachment is obsolete: true
Attachment #8374473 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: