test_AddonRepository_cache.js and test_update_strictcompat.js is permafailing on Win64 Aurora40 since the uplift

RESOLVED FIXED in Firefox 40

Status

()

Toolkit
Add-ons Manager
--
blocker
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: RyanVM, Assigned: mossop)

Tracking

unspecified
mozilla41
x86_64
Windows 8
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox40+ fixed, firefox41 fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
https://treeherder.mozilla.org/logviewer.html#?job_id=822779&repo=mozilla-aurora
Flags: needinfo?(dtownsend)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 2

3 years ago
It looks like this test is failing from too much recursion. Apparently all kinds of random things affect the available stack space to JS but I'm not entirely sure why this only happens on aurora. I think there are ways I can reduce the recursion going on here a lot, but I don't know if it will be upliftable. But then if xpcshell is hitting this it seems likely that real users could hit it too.
Assignee: nobody → dtownsend
Flags: needinfo?(dtownsend)
(Reporter)

Comment 3

3 years ago
Interesting given bug 1158223 that hit recently.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Updated

3 years ago
Depends on: 1164935, 1164934
(Assignee)

Comment 9

3 years ago
It's not possible to use try to verify a fix here because PGO builds timeout on try. Waiting on a project branch to be reset so I can use it to test a fix I have.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
m-a is currently closed because of this bug. What are our options for fixing the bustage in order to reopen the tree? Rather than fix the issue, is there a change that we can backout? If the issue is tests only, are there tests that we should disable?

Note that we also currently have Aurora updates disabled and are scheduled to re-enable updates today.
Flags: needinfo?(dtownsend)
(Assignee)

Comment 15

3 years ago
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #14)
> m-a is currently closed because of this bug. What are our options for fixing
> the bustage in order to reopen the tree? Rather than fix the issue, is there
> a change that we can backout? If the issue is tests only, are there tests
> that we should disable?

I don't know what caused this bustage, it's likely unrelated to the add-ons manager since it seems to be something limiting JS stack space so unless shu has other ideas it could be anything in the aurora merge. I don't know why it is only happening on aurora. It also looks likely to affect users too, unless something about xpcshell means we have less JS stack space for some reason so disabling the test doesn't seem like a good idea.

I now have access to a project branch that reproduces the failure and I've pushed a likely fix there to see the results. Assuming it works I should have a patch up for review today.
Flags: needinfo?(dtownsend)
status-firefox40: --- → affected
tracking-firefox40: --- → +
(Reporter)

Comment 16

3 years ago
Fun, now the test_bug732665.xul failures are back on win32 and win64 as well. Dave, mind if I piggyback on cypress for playing with the stack size? Shu gave me pointers on how to do it.
Flags: needinfo?(dtownsend)
(Assignee)

Comment 17

3 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #16)
> Fun, now the test_bug732665.xul failures are back on win32 and win64 as
> well. Dave, mind if I piggyback on cypress for playing with the stack size?
> Shu gave me pointers on how to do it.

Well if we're both trying to solve the same problem on the same tree we might end up solving it and not know which is the right solution. But if you have a better general solution (my fix would only affect add-ons manager code) then I guess that is the better solution anyway.
Flags: needinfo?(dtownsend)
(Assignee)

Comment 18

3 years ago
Created attachment 8606566 [details] [diff] [review]
patch

This is kind of the start of bug 987512 but for now it only touches AddonManagerInternal.getAddonByID and AddonManagerInternal.getAddonsByIDs converting them both to be entirely promise based. The public API is unaffected for now. This vastly reduces the stack used by calling getAddonsByIDs and also has the benefit of actually making the async calls run in parallel so may speed some operations up at the cost of doing some work that sometimes wouldn't have happened before. Verified on cypress that this solves the test failures.
Attachment #8606566 - Flags: review?(paolo.mozmail)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 24

3 years ago
Alas, bumping the stack size didn't fix this issue.

Comment 25

3 years ago
Comment on attachment 8606566 [details] [diff] [review]
patch

Review of attachment 8606566 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: toolkit/mozapps/extensions/AddonManager.jsm
@@ +267,5 @@
> + * @param  aProvider
> + *         The provider to call
> + * @param  aMethod
> + *         The method name to call
> + * @return a promise

nit: We use this convention in other parts of the codebase:

 * @return {Promise}
 * @resolves The result the provider returns, or |undefined| if the provider
 *           does not implement the method or the method throws.
 * @rejects Never.

(@rejects is omitted if the promise may reject with a JavaScript exception.)

@@ +2145,5 @@
>      if (!aID || typeof aID != "string")
>        throw Components.Exception("aID must be a non-empty string",
>                                   Cr.NS_ERROR_INVALID_ARG);
>  
> +    let promises = [promiseCallProvider(p, "getAddonByID", aID) for (p of this.providers)];

The new ES7 syntax for array comprehensions is preferred:

let promises = [for (p of this.providers) promiseCallProvider(p, "getAddonByID", aID)]

@@ +2147,5 @@
>                                   Cr.NS_ERROR_INVALID_ARG);
>  
> +    let promises = [promiseCallProvider(p, "getAddonByID", aID) for (p of this.providers)];
> +    return Promise.all(promises).then(aAddons => {
> +      return aAddons.find(a => a != null) || null;

Maybe ".find(a => !!a)"?

@@ +2197,5 @@
>     *
>     * @param  aIDs
>     *         The array of IDs to retrieve
> +   * @return a promise that will be resolved with the array of Addons
> +   * @throws if the aID argument are not specified

nit: if the aIDs argument is not specified

@@ +2878,5 @@
> +    if (typeof aCallback != "function")
> +      throw Components.Exception("aCallback must be a function",
> +                                 Cr.NS_ERROR_INVALID_ARG);
> +
> +    AddonManagerInternal.getAddonByID(aID).then(makeSafe(aCallback));

You may still want a .catch(Cu.reportError) at the end of the chain just in case getAddonByID is changed in the future so that it may return a rejected promise.
Attachment #8606566 - Flags: review?(paolo.mozmail) → review+
(Reporter)

Comment 26

3 years ago
Fun observation - these failures only hit on dep builds. The win64 nightlies are fine.

Comment 27

3 years ago
https://hg.mozilla.org/integration/fx-team/rev/59e4faaa69f2
https://hg.mozilla.org/mozilla-central/rev/59e4faaa69f2
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
(Assignee)

Comment 29

3 years ago
Comment on attachment 8606566 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]: Unknown
[User impact if declined]: We might be seeing metadata updates for add-ons fail for users who have limited memory available
[Describe test coverage new/current, TreeHerder]: The functions changed are probably the most well tested functions in the add-ons manager, many many automated tests call these functions in various ways and verify the results
[Risks and why]: This does change the exact timing of the asynchronous functions but no code should be relying on that. This should be low risk.
[String/UUID change made/needed]: None
Attachment #8606566 - Flags: approval-mozilla-aurora?
Comment hidden (Treeherder Robot)
Comment on attachment 8606566 [details] [diff] [review]
patch

Approved for uplift to aurora; has test coverage.
Attachment #8606566 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Reporter)

Comment 32

3 years ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/d5fbc61783e4
status-firefox40: affected → fixed
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.