Implement HTML5 Script Settings Object Stack

RESOLVED FIXED in mozilla29

Status

()

Core
DOM
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: bz, Assigned: bholley)

Tracking

(Blocks: 1 bug)

unspecified
mozilla29
x86
Mac OS X
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(13 attachments, 5 obsolete attachments)

3.02 KB, patch
Details | Diff | Splinter Review
12.05 KB, patch
Details | Diff | Splinter Review
6.81 KB, patch
Details | Diff | Splinter Review
4.03 KB, patch
Details | Diff | Splinter Review
7.25 KB, patch
Details | Diff | Splinter Review
23.54 KB, patch
Details | Diff | Splinter Review
32.35 KB, patch
Details | Diff | Splinter Review
9.24 KB, patch
Details | Diff | Splinter Review
92.11 KB, patch
Details | Diff | Splinter Review
1.43 KB, patch
Details | Diff | Splinter Review
7.35 KB, patch
bholley
: review+
Details | Diff | Splinter Review
12.19 KB, patch
luke
: review+
Details | Diff | Splinter Review
13.19 KB, patch
sfink
: review+
Details | Diff | Splinter Review
Just need to do this.
Flags: needinfo?(bzbarsky)
(Assignee)

Comment 1

5 years ago
I was going to write this as a continuation of the private thread "JS_DescribeScriptedCaller and selfHosted" from last summer, but I think it's better to move to a bug instead.

Hixie has now refactored the spec to describe a 'script settings object', which has a 1:1 mapping with globals. So we should probably define a corresponding object, and support it for both worker and non-worker globals.

In the new spec model, there's a stack of incumbent scripts, some of which are also marked 'candidate entry scripts'. The 'candidate entry script' thing only happens during event dispatch as far as I can tell.

The common case is 'When a JavaScript SourceElements production is to be evaluated', which causes us to push a global onto the stack (and generally not label it as a script entry point). WebIDL callbacks capture the incumbent script at the time when they were passed, and restore that onto the stack when they are invoked.

Obviously, we don't want an explicit representation of the entire stack of incumbent scripts, because that duplicates information already inside the JS engine, and would require munging the stack at every function call. So it seems like we want to do something like the following:

* Define some kind of script settings type. This could either be a separate object that both Window and WorkerGlobalScope create, or it could be an interface that both native globals QI to. I think the latter makes more sense unless there's something I haven't thought of. Let's call this nsIScriptSettingsObject for now.
* Add a per-CycleCollectedJSRuntime stack of ScriptSettingsStackEntries. Each entry has, at minimum, an nsIScriptSettingsObject and a bool indicating whether or not it's a candidate entry script.
* Add a hook in the JS engine such that, whenever we push one of these entries onto the stack, we increment a counter on the topmost function activation. If this is nonzero, JS_HasScriptedCallerOverride returns true, and JS_DescribeScriptedCaller asserts. We should assert that the entry is zero when the activation dies.
* Build a Gecko-side wrapper around this junk so that callers can just invoke GetIncumbentScript and GetEntryScript, and it will return the proper nsIScriptSettingsObject from a combination of the data in the JS engine and the data in the ScriptSettingsStack.
* When entering a top-level script for a page, push its global onto the stack, and mark it as a candidate entry script.
* Capture the result of GetIncumbentScript when passing a callback to WebIDL, and store the result on the callback.
* When invoking a WebIDL callback, push the stored nsIScriptSettingsObject. If we're doing event dispatch, tag it as a candidate entry point. Otherwise, don't.

There's also the question of what to do when we spin the event loop. The spec currently says that the stack is set aside. As I described in bug 924905 comment 28, I think we should generally get this changed, and move to a model whereby you can see whatever parts of the stack you subsume. But we could introduce some kind of null pushing for now.

Thoughts? Did I miss anything?
(Assignee)

Comment 2

5 years ago
Oh, and as for the naming of the RAII classes:

We could do AutoEntryScript and AutoIncumbentScript. But it might make more sense to do:

AutoScriptSettings foo(nsIScriptSettings arg);
foo.setCandidateEntryScript();
I don't understand the candidate entry script bit, and I doubt it's web-compatible ascurrently written.  I think every call of a WebIDL callback should be an entry script.
Flags: needinfo?(bzbarsky)
Or more precisely, an entry script settings object, since there may be no "script" there at all.
So for now I've created AutoEntryScript and AutoIncumbentScript.  They both take an nsIGlobalObject.  The former also takes (for now?) a JSContext and a boolean, because it turns out that there are situations in which we do not in fact have an nsIGlobalObject and getting from an nsIGlobalObject to an nsJSContext in the cases when we _do_ have the former is slow (QI, really?).

Once we nix the JSContext dependence we'll be able to clean that up....
Created attachment 832460 [details] [diff] [review]
Bobby, somethinkg like this what you were thinking of?
Attachment #832460 - Flags: review?(bobbyholley+bmo)
I could also try to push more of the exception-reporting in there if we want, but that's the main thing that could go in there.
(Assignee)

Comment 8

5 years ago
Comment on attachment 832460 [details] [diff] [review]
Bobby, somethinkg like this what you were thinking of?

Review of attachment 832460 [details] [diff] [review]:
-----------------------------------------------------------------

I think that Auto{Entry,Incumbent}Script.{cpp,h} should be merged into a ScriptSettings.{cpp,h}. I'll stick the other relevant stuff in that file as well.

r=bholley with comments addressed. Since inbound is going to be closed for a while, maybe you can just upload the new patch and I'll manually apply it for now?

::: dom/bindings/CallbackObject.cpp
@@ +107,2 @@
>    } else {
>      cx = workers::GetCurrentThreadJSContext();

We need to set globalObject here as well, right?
Attachment #832460 - Flags: review?(bobbyholley+bmo) → review+
(Assignee)

Updated

5 years ago
Blocks: 938878
> I think that Auto{Entry,Incumbent}Script.{cpp,h} should be merged into a ScriptSettings.

OK.

> maybe you can just upload the new patch 

Sure thing.

> We need to set globalObject here as well, right?

It's already null.
Or more to the point, I had no idea how I go about getting an nsIGlobalObject on a worker and the script settings stuff clearly has to handle a null nsIGlobalObject, since that's what we get in the safe context case.  I think I can see how to get an nsIGlobalObject on a worker, though, so maybe I should do that...
Attachment #832460 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
No longer blocks: 938878
(Assignee)

Comment 12

5 years ago
I think it makes more sense to land this all in one go in bug 938878.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 938878
(Assignee)

Comment 13

5 years ago
Or actually, no. This bug has more context. I'm just going to rename it. Sorry for the churn.
Assignee: bzbarsky → bobbyholley+bmo
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: Factor out CallSetup into a AutoScriptEntryPoint struct or something → Implement HTML5 Script Settings Object Stack
(Assignee)

Updated

5 years ago
Duplicate of this bug: 938878
(Assignee)

Updated

5 years ago
Depends on: 939166
(Assignee)

Updated

5 years ago
Blocks: 785310
Blocks: 941069
(Assignee)

Comment 17

5 years ago
I'm still working on the tests, but so far everything appears to be working. Doing a try push in case Bob wants to do anything over the next few days (I won't be doing much work).

https://tbpl.mozilla.org/?tree=Try&rev=49cf546423bf

Alternatively, you can pull them from github here:

https://github.com/bholley/gecko-dev/tree/entryscript
(Assignee)

Updated

4 years ago
Attachment #832618 - Attachment is obsolete: true
(Assignee)

Comment 18

4 years ago
Try push was green, and I've got tests that pass. Uploading and flagging for review.
(Assignee)

Comment 19

4 years ago
Created attachment 8342034 [details] [diff] [review]
Part 1 - Make TabChildGlobal implement nsIGlobalObject. v1
Attachment #8342034 - Flags: review?(bzbarsky)
(Assignee)

Comment 20

4 years ago
Created attachment 8342035 [details] [diff] [review]
Part 2 - Factor out the shareable parts of CallSetup into AutoEntryScript and AutoIncubentScript. v3
Attachment #8342035 - Flags: review?(bzbarsky)
(Assignee)

Comment 21

4 years ago
Created attachment 8342036 [details] [diff] [review]
Part 3 - Implement basic script settings stack machinery. v1
Attachment #8342036 - Flags: review?(bzbarsky)
(Assignee)

Comment 22

4 years ago
Created attachment 8342037 [details] [diff] [review]
Part 4 - Manipulate the script settings stack from the RAII classes. v1
Attachment #8342037 - Flags: review?(bzbarsky)
(Assignee)

Comment 23

4 years ago
Created attachment 8342038 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v1
Attachment #8342038 - Flags: review?(bzbarsky)
(Assignee)

Comment 24

4 years ago
Created attachment 8342040 [details] [diff] [review]
Part 6 - Pass the entire CallbackObject to CallSetup. v1

We need this so that we can grab the incumbent global that we stashed on the
CallbackObject.
Attachment #8342040 - Flags: review?(bzbarsky)
(Assignee)

Comment 25

4 years ago
Created attachment 8342041 [details] [diff] [review]
Part 7 - When invoking a callback object, restore the incumbent script settings object from when the callback was created. v1

See the 'incumbent script' stuff in the WebIDL spec.
Attachment #8342041 - Flags: review?(bzbarsky)
(Assignee)

Comment 26

4 years ago
Created attachment 8342043 [details] [diff] [review]
Part 8 - Replace all instance of null cx pushing with AutoSystemCaller. v1

This is an easy bonus chunk of the work to phase out cx pushing in the browser.
Attachment #8342043 - Flags: review?(bzbarsky)
(Assignee)

Comment 27

4 years ago
Created attachment 8342044 [details] [diff] [review]
Part 9 - Tests. v1
Attachment #8342044 - Flags: review?(bzbarsky)
(Assignee)

Comment 28

4 years ago
Created attachment 8342045 [details] [diff] [review]
rollup.patch

rollup patch for reference
Comment on attachment 8342034 [details] [diff] [review]
Part 1 - Make TabChildGlobal implement nsIGlobalObject. v1

r=me
Attachment #8342034 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342035 [details] [diff] [review]
Part 2 - Factor out the shareable parts of CallSetup into AutoEntryScript and AutoIncubentScript. v3

>+    MOZ_ASSERT(aIsMainThread, "cx is mandatory off-main-thread");

This should be documented in the header, I think.

Is there an existing bug about making GetNativeForGlobal() work with WebIDL globals?  If not, please file one.

r=me
Attachment #8342035 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342035 [details] [diff] [review]
Part 2 - Factor out the shareable parts of CallSetup into AutoEntryScript and AutoIncubentScript. v3

Though note that AutoIncumbentScript seems unused so far.  That's OK, since it's a no-op, but is that expected here?
Comment on attachment 8342036 [details] [diff] [review]
Part 3 - Implement basic script settings stack machinery. v1

>+  void Push(ScriptSettingsStackEntry& aSettings) {

If we store the pointer here, I think the caller should pass in a pointer too.  Otherwise it's too easy to accidentally pass a reference to something on the stack or some such and then unwind the stack (whereas taking a pointer-to-stack immediately makes people check lifetimes).

>+void InitScriptSettings();

Document that this only needs to be called once, during startup?  And maybe fix the docs for ThreadLocal to make it clear that it only needs to be inited once, not once per thread or something?

I assume there's a reason to use vector instead of nsTArray?

I still don't understand when a script setting stack entry would _not_ be an entry point candidate.  Can you please point me to an explanation for that?

r=me modulo all that
Attachment #8342036 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342036 [details] [diff] [review]
Part 3 - Implement basic script settings stack machinery. v1

Oh, I see, the non-entry things just track incumbent scripts.  Gotcha.
Comment on attachment 8342037 [details] [diff] [review]
Part 4 - Manipulate the script settings stack from the RAII classes. v1

>+  mStack.Push(ScriptSettingsStackEntry::SystemSingleton);

PushSystem?

r=me
Attachment #8342037 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342038 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v1

>+  JSContext *cx = nsContentUtils::GetCurrentJSContext();

That's mainthread-only, no?  Why is that ok?  I don't think it is.

>+  , mOverride(nsContentUtils::GetCurrentJSContext())

And here.

The JS parts will need review by a JS peer, obviously.
Attachment #8342038 - Flags: review?(bzbarsky) → review-
Comment on attachment 8342040 [details] [diff] [review]
Part 6 - Pass the entire CallbackObject to CallSetup. v1

r=me

Though those (preexisting) comments about JSAutoRequest worry me: what happens on workers?
Attachment #8342040 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342041 [details] [diff] [review]
Part 7 - When invoking a callback object, restore the incumbent script settings object from when the callback was created. v1

Ah, this fixes the worker GetIncumbentGlobal issue, ok.  That should be in the earlier relevant patch.

> nsDOMEventTargetHelper::SetEventHandler(nsIAtom* aType,

This should presumably use the incumbent global, not null?

>+++ b/content/events/src/nsEventListenerManager.cpp

This should probably use whatever global we just used to compile?  Though maybe it doesn't matter, since that'll be the incumbent global of the function anyway.

>+++ b/dom/base/nsGlobalWindow.cpp
>+      handler = new EventHandlerNonNull(callable, nullptr);                  \

Again, why not the incumbent global here?

>+      handler = new OnErrorEventHandlerNonNull(callable, nullptr);           \

And here.

>+      handler = new OnBeforeUnloadEventHandlerNonNull(callable, nullptr);    \

And here.

>+++ b/dom/bindings/CallbackObject.cpp
>+  tmp->mIncumbentGlobal = nullptr;

  NS_IMPL_CYCLE_COLLECTION_UNLINK(mIncumbentGlobal)

>+    // XXXbholley - Is this right? What are these used for?

Ideally, nothing.  The only in-tree consumers are NodeIterator/TreeWalker, and they would only use if if you somehow create one with an XPCOM callback (that wraps a JS object!) and then get .filter from JS.  Maybe we should just make this thing always return null if it's not already a webidl callback and not worry about this situation.

r=me modulo those nits
Attachment #8342041 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342043 [details] [diff] [review]
Part 8 - Replace all instance of null cx pushing with AutoSystemCaller. v1

>+++ b/content/base/src/nsImageLoadingContent.cpp

Actually, I think the bits here can just go away.  They should have in bug 880340; I'm sorry I didn't notice that they were left behind.

>+++ b/dom/src/geolocation/nsGeolocation.cpp

I strongly suspect the bits here are also not needed now that we're using WebIDL callbacks (which don't just reuse whatever JSContext is on the stack).  Can you please double-check that?

>+++ b/layout/base/nsLayoutUtils.cpp

This one can go away just like the ones in nsImageLoadingContent.

>+++ b/layout/generic/nsVideoFrame.cpp

And likewise here.

r=me with those fixed.
Attachment #8342043 - Flags: review?(bzbarsky) → review+
Comment on attachment 8342044 [details] [diff] [review]
Part 9 - Tests. v1

Maybe use DOM promises instead of the semi=deprecated Promise.jsm?

You probably want a test for an event handler compiled from a content attribute.

r=me
Attachment #8342044 - Flags: review?(bzbarsky) → review+
And also, thank you for doing this!
(Assignee)

Comment 41

4 years ago
(In reply to Boris Zbarsky [:bz] from comment #30)
> Is there an existing bug about making GetNativeForGlobal() work with WebIDL
> globals?  If not, please file one.

filed bug 946289.

(In reply to Boris Zbarsky [:bz] from comment #32)

> I assume there's a reason to use vector instead of nsTArray?

I think at some point I chose it because of the copying semantics or something, but I can't seem to recall why. Given that the current design just uses opaque unowned pointers, nsTArray should be fine. I'll switch to that.

(In reply to Boris Zbarsky [:bz] from comment #36)
> Though those (preexisting) comments about JSAutoRequest worry me: what
> happens on workers?

I think there's an explicit JSAutoRequest that gets constructed somewhere up high in the worker event loop. If there weren't, we'd assert all over the place.

(In reply to Boris Zbarsky [:bz] from comment #37)
> Ah, this fixes the worker GetIncumbentGlobal issue, ok.  That should be in
> the earlier relevant patch.

Yep. Just a mismerge on my part. I'll fix it.
(Assignee)

Comment 42

4 years ago
Created attachment 8342467 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v2
Attachment #8342038 - Attachment is obsolete: true
Attachment #8342467 - Flags: review?(luke)
Attachment #8342467 - Flags: review?(bzbarsky)
Comment on attachment 8342467 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v2

>+  , mOverride(nsContentUtils::GetCurrentJSContext())

This still needs to be fixed.

Can we just make nsContentUtils::GetCurrentJSContext do the IsMainThread check?  Or add a new function that does the right thing and call it both places we need this in this patch?
Attachment #8342467 - Flags: review?(bzbarsky) → review-
(Assignee)

Comment 44

4 years ago
Created attachment 8342501 [details] [diff] [review]
Part 7.5 - Return null from ToWebIDLCallback if one doesn't already exist. v1
Attachment #8342501 - Flags: review?(bzbarsky)
(Assignee)

Comment 45

4 years ago
Created attachment 8342502 [details] [diff] [review]
Part 8.5 - Remove unnecessary AutoSystemCaller usage. v1 r=bz

See bug 937317 comment 38.
Attachment #8342502 - Flags: review+
Comment on attachment 8342501 [details] [diff] [review]
Part 7.5 - Return null from ToWebIDLCallback if one doesn't already exist. v1

r=me
Attachment #8342501 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 47

4 years ago
Created attachment 8342509 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v3

Sorry for missing that. Putting this in nsContentUtils is an obvious
improvement.
Attachment #8342467 - Attachment is obsolete: true
Attachment #8342467 - Flags: review?(luke)
Attachment #8342509 - Flags: review?(bzbarsky)
Comment on attachment 8342509 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v3

Yes, much more like it.

You still want the review from Luke.
Attachment #8342509 - Flags: review?(bzbarsky) → review+
(Assignee)

Updated

4 years ago
Attachment #8342509 - Flags: review?(luke)

Comment 49

4 years ago
Comment on attachment 8342509 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v3

Review of attachment 8342509 [details] [diff] [review]:
-----------------------------------------------------------------

Nice to see all this coming together!  Is there a good high-level summary of what the incumbent global rules are?  I know we had an email thread where we agreed on something a couple months ago, but I forget exactly what we agreed and I don't know if the rules changed since then :)

Nit: "ScriptedCallerOverride" sounds a bit more intimidating than it really is.  IIUC, all we're doing is hiding the current activation, so could the phrase used in all the identifiers be "HideScriptedCaller"?  As in:
  JS_HideScriptedCaller
  JS_UnhideScriptedCaller
  JSAutoHideScriptedCaller
  Activation::hideScriptedCallerCount_;
  Activation::hideScriptedCaller()
  Activation::unhideScriptedCaller()
  Activation::scriptedCallerIsHidden()
?  Comments could be updated to use this terminology as well.  I'd also specifically mention what happens when you've hid the top activation and then, before unhiding it, you reenter the VM (which pushes a new unhidden activation, iiuc).

Also, can you put the new jsapi bits in namespace JS?

::: js/src/jsapi.cpp
@@ +6110,5 @@
>  
> +    // If there's an override, the embedding wants us to return null here so
> +    // that it can check its own stack.
> +    if (i.activation()->hasScriptedCallerOverride())
> +        return false;

For symmetry with the comment below, can you use:

  if (cx->runtime()->mainThread().activation()->hasScriptedCallerOverride())
      return false;

and put it before the NonBuiltinScriptFrameIter decl.

@@ +6129,5 @@
> +    // anything.
> +    NonBuiltinScriptFrameIter i(cx);
> +    if (i.done())
> +        return;
> +    i.activation()->addScriptedCallerOverride();

I think what you want is cx->runtime()->mainThread().activation().
(Assignee)

Comment 50

4 years ago
Created attachment 8342738 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v4 r=bz

Addressed luke's review comments.
Attachment #8342509 - Attachment is obsolete: true
Attachment #8342509 - Flags: review?(luke)
Attachment #8342738 - Flags: review?(luke)

Comment 51

4 years ago
Comment on attachment 8342738 [details] [diff] [review]
Part 5 - Implement and expose GetIncumbentGlobal. v4 r=bz

Great!
Attachment #8342738 - Flags: review?(luke) → review+
(Assignee)

Comment 52

4 years ago
Final all-platform try push:

https://tbpl.mozilla.org/?tree=Try&rev=5742c0f93432
(Assignee)

Comment 53

4 years ago
debug-only compilation bustage. opt tests look ok. Repushing debug:

https://tbpl.mozilla.org/?tree=Try&rev=bba4405e114a
(Assignee)

Comment 54

4 years ago
(In reply to Bobby Holley (:bholley) from comment #53)
> debug-only compilation bustage. opt tests look ok. Repushing debug:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=bba4405e114a

Hm, looks _almost_ green, except for _one_ mochitest-o run where the tests for this bug (test_scriptSettings.xul) timed out.

I added some more diagnostics, and let's see if this reproduces with any amount of retriggering:

https://tbpl.mozilla.org/?tree=Try&rev=4e1344314441
(Assignee)

Comment 55

4 years ago
Ok, with all that retriggering I can't reproduce the one test_scriptSettings.xul timeout from comment 53. So I'm going to just push this, with the extra diagnostics intact. If it ever goes orange on tinderbox, the log will presumably give us the information we need. :-)

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=c84430c040dd
(Assignee)

Comment 58

4 years ago
Looks like the winXP opt compiler (and that compiler only) requires an argument to MOZ_ASSUME_UNREACHABLE.

https://tbpl.mozilla.org/?tree=Try&rev=8ec945afa227

Updated

4 years ago
Depends on: 947525

Updated

4 years ago
Keywords: intermittent-failure
Backed out in https://hg.mozilla.org/mozilla-central/rev/b50d803d0ad5 - apparently that gaia-ui-test failure in the try push in comment 52 was real, and virtually permanent (though not, inconveniently, permanent enough to have hit the one push on inbound that we decided to merge, so you wound up merged to every tree and busting every tree).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla28 → ---
For what it's worth, I'm currently in the process of trying to get some actual directions for running the gaia UI tests out of the people who created them, for bug 697343.  Hopefully once we have that we can see what's going on...
Created attachment 8345480 [details] [diff] [review]
Root around GC call GetIncumbentGlobal,
Assignee: bobbyholley+bmo → sphink
Status: REOPENED → ASSIGNED
Duplicate of this bug: 947555
Comment on attachment 8345480 [details] [diff] [review]
Root around GC call GetIncumbentGlobal,

r=bz from bug 947555
Attachment #8345480 - Flags: review+
Assignee: sphink → bobbyholley+bmo
(Assignee)

Comment 69

4 years ago
comment 65 indicates that part 2 is the culprit. Nice.

So I just sunk some time into trying to reproduce this locally. After a lot of fiddling I finally managed to run the gaia-ui tests. Unfortunately, while there _are_ various failure, I think they're related to the differences in using a debug build on OSX (as opposed to an opt build on linux64). Getting rid of --restart helps a fair amount, but the actual socket-related timeouts from TBPL [1] still don't reproduce.

jgriffin and I talked about this on IRC. As it turns out, this is _already_ an existing intermittent failure (bug 948395), and these patches just seem to be tickling it to make it a bit less intermittent (though not fully permanent, see comment 61). And given the nature of patch 2, and the fact that everything else on CI is green, I'm near-certain that it's the b2g automation that's going to end up needing to change.

I think it makes sense for someone on the a-team to continue the investigation at this point. Zac, can you take the lead? This is rather high-priority stuff from the platform and security side.

[1] https://tbpl.mozilla.org/?tree=Try&rev=48254a3fc029
Flags: needinfo?(zcampbell)

Comment 70

4 years ago
This Gu intermittent in bug 948395 was only brought to my attention Friday last week so I'm not entirely convinced you're in the clear.

I will run the 48254a3fc029 build locally today and let you know.

Comment 71

4 years ago
This build is causing b2g to crash.

The Crash Reporter pops up and the socket times out because the b2g process us not running anymore.

I filed the crash reports:
https://crash-stats.mozilla.com/report/index/38d54d4b-302c-416e-9c7b-deac02131211
https://crash-stats.mozilla.com/report/index/57f37354-36da-42a9-b391-c21f72131211



ROUND 0
-------
starting httpd
running webserver on http://10.246.27.126:51141/
TEST-START test_settings.py
test_get_all_settings (test_settings.TestSettings) ... ok
test_set_named_setting (test_settings.TestSettings) ... ERROR
test_set_volume (test_settings.TestSettings) ... ERROR

======================================================================
ERROR: None
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette_test.py", line 127, in run
    self.setUp()
  File "/home/zac/Mozilla/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 814, in setUp
    self.cleanup_gaia(full_reset=False)
  File "/home/zac/Mozilla/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 882, in cleanup_gaia
    self.apps.kill_all()
  File "/home/zac/Mozilla/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 118, in kill_all
    self.marionette.execute_async_script("GaiaApps.killAll()")
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette.py", line 1073, in execute_async_script
    filename=os.path.basename(frame[0]))
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette.py", line 567, in _send_message
    raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None)
TEST-UNEXPECTED-FAIL | test_settings.py test_settings.TestSettings.test_set_named_setting | TimeoutException: socket.timeout
======================================================================
ERROR: None
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette_test.py", line 127, in run
    self.setUp()
  File "/home/zac/Mozilla/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 773, in setUp
    MarionetteTestCase.setUp(self)
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette_test.py", line 312, in setUp
    CommonTestCase.setUp(self)
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette_test.py", line 247, in setUp
    self.marionette.start_session()
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette.py", line 681, in start_session
    self.session = self._send_message('newSession', 'value')
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.6.2-py2.7.egg/marionette/marionette.py", line 567, in _send_message
    raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None)
TEST-UNEXPECTED-FAIL | test_settings.py test_settings.TestSettings.test_set_volume | TimeoutException: socket.timeout
----------------------------------------------------------------------
Ran 3 tests in 494.882s

FAILED (errors=2)
Flags: needinfo?(zcampbell)
> I filed the crash reports:

Those seem to have no useful symbols.  :(  Was this a local build of b2g that crash-stats would know nothing about?

The only useful thing in there is that we seem to think it's a null-deref...

Comment 73

4 years ago
Boris it was the Try build from here: https://tbpl.mozilla.org/?tree=Try&rev=48254a3fc029 so I guess that crash-stats would not be familiar.

I unpacked that, then I built a Gaia profile using a commit around the time that the Try build was made.

I ran the tests with the Try binary with the Gaia profile together.

I didn't do any manual exploratory testing to see if I could trigger the crash.
Benjamin, is there a way to do a try b2g build that crash-stats would have symbols for?
Flags: needinfo?(benjamin)

Comment 75

4 years ago
Well, not crash-stats, no. But the symbols are uploaded with the try results at http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/bobbyholley@gmail.com-48254a3fc029/try-linux64_gecko/.

So if you have a minidump and those symbols, you can run minidump-stackwalk locally on it to get your report.
Flags: needinfo?(benjamin)
Hmm.  How do I get a minidump?  Is that the "raw dump" option on crashstats?  I don't seem to have the bits to download those....
Flags: needinfo?(benjamin)
Benjamin sent me the minidumps.

The stack looks like so:

 0  libxul.so!JSAutoCompartment::JSAutoCompartment(JSContext*, JSObject*) [Barrier.h:48254a3fc029 : 278 + 0x0]
    rbx = 0x00007f732514a140   r12 = 0x00007f730fe8a7a8
    r13 = 0x00007f72f19d4101   r14 = 0x0000000000000001
    r15 = 0x0000000000000000   rip = 0x00007f73232c4b6b
    rsp = 0x00007fff38741da8   rbp = 0x00007fff38741e40
    Found by: given as instruction pointer in context
 1  libxul.so!mozilla::dom::AutoEntryScript::AutoEntryScript(nsIGlobalObject*, bool, JSContext*) [Maybe.h:48254a3fc029 : 61 + 0xe]
    rbx = 0x00007f732514a140   r12 = 0x00007f730fe8a7a8
    r13 = 0x00007f72f19d4101   r14 = 0x0000000000000001
    r15 = 0x0000000000000000   rip = 0x00007f73226fa7d7
    rsp = 0x00007fff38741db0   rbp = 0x00007fff38741e40
    Found by: call frame info
 2  libxul.so!mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject*>, mozilla::ErrorResult&, mozilla::dom::CallbackObject::ExceptionHandling, JSCompartment*) [Maybe.h:48254a3fc029 : 68 + 0x15]
    rbx = 0x00007fff38741e30   r12 = 0x00007f732514a140
    r13 = 0x00007f72f19d4158   r14 = 0x0000000000000001
    r15 = 0x0000000000000000   rip = 0x00007f73225ae1b8
    rsp = 0x00007fff38741df0   rbp = 0x00007f730fe8a7a8
    Found by: call frame info
 3  libxul.so!mozilla::dom::EncodingCompleteEvent::Run() [HTMLCanvasElementBinding.h:48254a3fc029 : 112 + 0x4]
    rbx = 0x00007f72f7545a00   r12 = 0x00007f72f19d4140
    r13 = 0x00007fff38741f5f   r14 = 0x0000000000000001
    r15 = 0x0000000000000000   rip = 0x00007f73228716f4
    rsp = 0x00007fff38741e20   rbp = 0x00007f72f1a9a590
    Found by: call frame info

and above that is just event loop stuff.  Both of the stacks from comment 71 look like that.
Flags: needinfo?(benjamin)
The relevant line in AutoEntryScript() is this:

>+  mAc.construct(aCx, aGlobalObject->GetGlobalJSObject());

I don't see an obvious way that aCx would be null at this point, but could aGlobalObject->GetGlobalJSObject() be null?
Note that if you push to try with a special patch applied you can also get actual debug symbols and attach a debugger:
https://wiki.mozilla.org/ReleaseEngineering/TryServer#Getting_debug_symbols
E.g. I suspect that at times nsInProcessTabChildGlobal::GetGlobalJSObject might return null (for a closed tab?).
(Assignee)

Comment 81

4 years ago
(In reply to Boris Zbarsky [:bz] from comment #80)
> E.g. I suspect that at times nsInProcessTabChildGlobal::GetGlobalJSObject
> might return null (for a closed tab?).

Yeah, that's probably likely. Maybe we should just bite the bullet and null-check the return value of GetGlobalJSObject() and bail, just like we do for non-current inners?
Worth a shot, at least...
(Assignee)

Comment 84

4 years ago
(In reply to Bobby Holley (:bholley) from comment #83)
> https://tbpl.mozilla.org/?tree=Try&rev=759cae801844

Youpie! This is green. I'll fold that patch into part 2 (bz, does comment 82 count as r+ on that change?), apply sfink's rooting patch, and do one final try push.
> does comment 82 count as r+ on that change?

Let's say yes.
Keywords: intermittent-failure

Updated

4 years ago
Flags: in-testsuite?
(Assignee)

Updated

4 years ago
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.