Closed Bug 1176829 Opened 5 years ago Closed 4 years ago

[Custom Elements] [Email]Device will jump to blank page.

Categories

(Core :: DOM: Core & HTML, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla43
blocking-b2g 2.5+
Tracking Status
firefox43 --- fixed
b2g-v2.2 --- unaffected
b2g-master --- verified

People

(Reporter: liuke, Assigned: wchen)

References

Details

(Keywords: regression, Whiteboard: [v3.0-nexus-5-l],[2.5-aries-test-run-1],[2.5-aries-test-run-2])

Attachments

(4 files)

Attached file logcat_1535.txt
[1.Description]:
[Flame v3.0][Nexus 5 v3.0][Email]Device will jump to blank page when you share image/video/music to Email for twice.
Found time:15:35
See attachment:1535.3gp and logcat_1535.txt

[2.Testing Steps]: 
Prerequisite: No email account is logged in.
1.Launch Gallery.
2.Select an image to share by Email.
3.Tap back key to back Gallery view.
4.Repeat step 2.

[3.Expected Result]: 
4.Device should enter Email log in page.

[4.Actual Result]: 
4.Device will jump to blank page.

[5.Reproduction build]: 
Device: Flame 2.2 version(Unaffected):
Build ID               20150619002501
Gaia Revision          1c33072e33c279c8aa5cb5e4a3e4da6af6cd818b
Gaia Date              2015-06-19 01:36:47
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/5ad34a170633
Gecko Version          37.0
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150619.042950
Firmware Date          Fri Jun 19 04:30:02 EDT 2015
Bootloader             L1TC000118D0

Device: Flame 3.0 version(Affected):
Build ID               20150622160206
Gaia Revision          311c4e59936a407e64509f54fecb440d8a78e3c8
Gaia Date              2015-06-20 20:21:42
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/be81b8d6fae9
Gecko Version          41.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150622.193834
Firmware Date          Mon Jun 22 19:38:45 EDT 2015
Bootloader             L1TC000118D0

Device: Nexus 5_2.2 version(Unaffected):
Build ID               20150621002506
Gaia Revision          1f8981d7872e3c0053571c26fb3edaf401844d75
Gaia Date              2015-06-19 13:22:30
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/9a0d8f7b1200
Gecko Version          37.0
Device Name            hammerhead
Firmware(Release)      5.1
Firmware(Incremental)  eng.cltbld.20150621.035418
Firmware Date          Sun Jun 21 03:54:36 EDT 2015
Bootloader             HHZ12f

Device: Nexus 5_3.0 version(Affected):
Build ID               20150622010206
Gaia Revision          d8bac9577a537b5e6bafc3f7ed088af5ea60c99e
Gaia Date              2015-06-19 13:21:50
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/a38c23ccca0a
Gecko Version          41.0a1
Device Name            hammerhead
Firmware(Release)      5.1
Firmware(Incremental)  eng.cltbld.20150622.043152
Firmware Date          Mon Jun 22 04:32:09 EDT 2015
Bootloader             HHZ12f

[6.Reproduction Frequency]: 
Always Recurrence,5/5

[7.TCID]: 
Free Test
Attached video 1535.3GP
Flags: needinfo?(twen)
Keywords: regression
This is a new bug introduced by the change of flow in sharing by email in V3.0. 

Nominate for spark blocking.
blocking-b2g: --- → spark?
Flags: needinfo?(twen)
Duplicate of this bug: 1180929
Took a brief look today. The issue seems to be the cards-setup-account-info custom element's onArgs method is called before the createdCallback method, and that causes an error, since the template work and binding to elements in the template occurs in the createdCallback chain of methods. The `this.backButton` is undefined because the template binding has not occurred.

This is very strange though: cards.js, pushCard() calls `new cardDef()` (I confirmed it is not using `args.cachedNode` in the error case), and I expect that to trigger the createdCallback method. Right after `new cardDef()` is called, `domNode.onArgs(args);` is called.

When I logged creation of cards-setup-account-info instances, I see three instances created in the STR:

1) When the app is initially launched. This one is fine. It could or could not have been a cached node, both cases were tried, but no errors at that time, proper createdCallback and onArgs sequencing.

2) When the user presses the Back button in the email app to cancel the first activity jump, the UI resets and creates a fresh `cards-setup-account-info`, but that one does not have the Back arrow and the message about needing to create an account to complete the activity. This is good and desirable, in case the user switches back to the email app outside this activity flow.

3) On the second activity jump, the app resets (clears existing cards) and this weird behavior happens. I confirmed that a new custom element instance is used, by setting a this.instanceId in a debug experiment, and since the createdCallback, where this.instanceId is set, is not called before the onArgs, this.instanceId is `undefined` in this case.

Unfortunately I am going to be AFK for the rest of the week, but wanted to write down what I found so far so I can remember it later.

Next steps would be to investigate more when createdCallback should be called. Initial unproven hypothesis is that perhaps if the email app is not in the foreground and not visible when the custom element is created, the createdCallback work is put on hold until the window becomes visible. For instance #1, it might work out since the whole email app is starting up and has time to reach visible state, where in #3, the email app is already launched and can create the element much more quickly. Could be something else though, just wild speculation at this point, and probably not correct, as I expect nodes created in a background tab to work normally.
Duplicate of this bug: 1181257
:wchen - do you have any insight? I am stumped. Is there a flow in the custom element code where a new CustomElement() will allow calling a prototype method on the custom element definition before the createdCallback method is called?

What I am seeing in the failure case:

    // CustomElement is the constructor returned
    // from document.registerElement
    var e = new CustomElement();

    // onArgs is a method that was defined on the
    // prototype passed to document.registerElement 
    e.onArgs({});

    // e is then added to the document

In the failure case, when email is already running, but may be in the background initially, the onArgs method is called before the createdCallback method on e's prototype is called. In the success cases, createdCallback is called before e.onArgs() is reached.

After the function that does this work, which includes finally adding the element to the DOM (but after onArgs is called) do I see the createdCallback called.
Flags: needinfo?(wchen)
No, that normally shouldn't happen. I've reproduced in a debugger and I can confirm that this is a gecko bug.
Flags: needinfo?(wchen)
Switching over to Core::DOM for tracking where the fix will occur.
Component: Gaia::E-Mail → DOM
Product: Firefox OS → Core
Summary: [Email]Device will jump to blank page. → [Custom Elements] [Email]Device will jump to blank page.
Version: unspecified → Trunk
Duplicate of this bug: 1182598
Duplicate of this bug: 1182882
[Blocking Requested - why for this release]:
blocking-b2g: spark? → 2.5?
[Traige] v2.5+ due to serious outage. We should be blocked by this issue.
blocking-b2g: 2.5? → 2.5+
Duplicate of this bug: 1193655
We got another report of a foxfooder running into this issue in IRC, so it would be good to get to the root cause of it, particularly since we are planning to add even more custom elements in email. :wchen do you have ideas on what might help move the bug forward, or workarounds we could do in email?
Flags: needinfo?(wchen)
William told me on IRC that he's working with smaug on an approach as it's not immediately clear how to fix this.
Duplicate of this bug: 1199016
Duplicate of this bug: 1198151
Whiteboard: [v3.0-nexus-5-l] → [v3.0-nexus-5-l],[2.5-aries-test-run-1],[2.5-aries-test-run-2]
Assignee: nobody → wchen
That spec has a concept of a base element queue on the custom elements processing stack that gets processed at the next microtask checkpoint. Normally only callbacks enqueued from the parser should end up on the base element queue and in our implementation it's checked using microtask level == 0. However, there is some message manager code where we are executing script at microtask level 0 and we don't have a microtask checkpoint because we don't consider that script to be part of a task. In this particular bug, that script is modifying the DOM and enqueuing callbacks that don't get processed at the expected time.

Upon investigation I decided it would be better to just remove the base element queue entirely and process the custom elements processing stack the same way as the non-parser case (at script blocker time). This is also more consistent with how we want custom elements to behave in the future with the callbacks beings more synchronous during parsing. The spec introduced the base element queue to provide a safe point to run script after parsing, but it looks like in Gecko there are already some script blockers in place to prevent script running at bad times.
Flags: needinfo?(wchen)
Attachment #8656216 - Flags: review?(bugs)
Comment on attachment 8656216 [details] [diff] [review]
Remove custom elements base element queue.

> nsDocument::EnqueueLifecycleCallback(nsIDocument::ElementCallbackType aType,
>                                      Element* aCustomElement,
>                                      LifecycleCallbackArgs* aArgs,
>                                      CustomElementDefinition* aDefinition)
> {
>   if (!mRegistry) {
>     // The element might not belong to a document that
>@@ -6017,20 +5994,17 @@ nsDocument::EnqueueLifecycleCallback(nsI
>   }
> 
>   if (!elementData->mElementIsBeingCreated) {
>     CustomElementData* lastData =
>       sProcessingStack->SafeLastElement(nullptr);
> 
>     // A new element queue needs to be pushed if the queue at the
>     // top of the stack is associated with another microtask level.
>-    // Don't push a queue for the level 0 microtask (base element queue)
>-    // because we don't want to process the queue until the
>-    // microtask checkpoint.
>-    bool shouldPushElementQueue = nsContentUtils::MicroTaskLevel() > 0 &&
>+    bool shouldPushElementQueue =
>       (!lastData || lastData->mAssociatedMicroTask <
>          static_cast<int32_t>(nsContentUtils::MicroTaskLevel()));
I don't understand this code. End of microtask is end of microtask.
MT level is > 1 when one does for example nested event dispatching 
(an event listener dispatches another event which then has some listener.)
But when mt level goes back to 1, that is not end of microtask, and nothing special should happen then.
Please explain


>-  MOZ_ASSERT(nsContentUtils::MicroTaskLevel() == 0);
>-  sProcessingBaseElementQueue = true;
So nothing sets sProcessingBaseElementQueue anymore to true, yet we have the variable defined. 
Please remove the variable.
Attachment #8656216 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #19)
> Comment on attachment 8656216 [details] [diff] [review]
> Remove custom elements base element queue.
> 
> I don't understand this code. End of microtask is end of microtask.
> MT level is > 1 when one does for example nested event dispatching 
> (an event listener dispatches another event which then has some listener.)
> But when mt level goes back to 1, that is not end of microtask, and nothing
> special should happen then.
> Please explain

The spec says:

- When transitioning from script to the user agent code, push a new element queue to the processing stack
- When transitioning back from the user agent code to script, pop the element queue from the processing stack and invoke callbacks in that queue.

The code is using microtask level to detect whether a transition from script to user agent code has occurred. Instead of pushing a new queue every time there is a transition, we only push a new queue when we need to enqueue a callback and when it detects the top-most queue is associated with a different level.

We use script runners to process the queue so that it appears it's processed "when transitioning back from user agent code to script".
See comment 20
Flags: needinfo?(bugs)
Comment on attachment 8656216 [details] [diff] [review]
Remove custom elements base element queue.

(Adding to my review queue so that I remember to process this faster.)
Flags: needinfo?(bugs)
Attachment #8656216 - Flags: review- → review?(bugs)
(In reply to William Chen [:wchen] from comment #20)
> The spec says:
> 
> - When transitioning from script to the user agent code, push a new element
> queue to the processing stack
> - When transitioning back from the user agent code to script, pop the
> element queue from the processing stack and invoke callbacks in that queue.
So this is like script runners, not microtasks.


> 
> The code is using microtask level to detect whether a transition from script
> to user agent code has occurred.
That is not what microtasks do. mt is for transition from ua code to script


reading the spec and code some more.
Comment on attachment 8656216 [details] [diff] [review]
Remove custom elements base element queue.

Ok, the setup looks more complicated than it should be, and I think we should be able to just use script runners everywhere and not rely on microtask levels, but this is smaller change for now.
Attachment #8656216 - Flags: review?(bugs) → review+
The spec was written with the intention that nested callbacks are processed before the remaining non-nested callbacks. For example, if innerHTML created two custom elements a1 and a2, and the created callback for a1 creates another custom element b1, the order in which the created callbacks should be called is a1 b1 a2. If we only used script runners, we would get a1 a2 b1.
setting innerHTML pushed scriptblocker on stack, so if we create any elements, those happen while we don't run script runners. When SetInnerHTML is executed, scriptblocker goes away, and scriptrunners run. The scriptrunner for lifecycle callbacks would take the currently queue and start processing it.
First it processes the one for a1, calls the relevant callback which creates b1. Now, when b1 is created either there isn't scriptblocker, in which case the callback is called immediately, or scriptblocker is first pushed to stack, b1 created, script blocker removed and the callback called. and b1 is now processed. Processing of the original queue continues and a2 is processed.
So, the order a1, b1, a2.
I see you mean, and yes that much simpler and I think it will work, but for now I'll land the patch in the bug because it's lower risk.
https://hg.mozilla.org/mozilla-central/rev/80f0d809263a
https://hg.mozilla.org/mozilla-central/rev/6126225bbb93
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
This bus has been verified as pass on lastest Flame v2.5 and Nexus5 v2.5.
STR:
Same STR as Comment 0.
Repro rate: 0/10
See video:"Flame_v2.5.3GP"

Actural result:
Device will enter to Email log in page again.

Flame v2.5(pass):
Build ID               20150920150205
Gaia Revision          e67d319d0854e32e23210784eb9c4e1b8a025adb
Gaia Date              2015-09-19 07:42:05
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/ccd6b5f5e544c1d708849144943a776941bd3794
Gecko Version          43.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150920.182952
Firmware Date          Sun Sep 20 18:30:04 EDT 2015
Firmware Version       v18D v4
Bootloader             L1TC000118D0

Nexus 5_v2.5(pass):
Build ID               20150920150205
Gaia Revision          e67d319d0854e32e23210784eb9c4e1b8a025adb
Gaia Date              2015-09-19 07:42:05
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/ccd6b5f5e544c1d708849144943a776941bd3794
Gecko Version          43.0a1
Device Name            hammerhead
Firmware(Release)      5.1
Firmware(Incremental)  eng.cltbld.20150920.183155
Firmware Date          Sun Sep 20 18:32:11 EDT 2015
Bootloader             HHZ12f
Status: RESOLVED → VERIFIED
QA Whiteboard: [MGSEI-Triage+]
Duplicate of this bug: 1178371
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.