Closed
Bug 1004266
Opened 9 years ago
Closed 9 years ago
[B2G][FTU] When signing into Gmail repeatedly tapping on the screen can force close the FTU
Categories
(Core :: IPC, defect)
Tracking
()
Tracking | Status | |
---|---|---|
b2g-v1.3 | --- | affected |
b2g-v1.3T | --- | fixed |
b2g-v1.4 | --- | unaffected |
b2g-v2.0 | --- | fixed |
People
(Reporter: jschmitt, Assigned: kats)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][sprd316039])
Crash Data
Attachments
(4 files)
86.57 KB,
text/plain
|
Details | |
1.05 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
907 bytes,
patch
|
Details | Diff | Splinter Review | |
6.84 KB,
image/png
|
Details |
Description: On the authentication page of Gmail tapping repeatedly can force close FTU and bring the user to the Homescreen. Repro Steps: 1) Update a Tarako to BuildID: 20140430014008 2) Reset phone to enable FTU 3) Proceed to the 'Select a Network' and connect to a Wi-Fi network 4) Proceed to 'Import Contacts' and select 'Gmail' 5) Input user name and password then select 'Sign in' 6) Once loaded, repeatedly tap on the screen around the words 'Have offline access' Actual: The FTU closes and user is taken to the Homescreen Expected: The FTU does not close 1.3t Environmental Variables: Device: Tarako 1.3t BuildID: 20140430014008 Gaia: f9b62bd1135f4edf8317fff1c2947b9d99438d79 Gecko: ba50f734b815 Version: 28.1 Firmware Version: sp6821a_gonk4.0_user.pac Notes: Repro frequency: 100% See attached: logcat
Reporter | ||
Comment 1•9 years ago
|
||
Tested against 1.3 Buri build which the bug DOES repro and I was able to get a crash log for Buri, I was unable to receive a crash log for Tarako. https://crash-stats.mozilla.com/report/index/5e97dabe-c1ea-478d-b021-005192140430 1.3 Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20140429024001 Gaia: caab7a78f0c04913f48a3051259663ee85625906 Gecko: f84a8ffbc552 Version: 28.0 Firmware Version: v1.2-device.cfg
status-b2g-v1.3:
--- → affected
Looks like an Dom/IPC crash. Can you check to see if you can repro in 2.0 on the flame or open c please?
Crash Signature: [@ mozilla::dom::TabChild::RecvRealTouchEvent(mozilla::WidgetTouchEvent const&, mozilla::layers::ScrollableLayerGuid const&) ]
Comment 3•9 years ago
|
||
Is there a crash report message showing up on tarako when this bug reproduces?
Comment 4•9 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #3) > Is there a crash report message showing up on tarako when this bug > reproduces? Taking ni for jschmitt, the crash report message is NOT showing up on Tarako, the FTE just force closes. The crash report msg can only be seen on Buri 1.3 when using the same STR.
Flags: needinfo?(jschmitt)
Reporter | ||
Comment 5•9 years ago
|
||
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #2) > Looks like an Dom/IPC crash. Can you check to see if you can repro in 2.0 > on the flame or open c please? I tested on 2.0 Open_C and the crash does not occur. When using the same repro steps the Open_C froze on the FTU 'Import Contacts' page and I had to restart the device, but no crash or force close of the FTU when following the same repro steps.
Comment 6•9 years ago
|
||
We met this crash after land bug 1014272.
blocking-b2g: --- → 1.3T?
Whiteboard: [b2g-crash] → [b2g-crash][sprd316039]
Comment 7•9 years ago
|
||
We met the same crash backtrace after land bug 1014272.
Updated•9 years ago
|
Component: Gaia::First Time Experience → IPC
Product: Firefox OS → Core
Comment 8•9 years ago
|
||
Blocking as we have STR, :smaug, fabrice recommend that you may be able to help with some investigation here as this is in IPC area, any thoughts here ? (crash logs in comment #1)
blocking-b2g: 1.3T? → 1.3T+
Flags: needinfo?(bugs)
Comment 9•9 years ago
|
||
Any chance for a better stack trace? https://crash-stats.mozilla.com/report/index/5e97dabe-c1ea-478d-b021-005192140430 is missing all the interesting bits. This looks like a null+offset crash, but hard to say what is actually null.
Flags: needinfo?(bugs)
Comment 10•9 years ago
|
||
Oh, there is some nested event loop handling. What is causing that. Some js spinning the loop? This might be a regression from bug 950300. That certainly doesn't handle nested event loop spinning. outerWindow for example can be null here http://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/ce9035ea1b83#l2.42 when returning to the outermost loop.
Comment 11•9 years ago
|
||
Hi Olli, wonder if you are the right person to take this bug? Thanks
Flags: needinfo?(bugs)
Comment 12•9 years ago
|
||
Olli's at least somewhat on PTO this week. kats and Vivien were also involved in bug 950300 which smaug thinks may have caused a regression here so flagging them for help.
Flags: needinfo?(bugs)
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(21)
Assignee | ||
Comment 13•9 years ago
|
||
I agree with Olli's analysis in comment 10. I'm just not sure what the right fix. It seems wrong for the JS to be spinning the event loop and I feel like we should prevent that from happening. If we can't prevent that then maybe we need some IPC changes to not deliver new events until the previous one is done being processed? I feel like this might be a problem that affects more than just RecvRealTouchEvent.
Flags: needinfo?(bugmail.mozilla)
We have to assume that firing any DOM event could spin a nested event loop, right? And that any other events in the queue will run when we do unless they're blocked explicitly.
Comment 15•9 years ago
|
||
Any JS can spin event loop currently. Just use main thread sync XHR.
Assignee | ||
Comment 16•9 years ago
|
||
Ok. Well it doesn't make sense to dispatch a touch event while the JS code is still in the middle of processing the previous touch event, right? And I don't think we should be processing touch events out-of-order with other types of messages. For example the APZ code assumes that touch events and UpdateFrame messages (i.e. TabChild::RecvUpdateFrame) are processed in the order that they are dispatched from the parent process. What are our options here?
Updated•9 years ago
|
Assignee: nobody → overholt
Assignee | ||
Comment 17•9 years ago
|
||
I can't test this easily since I don't have an active tarako tree. But based on the crash stack (a nsCOMPtr null deref) I'm pretty confident this is the problem.
Attachment #8431677 -
Flags: review?(fabrice)
Updated•9 years ago
|
Assignee: overholt → bugmail.mozilla
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Olli Pettay [:smaug] (vacation-ish May 26-30) from comment #15) > Any JS can spin event loop currently. Just use main thread sync XHR. bent and I discussed this a bit. It looks like the sync XHR code explicitly blocks events from going through at [1]. The modal dialogs (alert/confirm/prompt) do something similar except they only block animations [2] but as far as I can tell all events go to the modal window anyway so it's not a problem. The point is that I can't find any existing way that content JS will be delivered events in a re-entrant manner. I imagine that a bunch of web content would be busted if that happened. What makes the most sense to me is to install this event suppression in this case as well, and just discard events that are coming in while content is in the middle of handling something. In the case of touch events we would probably have to ensure event stream consistency (e.g. don't send touchmoves without a corresponding touchstart) but maybe nsEventStateManager does this already. We'd also have to make RecvRealTouchEvent itself re-entrancy aware, so that it doesn't make a mistake when pulling the prevent-default status from the "first" touchmove event. bent also suggested maybe all of this should live inside the event state manager. smaug, you know all this code best, any thoughts? [1] http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLHttpRequest.cpp?rev=57b0932e2f06#3004 [2] http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#8558
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(21) → needinfo?(bugs)
rebased to v1.3t I think? Kats, can you verify that I made the change to the right location please?
Flags: needinfo?(bugmail.mozilla)
Comment 20•9 years ago
|
||
This rebased patch looks correct to me, Naoki.
Flags: needinfo?(bugmail.mozilla)
Comment 21•9 years ago
|
||
sync XHR blocks event in presshell level. TabChild is way lower level thing, and event suppression shouldn't affect to it. Note, the outerWindow issue was just a guess. We need a real stack trace here.
Flags: needinfo?(bugs)
Comment 22•9 years ago
|
||
Or someone who can reproduce the issue needs to try the patch.
Assignee | ||
Comment 23•9 years ago
|
||
Yeah Naoki is trying it. I don't understand why event suppression wouldn't affect this - the TabChild dispatches the window via the widget, which should be going through the presshell lower down in the stack, no? So if the suppression happens in the presshell it should still have an effect there.
Comment 24•9 years ago
|
||
While suppression is on we still process more runnables in the event loop.
Right, as I understand it that's fine, kats just doesn't want the page to be getting more dom events... I think?
Comment 26•9 years ago
|
||
why does that matter here? If event loop spins, anything can happen (unless something is explicitly disabled).
I applied the patch and I tried the steps. Instead of crashing, I'm stuck at this screen. Hitting X won't cancel out. Since it's FTU you can't long tap on Home button to get out. You have to reboot, which does put you back to the FTU.
Assignee | ||
Comment 28•9 years ago
|
||
I discussed this a bit with smaug on IRC today. According to him there are already cases where web content needs to deal with re-entrancy with respect to events, and I think this is probably one of those cases as well. The patch I provided fixes a gecko bug, but the stuff naoki described in comment 27 might need to be fixed in the web content. I don't know exactly what the web content in this case is doing on touchmove events, but it's something that spins the event loop. Can we find out what DOM APIs it's invoking to make this happen? If it's something we expect to spin the event loop, then the rest of this bug needs to be fixed in the content (by handling re-entrancy). If it's some unexpected codepath that spins the event loop then there might be additional gecko bugs.
Assignee | ||
Comment 29•9 years ago
|
||
Andrew, do you know who can get a hold of the web content that is spinning the event loop on touch events? I think we need that to move ahead with this bug.
Flags: needinfo?(overholt)
Comment 30•9 years ago
|
||
The APIs that would spin the event loop include synchronous XHR, alert() and confirm() AFAIK.
Comment 31•9 years ago
|
||
Maybe Francisco, Fernando, Mike, or Gregor?
Flags: needinfo?(overholt)
Flags: needinfo?(mhenretty)
Flags: needinfo?(francisco)
Flags: needinfo?(fernando.campo)
Flags: needinfo?(anygregor)
Comment 32•9 years ago
|
||
I think Jose knows the importing parts.
Flags: needinfo?(anygregor) → needinfo?(jmcf)
Comment 33•9 years ago
|
||
showModalDialog() is the canonical spec'ed "spin-the-event-loop" case, but I don't know if b2g supports that. sync XHR should work too.
Comment 34•9 years ago
|
||
I didn't see any sync XHR, alert, confirm, or showModalDialog calls in the FTU or 'shared/pages/import/*'. Is there anything else we should look for? Since this import is opening an overlay to google's oauth dialog, my guess would be that it's google's content that's causing the nested spin.
Flags: needinfo?(mhenretty)
Assignee | ||
Comment 35•9 years ago
|
||
(In reply to Michael Henretty [:mhenretty] from comment #34) > I didn't see any sync XHR, alert, confirm, or showModalDialog calls in the > FTU or 'shared/pages/import/*'. Is there anything else we should look for? Are there any touch event listeners of any kind? > Since this import is opening an overlay to google's oauth dialog, my guess > would be that it's google's content that's causing the nested spin. That's quite likely. Is it possible to get a hold of the source to the google dialog?
Comment 36•9 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #35) > (In reply to Michael Henretty [:mhenretty] from comment #34) > > I didn't see any sync XHR, alert, confirm, or showModalDialog calls in the > > FTU or 'shared/pages/import/*'. Is there anything else we should look for? > > Are there any touch event listeners of any kind? I didn't see touchend, touchstart or touchmove in either FTU or the import page. We do use click though, but I'm not sure if this maps to a touch event in the platform or not. The way the whole flow works is, we register a click handler on the "Gmail" import button. In this handler we post a message to an iframe which contains our oauth logic (still in gaia). This iframe (which has no visible DOM), then calls window.open with the url https://accounts.google.com/o/oauth2/auth?... for oauth. Once the flow is done on google's side, our iframe postMessages back the FTU. > > Since this import is opening an overlay to google's oauth dialog, my guess > > would be that it's google's content that's causing the nested spin. > > That's quite likely. Is it possible to get a hold of the source to the > google dialog? Hrm, that's quite hard. I used FF desktop to try and view the source of one of an oauth flow [1], and the code is minified and obfuscated. I also tried re-constructing the url we use for the gmail import, but I can't figure our FxOS appID [2]. I think we will need someone who is familiar with this flow. Jose may know. 1). https://accounts.google.com/o/oauth2/auth?client_id=533832195920.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Fbobyouku.ap01.aws.af.cm%2Ftestyoutube.php&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fyoutube&response_type=code&access_type=offline 2.) https://accounts.google.com/o/oauth2/auth?client_id=123456&response_type=token&approval_prompt=force&redirect_uri=http://www.mozilla.com&scope=wl.basic
Comment 37•9 years ago
|
||
build/config/communications_services.json file contains the application id for Gmail.
Flags: needinfo?(jmcf)
Comment 38•9 years ago
|
||
This link takes us to the oauth page in question: https://accounts.google.com/o/oauth2/auth?client_id=664741361278.apps.googleusercontent.com&response_type=token&approval_prompt=force&redirect_uri=http://www.mozilla.com&scope=https://www.google.com/m8/feeds/
Comment 39•9 years ago
|
||
Hi, I think that Michael and Jose provided all the info, nothing else that I can add, sorry folks :(
Flags: needinfo?(francisco)
Comment 40•9 years ago
|
||
Do you have the info you need here, kats?
Flags: needinfo?(fernando.campo) → needinfo?(bugmail.mozilla)
Assignee | ||
Comment 41•9 years ago
|
||
If the page in comment 38 is the one that I was looking for, then something else must be going on, because that page doesn't have any touch listeners at all, as far as I can tell. Either the stack in the crash report is wrong or the touch listener is somewhere else, and I have no idea where it might be.
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 42•9 years ago
|
||
I'm going to do a local build of 1.3 on Hamachi to see if I can repro and debug this. If anybody has any other ideas please feel free to mention and/or try them.
Assignee | ||
Comment 43•9 years ago
|
||
I'm unable to reproduce on a local 1.3 build on Hamachi using the STR in comment 0. Also just to note the STR indicate that the crash happens on the screen *after* signing in, which is not the same page as the one in comment 38. Also, if I hit "Decline" on that permissions screen, I end up in a state like what Naoki described in comment 27, so I think that is most likely a separate bug in the FTU. I'm inclined to land the patch as-is to fix the crash and spin off another bug to deal with this "Stuck in the FTU" scenario (which may or may not need to be blocking, I don't know).
Assignee | ||
Comment 44•9 years ago
|
||
Comment on attachment 8431677 [details] [diff] [review] Blind patch Moving review since fabrice is away.
Attachment #8431677 -
Flags: review?(fabrice) → review?(21)
Comment 45•9 years ago
|
||
Comment on attachment 8431677 [details] [diff] [review] Blind patch No need for (), but rs+
Attachment #8431677 -
Flags: review?(21) → review+
Assignee | ||
Comment 46•9 years ago
|
||
Thanks. Removed () and landed: https://hg.mozilla.org/integration/b2g-inbound/rev/02dc36c183a1
Comment 47•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/02dc36c183a1
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Comment 48•9 years ago
|
||
somebody help to uplift it to 1.3t? Because we don' have the rights now.
Assignee | ||
Updated•9 years ago
|
status-b2g-v2.0:
--- → fixed
Comment 50•9 years ago
|
||
Unable to reproduce this issue on the latest 1.4 for Flame, Open C, and Buri. Device: Flame 1.4 BuildID: 20140618000203 Gaia: 3bdd037ec1a11abebe16a5d7f6ff0d863e80bc07 Gecko: 523491fa3339 Version: 30.0 (1.4) Firmware Version: v121-2 Device: Open_C 1.4 BuildID: 20140618000203 Gaia: 3bdd037ec1a11abebe16a5d7f6ff0d863e80bc07 Gecko: 523491fa3339 Version: 30.0 (1.4) Firmware Version: P821A10V1.0.0B06_LOG_DL Device: Buri 1.4 Build ID: 20140618000203 Gaia: 3bdd037ec1a11abebe16a5d7f6ff0d863e80bc07 Gecko: 523491fa3339 Version: 30.0 (1.4) Firmware Version: v1.2device.cfg
Comment 51•9 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17) > Created attachment 8431677 [details] [diff] [review] > Blind patch > > I can't test this easily since I don't have an active tarako tree. But based > on the crash stack (a nsCOMPtr null deref) I'm pretty confident this is the > problem. Hi, would you please uplift this to 1.3t?
Flags: needinfo?(bugmail.mozilla)
Comment 52•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/2d7a32e38d7d
Flags: needinfo?(bugmail.mozilla)
Comment 53•9 years ago
|
||
thx
Comment 54•9 years ago
|
||
This fix is not in the mozilla-b2g30_v1_4 branch. Bhavana says QA is testing to determine whether the crash is actually reproducible on 1.4.
status-b2g-v1.4:
--- → ?
Assignee | ||
Comment 55•9 years ago
|
||
Based on comment 49 and comment 50 that already happened.
Updated•9 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
You need to log in
before you can comment on or make changes to this bug.
Description
•