<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 9

•

11 years ago

Any chance for a better stack trace? https://crash-stats.mozilla.com/report/index/5e97dabe-c1ea-478d-b021-005192140430 is missing all the interesting bits. This looks like a null+offset crash, but hard to say what is actually null.

Flags: needinfo?(bugs)

Joe Cheng [:jcheng] (please needinfo)

Comment 10

•

11 years ago

Oh, there is some nested event loop handling. What is causing that. Some js spinning the loop? This might be a regression from bug 950300. That certainly doesn't handle nested event loop spinning. outerWindow for example can be null here http://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/ce9035ea1b83#l2.42 when returning to the outermost loop.

Comment 11

•

11 years ago

Hi Olli, wonder if you are the right person to take this bug? Thanks

Flags: needinfo?(bugs)

Comment 12

•

11 years ago

Olli's at least somewhat on PTO this week. kats and Vivien were also involved in bug 950300 which smaug thinks may have caused a regression here so flagging them for help.

Flags: needinfo?(bugs)

Flags: needinfo?(bugmail.mozilla)

Flags: needinfo?(21)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 13

•

11 years ago

I agree with Olli's analysis in comment 10. I'm just not sure what the right fix. It seems wrong for the JS to be spinning the event loop and I feel like we should prevent that from happening. If we can't prevent that then maybe we need some IPC changes to not deliver new events until the previous one is done being processed? I feel like this might be a problem that affects more than just RecvRealTouchEvent.

Flags: needinfo?(bugmail.mozilla)

Comment 14

•

11 years ago

We have to assume that firing any DOM event could spin a nested event loop, right? And that any other events in the queue will run when we do unless they're blocked explicitly.

Comment 15

•

11 years ago

Any JS can spin event loop currently. Just use main thread sync XHR.

Assignee

Comment 16

•

11 years ago

Ok. Well it doesn't make sense to dispatch a touch event while the JS code is still in the middle of processing the previous touch event, right? And I don't think we should be processing touch events out-of-order with other types of messages. For example the APZ code assumes that touch events and UpdateFrame messages (i.e. TabChild::RecvUpdateFrame) are processed in the order that they are dispatched from the parent process. What are our options here?

Doug Turner (:dougt)

Updated

•

11 years ago

Assignee: nobody → overholt

Assignee

Comment 17

•

11 years ago

Attached patch Blind patch — Details — Splinter Review

I can't test this easily since I don't have an active tarako tree. But based on the crash stack (a nsCOMPtr null deref) I'm pretty confident this is the problem.

Attachment #8431677 - Flags: review?(fabrice)

Updated

•

11 years ago

Assignee: overholt → bugmail.mozilla

Assignee

Comment 18

•

11 years ago

(In reply to Olli Pettay [:smaug] (vacation-ish May 26-30) from comment #15) > Any JS can spin event loop currently. Just use main thread sync XHR. bent and I discussed this a bit. It looks like the sync XHR code explicitly blocks events from going through at [1]. The modal dialogs (alert/confirm/prompt) do something similar except they only block animations [2] but as far as I can tell all events go to the modal window anyway so it's not a problem. The point is that I can't find any existing way that content JS will be delivered events in a re-entrant manner. I imagine that a bunch of web content would be busted if that happened. What makes the most sense to me is to install this event suppression in this case as well, and just discard events that are coming in while content is in the middle of handling something. In the case of touch events we would probably have to ensure event stream consistency (e.g. don't send touchmoves without a corresponding touchstart) but maybe nsEventStateManager does this already. We'd also have to make RecvRealTouchEvent itself re-entrancy aware, so that it doesn't make a mistake when pulling the prevent-default status from the "first" touchmove event. bent also suggested maybe all of this should live inside the event state manager. smaug, you know all this code best, any thoughts? [1] http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsXMLHttpRequest.cpp?rev=57b0932e2f06#3004 [2] http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#8558

Naoki Hirata :nhirata (please use needinfo instead of cc)

Assignee

Updated

•

11 years ago

Flags: needinfo?(21) → needinfo?(bugs)

Comment 19

•

11 years ago

Attached patch bug1004266.patch — Details — Splinter Review

rebased to v1.3t I think? Kats, can you verify that I made the change to the right location please?

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

11 years ago

Flags: needinfo?(bugmail.mozilla)

Comment 20

•

11 years ago

This rebased patch looks correct to me, Naoki.

Flags: needinfo?(bugmail.mozilla)

Comment 21

•

11 years ago

sync XHR blocks event in presshell level. TabChild is way lower level thing, and event suppression shouldn't affect to it. Note, the outerWindow issue was just a guess. We need a real stack trace here.

Flags: needinfo?(bugs)

Comment 22

•

11 years ago

Or someone who can reproduce the issue needs to try the patch.

Assignee

Comment 23

•

11 years ago

Yeah Naoki is trying it. I don't understand why event suppression wouldn't affect this - the TabChild dispatches the window via the widget, which should be going through the presshell lower down in the stack, no? So if the suppression happens in the presshell it should still have an effect there.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 24

•

11 years ago

While suppression is on we still process more runnables in the event loop.

Comment 25

•

11 years ago

Right, as I understand it that's fine, kats just doesn't want the page to be getting more dom events... I think?

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 26

•

11 years ago

why does that matter here? If event loop spins, anything can happen (unless something is explicitly disabled).

Comment 27

•

11 years ago

Attached image 2014-05-30-17-30-29.png — Details

I applied the patch and I tried the steps. Instead of crashing, I'm stuck at this screen. Hitting X won't cancel out. Since it's FTU you can't long tap on Home button to get out. You have to reboot, which does put you back to the FTU.

Assignee

Comment 28

•

11 years ago

I discussed this a bit with smaug on IRC today. According to him there are already cases where web content needs to deal with re-entrancy with respect to events, and I think this is probably one of those cases as well. The patch I provided fixes a gecko bug, but the stuff naoki described in comment 27 might need to be fixed in the web content. I don't know exactly what the web content in this case is doing on touchmove events, but it's something that spins the event loop. Can we find out what DOM APIs it's invoking to make this happen? If it's something we expect to spin the event loop, then the rest of this bug needs to be fixed in the content (by handling re-entrancy). If it's some unexpected codepath that spins the event loop then there might be additional gecko bugs.

Assignee

Comment 29

•

11 years ago

Andrew, do you know who can get a hold of the web content that is spinning the event loop on touch events? I think we need that to move ahead with this bug.

Flags: needinfo?(overholt)

(no longer active)

Comment 30

•

11 years ago

The APIs that would spin the event loop include synchronous XHR, alert() and confirm() AFAIK.

Comment 31

•

11 years ago

Maybe Francisco, Fernando, Mike, or Gregor?

Flags: needinfo?(overholt)

Flags: needinfo?(mhenretty)

Flags: needinfo?(francisco)

Flags: needinfo?(fernando.campo)

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 32

•

11 years ago

I think Jose knows the importing parts.

Flags: needinfo?(anygregor) → needinfo?(jmcf)

Michael Henretty [:mikehenrty][:mhenretty]

Comment 33

•

11 years ago

showModalDialog() is the canonical spec'ed "spin-the-event-loop" case, but I don't know if b2g supports that. sync XHR should work too.

Comment 34

•

11 years ago

I didn't see any sync XHR, alert, confirm, or showModalDialog calls in the FTU or 'shared/pages/import/*'. Is there anything else we should look for? Since this import is opening an overlay to google's oauth dialog, my guess would be that it's google's content that's causing the nested spin.

Flags: needinfo?(mhenretty)

Michael Henretty [:mikehenrty][:mhenretty]

Assignee

Comment 35

•

11 years ago

(In reply to Michael Henretty [:mhenretty] from comment #34) > I didn't see any sync XHR, alert, confirm, or showModalDialog calls in the > FTU or 'shared/pages/import/*'. Is there anything else we should look for? Are there any touch event listeners of any kind? > Since this import is opening an overlay to google's oauth dialog, my guess > would be that it's google's content that's causing the nested spin. That's quite likely. Is it possible to get a hold of the source to the google dialog?

Comment 36

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #35) > (In reply to Michael Henretty [:mhenretty] from comment #34) > > I didn't see any sync XHR, alert, confirm, or showModalDialog calls in the > > FTU or 'shared/pages/import/*'. Is there anything else we should look for? > > Are there any touch event listeners of any kind? I didn't see touchend, touchstart or touchmove in either FTU or the import page. We do use click though, but I'm not sure if this maps to a touch event in the platform or not. The way the whole flow works is, we register a click handler on the "Gmail" import button. In this handler we post a message to an iframe which contains our oauth logic (still in gaia). This iframe (which has no visible DOM), then calls window.open with the url https://accounts.google.com/o/oauth2/auth?... for oauth. Once the flow is done on google's side, our iframe postMessages back the FTU. > > Since this import is opening an overlay to google's oauth dialog, my guess > > would be that it's google's content that's causing the nested spin. > > That's quite likely. Is it possible to get a hold of the source to the > google dialog? Hrm, that's quite hard. I used FF desktop to try and view the source of one of an oauth flow [1], and the code is minified and obfuscated. I also tried re-constructing the url we use for the gmail import, but I can't figure our FxOS appID [2]. I think we will need someone who is familiar with this flow. Jose may know. 1). https://accounts.google.com/o/oauth2/auth?client_id=533832195920.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Fbobyouku.ap01.aws.af.cm%2Ftestyoutube.php&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fyoutube&response_type=code&access_type=offline 2.) https://accounts.google.com/o/oauth2/auth?client_id=123456&response_type=token&approval_prompt=force&redirect_uri=http://www.mozilla.com&scope=wl.basic

Jose Manuel Cantera

Comment 37

•

11 years ago

build/config/communications_services.json file contains the application id for Gmail.

Flags: needinfo?(jmcf)

Michael Henretty [:mikehenrty][:mhenretty]

Comment 38

•

11 years ago

This link takes us to the oauth page in question: https://accounts.google.com/o/oauth2/auth?client_id=664741361278.apps.googleusercontent.com&response_type=token&approval_prompt=force&redirect_uri=http://www.mozilla.com&scope=https://www.google.com/m8/feeds/

Francisco Jordano [:arcturus] [:francisco]

Comment 39

•

11 years ago

Hi, I think that Michael and Jose provided all the info, nothing else that I can add, sorry folks :(

Flags: needinfo?(francisco)

Comment 40

•

11 years ago

Do you have the info you need here, kats?

Flags: needinfo?(fernando.campo) → needinfo?(bugmail.mozilla)

Assignee

Comment 41

•

11 years ago

If the page in comment 38 is the one that I was looking for, then something else must be going on, because that page doesn't have any touch listeners at all, as far as I can tell. Either the stack in the crash report is wrong or the touch listener is somewhere else, and I have no idea where it might be.

Flags: needinfo?(bugmail.mozilla)

Assignee

Comment 42

•

11 years ago

I'm going to do a local build of 1.3 on Hamachi to see if I can repro and debug this. If anybody has any other ideas please feel free to mention and/or try them.

Assignee

Comment 43

•

11 years ago

I'm unable to reproduce on a local 1.3 build on Hamachi using the STR in comment 0. Also just to note the STR indicate that the crash happens on the screen *after* signing in, which is not the same page as the one in comment 38. Also, if I hit "Decline" on that permissions screen, I end up in a state like what Naoki described in comment 27, so I think that is most likely a separate bug in the FTU. I'm inclined to land the patch as-is to fix the crash and spin off another bug to deal with this "Stuck in the FTU" scenario (which may or may not need to be blocking, I don't know).

Assignee

Comment 44

•

11 years ago

Comment on attachment 8431677 [details] [diff] [review] Blind patch Moving review since fabrice is away.

Attachment #8431677 - Flags: review?(fabrice) → review?(21)

Comment 45

•

11 years ago

Comment on attachment 8431677 [details] [diff] [review] Blind patch No need for (), but rs+

Attachment #8431677 - Flags: review?(21) → review+

Ryan VanderMeulen [:RyanVM]

Assignee

Comment 46

•

11 years ago

Thanks. Removed () and landed: https://hg.mozilla.org/integration/b2g-inbound/rev/02dc36c183a1

Updated

•

11 years ago

Depends on: 1019957

Ryan VanderMeulen [:RyanVM]

Comment 47

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/02dc36c183a1

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

ying.xu

Comment 48

•

11 years ago

somebody help to uplift it to 1.3t? Because we don' have the rights now.

Jayme Mercado [:JMercado] (Inactive 3/4/2016)

Assignee

Updated

•

11 years ago

status-b2g-v2.0: --- → fixed

bhavana bajaj [:bajaj]

Comment 49

•

11 years ago

Adding qawanted to see if this is reproducible on 1.4.

Keywords: qawanted

Comment 50

•

11 years ago

Unable to reproduce this issue on the latest 1.4 for Flame, Open C, and Buri. Device: Flame 1.4 BuildID: 20140618000203 Gaia: 3bdd037ec1a11abebe16a5d7f6ff0d863e80bc07 Gecko: 523491fa3339 Version: 30.0 (1.4) Firmware Version: v121-2 Device: Open_C 1.4 BuildID: 20140618000203 Gaia: 3bdd037ec1a11abebe16a5d7f6ff0d863e80bc07 Gecko: 523491fa3339 Version: 30.0 (1.4) Firmware Version: P821A10V1.0.0B06_LOG_DL Device: Buri 1.4 Build ID: 20140618000203 Gaia: 3bdd037ec1a11abebe16a5d7f6ff0d863e80bc07 Gecko: 523491fa3339 Version: 30.0 (1.4) Firmware Version: v1.2device.cfg

QA Whiteboard: [QAnalyst-Triage?]

Flags: needinfo?(jmitchell)

Keywords: qawanted

ying.xu

Comment 51

•

11 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17) > Created attachment 8431677 [details] [diff] [review] > Blind patch > > I can't test this easily since I don't have an active tarako tree. But based > on the crash stack (a nsCOMPtr null deref) I'm pretty confident this is the > problem. Hi, would you please uplift this to 1.3t?

Flags: needinfo?(bugmail.mozilla)

Gregor Wagner [:gwagner]

Comment 52

•

11 years ago

https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/2d7a32e38d7d

status-b2g-v1.3T: affected → fixed

Flags: needinfo?(bugmail.mozilla)

ying.xu

Comment 53

•

11 years ago

thx

Chris Peterson [:cpeterson]

Comment 54

•

11 years ago

This fix is not in the mozilla-b2g30_v1_4 branch. Bhavana says QA is testing to determine whether the crash is actually reproducible on 1.4.

status-b2g-v1.4: --- → ?