Closed Bug 959059 Opened 10 years ago Closed 10 years ago

[NFC] Shrinking UI is shrunk due to unchecked invalid sessionToken from Gecko.

Categories

(Firefox OS Graveyard :: NFC, defect)

x86
macOS
defect
Not set
normal

Tracking

(blocking-b2g:2.0+, b2g-v2.0 fixed)

VERIFIED FIXED
2.0 S1 (9may)
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- fixed

People

(Reporter: johnhu, Assigned: dgarnerlee)

References

Details

Attachments

(4 files, 8 obsolete files)

When we draw back two device while devices vibrating, the shrinking UI is shrinked forever. It may be an edge case but easy to reproduce.

STR:
1. find one android device and one firefox os device
2. open app with NFC supported or apply the WIP patch at bug 903253
3. open app to sharing context
4. touch two devices
5. draw back one device when devices vibrating.

symptom:
shrinking UI is shrinked forever

recovery step:
touch again and wait few seconds.
TechLost is being executed according to the debug logs, however, it appears the shrink-stop event may have been sent while it's still finishing the shrinking animation.

Are the custom shrink events queued?
Flags: needinfo?(gweng)
I can stop listen to the event while the animation is processing, if this is the only issue we have here.
Flags: needinfo?(gweng)
John reminded me we need a queue here, not suspend listener.
I would map start/end to +1/-1 in the queue, and to merge (sum) all numbers while I need to handle the next event. If the result is 0 (like [s,e,s,e]), do nothing; if it's +1/-1, do start/end.
Assignee: nobody → gweng
Since the newest master (which I branched from) won't work even in the normal case, I may postpone this patch until we can find a firm way to test the normal and this edge case.
OK, I've finished the patch and wait devices to test.
I've found the bug is *irrelevant* with queueing. The events from NFC manager seems sometime would be incorrect, especially when the user do the closing behavior twice (close-away, close-away), and leaving devices rapidly.

Here is the log which prints out the events received by ShrinkingUI's handleEvent:


# The first close-leave, which would tilt back normally:

Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event appcreated
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event appopen
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event appcreated
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event appopen
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event appcreated
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event check-p2p-registration-for-active-app
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event shrinking-start
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event shrinking-stop

The problem is the second 'shrinking-stop' fired very slow, about 3 secs after the start, even the device has leaved away.

# After the first close-away it would fire wrong events when I leave pairing device rapidly:

Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event check-p2p-registration-for-active-app
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event shrinking-stop
Content JS LOG at app://system.gaiamobile.org/js/shrinking_ui.js:88 in su_handleEvent: >>-- shrinking-event shrinking-start

You can see there are two 'stop' and one incorrect 'start' followed. So the shrinking UI correctly responds to these events, and finally it got tilted because the last 'start' remains.
Flags: needinfo?(dgarnerlee)
Summary:

1. The first close-away would fire events in the correct order, one start follows one stop.
2. The second close-away would fire events in the wrong order, and leave one start as the last. So ShrinkingUI tilt according to the last event correctly
3. Sometimes it would be broken even in the first close-away test. In such case the events are also fired incorrectly.
Thanks for the investigation. We'll also need to look at the hardware events bubbling up to ensure the sequence is working properly.

Maybe it's related to Bug 961667? Dimi, comments on it or a similar bug before I add it to dependencies?
Flags: needinfo?(dlee)
(In reply to Garner Lee from comment #8)
> Thanks for the investigation. We'll also need to look at the hardware events
> bubbling up to ensure the sequence is working properly.
> 
> Maybe it's related to Bug 961667? Dimi, comments on it or a similar bug
> before I add it to dependencies?

This is not the same bug as Bug 961677. The shrinking UI behavior in that case will be :
shirnking -> normal -> shrinking -> ...
Flags: needinfo?(dlee)
(In reply to Dimi Lee[:dimi][:dlee] from comment #9)
> (In reply to Garner Lee from comment #8)
> > Thanks for the investigation. We'll also need to look at the hardware events
> > bubbling up to ensure the sequence is working properly.
> > 
> > Maybe it's related to Bug 961667? Dimi, comments on it or a similar bug
> > before I add it to dependencies?
> 
> This is not the same bug as Bug 961677. The shrinking UI behavior in that
> case will be :
> shirnking -> normal -> shrinking -> ...

Incorrect bug number sorry, not the same as Bug 961667.
It's weird. Everytime it suddenly prints out something about allocating a gralloc or OpenGL information, the animation fails to complete draw animation regarding techLost (it stops part way, and stays on screen):

I/Gecko   (  178): -*- Nfc: Received message from NFC worker: {"type":"techLost","sessionId":43}
I/Gecko   (  178): -*- Nfc: 111 techLost
I/Gecko   (  178): -*- Nfc: 222
I/Gecko   (  178): -*- Nfc: 333
I/Gecko   (  178): -*- Nfc: 444
I/Gecko   (  178): Nfc Worker: Next parcel size unknown, going to sleep.
I/Gecko   (  178): -*- Nfc: 555
I/Gecko   (  178): -*- Nfc: Application ID : null is not a registered target for the event 2 notification
I/Gecko   (  178): -*- Nfc: 666
I/Gecko   (  178): -*- Nfc: 777
I/Gecko   (  178): -*- Nfc: 888
I/Gecko   (  178): -*- Nfc: 999
I/GeckoDump(  178): [DEBUG] SYSTEM NFC: Technology Lost: No Print Version
I/Gecko   (  944): Initializing context 0xb12e74c0 surface 0xb0668920 on display 0x1
I/Gecko   (  944): OpenGL version detected: 300
I/Gecko   (  944): OpenGL vendor: Qualcomm
I/Gecko   (  944): OpenGL renderer: Adreno (TM) 320
I/Gecko   (  944): [Child 944] WARNING: standard_derivatives marked as unsupported: file ../../../gecko/gfx/gl/GLContextFeatures.cpp, line 473
I/Gecko   (  944): [Child 944] WARNING: robustness marked as unsupported: file ../../../gecko/gfx/gl/GLContextFeatures.cpp, line 473
I/Gecko   (  944): SharedSurface_Gralloc::Create -------
I/Gecko   (  944): SharedSurface_Gralloc::Create: success -- surface 0xb12e7520, GraphicBuffer 0xb1bd3f00.

I'm in a full debug compile. Working cases never print that extra stuff, and the screen pops back out of shrink UI. What is it doing, API-wise underneath? Are the warning messages relevant?

It always vibrates due to the techLost message, so nfc_manager is definately receiving the techLost fine as the last hardware state message. So far, at head revision nfcd code, I have been getting a clean techDiscovered and techLost from NFCD (at least for the particular physical NFC motion I use to pair the phones).
Flags: needinfo?(dgarnerlee)
Hi, any extra investigation? This bug is still occurring (from the above message, the above opengl message when it fails in debug compile).

There's some talk about moving ShrinkUI to user application space as well: Bug 963531.
Blocks: 963531
Flags: needinfo?(gweng)
Don't know if the Gecko issue had been solved or not. If there is no new message, my Gaia work would not help because it response the events correctly as I commented at Comment 6. I may ask Ken later.
Flags: needinfo?(gweng)
(In reply to Garner Lee from comment #12)
> Hi, any extra investigation? This bug is still occurring (from the above
> message, the above opengl message when it fails in debug compile).
Garner, I am a little confused. Are you sure the problem describing in comment 6 is same as the one describing in comment 11? 

> 
> There's some talk about moving ShrinkUI to user application space as well:
> Bug 963531.
I wonder if it would happened in near future...:-(
Flags: needinfo?(dgarnerlee)
> (In reply to Garner Lee from comment #12)
> > Hi, any extra investigation? This bug is still occurring (from the above
> > message, the above opengl message when it fails in debug compile).
> Garner, I am a little confused. Are you sure the problem describing in
> comment 6 is same as the one describing in comment 11? 

I can re-check. What needs to happen is a proper queue from NFCD to send up (If tech discovered followed by a lost isn't already happening), though a re-initializing opengl library/printing library details isn't quite one would be expecting to happen when merely transforming a single screen bitmap either.

Regardless, it would seem a app/UI might crash (memory pressure, bug, user closing the app, etc), relaunch, and receive a techlost right after, so maybe ShrinkUI still has to consider handling that edge case (drop messages, etc.).

> 
> > 
> > There's some talk about moving ShrinkUI to user application space as well:
> > Bug 963531.
> I wonder if it would happened in near future...:-(
Assignee: gweng → dgarnerlee
Flags: needinfo?(dgarnerlee)
Notes: Lots of race conditions.
1) Shrink UI has some kind of race condition (change the transition effects down to 0.001s to mostly miss the animation timing issues).
2) Nfc.js has some kind of race condition as well: CheckP2PRegistration has a different messaging path, and therefore timing, than techLost (tech-lost can erroneously come before ui start of the last nfc session).

Fix #2, I have some test code workarounds to drop out of order start/stops. Then #1 becomes the reason the UI gets stuck (animations and callbacks for transitions that are still in progress).

My suggestion is to add some unit tests (if not already) for shrinking_ui.js to verify/catch and fix the condition isolated from NFC Manager's incoming events, while Nfc.js is is modified to return more information to NFC Manager to handle stray event timing, and drop the stray start message.

The display still flickers occasionally, but I'm not in a debug compile to figure out what OpenGL is doing at that moment.
Flags: needinfo?(gweng)
blocking-b2g: --- → 2.0?
Target Milestone: --- → 1.4 S6 (25apr)
Forward physical error condition of NFC Tag/P2P being missing to onerror callback.
blocking-b2g: 2.0? → 2.0+
I think this is a Gecko bug, we should land this first.
Gaia problems(timing issue) sounds like a different problem, we should file another bug for it.
Also cancelling ni? for Greg for we should focus on the Gecko part first in this bug.
Blocks: b2g-nfc
No longer blocks: NFC-Gaia
Flags: needinfo?(gweng)
Depends on: 998023
Attachment #8407214 - Flags: review?(allstars.chh)
Attachment #8407645 - Flags: review?(alive)
Updated gaia patch on Github for P2P CheckP2PRegistration error and success cases.
Try-ing Gecko patch:
https://tbpl.mozilla.org/?tree=Try&rev=ebc13f9daf3e
Could you explain what's the problem here?
And You file Bug 998111 with almost the same title.
could you update the bug title in either one of them?
Also by checking Bug 998111 it looks to me it's the correct fix for this(video shown in Comment 0).
Comment on attachment 8407645 [details] [review]
Cancel P2P UI on error. Gaia side patch. r=alive

See comments
Attachment #8407645 - Flags: review?(alive)
(In reply to Alive Kuo [:alive][NEEDINFO!][God bless Taiwan.] from comment #25)
> Comment on attachment 8407645 [details] [review]
> Cancel P2P UI on error. Gaia side patch.
> 
> See comments

I noticed there's unit test error on Travics.
(In reply to Yoshi Huang[:allstars.chh] from comment #24)
> Could you explain what's the problem here?
> And You file Bug 998111 with almost the same title.
> could you update the bug title in either one of them?
> Also by checking Bug 998111 it looks to me it's the correct fix for
> this(video shown in Comment 0).

Yes, comment 16 indicates there's 2. We can swap the attachments.

The ShrinkUI could be in a bad state due to the fact a techLost can happen (which invalidates sessionToken), with a "CheckP2PRegistration" DOMRequest coming back along a different message path, but not checking if the sessionToken is still valid. Forwarding the error allows NFC Manager to avoid sending an erroneous success message to the ShrinkUI (or some other future application P2P UI), which can't work anyway, since internally, the sessionToken is required to use the DOM.

From Sid's patch, it fixes the state tracking in ShrinkUI, with a race condition for the state tracking the animation timers.

Both are needed to fix the 2 known conditions (and works).
The gecko fix from Garner, tries to fix a separate edge case. Even after Bug#998111 fix, there is a case that can happen in the following sequence (theoretically, though I am not able to produce this by testing).
1) Lets say, In Nfc.js, checkP2PRegistration(...) fires a success to NfcContentHelper (at the time when tech-lost is received from nfcd)
2) In nfc_manager.js, upon receiving success, it sends a 'shrinking-start' event to shrinking_ui.js
 
 At STEP#1, there could be a possibility of 'tech-lost' system message reaching 'nfc_manager' first before, 'success' is fired on dom-request for checkP2PRegistration(..)
 What this means is that, on tech-lost, 'shrinking-stop' event is sent. And after receiving 'success' from 'checkP2PRegistration(..)' dom call,  'shrinking-start' will be sent resulting again in a 'tilted UI'.
Infact, we should consider this usecase for all integration points between Nfc.js & NfcContentHelper.js (in a separate bug). Whenever Nfc.js is responding (responses) to Nfc messages from Content process,
it is better to check if the session is already lost. If so, flip the status flag to 'Error' so that NfcContentHelper will fire 'onerror' in such scenarios.
After more discussion with Sid, since Bug 998111 is already r+'ed (and ready to go), we'll just edit the respective bug titles to match what is happening in each edge case (with similar results).

I've re-pushed the test cases with the suggested changes applied globally, and fixed nfc_manager.js (this --> self) per Alive's comments.
Summary: [NFC] Shrinking UI is always shrinked even if device is not touched together → [NFC] Shrinking UI is shrunk due to unchecked invalid sessionToken from Gecko.
Blocks: 961687
Comment on attachment 8407214 [details] [diff] [review]
CheckP2PRegistration Gecko-side Patch

Review of attachment 8407214 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the explanation, Garner and Sidd.

Cancelling r? for I think the wrapper (exposeProp) is not neccesary to me.

::: dom/system/gonk/Nfc.js
@@ +322,5 @@
> +        // Remember the current AppId if registered.
> +        this.currentPeerAppId = (isRegistered) ? msg.json.appId : null;
> +        status = (isRegistered) ? NFC.GECKO_NFC_ERROR_SUCCESS :
> +                                  NFC.GECKO_NFC_ERROR_GENERIC_FAILURE;
> +      }

Adding comments would help.

I think in this case, it's techLost comes before checkP2PRegistration is called, right ?

Seems to me change the meaning of isRegistered is enough.

let isValidAndRegistered = !!this.nfc.sessionTokenMap[this.nfc._currentSessionId] &&
this.isRegisteredP2PTarget(msg.json.appId, NFC.NFC_PEER_EVENT_READY);
...

You could make the naming shorter if you want.

::: dom/system/gonk/NfcContentHelper.js
@@ +491,1 @@
>  };

If there is only one boolean will be returned in the DOMRequest,

We can take advantage of the DOMRequest.result.

So you don't have to do the wrapper (exposedProps),

you could call 
this.fireRequestSuccess(requestId, result.status == NFC.GECKO_NFC_ERROR_SUCCESS);

In that case, on Gaia side
request.onsuccess() will always be called, and Gaia side could read the request.result (which is a boolean) to know the result of checkP2PRegistraion.
Attachment #8407214 - Flags: review?(allstars.chh)
I suppose DOMRequest will eventually migrate to the more standard Promises, and require a partial re-write either way. I'll look into updating the code on both ends to avoid specifying exposed props for existing code.
Gecko side: Now using onsuccess for all status messages (in this case, boolean true/false). I believe the nfc status codes will get more detailed in the future. We'll need to address that in the future when the error constants show up on both gaia and gecko ends.

Concerning the comment on how to get to the error path, on the mako (nexus 4) devices, you're doing well if you can get a pair of discovered and losts within a span of 1 second. NfcManager.checkP2PRegistration will eventually hit "false".
Attachment #8407214 - Attachment is obsolete: true
Attachment #8412959 - Flags: review?(allstars.chh)
Attachment #8412959 - Attachment description: 0001-Bug-959059-NFC-CheckP2PRegistration-should-return-er.patch → CheckP2PRegistration Gecko-side Patch
Attachment #8412959 - Attachment description: CheckP2PRegistration Gecko-side Patch → (v2) CheckP2PRegistration Gecko-side Patch
Comment on attachment 8412959 [details] [diff] [review]
(v2) CheckP2PRegistration Gecko-side Patch

Review of attachment 8412959 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/system/gonk/Nfc.js
@@ +310,5 @@
>                (event === (NFC.NFC_PEER_EVENT_READY | NFC.NFC_PEER_EVENT_LOST)));
>      },
>  
> +    checkP2PRegistration: function checkP2PRegistration(msg) {
> +      // Check if the session and application id yeilds a valid registered

yeilds -> yield  ?
Attachment #8412959 - Flags: review?(allstars.chh) → review+
Comment on attachment 8412959 [details] [diff] [review]
(v2) CheckP2PRegistration Gecko-side Patch

Review of attachment 8412959 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry I forgot there should be a marionette-webapi test for this.
Cancelling r+.
Attachment #8412959 - Flags: review+
(In reply to Yoshi Huang[:allstars.chh] from comment #35)
> Comment on attachment 8412959 [details] [diff] [review]
> (v2) CheckP2PRegistration Gecko-side Patch
> 
> Review of attachment 8412959 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Sorry I forgot there should be a marionette-webapi test for this.
> Cancelling r+.

I believe there's another marionette test bug dependency opened. Emulator support is needed, since mozNfc doesn't exist otherwise under automated regression testing: Bug 998023.
Depends on: 999569
For my reference, this one has the beginnings of the marionette checkP2PRegistration test case:
https://bug987596.bugzilla.mozilla.org/attachment.cgi?id=8413635
(In reply to Garner Lee from comment #37)
> I believe there's another marionette test bug dependency opened. Emulator
> support is needed, since mozNfc doesn't exist otherwise under automated
> regression testing: Bug 998023.

You change the result value of the DOMRequest for checkP2pRegistration,
you need to write the test case in this bug.

You can test the timing problem in Bug 998023.
Target Milestone: 1.4 S6 (25apr) → 2.0 S1 (9may)
This bug blocks 2.0.
Blocks: 948721
Blocks: 998023
No longer depends on: 998023
Attached patch (v1) Patch - Marionette TestCase (obsolete) — Splinter Review
Attachment #8414708 - Flags: review?(allstars.chh)
Attachment #8414708 - Attachment description: Bug 959059 - Marionette TestCase → (v1) Patch - Marionette TestCase
Attached patch (v1) Patch - Marionette TestCase (obsolete) — Splinter Review
Hi Yoshi, Do you think we could use this test case for the fix being provided ?
Attachment #8414708 - Attachment is obsolete: true
Attachment #8414708 - Flags: review?(allstars.chh)
Attachment #8414711 - Flags: review?(allstars.chh)
Re-reviewing just in case. Add r=allstars.chh to patch. Marionette test case is a separate test patch, since Sid finished while my emulator build env was still building.
Attachment #8414799 - Flags: review?(allstars.chh)
Comment on attachment 8407645 [details] [review]
Cancel P2P UI on error. Gaia side patch. r=alive

Hi, thanks for the previous review. It has been rebased, and changes have been applied (this.sinon, remove restore(), etc), and now matches another result type change in gecko.
Attachment #8407645 - Flags: review?(alive)
Attachment #8412959 - Attachment is obsolete: true
Took care few comments from Garner
Attachment #8414711 - Attachment is obsolete: true
Attachment #8414711 - Flags: review?(allstars.chh)
Attachment #8414826 - Flags: review?(allstars.chh)
Comment on attachment 8414799 [details] [diff] [review]
(v2.1) CheckP2PRegistration Gecko-side Patch

Review of attachment 8414799 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/system/gonk/Nfc.js
@@ +307,5 @@
>                (event === (NFC.NFC_PEER_EVENT_READY | NFC.NFC_PEER_EVENT_LOST)));
>      },
>  
> +    checkP2PRegistration: function checkP2PRegistration(msg) {
> +      // Check if the session and application id yeilds a valid registered

Did you fix the typo 'yeilds' ?
Attachment #8414799 - Flags: review?(allstars.chh) → review+
Comment on attachment 8414826 [details] [diff] [review]
(v1.a) Patch - Marionette TestCase

Review of attachment 8414826 [details] [diff] [review]:
-----------------------------------------------------------------

Please update the subject of your patch,
you are not testing onpeerready.
Attachment #8414826 - Flags: review?(allstars.chh) → review+
Attachment #8407645 - Flags: review?(alive) → review+
Attachment #8414826 - Attachment is obsolete: true
Fixed the subject
Attachment #8415398 - Attachment is obsolete: true
Attachment #8415399 - Attachment is obsolete: true
r=allstars.chh, and fixed comment.
Attachment #8414799 - Attachment is obsolete: true
Comment on attachment 8407645 [details] [review]
Cancel P2P UI on error. Gaia side patch. r=alive

r=alive. Previous green: https://travis-ci.org/mozilla-b2g/gaia/builds/24040163
Attachment #8415403 - Attachment description: (v2.2) CheckP2PRegistration Gecko-side Patch → (v2.2) CheckP2PRegistration Gecko-side Patch r=allstars.chh
Attachment #8407645 - Attachment description: Cancel P2P UI on error. Gaia side patch. → Cancel P2P UI on error. Gaia side patch. r=alive
Keywords: checkin-needed
The Gaia patch has conflicts. Please rebase.
Keywords: checkin-needed
Gaia tests now rebased.
Keywords: checkin-needed
Blocks: 996515
Verified on
Gaia      347d0517f0a77122c876d5f62c0942006a7a0bfe
Gecko     https://hg.mozilla.org/mozilla-central/rev/8be0e21fd300
BuildID   20140507160203
Version   32.0a1
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.