System messages are lost after nullify the handler via naviator.mozSetMessageHandler(type, null)

NEW
Unassigned

Status

Firefox OS
General
5 years ago
5 years ago

People

(Reporter: salva, Unassigned)

Tracking

unspecified
ARM
Gonk (Firefox OS)

Firefox Tracking Flags

(blocking-b2g:-)

Details

After disabling the system message handler by using:

navigator.mozSetMessageHandler(aType, null)

and going background, if a system message is received, window_manager opens the declared handler registered in Manifest but no message is being delivered.

I'll attach a patch to test this behavior and provide a detailed STR.
Please find the test application `Lost messages` here:
https://github.com/lodr/gaia/tree/lost-messages

Apply on Gaia Master.

Lost message application register an alarm handler in handler.html but allow you to install alarms by including handler.html inside index.html through an iframe.

The application have two buttons:
 - Install an alarm: sets an alarm to be triggered in 40s
 - Disconnect the handler: make navigator.mozSetMessageHandler(aType, null) to unset the handler.

Setting / receiving an alarm (foreground):
0- Do `make reset-gaia` to install the new application
1- Open `Lost messages` application
2- Click on `Install an alarm`
3- Wait for 40s without going background nor allowing the screen get off
4- Check the iframe to confirm the message has been attended


Setting / receiving an alarm (closed application):
1- Open `Lost messages` application
2- Click on `Install an alarm`
3- Close the application
4- Wait for 40s
5- Check the card view to confirm the handler was launched

During start-up, there should be pending messages.

Setting / receiving an alarm (background):
1- Open `Lost messages` application
2- Click on `Install an alarm`
3- Press 'home` to send the application to background
4- Wait for 40s
5- Check the card view to confirm the handler has replaced the application

Behavior here is undefined upon a race condition between the handler inside the iframe in the application and the window manager.

1- If window manager wins, no message is delivered to the application, and you see pending messages during start-up and a handled alarm.
2- If handler wins, the alarm is attended in the application, then window manager changes the iframe location to the handler so you won't see any attended alarm nor pending messages.

This is weird but expected.
Now, the unexpected STR:
1- Open `Lost messages` application
2- Press `Disconnect the handler`
3- Press `Install an alarm`
4- Press home to send the application to background
5- Wait for 40s
6- Check the card view to see how the handler has replaced the application

EXPECTED:
As the message handler has been open to deliver a message a no other handler is registered, pending messages should be true and it should be attended.

ACTUAL:
Pending messages is false so any message is attended. The handler.html was open to do nothing.
(In reply to Salvador de la Puente González [:salva] from comment #0)
> After disabling the system message handler by using:
> 
> navigator.mozSetMessageHandler(aType, null)
> 
> and going background, if a system message is received, window_manager opens
> the declared handler registered in Manifest but no message is being
> delivered.

This behaviour seems expected to me. If you nullify the message handler, it means you're hoping to un-register the handler for a given type. Under this circumstance, the handler shouldn't run in any way and no message should be handled. Also, mozHasPendingMessage(...) must return false if no handler is registered.

I don't understand what the real problem is. Could you please summarize the issue again? :)
Blocks: 755245
I updated the `Lost messages` application, now delaying for 20s the installation of the handler.

(In reply to Gene Lian [:gene] from comment #3)
> (In reply to Salvador de la Puente González [:salva] from comment #0)
> > After disabling the system message handler by using:
> > 
> > navigator.mozSetMessageHandler(aType, null)
> > 
> > and going background, if a system message is received, window_manager opens
> > the declared handler registered in Manifest but no message is being
> > delivered.
> 
> This behaviour seems expected to me. If you nullify the message handler, it
> means you're hoping to un-register the handler for a given type. Under this
> circumstance, the handler shouldn't run in any way and no message should be
> handled. Also, mozHasPendingMessage(...) must return false if no handler is
> registered.

Then, if I nullify the message handler and then close the application? Why the message is normally delivered? What happen with my choose of not accepting more messages?

> 
> I don't understand what the real problem is. Could you please summarize the
> issue again? :)

Anyway, the problem is a system message is being marked as delivered if it has reached the process, not the application. You can see this in:

http://mxr.mozilla.org/mozilla-b2g18/source/dom/messages/SystemMessageManager.js#209

Then it checks if there is a dispatcher. If there is not, the message is discarded:

http://mxr.mozilla.org/mozilla-b2g18/source/dom/messages/SystemMessageManager.js#224

I think the child process should not ack the message unless there is a real dispatch. 

WDYT?

Furthermore, Antonino and I detect another problem. Suppose the following timeline:

T0     T1     T2
|------|------|------
 \      \      \- Register handler
  \      \- System message
   \- Start application

The system message is never attended by the handler as it is acknowledged in line 209 but discarded in 224 before a handler is registered.

You can test this behavior with the new version of the application and follow this STR.

Timing is important!!.

1- Launch `Lost messages` application
2- Install an alarm
3- Close the application
4- Wait for 25-30 seconds
5- Launch the application
6- Wait for 20s more

EXPECTED:
As soon as the message handler is registered, the system message is attended.

ACTUAL:
The system message arrived between complete load of the application and message handler installation so it was discarded. The alarm never reaches the handler.
The behavior that Salva described might be expected but seems strange from a app dev point of view. I don't know if the problem might be reproducible without introducing an artificial delay (although it could be a normal delay, if the handler depends on some data to be fetch asynchronously). That is, I don't know if the normal launch flow will allow something like:

T0: App 1 starts launching. It does register itself as a system message receiver and starts loading the app.
T0+1: (close to T0, the system is multiprocess) A system message is received for App1. The parent process checks if the app is running (it is) and sends the message
T0+2: The child process still hasn't finished loading the app. The system handler hasn't been set. It processes the parent message, and discard the message.
T0+3: The child process finishes loading the app and setting the handler... but the message is lost.

A possible solution here (solution as in making this work the same way always and not depend on timing) would be for the child process to answer to the parent with another message (not OK) so the parent process can know that the message wasn't really processed and can let it queued.
Thanks Salvador and Antonio for the detailed explanation! Now I understand better. It seems we have a race condition issue in the code patch from the point of launching the app to the point of registering the system message handler.

To fix this issue, I think maybe we should fire the acknowledgement ("SystemMessageManager:Message:Return:OK") to clean up the pending message in the parent *after handling* the system message instead of *when receiving* that.
Gene, don't forget to check bug 888926, please. What do you think?
Flags: needinfo?(gene.lian)
I've already left a comment for that bug before your reminder. :)
Flags: needinfo?(gene.lian)
(In reply to Gene Lian [:gene] from comment #6)
> Thanks Salvador and Antonio for the detailed explanation! Now I understand
> better. It seems we have a race condition issue in the code patch from the
> point of launching the app to the point of registering the system message
> handler.
> 
> To fix this issue, I think maybe we should fire the acknowledgement
> ("SystemMessageManager:Message:Return:OK") to clean up the pending message
> in the parent *after handling* the system message instead of *when
> receiving* that.

I think that would introduce reentrant problems with the handlers (what happens if a non reentrant handler starts running, makes some changes, then crashes the process by OOMing it before sending the ACK to the message?). 

If we restrict this problem as "let's make sure the message is going to be handler somehow before deleting it" then it could be solved if we ACK it only when we actually have a handler, that is, if we always queue messages in case a handler is defined later on, and we send the ACK only when the message is going to be handled. That is, move the sentence at [1] to just before [2]

[1] https://mxr.mozilla.org/mozilla-b2g18/source/dom/messages/SystemMessageManager.js?force=1#212

[2] https://mxr.mozilla.org/mozilla-b2g18/source/dom/messages/SystemMessageManager.js?force=1#226

Updated

5 years ago
No longer blocks: 882084, 755245

Updated

5 years ago
Blocks: 755245
Yeap, nice catch. :) I just left a comment at bug 888926, comment #5.
Do we have STR ( or duplicate bugs) such that the race condition is hit for a normal user scenario when the phone is being used day-day ? If no, it would be hard to block here given where we are in the 1.1 cycle
Minusing for now, please renom if you can provide the info requested in comment 11
blocking-b2g: leo? → -
You need to log in before you can comment on or make changes to this bug.