Closed Bug 1032410 Opened 10 years ago Closed 6 years ago

GonkSwitch isn't thread safe

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: dhylands, Assigned: sraok, Mentored)

Details

(Whiteboard: [lang=C++])

GonkSwitch has an mHandler array which is manipulated both on MainThread (Init routine, destructor) and on IOThread (GetEventInfo)

If looks like mEventInfo is also being accessed on multiple threads as well.
Mentor: dhylands
Whiteboard: [lang=C++]
I think that to fix this we need a set of switch states which is maintained on both threads.

The master copy would be on the IOThread, and a cache should exist for the main thread.

Then each time an update is received (on the IOThread) it should update the master and post an event for the main thread to update the cache.

Requests to perform actions which come in on the main thread should post an event to the IOThread to actually perform the action.
hello Dave ,
I would like to work on this,can you please assign it to me ?
Hi vaibhav,

I have somebody else that I think has already started on this bug. Let me doublecheck and if they have already started, I'll mark it as assigned to them, otherwise I'd be happy to have you work on it.
Assignee: nobody → sraok
To test this bug I was trying to listen to something on FM app using headphones but it seems that there is a bug related to FM app on Flame Phone. The steps to reproduce the bug are as follows

1. Flash a debug build of Firefox OS on Flame. (export DEVICE_DEBUG=1, export B2G_DEBUG=1 in .userconfig.)
2. Open the FM app. With or without headphones plugged in. If headphones are not plugged in, do so when the app gives a prompt to plug them in. 
3.  Press the previous or next channel button. 
Result: As soon as the button is pressed, the phone turns off.
4. Remove the headphones. Remove and reinsert the battery and try to boot the phone.
Result: "Thundersoft" logo is shown and then the phone becomes dead again.

Stack trace after step 3 is performed is at: http://mozservices.pastebin.mozilla.org/6065888

dmesg exception trace which I guess is causing the boot to fail is at: http://mozservices.pastebin.mozilla.org/6065122

Should I log a bug for the issue? I guess I can go ahead with the fix of the current bug (1032410) as the bug in FM doesnt seem to be related to it. Please let me know what you think.
It seems that lots of devs don't bother testing with debug builds.

So - yes, please file a bug with STR (Steps to Reproduce), and attach the backtrace to that bug (about the Assert)

I'd also verify that the bug occurs without any of your patches applied.

It seem extremely weird to me that the phone won't boot up after a battery pull. I think that this also warrants further investigation, and its probably worthwhile to file a bug about just that issue as well.

Please cc me on both bugs.
Hi All,

I'm looking at this cos it seems like not much is happening. 

Perhaps there's a simpler way: If all the action is on the UI thread and the main thread only touches mHandler during Init and destruction, is it really meaningful for UI calls to be honoured during those times? We could just make a boolean that becomes true at the end of Init and false at the beginning of destruction, then simply discard UI calls with SWITCH_DEVICE_UNKNOWN if it's false. Or am I missing something?

As for mEventInfo, if EnableSwitch and the other accessors are all on the UI thread, then the same would apply, but I don't actually know what threads those calls are on. 

If we really didn't want to discard UI calls during Init, perhaps we could simply post them round in circles til the boolean was true. But I'm not sure if that works cos I don't know whether the ultimate caller of GetEventInfo is expecting a synchronous answer or not, neither do I know how many are likely to show up during that time and whether or not the order matters. It probably does, so I guess discarding them is better.
Hi,

I am sorry I got busy with some of my deadlines. I had implemented a fix but it was crashing at a call to AddRef for the RefPtr of type SwitchHandler. On investigation I felt that even in a main thread the condition "if(!NS_IsMainThread())" was evaluating to true. I might be doing some mistake. I will post the link to my current code and investigations in a few hours. If there is no resolution, I could try implementing what you are suggesting.

Apologies,
Sanmukh
I have attached the code at [1] and investigation at [2]. As per the code, I created a class named SwitchEventInfo. This class is responsible for updating the eventinfo in IOThread and dispatching a runnable to main thread to update the eventinfo there (which again is another instance of the same class on main thread). 

The SwitchEventObserver::Init() (GonkSwitch.cpp:423) is called once on main thread and once on IOThread. In both the cases two instances of the class SwitchEventInfo are created: mHandlerMainThread and mHandlerIOThread. mHandlerIOThread holds the reference to its main thread counterpart.

The detailed analysis of crash is provided in [2]

[1] http://mozservices.pastebin.mozilla.org/6729832
[2] http://mozservices.pastebin.mozilla.org/6729848

I will spend some time on it this week. If you see some obvious mistake in the code, please let me know.
This code is gone.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.