Closed Bug 977995 Opened 7 years ago Closed 7 years ago

B2G: remove mNetdWorker from SystemWorkerManager

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
1.4 S5 (11apr)

People

(Reporter: vicamo, Assigned: vicamo)

References

Details

(Whiteboard: [ft:ril][p=1])

Attachments

(1 file, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #864931 +++

NetworkService no longer owns a ChromeWorker.  No one is really connected to NetworkService via nsISystemWorkerManager.
Attached patch patch (obsolete) — Splinter Review
try all: https://tbpl.mozilla.org/?tree=Try&rev=7f90e4693a66
Assignee: nobody → vyang
Attachment #8383524 - Flags: review?(khuey)
Based on bug 975813.
Depends on: 975813
Comment on attachment 8383524 [details] [diff] [review]
patch

Busted B2G xpcshell tests.  Got to fix test_networkstats_service.js first.
Attachment #8383524 - Flags: review?(khuey)
Attached patch patch (obsolete) — Splinter Review
There are multiple defects in NetworkWorker and the related parts since the C++ rewrite.  1) NetworkService holds a reference to NetworkWorker and never releases it.  It has to wait until the cycle collector comes up to resolve their ownership loop and free NetworkWorker manually.  However 2) nsINetworkWorker::shutdown is never called, and that leaves everything living till the end, inclusive of that gNetdConsumer in Netd.cpp.  3) when GC comes to free NetworkWorker, it calls its parent destructor ~NetConsumer(), which in turn calls ~RefCounted<NetdConsumer>().  Having a valid gNetdConsumer in Netd.cpp follows its refCnt is not zero and this triggers an assertion in ~RefCounted<NetdConsumer>().

So, some obvious treatments here.  A) NetworkService should call nsINetworkWorker::shutdown upon receiving a shutdown observer event and release the reference to NetworkWorker.  B) NetworkWorker should never be double ref-counted.  Move NetdConsumer implementation into a separated class.

Passed local dbg-xpcshell.  Full try: https://tbpl.mozilla.org/?tree=Try&rev=506f32bcfcc2
Attachment #8383524 - Attachment is obsolete: true
Attachment #8403195 - Flags: review?(khuey)
Attachment #8403195 - Flags: feedback?(vchang)
Comment on attachment 8403195 [details] [diff] [review]
patch

Review of attachment 8403195 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for your help to fix this. Nice work. 

According to your explanation, I am wondering why removing netd code from systemWorkerManager causes xpcshell test fail. 
Maybe kyle can help to give us the answer ? :-)
Attachment #8403195 - Flags: feedback?(vchang) → feedback+
Comment on attachment 8403195 [details] [diff] [review]
patch

Review of attachment 8403195 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/system/gonk/NetworkWorker.cpp
@@ +177,5 @@
>  
>    nsresult rv;
>  
> +  nsCOMPtr<nsIThread> workerThread;
> +  rv = NS_NewThread(getter_AddRefs(workerThread));

NS_NewNamedThread please.  And just assign directly into gWorkerThread.

@@ +203,4 @@
>    StopNetd();
>  
> +  gWorkerThread->Shutdown();
> +  gWorkerThread = nullptr;

You should also be able to replace all the ClearOnShutdown calls in this file with code to null out pointers here.  Feel free to do that in another bug.
Attachment #8403195 - Flags: review?(khuey) → review+
(In reply to Vincent Chang[:vchang] from comment #5)
> Comment on attachment 8403195 [details] [diff] [review]
> patch
> 
> Review of attachment 8403195 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for your help to fix this. Nice work. 
> 
> According to your explanation, I am wondering why removing netd code from
> systemWorkerManager causes xpcshell test fail. 
> Maybe kyle can help to give us the answer ? :-)

What xpcshell test failure?  The try link in comment 4 looks fine.
> What xpcshell test failure?  The try link in comment 4 looks fine.

test_networkstats_service.js mentioned in comment 3.
OS: All → Gonk (Firefox OS)
Hardware: All → ARM
Whiteboard: [ft:ril][p=1]
Target Milestone: --- → 1.4 S6 (25apr)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (Away 4/19-5/7) from comment #9)
> Is that not what part #3 of comment 4 fixes?

The first revision patch, attachment 8383524 [details] [diff] [review], crashed debug build at the end of xpcshell tests (results in comment 1).  Then I have a second revision, attachment 8403195 [details] [diff] [review], to address that crash (comment 4).

So Vincent is wondering, why does "removing additional reference from NetworkService and SystemWorkerManager in that first revision" result in the crash.  We all know the near cause of that crash as I've described in comment 4, but we don't know the whole picture, eg. when does every refCnt of all the involved instances get decreased?  I don't have a satisfying answer for him.
Attached patch patch : v2Splinter Review
Address review comments.  I'll file a follow-up for ClearOnShutdown stuff because I think that will touch object references and it would be better have its own bug for further discuss & review.
Attachment #8403195 - Attachment is obsolete: true
Attachment #8405412 - Flags: review+
Attachment #8405412 - Flags: feedback+
My guess is that if you never call Shutdown on the NetworkWorker we would not call StopNetd() which means that the destructor would not run until static destructors run and gNetDConsumer is destroyed.  I wouldn't be surprised if destroying objects that late causes crashes in debug builds (where we try to do things like leak tracking, cycle collected objects assert if they are destroyed too late, etc).

This is just speculation though ... looking at the stack of the failure and poking around with a debugger would get you much further.
https://hg.mozilla.org/mozilla-central/rev/9df1bf74a0c6
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: 1.4 S6 (25apr) → 1.4 S5 (11apr)
Blocks: 995525
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #11)
> I'll file a follow-up for ClearOnShutdown stuff
> because I think that will touch object references and it would be better
> have its own bug for further discuss & review.

Filed as bug 995525.
Bug 988132 is fixed after apply the patch. Try result: https://tbpl.mozilla.org/?tree=Try&rev=7e212dae6207
No crash found in B2G emulator reftests.
Duplicate of this bug: 988132
You need to log in before you can comment on or make changes to this bug.