Closed
Bug 806371
Opened 12 years ago
Closed 12 years ago
Android talos test rpr (testBrowserProviderPerf) killed prematurely
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
There are frequent test failures for Talos test rpr (testBrowserProviderPerf). Some are recorded in bug 796914. The test terminates abruptly, as though there is a crash, but no stack is reported.
Tracing suggests the problem occurs in addTonsOfUrls -> bulkInsert -> getWritableDatabase.
10-29 06:38:43.457 D/GeckoBrowserProvider( 2440): Inserted ID in database: 475
10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Calling insert in transaction on URI: content://org.mozilla.fennec.db.browser/bookmarks?test=1
10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Insert on BOOKMARKS: content://org.mozilla.fennec.db.browser/bookmarks?test=1
10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Extracting image values for URI: 061a366c-66db-473e-abf7-685b47140e78
10-29 06:38:43.457 D/GeckoBrowserProvider( 2440): Inserting bookmark in database with URL: 061a366c-66db-473e-abf7-685b47140e78
10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Getting writable database for URI: content://org.mozilla.fennec.db.browser/bookmarks?test=1
10-29 06:38:43.477 I/ActivityManager( 1020): Process org.mozilla.fennec (pid 2440) has died.
10-29 06:38:43.477 I/WindowManager( 1020): WIN DEATH: Window{486cf2b0 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
10-29 06:38:43.487 D/AndroidRuntime( 2432): Shutting down VM
10-29 06:38:43.487 I/Process ( 1020): Sending signal. PID: 2509 SIG: 9
10-29 06:38:43.487 I/WindowManager( 1020): WIN DEATH: Window{4867f2d0 SurfaceView paused=false}
10-29 06:38:43.487 W/ActivityManager( 1020): Crash of app org.mozilla.fennec running instrumentation ComponentInfo{org.mozilla.roboexample.test/org.mozilla.fennec.FennecInstrumentationTestRunner}
10-29 06:38:43.487 I/ActivityManager( 1020): Force stopping package org.mozilla.fennec uid=10033
10-29 06:38:43.487 I/ActivityManager( 1020): Force finishing activity HistoryRecord{4851f108 org.mozilla.fennec/.App}
10-29 06:38:43.487 W/ActivityManager( 1020): Duplicate finish request for HistoryRecord{4851f108 org.mozilla.fennec/.App}
Assignee | ||
Comment 1•12 years ago
|
||
I see now! It's not so much a crash, but a kill: "Sending signal. PID: 2509 SIG: 9"
I suspect talos is getting confused and killing Fennec mid-test.
There is a timeout in ttest of 300 s; this test normally takes about 110 s. In the failure cases, it looks like we often only wait ~5 seconds, because it appears that the process has completed.
Summary: Crash during rpr (testBrowserProviderPerf) in getWritableDatabse → Android talos test rpr (testBrowserProviderPerf) killed prematurely
Assignee | ||
Comment 2•12 years ago
|
||
In the failure cases, ffprocess_remote.launchProcess only executes for ~5 s, whereas launchProcess executes for ~110 s in the success cases. In turn, this difference is seen in execution times for devicemanagerSUT.fireProcess. I suspect that sutAgent is confused by the "am" command line, but debugging the sutAgent is awkward in this case, since I can only seem to reproduce this bug on try. Also, it seems to me that sutAgent exec commands are not (and should not be) guaranteed to wait for process completion. There are safe-guards in devicemanager and ffprocess to wait for process completion after the process has been launched, but that code appears to be consistently failing -- I am looking into that now.
Assignee | ||
Comment 3•12 years ago
|
||
I determined that sutAgent was in fact waiting for process completion -- the process was ending abruptly because it was being killed. The kill was not originating from bcontroller/ffprocess/devicemanager -- they were all waiting for dm.fireProcess to complete. I noted that the kill was occurring while addTonsOfUrls was executing; I added sleeps before addTonsOfUrls, and the kill continued to occur during addTonsOfUrls.
I suspected - not for the first time! - an OOM kill, but could not verify: onLowMemory was not called, but that's not conclusive one way or the other. I tried reducing the test's memory use by simply reducing the BATCH_SIZE in testBrowserProviderPerf.java.in. Immediately, try tests stopped showing the sudden kill on test startup. However, those same tests started failing intermittently with a SIGSEGV on test startup. We are currently seeing SIGSEGVs on startup in other tests and tracking that in other bugs, but not with this frequency.
I then wondered if the SIGSEGVs were caused by an unrelated problem, so updated my build and pushed to try again: now I don't see the SIGSEGVs or the kills on try -- rpr is passing consistently. But now rpr is passing on tbpl also, without my change. Frustrating!!
Assignee | ||
Comment 4•12 years ago
|
||
The last 20 rpr runs on tbpl/m-i have been error-free. I guess there's nothing more to do here.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•