806371 - Android talos test rpr (testBrowserProviderPerf) killed prematurely

Assignee

Description

•

12 years ago

There are frequent test failures for Talos test rpr (testBrowserProviderPerf). Some are recorded in bug 796914. The test terminates abruptly, as though there is a crash, but no stack is reported. Tracing suggests the problem occurs in addTonsOfUrls -> bulkInsert -> getWritableDatabase. 10-29 06:38:43.457 D/GeckoBrowserProvider( 2440): Inserted ID in database: 475 10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Calling insert in transaction on URI: content://org.mozilla.fennec.db.browser/bookmarks?test=1 10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Insert on BOOKMARKS: content://org.mozilla.fennec.db.browser/bookmarks?test=1 10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Extracting image values for URI: 061a366c-66db-473e-abf7-685b47140e78 10-29 06:38:43.457 D/GeckoBrowserProvider( 2440): Inserting bookmark in database with URL: 061a366c-66db-473e-abf7-685b47140e78 10-29 06:38:43.457 V/GeckoBrowserProvider( 2440): Getting writable database for URI: content://org.mozilla.fennec.db.browser/bookmarks?test=1 10-29 06:38:43.477 I/ActivityManager( 1020): Process org.mozilla.fennec (pid 2440) has died. 10-29 06:38:43.477 I/WindowManager( 1020): WIN DEATH: Window{486cf2b0 org.mozilla.fennec/org.mozilla.fennec.App paused=false} 10-29 06:38:43.487 D/AndroidRuntime( 2432): Shutting down VM 10-29 06:38:43.487 I/Process ( 1020): Sending signal. PID: 2509 SIG: 9 10-29 06:38:43.487 I/WindowManager( 1020): WIN DEATH: Window{4867f2d0 SurfaceView paused=false} 10-29 06:38:43.487 W/ActivityManager( 1020): Crash of app org.mozilla.fennec running instrumentation ComponentInfo{org.mozilla.roboexample.test/org.mozilla.fennec.FennecInstrumentationTestRunner} 10-29 06:38:43.487 I/ActivityManager( 1020): Force stopping package org.mozilla.fennec uid=10033 10-29 06:38:43.487 I/ActivityManager( 1020): Force finishing activity HistoryRecord{4851f108 org.mozilla.fennec/.App} 10-29 06:38:43.487 W/ActivityManager( 1020): Duplicate finish request for HistoryRecord{4851f108 org.mozilla.fennec/.App}

Geoff Brown [:gbrown]

Assignee

Updated

•

12 years ago

Blocks: 796914

Geoff Brown [:gbrown]

Assignee

Comment 1

•

12 years ago

I see now! It's not so much a crash, but a kill: "Sending signal. PID: 2509 SIG: 9" I suspect talos is getting confused and killing Fennec mid-test. There is a timeout in ttest of 300 s; this test normally takes about 110 s. In the failure cases, it looks like we often only wait ~5 seconds, because it appears that the process has completed.

Summary: Crash during rpr (testBrowserProviderPerf) in getWritableDatabse → Android talos test rpr (testBrowserProviderPerf) killed prematurely

Geoff Brown [:gbrown]

Assignee

Comment 2

•

12 years ago

In the failure cases, ffprocess_remote.launchProcess only executes for ~5 s, whereas launchProcess executes for ~110 s in the success cases. In turn, this difference is seen in execution times for devicemanagerSUT.fireProcess. I suspect that sutAgent is confused by the "am" command line, but debugging the sutAgent is awkward in this case, since I can only seem to reproduce this bug on try. Also, it seems to me that sutAgent exec commands are not (and should not be) guaranteed to wait for process completion. There are safe-guards in devicemanager and ffprocess to wait for process completion after the process has been launched, but that code appears to be consistently failing -- I am looking into that now.

Geoff Brown [:gbrown]

Assignee

Comment 3

•

12 years ago

I determined that sutAgent was in fact waiting for process completion -- the process was ending abruptly because it was being killed. The kill was not originating from bcontroller/ffprocess/devicemanager -- they were all waiting for dm.fireProcess to complete. I noted that the kill was occurring while addTonsOfUrls was executing; I added sleeps before addTonsOfUrls, and the kill continued to occur during addTonsOfUrls. I suspected - not for the first time! - an OOM kill, but could not verify: onLowMemory was not called, but that's not conclusive one way or the other. I tried reducing the test's memory use by simply reducing the BATCH_SIZE in testBrowserProviderPerf.java.in. Immediately, try tests stopped showing the sudden kill on test startup. However, those same tests started failing intermittently with a SIGSEGV on test startup. We are currently seeing SIGSEGVs on startup in other tests and tracking that in other bugs, but not with this frequency. I then wondered if the SIGSEGVs were caused by an unrelated problem, so updated my build and pushed to try again: now I don't see the SIGSEGVs or the kills on try -- rpr is passing consistently. But now rpr is passing on tbpl also, without my change. Frustrating!!

Geoff Brown [:gbrown]

Assignee

Comment 4

•

12 years ago

The last 20 rpr runs on tbpl/m-i have been error-free. I guess there's nothing more to do here.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

Ed Morley [:emorley]

Updated

•

12 years ago

Blocks: 816584

Phil Ringnalda (:philor)

Updated

•

12 years ago

Blocks: 829371

Nobody; OK to take it and work on it

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Bugzilla

Android talos test rpr (testBrowserProviderPerf) killed prematurely

Categories

(Firefox for Android Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Updated