Don't expire idle connections in Sync's connection pool

VERIFIED FIXED in Firefox 14

Status

()

Firefox for Android
Android Sync
P1
normal
VERIFIED FIXED
6 years ago
4 months ago

People

(Reporter: rnewman, Assigned: rnewman)

Tracking

unspecified
mozilla15
ARM
Android
Points:
---

Firefox Tracking Flags

(firefox14 fixed, blocking-fennec1.0 +)

Details

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
Right now we shut down our connection pool at the end of a sync, and expire dead connections every three seconds.

I have a minor concern about us shutting down a socket that isn't finished flushing data, or otherwise causing unexpected issues.

We found bugs by having the pool shutdown in place, but I think it's less risky to leave the pool alive for now.
Data point:

D class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(8591) Connection closed
D class ch.boye.httpclientandroidlib.impl.client.DefaultHttpClient(8591) Closing the connection.
D class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(8591) Connection closed
D class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(8591) Connection shut down
W RecordUploadRunnable(8591)  Got request error: ch.boye.httpclientandroidlib.NoHttpResponseException: The target server failed to respond
W RecordUploadRunnable(8591)  ch.boye.httpclientandroidlib.NoHttpResponseException: The target server failed to respond
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:281)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:219)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:645)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:818)
W RecordUploadRunnable(8591)  	at ch.boye.httpclientandroidlib.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:752)
W RecordUploadRunnable(8591)  	at org.mozilla.gecko.sync.net.BaseResource.execute(BaseResource.java:247)
W RecordUploadRunnable(8591)  	at org.mozilla.gecko.sync.net.BaseResource.go(BaseResource.java:275)
W RecordUploadRunnable(8591)  	at org.mozilla.gecko.sync.net.BaseResource.post(BaseResource.java:295)
W RecordUploadRunnable(8591)  	at org.mozilla.gecko.sync.net.SyncStorageRequest.post(SyncStorageRequest.java:170)
W RecordUploadRunnable(8591)  	at org.mozilla.gecko.sync.repositories.Server11RepositorySession$RecordUploadRunnable.run(Server11RepositorySession.java:523)
W RecordUploadRunnable(8591)  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
W RecordUploadRunnable(8591)  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:569)
W RecordUploadRunnable(8591)  	at java.lang.Thread.run(Thread.java:856)
W SynchronizerSession(8591)   Second RecordsChannel onFlowStoreFailed. Ignoring store error.
W SynchronizerSession(8591)   ch.boye.httpclientandroidlib.NoHttpResponseException: The target server failed to respond
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:281)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:219)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:645)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:818)
W SynchronizerSession(8591)   	at ch.boye.httpclientandroidlib.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:752)
W SynchronizerSession(8591)   	at org.mozilla.gecko.sync.net.BaseResource.execute(BaseResource.java:247)
W SynchronizerSession(8591)   	at org.mozilla.gecko.sync.net.BaseResource.go(BaseResource.java:275)
W SynchronizerSession(8591)   	at org.mozilla.gecko.sync.net.BaseResource.post(BaseResource.java:295)
W SynchronizerSession(8591)   	at org.mozilla.gecko.sync.net.SyncStorageRequest.post(SyncStorageRequest.java:170)
W SynchronizerSession(8591)   	at org.mozilla.gecko.sync.repositories.Server11RepositorySession$RecordUploadRunnable.run(Server11RepositorySession.java:523)
W SynchronizerSession(8591)   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
W SynchronizerSession(8591)   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:569)
W SynchronizerSession(8591)   	at java.lang.Thread.run(Thread.java:856)
I RecordsChannel(8591)        onStoreCompleted. Notifying delegate of onFlowCompleted. Fetch end is 1334874054473, store end is 0
(Assignee)

Comment 2

6 years ago
I have an intuition about what might be going on here. Select lines from a related log:

04-29 21:22:49.907: INFO/RecordsChannel(19842): Beginning source.
04-29 21:23:16.142: INFO/RecordsChannel(19842): onFetchCompleted. Stopping consumer once stores are done.
04-29 21:23:16.833: INFO/SynchronizerSession(19842): First RecordsChannel onFlowCompleted. Fetch end is 1335759754040. Store end is 1335759796850. Starting next.
04-29 21:23:16.843: INFO/BrowserRepoSession(19842): Running fetchSince(0).
04-29 21:23:22.358: DEBUG/class ch.boye.httpclientandroidlib.impl.conn.tsccm.ThreadSafeClientConnManager(19842): Closing expired connections
04-29 21:23:22.358: DEBUG/class ch.boye.httpclientandroidlib.impl.conn.tsccm.ConnPoolByRoute(19842): Closing expired connections
04-29 21:23:22.388: DEBUG/dalvikvm(19842): GC_FOR_MALLOC freed 242K, 42% free 4397K/7495K, external 174K/521K, paused 24ms
04-29 21:23:22.388: INFO/RecordsChannel(19842): onFetchCompleted. Stopping consumer once stores are done.
04-29 21:23:22.388: INFO/RecordsChannel(19842): Fetch timestamp is 1335759796853
04-29 21:23:22.388: INFO/CRecordConsumer(19842): Consumer is done. Processed 0 records.
04-29 21:23:22.428: DEBUG/dalvikvm(19812): GC_EXPLICIT freed 109K, 47% free 2991K/5639K, external 0K/0K, paused 170ms
04-29 21:23:22.458: DEBUG/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(19842): Connection closed
04-29 21:23:22.458: DEBUG/class ch.boye.httpclientandroidlib.impl.client.DefaultHttpClient(19842): Closing the connection.
04-29 21:23:22.458: DEBUG/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(19842): Connection closed
04-29 21:23:22.458: DEBUG/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(19842): Connection shut down
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842): Got request error: javax.net.ssl.SSLException: Write error: ssl=0x54e3b0: I/O error during system call, Broken pipe
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842): javax.net.ssl.SSLException: Write error: ssl=0x54e3b0: I/O error during system call, Broken pipe
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_write(Native Method)
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLOutputStream.write(OpenSSLSocketImpl.java:839)
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at ch.boye.httpclientandroidlib.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:153)
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at ch.boye.httpclientandroidlib.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:114)
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at ch.boye.httpclientandroidlib.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:120)
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at org.mozilla.gecko.sync.repositories.Server11RepositorySession$RecordUploadRunnable$ByteArraysContentProducer.writeTo(Server11RepositorySession.java:477)
04-29 21:23:22.458: WARN/RecordUploadRunnable(19842):     at ch.boye.httpclientandroidlib.entity.EntityTemplate.writeTo(EntityTemplate.java:68)
...

Note the timestamps here. We expire connections every six seconds, and we treat 30 seconds as the expiration threshold for idle.

HttpClient decides a connection is idle when it hasn't been checked out and returned to the pool within that time.

Timeline:

 0    21:22:49.907  Begin download.
27    21:23:16.833  Download finished.
27    21:23:16.843  Begin upload.
33    21:23:22.358  Begin closing connections (timed).
33    21:23:22.388  All fetched uploads have been queued, so now we're just writing.
33    21:23:22.458  Log write error.

so it looks like, 30 seconds after we started the download, the connection is treated as idle and closed, even though we're uploading on that pipe.

This *should not* happen if we return the connection to the pool. I'm pretty sure that we do. In fact, it *should not* happen, because the connection is active, not in the pool!

Given that:

1. Something is happening here that's not desirable, and
2. We close outstanding connections *anyway* when a sync finishes, and
3. We should be constantly using the same connection for the entire duration of a sync, because we only talk to one server

I suggest that we don't expire idle connections at any time during a sync.
Summary: Disable connection pool shutdown → Don't expire idle connections in Sync's connection pool
(Assignee)

Comment 4

6 years ago
Tony ran this through its paces; it no longer fails in the manner that it did before.

This is a compact and -- in my opinion -- safe change that will avoid failed uploads on some or all devices if a sync takes more than 30 seconds. I've been keeping an eye on this and mulling it over, and I'm convinced that this is the right one-line fix.

(That old story about the chalk mark!)

Connections are still closed, just not when they shouldn't be! Nominating as a release blocker, at least. Patch is ready to go.
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
blocking-fennec1.0: --- → ?
(Assignee)

Comment 5

6 years ago
Created attachment 620965 [details] [diff] [review]
Proposed patch. v1
Attachment #620965 - Flags: review?(liuche)
(Assignee)

Updated

6 years ago
Priority: -- → P1
(Assignee)

Comment 6

6 years ago
Comment on attachment 620965 [details] [diff] [review]
Proposed patch. v1

Chenxia reviewed offline.
Attachment #620965 - Flags: review?(liuche) → review+
blocking-fennec1.0: ? → +
(Assignee)

Updated

6 years ago
Target Milestone: --- → mozilla15
(Assignee)

Comment 8

6 years ago
Comment on attachment 620965 [details] [diff] [review]
Proposed patch. v1

Nomming for after this has a day on m-c. With any luck it'll get in before beta.
Attachment #620965 - Flags: approval-mozilla-aurora?

Comment 9

6 years ago
https://hg.mozilla.org/mozilla-central/rev/524f31fc1679
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Attachment #620965 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 10

6 years ago
This has been tested and had a little bake time.

https://hg.mozilla.org/releases/mozilla-aurora/rev/b37b69ed6b86

Only QA steps are "keep an eye out for connection failures immediately after a connection pool log message", and "make sure connection pool is closed at the end of a sync". Those are general watchfulness, so not setting any QA flags.
status-firefox14: --- → fixed
verified with latest Nightly
Status: RESOLVED → VERIFIED
Component: Android Sync → Android Sync
Product: Mozilla Services → Android Background Services

Updated

4 months ago
Product: Android Background Services → Firefox for Android
You need to log in before you can comment on or make changes to this bug.