Open
Bug 1246840
Opened 10 years ago
Updated 3 years ago
measure NS_ERROR_NET_PARTIAL_TRANSFER and other potentially retryable errors while syncing
Categories
(Firefox :: Sync, defect, P4)
Firefox
Sync
Tracking
()
NEW
| Tracking | Status | |
|---|---|---|
| firefox47 | --- | affected |
People
(Reporter: markh, Unassigned)
References
Details
(Whiteboard: [sync-data-integrity])
Attachments
(1 obsolete file)
On my dev profile I just saw an error doing a GET due to NS_ERROR_NET_PARTIAL_TRANSFER. Best I can tell, this apparently means the number of bytes received != Content-Length. I've also seen this in some user-submitted logs.
It seems we should attempt to recover from what appears to be transient network related errors.
Log entry:
1454982237312 Sync.ErrorHandler DEBUG bookmarks failed: Error: NS_ERROR_NET_PARTIAL_TRANSFER (resource://services-sync/resource.js:405:19) JS Stack trace: waitForSyncCallback@async.js:98:7
Res__request@resource.js:398:14
Res_get@resource.js:425:12
SyncEngine.prototype._processIncoming@engines.js:1129:18
BookmarksEngine.prototype._processIncoming@bookmarks.js:463:7
SyncEngine.prototype._sync@engines.js:1551:7
wrappedSync@bookmarks.js:229:11
_sync@bookmarks.js:226:5
WrappedNotify@util.js:146:21
Engine.prototype.sync@engines.js:670:5
_syncEngine@enginesync.js:213:7
sync@enginesync.js:163:15
onNotify@service.js:1279:7
WrappedNotify@util.js:146:21
WrappedLock@util.js:101:16
_lockedSync@service.js:1269:12
sync/<@service.js:1261:14
WrappedCatch@util.js:75:16
sync@service.js:1249:5
Res_get@resource.js:425:12
SyncEngine.prototype._processIncoming@engines.js:1129:18
BookmarksEngine.prototype._processIncoming@bookmarks.js:463:7
SyncEngine.prototype._sync@engines.js:1551:7
wrappedSync@bookmarks.js:229:11
_sync@bookmarks.js:226:5
WrappedNotify@util.js:146:21
Engine.prototype.sync@engines.js:670:5
_syncEngine@enginesync.js:213:7
sync@enginesync.js:163:15
onNotify@service.js:1279:7
WrappedNotify@util.js:146:21
WrappedLock@util.js:101:16
_lockedSync@service.js:1269:12
sync/<@service.js:1261:14
WrappedCatch@util.js:75:16
sync@service.js:1249:5
Updated•10 years ago
|
Flags: firefox-backlog+
Updated•10 years ago
|
Assignee: nobody → tchiovoloni
Status: NEW → ASSIGNED
Comment 2•10 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49555/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49555/
Attachment #8746729 -
Flags: review?(markh)
Comment 3•10 years ago
|
||
When you get around to reviewing this, a couple things occurred to me as worth mentioning that I'm not sure about.
- Number of times to retry (one now). The Response tests even seem to believe theres a `services.sync.network.numRetries` pref, but no reference to network.numRetries (or even numRetries) exists in the tree AFAICT, but it may make sense to use a pref here (this pref, even).
- Whether or not we should wait a short amount of time between retries (right now we don't, since we only retry once).
- Whether or not we should handle other kinds of network errors. There are a few that may be applicable. NS_ERROR_NET_INTERRUPT, NS_ERROR_NET_RESET, NS_ERROR_UNKNOWN_HOST, etc. (I've even heard of people retrying when HTTP codes come back >= 500, in the hope that they don't trigger the same server error, but I'm not sure this sounds sane, or makes sense -- esp. in this case)
- Whether or not we should retry other idempotent requests (PUT and DELETE are the only two that are applicable here I think, although likely not for this specific error, as the server is sending a 200, and we likely don't need the actual data it sends back for these).
| Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Thom Chiovoloni [:tcsc] from comment #3)
> When you get around to reviewing this, a couple things occurred to me as
> worth mentioning that I'm not sure about.
Thanks! I'm getting a bit behind now, so I probably won't get back to this for a week or so.
| Reporter | ||
Comment 5•10 years ago
|
||
(In reply to Thom Chiovoloni [:tcsc] from comment #3)
> When you get around to reviewing this, a couple things occurred to me as
> worth mentioning that I'm not sure about.
Yeah, these are all excellent points, and make me wonder if we should be doing this at all. The error isn't user-visible, and Sync will already retry quicker on error than it would normally - and as you mention, an immediate retry sounds like it would often fail. The list of possible transient errors is large, and they could still happen on POSTs, for which we can't sanely do an automatic retry.
I guess we could prioritize the measurement of how often this happens (which is what comment 1 says is already was), but I'm not sure even that should be a high priority - the fact that I saw it once and saw it in other a few other logs doesn't mean much. I may have seen it from logs in older versions of Sync where users were notified of the errors, or if it was in later Syncs I suspect they wouldn't have been relevant to the bug being reported (as the user wouldn't have noticed the single error and the retry we made soon after would have worked - or if it didn't, this retry strategy wouldn't have helped anyway).
So I suggest we don't bother with this patch, change the bug to make it clearer the task is to measure how often it happens and bump it down to P4.
Thom, what do you think? (And if you agree, apologies for wasting your time here)
Comment 6•10 years ago
|
||
Yes, although the frequency alone might not be worth doing, it would be more useful to know which kinds of network errors are occurring as well.
But other than that, yes, I agree that in its current state it's fairly unlikely to be particularly useful, and moving it to a P4 measurement bug sounds good to me.
| Reporter | ||
Comment 7•10 years ago
|
||
Thanks Thom!
Assignee: tchiovoloni → nobody
Status: ASSIGNED → NEW
Priority: P2 → P4
Summary: NS_ERROR_NET_PARTIAL_TRANSFER error syncing → measure NS_ERROR_NET_PARTIAL_TRANSFER and other potentially retryable errors while syncing
| Reporter | ||
Updated•10 years ago
|
Attachment #8746729 -
Attachment is obsolete: true
Attachment #8746729 -
Flags: review?(markh)
Updated•9 years ago
|
Whiteboard: [sync-data-integrity]
| Reporter | ||
Comment 8•9 years ago
|
||
I can repro this every time in bug 1294270. Note also that it seems highly likely the sync ping can tell us how often this error is hit.
See Also: → 1294270
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•