(In reply to Mathieu Leplatre [:leplatrem] from comment #8)
f the re-scheduling feature seems valuable to you, perhaps let's file a follow-up for that?
We have the 24h timer, which will keep on retrying...
Sure, but it seems like if the last 24h it failed, we should retry if we get network-changed events so that we don't depend on the combination of:
- firefox is running
- firefox is connected to a network
- that network allows us to reach the server
as much as we do now. (I realize that we'll retry on startup if Firefox wasn't running when the timer would have expired, but that still leaves the other 2 things, and I'd be curious if e.g. for captive portals, we are liable to end up in a situation where the user doesn't have network on startup, but has it shortly afterwards, which would break a lot! This might also affect app updates!)
Could we try to find out how frequent this kind of issue is by adding some telemetry? (Perhaps we already have telemetry?)
We already have some specific Telemetry status reported in a number of situations (network, server, indexeddb, etc.). (errors distributions, sync statuses)
The sync status seems to indicate about 2% of syncs fail. That seems worth trying to do something about - maybe. But if we just try more often then the error rate might actually go up! Do we have telemetry for how out-of-date collections are / "time since last successful sync" ? Would be interesting to know where the long tail of that graph starts, as it were... Like, are users often 2-3 days out of date, or maybe even a week or more?
I'm not sure what you mean here by "completely broken" :-) Can you elaborate?
I was thinking something like profile folder not writable, quota exceeded, hard disk is full, etc.
Good question. I think we don't deal well with this generally, but we probably should. I don't know how best to do so though...
Also, if synchronization is consistently failing, should we take the initiative to reset critical preferences (see Bug 1769669) ?
I think the approach in that bug (not allowing prefs to mess with it that much) is probably better. Resetting it without breaking automated tests might be tricky, and I'm always apprehensive about automatically overwriting users settings...