Closed Bug 865552 Opened 11 years ago Closed 8 years ago

"Sync encountered an error: unknown error" toolbar isn't useful

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1180587

People

(Reporter: jaws, Unassigned)

Details

Occasionally, and recently more frequently, I've been seeing the "Sync encountered an error while syncing: Unknown error. Sync will automatically retry this action. [Sync now]" toolbar at the bottom of my Firefox window (Nightly and Aurora).

This toolbar doesn't provide any actionable information to users with the exception of the Sync Now button, but even that button is an implementation detail that the user shouldn't be concerned about.

When Sync encounters an error, Sync should just retry itself. It doesn't make much sense to tell the user that they need to click a button to get Sync to retry.

The main benefit of showing this toolbar would be for users who are wondering why their information is not syncing immediately to their other devices. Do we know how serious this issue actually is? Are we "shouting from the mountain tops" when the user wouldn't even notice this blip?

It would be useful to know that for cases where Sync stops working, does it stop working for a matter of minutes? or days? or weeks? If over 80% of the cases are just a matter of minutes, then we should delay showing this notification until many hours (or days!) have passed.

Lastly, I don't think this is the best way that we can notify the user of this issue. FHR or Troubleshooting Information would be more opportune places to put this information.
Sync *usually* doesn't show this bar unless Sync has been failing for seven straight days. It's essentially a clue to start searching/looking on SUMO/hitting IRC.
Wow, 7 days! That's a long time. I would expect that it would be far more emphatic after a disaster of that magnitude. It makes it sound like a temporary network interruption or something. Also, it seems that there's no feedback whether "Sync Now" actually worked, as the bar disappears (for another week?!).

"Unknown error" is particularly helpful when googling for a solution.

Perhaps a tweak to the text and a link to SUMO would be a easyish stopgap?
(In reply to Richard Newman [:rnewman] from comment #1)
> Sync *usually* doesn't show this bar unless Sync has been failing for seven
> straight days. It's essentially a clue to start searching/looking on
> SUMO/hitting IRC.

Huh, so if I've seen this three times in the past three weeks, then Sync hasn't been working for three weeks for me? What can the user even do in this scenario?

I double-checked and this isn't true. If Sync hasn't been able to connect in 7 days then it shows the "error.sync.prolonged_failure" text which is:
> "Sync has not been able to complete during the last %1$S days. Please check your network settings."

So this bug is happening because the Sync server went down for maintenance and Sync could not connect. See the code path that was followed here: http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser-syncui.js#342

We shouldn't jump to an infobar immediately when a problem like this can go away soon (a matter of minutes or hours).

Richard, would you like to mentor this bug?
Flags: needinfo?(rnewman)
(In reply to Jared Wein [:jaws] from comment #3)
> So this bug is happening because the Sync server went down for maintenance
> and Sync could not connect.

I say this because last night, and today, https://services.mozilla.com/status/ is showing:
> The Firefox Sync service is undergoing maintenance.
(In reply to James May [:fowl] from comment #2)
> Wow, 7 days! That's a long time. I would expect that it would be far more
> emphatic after a disaster of that magnitude. It makes it sound like a
> temporary network interruption or something.

That's the usual rule, yes.

For "major" problems, or for a user-initiated sync, the bar will pop up immediately.

The purpose of the seven-day wait is to stop transient network errors from constantly bothering the user. Sync is supposed to be invisible and eventually consistent, and it used to constantly throw useless errors like this when you opened your laptop on a captive portal, etc.


> Also, it seems that there's no
> feedback whether "Sync Now" actually worked, as the bar disappears (for
> another week?!).

If it disappears and doesn't come back, it worked.


> Perhaps a tweak to the text and a link to SUMO would be a easyish stopgap?

Could be, but the reason this UI sucks is the same reason we aren't investing in it -- the product is sunsetting, so if it isn't a critical issue, it's a WONTFIX.


(In reply to Jared Wein [:jaws] from comment #3)

> Huh, so if I've seen this three times in the past three weeks, then Sync
> hasn't been working for three weeks for me?

Most likely you've been hitting unusual server errors -- the latest batch of server failures have been resulting in HTTP 400 responses, which throw up an error bar immediately.


> What can the user even do in
> this scenario?

Google that error phrase, find out how to report a bug/find SUMO/see their Sync error logs. (Our IRC activity suggests that this is working.)

The reason why it's a pretty vague message is because if we knew what the problem was, we'd automatically fix it. Also because the legacy Sync codebase really sucks.


> So this bug is happening because the Sync server went down for maintenance
> and Sync could not connect.

Actually the exact opposite: this is happening because the Sync server started failing and was *not* responding that it was in maintenance mode (503 + Retry After).


> We shouldn't jump to an infobar immediately when a problem like this can go
> away soon (a matter of minutes or hours).

*All* server problems *should* result in a 503, which goes down the prolonged failure path. That they don't is a server problem.

What you're seeing here is the decrepit Sync infrastructure buckling under load as we wait for a replacement product to reach the market. These are unpredictable failures that are outside the protocol.


> Richard, would you like to mentor this bug?

There isn't a client bug here.
Flags: needinfo?(rnewman)
(In reply to Jared Wein [:jaws] from comment #4)

> I say this because last night, and today,
> https://services.mozilla.com/status/ is showing:
> > The Firefox Sync service is undergoing maintenance.

That message is there because failures are occurring, and the services ops folks are working hard to fix them :D
Well, it is a client bug in that it shows up in Firefox and hurts people's confidence in Firefox. The fact that Sync is sunsetting doesn't play a role in stopping an issue like this one from affecting users today and until Sync is replaced.

It should be pretty trivial to remove this case of the Sync toolbar appearing.
(In reply to Jared Wein [:jaws] from comment #7)
> Well, it is a client bug in that it shows up in Firefox and hurts people's
> confidence in Firefox.

But so do the rest of Sync's collection of flaws, and we're not fixing those, either.

The reason I call this a server bug is "someone can 100% fix this without me doing anything". Stop the server returning spurious 400s, et voila.

> The fact that Sync is sunsetting doesn't play a role
> in stopping an issue like this one from affecting users today and until Sync
> is replaced.

At the risk of being a curmudgeon, there are zero engineers assigned to Sync client work. There is a very high bar to get me to drop FHR on Android and work on desktop Sync UX changes that transiently affect a very small number of users.

I understand that it's janky, but it's *relatively* unimportant.

> It should be pretty trivial to remove this case of the Sync toolbar
> appearing.

If you define "this case" as "every case", sure. There are basically three paths:

* It's a 503, so we'll wait. Not an error.
* It's a network error, so we'll only complain after a week.
* It's some unrecoverable serious failure, so we'll complain now.

That server issues are being reported in the third class, not the first, is something that svc ops is working on.

I could be persuaded that the third class should be merged with the second, but I would need to be persuaded that it's worth the effort, given that the problem should go away when ops work is complete.

Put differently: when the current outage is fixed, you'll see this bar as much as you did over the previous year, which should be approximately never.
(In reply to Richard Newman [:rnewman] from comment #8)
> At the risk of being a curmudgeon, there are zero engineers assigned to Sync
> client work.

Hence my offer for you to mentor the bug :) I don't think we can just assume that the third option will never happen again in the future.

To fix this bug, at http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser-syncui.js#340 instead of setting the string for the toolbar, we can just return early.

We can leave the string intact since it is also used at http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser-syncui.js#170 for login-related issues, which appear to be happening less frequently.
Whiteboard: [good first bug][mentor=jaws][lang=js]
(In reply to Jared Wein [:jaws] from comment #9)
> (In reply to Richard Newman [:rnewman] from comment #8)
> > At the risk of being a curmudgeon, there are zero engineers assigned to Sync
> > client work.
> 
> Hence my offer for you to mentor the bug :) I don't think we can just assume
> that the third option will never happen again in the future.

"The future" should be about 5 months.

> To fix this bug, at
> http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser-
> syncui.js#340 instead of setting the string for the toolbar, we can just
> return early.

Alas no.

That will entirely remove all severe error reporting from Sync, including for manual syncs, and for genuine failures. That's in some respects worse than this outage failure mode. You'd also break quota reporting, client version reporting, outage reporting, and whatever else is lurking in the remaining 60 lines of that function.

I'd approve the expansion in policy.js from transient network errors into broader categories, but that needs to be done carefully so that only remote errors are caught.

This is not simply a UI patch. Sync isn't smart enough for that!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Whiteboard: [good first bug][mentor=jaws][lang=js]
This is still present, and hasn't been fixed yet by "the future" sync. Could we just do a text comparison on the error text and not show the this warning if it matches "Unknown error"?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
This has renewed importance as we are now going to be advertising Sync much more prominently and have made it easier for users to create accounts.
Summary: "Sync encountered an error" toolbar isn't useful → "Sync encountered an error: unknown error" toolbar isn't useful
I saw this error in a support forum question [https://support.mozilla.org/en-US/questions/997250]
Is there a chance that this is still happening in the new sync?
(In reply to rmcguigan from comment #14)

> Is there a chance that this is still happening in the new sync?

Yes. New Sync is Old Sync with New Auth. Otherwise identical.
We removed all of the error UI in bug 1180587
Status: REOPENED → RESOLVED
Closed: 11 years ago8 years ago
Resolution: --- → DUPLICATE
Component: Firefox Sync: UI → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.