Closed Bug 1391488 Opened 2 years ago Closed 2 years ago

Uptick of failures in Firefox 55: client engine fails with NS_ERROR_XPC_BAD_CONVERT_NATIVE

Categories

(Firefox :: Sync, defect, major)

defect
Not set
major

Tracking

()

VERIFIED FIXED
Firefox 57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- wontfix
firefox56 - fixed
firefox57 --- fixed

People

(Reporter: markh, Assigned: markh)

References

Details

(Keywords: regression)

Attachments

(1 file)

Since around August 8, there's a noticeable spike in the clients engine failing with NS_ERROR_XPC_BAD_CONVERT_NATIVE (0x8057000A). This appears to correspond with 55 hitting release. It's gone from ~4% of client engine failures to ~18%.

This error is typically raised by xpconnect when (IIUC) trying to convert a c++ result into a jsobject. A quick look at the sync bugs that landed in 55 didn't offer many clues.

via https://nbviewer.jupyter.org/gist/mhammond/01907397057deb00edec1d6616c0a17c/Success-Failure-Error%20rates%20per%20engine.ipynb
Looking at about:sync-log, I got the same error in Nightly (57.0a1) on 2017-08-10. It seems to be linked to GreaseMonkey, an extension I used until recently.

I don't have GreaseMonkey installed anymore since legacy extensions were disabled by default in Nightly, and I haven't been getting that error since. There might be other extensions that cause the same error, of course.

I haven't been able to reproduce it with GreaseMonkey in 55, only in 57.

Here's a stack trace.

> 1502431396384	Sync.EngineManager	ERROR	Could not initialize engine : [Exception... "Could not convert JavaScript argument arg 0 [nsISupports.QueryInterface]"  nsresult: "0x80570009 (NS_ERROR_XPC_BAD_CONVERT_JS)"  location: "JS frame :: chrome://greasemonkey-modules/content/prefmanager.js :: GM_PrefManager.prototype.watch :: line 138"  data: no] Stack trace: GM_PrefManager.prototype.watch()@prefmanager.js:138 < ScriptEngine()@sync.js:241 < register()@resource://services-sync/engines.js:607

The error is thrown from here in GreaseMonkey:
https://github.com/greasemonkey/greasemonkey/blob/3.11/modules/prefmanager.js#L138

Seems to be related to these two bugs:
https://github.com/greasemonkey/greasemonkey/pull/2541
https://bugzilla.mozilla.org/show_bug.cgi?id=1326520
Just noticed that NS_ERROR_XPC_BAD_CONVERT_JS, not NS_ERROR_XPC_BAD_CONVERT_NATIVE. Huh...
Flags: needinfo?(markh)
Playing more with telemetry, the locales involved are:

(u'sv-SE', 4),
 (u'ja-JP', 9),
 (u'uk-UA', 15),
 (u'ar-KW', 1),
 (u'zh-TW', 1),
 (u'tr-TR', 4),
 (u'zh-CN', 65),
 (u'hu-HU', 4),
 (u'et-EE', 4),
 (u'nl-NL', 1),
 (u'he-IL', 2),
 (u'el-GR', 1),
 (u'ar-EG', 5),
 (u'de-DE', 31),
 (u'es-SV', 5),
 (u'ru-RU', 256),
 (u'ko-KR', 15),
 (u'es-MX', 1),
 (u'fr-FR', 16),
 (u'bg-BG', 12),
 (u'en-US', 2),
 (u'es-ES', 17),
 (u'ar-SA', 2),
 (u'pt-BR', 9),
 (u'pt-PT', 1)]

(note this is a 10% sample.)

So my money is on bug 1369285 and an issue with encodings - I propose we make a simple patch to catch errors from |Cc["@mozilla.org/network/dns-service;1"].getService(Ci.nsIDNSService).myHostName| and use, say, a "?", and see if the error reduces. (Note that due to localization we should avoid an English string such as "unknown")
Flags: needinfo?(markh)
Keywords: regression
Duplicate of this bug: 1394828
[Tracking Requested - why for this release]:

Telemetry is showing an uptick in Sync failures starting in 55. Given bug 1394828 we feel confident it is a regression from 1369285 and a simple patch should fix it.
Assignee: nobody → markh
Blocks: 1369285
Status: NEW → ASSIGNED
Comment on attachment 8903470 [details]
Bug 1391488 - catch and ignore errors fetching the hostname from the DNS service.

https://reviewboard.mozilla.org/r/175306/#review180686

Sorry, forgot to r+! Just a question, which you're welcome to ignore.

::: services/sync/modules/util.js:590
(Diff revision 1)
> +      // hostname of the system, usually assigned by the user or admin
> +      hostname = Cc["@mozilla.org/network/dns-service;1"].getService(Ci.nsIDNSService).myHostName;
> +    } catch (ex) {
> +      Cu.reportError(ex);
> +    }
>      let system =

Do you think it's worth wrapping this entire declaration in a `try...catch`, in case the `system-info` call fails, or is just the hostname sufficient?
Attachment #8903470 - Flags: review?(kit) → review+
Comment on attachment 8903470 [details]
Bug 1391488 - catch and ignore errors fetching the hostname from the DNS service.

https://reviewboard.mozilla.org/r/175306/#review180688

::: services/sync/modules/util.js:590
(Diff revision 1)
> +      // hostname of the system, usually assigned by the user or admin
> +      hostname = Cc["@mozilla.org/network/dns-service;1"].getService(Ci.nsIDNSService).myHostName;
> +    } catch (ex) {
> +      Cu.reportError(ex);
> +    }
>      let system =

I think I'll leave the patch as it is, as we have no evidence the other blocks ever failed and nothing in them changed in 55. I'd also like a high degree of confidence it is the dns server at fault, and finally, it's not clear what string we should use if we wrapped the entire block.
Pushed by mhammond@skippinet.com.au:
https://hg.mozilla.org/integration/autoland/rev/3e029b5ea02f
catch and ignore errors fetching the hostname from the DNS service. r=kitcambridge
Comment on attachment 8903470 [details]
Bug 1391488 - catch and ignore errors fetching the hostname from the DNS service.

Approval Request Comment
[Feature/Bug causing the regression]: bug 
1369285 
[User impact if declined]: Sync failures, inability to send tabs, etc.
[Is this code covered by automated tests?]: Yes, although the specific conditions which cause this failure aren't known and aren't covered by tests - we believe it has something to do with non-english device names.
[Has the fix been verified in Nightly?]: The fix has been verified with a simulated error - we can't reproduce the actual error.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: Very low risk change - we are catching an exception which our telemetry implies is happening in the wild.
[String changes made/needed]: None
Attachment #8903470 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/3e029b5ea02f
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Duplicate of this bug: 1395988
See Also: → 1396498
Comment on attachment 8903470 [details]
Bug 1391488 - catch and ignore errors fetching the hostname from the DNS service.

Fix a regression. Beta56+.
Attachment #8903470 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Duplicate of this bug: 1399829
Duplicate of this bug: 1391220
https://nbviewer.jupyter.org/gist/mhammond/649a6f6c5606cff0e0d9d8a4f0b9e9a1/Success-Failure-Error%20rates%20per%20engine%2C%20by%20version.ipynb tells me the clients engine is back to normal after this fix.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.