Closed Bug 1535877 Opened 3 years ago Closed 3 years ago

Crash with failed "@mozilla.org/network/effective-tld-service;1" instance

Categories

(Core :: Permission Manager, defect)

67 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- unaffected
firefox67 - wontfix
firefox68 --- fixed

People

(Reporter: Oriol, Assigned: ehsan.akhgari)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Open the browser console and run

Cc["@mozilla.org/network/effective-tld-service;1"].createInstance(Ci.nsIArray);

Load some page like http://example.com/

Result: Firefox crashes. https://crash-stats.mozilla.org/report/index/edb046a6-f68a-430b-9847-c0f590190317

Expected: no crash, just an error.

Regression-window: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=024508ba77de&tochange=1502462b388b

Regression window from the description lists Ehsan's patches for Bug 1527505. Ehsan, can you take a look?

Flags: needinfo?(ehsan)

There is no real bug here. The JS code snippet in comment 0 to be typed in the browser console is just invalid, the effective TLD service is an XPCOM service so it shouldn't be instantiated with createInstance(). When we do that, this assertion doesn't trigger in release mode https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/dns/nsEffectiveTLDService.cpp#49 and we end up overwriting gService with the second this pointer. This newly created object will quickly get destroyed (when we realize nsIArray isn't a valid interface to query for here) and gService becomes nullptr https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/dns/nsEffectiveTLDService.cpp#58 which means any future dereference will be a nullptr dereference crash.

But the reason why this isn't a real bug is the code typed in the browser console is just some garbage JS code which puts the browser in an invalid state. My proposal for how to fix this is to convert this assertion https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/netwerk/dns/nsEffectiveTLDService.cpp#49 into a runtime failure, basically breaking the JS code in comment 0 a bit earlier before we attempt to QI for nsIArray.

Flags: needinfo?(ehsan)

I think the solution for bug 1535874 would also work here, not sure if this is what you had in mind.

(In reply to Oriol Brufau [:Oriol] from comment #3)

I think the solution for bug 1535874 would also work here, not sure if this is what you had in mind.

Yeah that would work, but it would require us to ensure that nothing in the service code depends on gService pointing to this, which is a useless invariant to preserve. I'm planning to do something a bit more aggressive here: make Cc["@mozilla.org/network/effective-tld-service;1"].createInstance(foo) fail when the service has already been instantiated. There is no reason why we would want the createInstance constructor to work for an XPCOM service (getService() is intended to be used for XPCOM services.)

(FWIW it seems like you're continually testing the creation of instances from XPCOM service which is how you're discovering these crash bugs? You might want to consider submitting a patch for adding in-tree tests that do the same thing so that developers see a test failures when they introduce these types of bugs to avoid having to fix these things after the fact. Since we no longer support XPCOM-based extensions, these bugs shouldn't be possible to trigger for real users unless we have code somewhere that creates instances of these services in JS like your code snippets do, so for the most part fixing these bugs after the fact provides very little value to our users...)

(In reply to :Ehsan Akhgari from comment #5)

You might want to consider submitting a patch for adding in-tree tests that do the same thing so that developers see a test failures when they introduce these types of bugs to avoid having to fix these things after the fact.

The problem is that some of these problems only arise after attempting to create some thousands of instances (e.g. bug 1408378). Then repeating this for each class can be too expensive, and as you say it doesn't provide much value to users. So I have doubts that a full test would be accepted, but maybe I could add a simpler one testing a single instance per class.

(In reply to Oriol Brufau [:Oriol] from comment #7)

(In reply to :Ehsan Akhgari from comment #5)

You might want to consider submitting a patch for adding in-tree tests that do the same thing so that developers see a test failures when they introduce these types of bugs to avoid having to fix these things after the fact.

The problem is that some of these problems only arise after attempting to create some thousands of instances (e.g. bug 1408378). Then repeating this for each class can be too expensive, and as you say it doesn't provide much value to users. So I have doubts that a full test would be accepted, but maybe I could add a simpler one testing a single instance per class.

Ah I see, that's fair. Perhaps bug 1539991 would end up with a more effective long term fix for at least a class of these bugs. :-)

(Setting firefox67? because I think this should be minused, and there is no need for the patch to be backported, as the bug should affect no real users.)

Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c4845fa789b
Fail explicitly when creating a second instance of the effective TLD service; r=nika
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/47f89fe63044
Backed out changeset 5c4845fa789b for failing at /browser_child_resource.js on a CLOSED TREE.

Backed out changeset 5c4845fa789b (bug 1535877) for failing at /browser_child_resource.js on a CLOSED TREE.

Backout link: https://hg.mozilla.org/integration/autoland/rev/47f89fe63044fa201b020fd1243ae8ba6f1a6919

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=7cfcdf742d19ef55f45ef9150a15908247112f03&selectedJob=236790260

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=236790260&repo=autoland&lineNumber=6519

Log snippet:

01:42:43 INFO - GECKO(2390) | Exiting due to channel error.
01:42:43 INFO - GECKO(2390) | Exiting due to channel error.
01:42:43 INFO - GECKO(2390) | Exiting due to channel error.
01:42:43 INFO - TEST-INFO | Main app process: exit 1
01:42:43 INFO - Buffered messages logged at 01:42:42
01:42:43 INFO - Entering test bound
01:42:43 INFO - Buffered messages finished
01:42:43 ERROR - TEST-UNEXPECTED-FAIL | netwerk/test/browser/browser_child_resource.js | application terminated with exit code 1
01:42:43 INFO - runtests.py | Application ran for: 0:00:07.096066
01:42:43 INFO - zombiecheck | Reading PID log: /var/folders/1n/fgw1wrbj2b5647p5_8cp11k000000w/T/tmpJyxvRPpidlog
01:42:43 INFO - ==> process 2390 launched child process 2391
01:42:43 INFO - ==> process 2390 launched child process 2392
01:42:43 INFO - ==> process 2390 launched child process 2393

Flags: needinfo?(ehsan)
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5da02ff0be56
Fail explicitly when creating a second instance of the effective TLD service; r=nika
Flags: needinfo?(ehsan)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Assignee: nobody → ehsan
You need to log in before you can comment on or make changes to this bug.