nsContentList cache can hand out unlinked lists

VERIFIED FIXED in Firefox 25

Status

()

VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: nospam, Assigned: bzbarsky)

Tracking

23 Branch
mozilla27
x86_64
Windows 7
Points:
---

Firefox Tracking Flags

(firefox25 verified, firefox26 verified, firefox27 verified, firefox-esr24 wontfix)

Details

(URL)

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0 (Beta/Release)
Build ID: 20130730113002

Steps to reproduce:

Open the website http://ferienhaus-regen.de/fotos and click through the images on the right side. After a while, going forth and back through the list, a click would not select an image anymore. I'm using my own CSS selector function which has passed all tests for the last years. But now, it doesn't find an "img" element under a specified "a" element anymore. When an image link doesn't work anymore, set a breakpoint on common.js in the function selectPhotoLink() and step through it all.


Actual results:

When debugging the function, it works. Even if it failed several times before.


Expected results:

It should always work just like in earlier versions of Firefox. There seems to be a strange bug in the JavaScript engine. Why should a debugger affect the behaviour of a single-threaded, sandboxed and anything application like a JavaScript app? If it starts doing such things, we might as well start using C for our websites.
Confirmed. After couple of minutes, clicking on images will have no effect.
FF 23.0.1, Win 7 x64.
Status: UNCONFIRMED → NEW
Component: Untriaged → ImageLib
Ever confirmed: true
Product: Firefox → Core

Comment 2

5 years ago
The fact that this works under the debugger suggests a JIT problem to me.
Assignee: nobody → general
Component: ImageLib → JavaScript Engine
(Reporter)

Comment 3

5 years ago
Can I disable the JIT or certain optimisations of it during runtime or at least without recompiling Firefox?
Yes. Go to about:config and look for javascript.options.ion.content, javascript.options.baselinejit.content and javascript.options.typeinference.

Try disabling the first. If that fixes the issue, it probably lies in IonMonkey. If not, disable the second and check again. Finally, try with the third one.
(Reporter)

Comment 5

5 years ago
None of these options help. Not a single one and not all together. After changing any value, I restarted Firefox, Ctrl+F5-reloaded the page and started clicking images. Usually, the first failure occurs after a few seconds after a dozen clicks or so, mostly already without scrolling down the image list. But more failures occured when continuing and also scrolling down to other images.

Still happens in current Firefox 24 stable.
Thank you for testing these out!

If you're not bored of this by now, can I ask you to download the current development build from http://nightly.mozilla.org/ and see if it works in that?
(Reporter)

Comment 7

5 years ago
I tested the latest nightly in a separate VM, this time on Windows 8 x64. First I verified that the issue exists there in the stable version, which is true. The nightly behaves just the same, the bug still exists there as well. Both stable and nightly were a clean install, there was no Firefox on that system before.
Created attachment 808351 [details]
Reduced testcase

Ok, after lots of experimenting, I reduced this to a very simple testcase.

The issue is that the HTMLCollection returned by document.getElementsByTagName('div') ends up being empty instead of containing one element, as expected.

On my machine, this happens - and shows an alert - at iteration 50.

Some notes:

- changing the iterations of the for loop changes in which of the outer iterations the error happens. It doesn't do so linearly, though. E.g., increasing to 20k makes it happen at iteration 60, whereas decreasing to 5k results in 138

- these results are absolutely stable. However. when changing the filter invocation to `[].filter.call(..)`, required iterations increase (to 74 for 10k) and become slightly unstable. As I had much more unstable results in earlier version of the testcase that created more garbage per run, my guess is that this is gc-related

- this *only* reproduces as long as any property is set on the nodes HTMLCollection. `nodes.prop = 'foo'` does the trick, but any other property does, too

- it also stops reproducing as soon as the Array.filter call is removed, even though its results aren't used, and it doesn't change the collection


So. This seems to be related to JITs and, very probably, to HTMLCollections.

efaust, that makes you the lucky winner of my "who gets to look into this with me" lottery. Congratulations!
Flags: needinfo?(efaustbmo)
(Assignee)

Updated

5 years ago
Attachment #808351 - Attachment mime type: text/plain → text/html
(Assignee)

Comment 9

5 years ago
Need to remove from cache on unlink, not dtor.
Assignee: general → bzbarsky
Component: JavaScript Engine → DOM
Flags: needinfo?(efaustbmo)
(Assignee)

Updated

5 years ago
Summary: Unspecific JavaScript failure → nsContentList cache can hand out unlinked lists
Whiteboard: [need review]
Created attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.
Attachment #808967 - Flags: review?(bugs)
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.


>-  virtual void RemoveFromCaches() {
>+  virtual void RemoveFromCaches() MOZ_OVERRIDE {
Nit, { should be on the next line
Attachment #808967 - Flags: review?(bugs) → review+
Done, and https://hg.mozilla.org/integration/mozilla-inbound/rev/cccb0edbca65
Whiteboard: [need review]
Target Milestone: --- → mozilla27
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
   I worry about possible security impact here.
User impact if declined: Some random breakage on sites and _possible_ security 
  issues.
Fix Landed on Version: 27.
Risk to taking this patch (and alternatives if risky): Very low risk, imo.
String or UUID changes made by this patch: None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Probably async destruction from
   unlink.  Not sure how long ago that happened.
User impact if declined: Some random breakage on sites and _possible_ security
   issues.
Testing completed (on m-c, etc.): Compiles and fixes bug.
Risk to taking this patch (and alternatives if risky): Low risk.
String or IDL/UUID changes made by this patch: None.
Attachment #808967 - Flags: approval-mozilla-esr24?
Attachment #808967 - Flags: approval-mozilla-beta?
Attachment #808967 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/cccb0edbca65
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

bz: without a critical or high security rating, this doesn't meet ESR uplift criteria - general concern isn't enough.  Is there any critical security bug you could tie this to in order to justify uplift?
Attachment #808967 - Flags: approval-mozilla-beta?
Attachment #808967 - Flags: approval-mozilla-beta+
Attachment #808967 - Flags: approval-mozilla-aurora?
Attachment #808967 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/d2cf008c007e
https://hg.mozilla.org/releases/mozilla-beta/rev/bb3972fd69e6
status-firefox25: --- → fixed
status-firefox26: --- → fixed
status-firefox27: --- → fixed
status-firefox-esr24: --- → affected
Keywords: verifyme

Comment 17

5 years ago
Verified as fixed on:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (20130926170421)

This issue reproduces for me in at most 1-2 minutes of reproducing the steps on Firefox 23.0.1 and it didn't reproduce on Firefox 25 even after more than 5 minutes (comment 0 STR + reduced testcase).
status-firefox25: fixed → verified
QA Contact: ioana.budnar
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0

Verified as fixed on latest Aurora 26.0a2 (buildID: 20131003004003) and latest Nightly 27.0a1 (buildID: 20131003030203). Reproduced the issue on 23.0.1 after 2 minutes, did not reproduce on latest Aurora or latest Nightly after 1 hour.
Status: RESOLVED → VERIFIED
status-firefox26: fixed → verified
status-firefox27: fixed → verified
Keywords: verifyme
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

a- on the esr patch given comment#15
Attachment #808967 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24-
status-firefox-esr24: affected → wontfix
You need to log in before you can comment on or make changes to this bug.