Closed Bug 904202 Opened 10 years ago Closed 10 years ago

nsContentList cache can hand out unlinked lists


(Core :: DOM: Core & HTML, defect)

23 Branch
Windows 7
Not set



Tracking Status
firefox25 --- verified
firefox26 --- verified
firefox27 --- verified
firefox-esr24 --- wontfix


(Reporter: _+bugzilla, Assigned: bzbarsky)





(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0 (Beta/Release)
Build ID: 20130730113002

Steps to reproduce:

Open the website and click through the images on the right side. After a while, going forth and back through the list, a click would not select an image anymore. I'm using my own CSS selector function which has passed all tests for the last years. But now, it doesn't find an "img" element under a specified "a" element anymore. When an image link doesn't work anymore, set a breakpoint on common.js in the function selectPhotoLink() and step through it all.

Actual results:

When debugging the function, it works. Even if it failed several times before.

Expected results:

It should always work just like in earlier versions of Firefox. There seems to be a strange bug in the JavaScript engine. Why should a debugger affect the behaviour of a single-threaded, sandboxed and anything application like a JavaScript app? If it starts doing such things, we might as well start using C for our websites.
Confirmed. After couple of minutes, clicking on images will have no effect.
FF 23.0.1, Win 7 x64.
Component: Untriaged → ImageLib
Ever confirmed: true
Product: Firefox → Core
The fact that this works under the debugger suggests a JIT problem to me.
Assignee: nobody → general
Component: ImageLib → JavaScript Engine
Can I disable the JIT or certain optimisations of it during runtime or at least without recompiling Firefox?
Yes. Go to about:config and look for javascript.options.ion.content, javascript.options.baselinejit.content and javascript.options.typeinference.

Try disabling the first. If that fixes the issue, it probably lies in IonMonkey. If not, disable the second and check again. Finally, try with the third one.
None of these options help. Not a single one and not all together. After changing any value, I restarted Firefox, Ctrl+F5-reloaded the page and started clicking images. Usually, the first failure occurs after a few seconds after a dozen clicks or so, mostly already without scrolling down the image list. But more failures occured when continuing and also scrolling down to other images.

Still happens in current Firefox 24 stable.
Thank you for testing these out!

If you're not bored of this by now, can I ask you to download the current development build from and see if it works in that?
I tested the latest nightly in a separate VM, this time on Windows 8 x64. First I verified that the issue exists there in the stable version, which is true. The nightly behaves just the same, the bug still exists there as well. Both stable and nightly were a clean install, there was no Firefox on that system before.
Attached file Reduced testcase
Ok, after lots of experimenting, I reduced this to a very simple testcase.

The issue is that the HTMLCollection returned by document.getElementsByTagName('div') ends up being empty instead of containing one element, as expected.

On my machine, this happens - and shows an alert - at iteration 50.

Some notes:

- changing the iterations of the for loop changes in which of the outer iterations the error happens. It doesn't do so linearly, though. E.g., increasing to 20k makes it happen at iteration 60, whereas decreasing to 5k results in 138

- these results are absolutely stable. However. when changing the filter invocation to `[]`, required iterations increase (to 74 for 10k) and become slightly unstable. As I had much more unstable results in earlier version of the testcase that created more garbage per run, my guess is that this is gc-related

- this *only* reproduces as long as any property is set on the nodes HTMLCollection. `nodes.prop = 'foo'` does the trick, but any other property does, too

- it also stops reproducing as soon as the Array.filter call is removed, even though its results aren't used, and it doesn't change the collection

So. This seems to be related to JITs and, very probably, to HTMLCollections.

efaust, that makes you the lucky winner of my "who gets to look into this with me" lottery. Congratulations!
Flags: needinfo?(efaustbmo)
Attachment #808351 - Attachment mime type: text/plain → text/html
Need to remove from cache on unlink, not dtor.
Assignee: general → bzbarsky
Component: JavaScript Engine → DOM
Flags: needinfo?(efaustbmo)
Summary: Unspecific JavaScript failure → nsContentList cache can hand out unlinked lists
Whiteboard: [need review]
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

>-  virtual void RemoveFromCaches() {
>+  virtual void RemoveFromCaches() MOZ_OVERRIDE {
Nit, { should be on the next line
Attachment #808967 - Flags: review?(bugs) → review+
Done, and
Whiteboard: [need review]
Target Milestone: --- → mozilla27
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
   I worry about possible security impact here.
User impact if declined: Some random breakage on sites and _possible_ security 
Fix Landed on Version: 27.
Risk to taking this patch (and alternatives if risky): Very low risk, imo.
String or UUID changes made by this patch: None.

See for more info.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Probably async destruction from
   unlink.  Not sure how long ago that happened.
User impact if declined: Some random breakage on sites and _possible_ security
Testing completed (on m-c, etc.): Compiles and fixes bug.
Risk to taking this patch (and alternatives if risky): Low risk.
String or IDL/UUID changes made by this patch: None.
Attachment #808967 - Flags: approval-mozilla-esr24?
Attachment #808967 - Flags: approval-mozilla-beta?
Attachment #808967 - Flags: approval-mozilla-aurora?
Closed: 10 years ago
Resolution: --- → FIXED
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

bz: without a critical or high security rating, this doesn't meet ESR uplift criteria - general concern isn't enough.  Is there any critical security bug you could tie this to in order to justify uplift?
Attachment #808967 - Flags: approval-mozilla-beta?
Attachment #808967 - Flags: approval-mozilla-beta+
Attachment #808967 - Flags: approval-mozilla-aurora?
Attachment #808967 - Flags: approval-mozilla-aurora+
Keywords: verifyme
Verified as fixed on:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (20130926170421)

This issue reproduces for me in at most 1-2 minutes of reproducing the steps on Firefox 23.0.1 and it didn't reproduce on Firefox 25 even after more than 5 minutes (comment 0 STR + reduced testcase).
QA Contact: ioana.budnar
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0

Verified as fixed on latest Aurora 26.0a2 (buildID: 20131003004003) and latest Nightly 27.0a1 (buildID: 20131003030203). Reproduced the issue on 23.0.1 after 2 minutes, did not reproduce on latest Aurora or latest Nightly after 1 hour.
Keywords: verifyme
Comment on attachment 808967 [details] [diff] [review]
Make sure we remove our nsContentLists from list caches during unlink.

a- on the esr patch given comment#15
Attachment #808967 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24-
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.