Last Comment Bug 672112 - Hashmap lookups may return wrong values
: Hashmap lookups may return wrong values
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 5 Branch
: All All
: -- normal (vote)
: mozilla8
Assigned To: [PTO to Dec5] Bill McCloskey (:billm)
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-17 05:56 PDT by bugzilla
Modified: 2011-07-25 18:44 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
mozillabug.html (6.29 KB, text/html)
2011-07-17 05:56 PDT, bugzilla
no flags Details
fix (9.20 KB, patch)
2011-07-20 15:07 PDT, [PTO to Dec5] Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
patch v2 (10.01 KB, patch)
2011-07-20 17:23 PDT, [PTO to Dec5] Bill McCloskey (:billm)
dmandelin: review+
Details | Diff | Splinter Review
backout patch (10.12 KB, patch)
2011-07-21 16:32 PDT, [PTO to Dec5] Bill McCloskey (:billm)
cdleary: review+
Details | Diff | Splinter Review

Description bugzilla 2011-07-17 05:56:34 PDT
Created attachment 546414 [details]
mozillabug.html

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build ID: 20110615151330

Steps to reproduce:

Javascript:
var hashmap = {"400": {"type":400}, "401": {"type":401}}
for(var key in hashmap) if (key!=hashmap[key].type) alert("bug");


Actual results:

Sometimes firefox reports that 400 is not 400


Expected results:

Nothing because 400 should be 400 :)
Comment 1 Alice0775 White 2011-07-17 07:18:23 PDT
Confirmed on
http://hg.mozilla.org/mozilla-central/rev/48dcb60519ac
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0a1) Gecko/20110716 Firefox/8.0a1 ID:20110716030739

And It is very rare to reproduce. however I can reproduce at least since
http://hg.mozilla.org/mozilla-central/rev/a02c6f4ffe4a
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110204 Firefox/4.0b12pre ID:20110206153537
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2011-07-18 08:09:14 PDT
Is this reproducible with JM disabled?
Comment 3 Thomas Ahlblom 2011-07-18 08:58:00 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:8.0a1) Gecko/20110718 Firefox/8.0a1

I get output like "405=405" after may clicks. Does that mean I've reproduced the bug?

So far it has not happened in safe mode.
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2011-07-18 09:08:59 PDT
> I get output like "405=405" after may clicks. Does that mean I've reproduced
> the bug?

Yes.
Comment 5 bugzilla 2011-07-18 10:32:33 PDT
Some more details - hope it may help:
When I push a reference to another array I may get the reference twice - e.g.:

var hashmap = {"400": {"type":400}, "401": {"type":401}}
var arr=[];
for(var key in hashmap) arr.push([key,hashmap[key]]);

Expected Output or "arr":
[[400, {"type":400}], [401, {"type":401}]]

Sometimes output will be like:
[[400, {"type":400}], [401, {"type":400}]]

(note last element ist type 400 again - if that error occur all following elements got the reference from the object before too)
Comment 6 David Mandelin [:dmandelin] 2011-07-18 18:16:18 PDT
I couldn't reproduce this so far. How many tries does it usually take? And what do I have to try again? Just the click, or the whole sequence?

Alternatively, could anyone that can reproduce this get us a narrowed-down regression range?
Comment 7 Alice0775 White 2011-07-19 08:03:26 PDT
My step to reproduce is as follows:

1. Start Firefox with clean profile
2. Open  attachment 546414 [details] in a 1st Tab

3. Open http://www.mozilla.com/en-US/firefox/new/  in New 2nd Tab
4. Close 2nd Tab and wait 1-5 sec
5. Click "Run Test" 2 or 3 times

6. Repeat Step3-5 at least 20 times


I tried to find regression with cached m-c hourly.

Result:
Not reproduced (However, I cannot assert that a problem never happen):
http://hg.mozilla.org/mozilla-central/rev/4c62984f12d1
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110207 Firefox/4.0b12pre ID:20110207030345
http://hg.mozilla.org/mozilla-central/rev/3470891975c7
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110208 Firefox/4.0b12pre ID:20110208030358

Reproduced:
http://hg.mozilla.org/mozilla-central/rev/0a2e06927c31
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110208 Firefox/4.0b12pre ID:20110208015457
http://hg.mozilla.org/mozilla-central/rev/fd0817e454fe
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110209 Firefox/4.0b12pre ID:20110209030359
http://hg.mozilla.org/mozilla-central/rev/199cb6282554
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110210 Firefox/4.0b12pre ID:20110210030400
http://hg.mozilla.org/mozilla-central/rev/1ed3464aaa92
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110211 Firefox/4.0b12pre ID:20110211030400
http://hg.mozilla.org/mozilla-central/rev/1ae724a3a546
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110211 Firefox/4.0b12pre ID:20110211100318
http://hg.mozilla.org/mozilla-central/rev/a0372b031aac
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0a1) Gecko/20110718 Firefox/8.0a1 ID:20110718030807
Comment 8 Alice0775 White 2011-07-19 08:14:05 PDT
Sorry Error in comment #7,
> 4. Close 2nd Tab and wait 1-5 sec
4-1. Close 2nd Tab
4-2. switch to NotePad.exe
4-3. back to the Browser and wait 1-5 sec
Comment 9 Alice0775 White 2011-07-19 11:46:57 PDT
FYI,
I commented out two lines of  dom/base/nsJSEnvironment.cpp as follows.
And I build Firefox in local. And I tried it to reproduce the problem.

Build from aa2de73abc19 : Reproduced the problem
Build from aa2de73abc19+ the following modification : Not reproduced


--------------------------------------
// static
void
GCTimerFired(nsITimer *aTimer, void *aClosure)
{
  NS_RELEASE(sGCTimer);

  //nsJSContext::GarbageCollectNow();
}

// static
void
CCTimerFired(nsITimer *aTimer, void *aClosure)
{
  NS_RELEASE(sCCTimer);

  //nsJSContext::CycleCollectNow();
}
--------------------------------------
Comment 10 David Mandelin [:dmandelin] 2011-07-19 15:33:24 PDT
Thanks much for the STR and other info, Alice. I can now reproduce this.

The regression range points to bug 630947, but it's probably not really a regression from that bug. The failure occurs only if the tracejit is enabled, so it's probably a bug where a GC at some certain time causes a failure in the tracer.
Comment 11 [PTO to Dec5] Bill McCloskey (:billm) 2011-07-20 15:07:09 PDT
Created attachment 547269 [details] [diff] [review]
fix

This is a tracer bug with the PICTable cache. The problem is that we weren't normalizing the jsid before sticking it in the PIC table (for performance I guess). This meant we could be collecting the string during a GC, and then that same memory could be reallocated to a different string. This causes all the tables to be flushed on GC.
Comment 12 David Mandelin [:dmandelin] 2011-07-20 17:03:44 PDT
Comment on attachment 547269 [details] [diff] [review]
fix

Review of attachment 547269 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jstracer.cpp
@@ +3064,5 @@
> +                PICTable** tables = peer->picTables.data();
> +                for (unsigned len = peer->picTables.length(); len; --len) {
> +                    PICTable *table = *tables++;
> +                    table->clear();
> +                }

Why not

    for (int j = 0; j < peer->picTables.length(); ++j)
        tables[j]->clear();

?
Comment 13 [PTO to Dec5] Bill McCloskey (:billm) 2011-07-20 17:23:19 PDT
Created attachment 547303 [details] [diff] [review]
patch v2

Yeah, good point. I fixed the place where I copied the code from as well.
Comment 14 David Mandelin [:dmandelin] 2011-07-20 17:24:43 PDT
Comment on attachment 547303 [details] [diff] [review]
patch v2

Review of attachment 547303 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 15 Dão Gottwald [:dao] 2011-07-20 20:29:56 PDT
backed out for breaking the tree
Comment 16 [PTO to Dec5] Bill McCloskey (:billm) 2011-07-21 15:43:14 PDT
I really should have tested this better before landing it. The problem is as follows. Each PICTable is allocated with the traceAlloc allocator. Normally, this memory won't be thrown away until the tracejit is flushed. However, if we fail to compile the code, then we rewind the allocator and throw away everything from that compilation. This patch stores up a queue of PICTable entries to reset on GC, and it fails to remove entries if they were part of a failed compile. Then when we reset them, we write to bad memory.

Fixing this would be annoying. We would have to somehow find all the PICTables that were part of the bad compilation and remove them from the queue. I'm not really sure how to do that.

However, an easy fix would just be to remove the PICTable optimization entirely. It was added in 596026 to make string-fasta faster in the tracer. However, now that we have the profiler, string-fasta runs in the methodjit. I ran SunSpider, V8, and Kraken, and I don't see any regressions from backing this out. If everyone is okay with this, I'll make a patch.
Comment 17 Chris Leary [:cdleary] (not checking bugmail) 2011-07-21 15:56:31 PDT
(In reply to comment #16)

Do it! If it doesn't regress anything, then it wasn't worth having anyway.
Comment 18 [PTO to Dec5] Bill McCloskey (:billm) 2011-07-21 16:32:45 PDT
Created attachment 547555 [details] [diff] [review]
backout patch

OK, here it is then.
Comment 19 Chris Leary [:cdleary] (not checking bugmail) 2011-07-21 16:37:14 PDT
Comment on attachment 547555 [details] [diff] [review]
backout patch

And the engine returns back to the wonderful state of me-having-made-no-real-contributions-to-the-tracer. Simplicity ftw.
Comment 20 Boris Zbarsky [:bz] (still a bit busy) 2011-07-21 17:30:58 PDT
> If it doesn't regress anything,

Well, it doesn't regress those three benchmarks.  There's more to the web than that!  It's probably still fine to go, but I really wish we had a better test corpus. :(
Comment 21 Jason Orendorff [:jorendorff] 2011-07-22 13:15:28 PDT
Rats, I liked that hack. :-(
Comment 22 Brendan Eich [:brendan] 2011-07-22 18:12:58 PDT
How do we know we aren't leaving too much money on the table? I mean ignoring the stupid benchmarks, of course.

/be
Comment 23 Marco Bonardo [::mak] 2011-07-25 05:49:18 PDT
http://hg.mozilla.org/mozilla-central/rev/cfa6b93e92fc

Note You need to log in before you can comment on or make changes to this bug.