Closed Bug 528645 Opened 10 years ago Closed 10 years ago

Missing unit string check in js_IsAboutToBeFinalized and crash [@ JS_IsAboutToBeFinalized ]

Categories

(Core :: JavaScript Engine, defect, critical)

1.9.2 Branch
x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- final-fixed

People

(Reporter: marcia, Assigned: igor)

References

()

Details

(Keywords: crash, verified1.9.2, Whiteboard: [crashkill] fixed-in-tracemonkey)

Crash Data

Attachments

(1 file)

Seen while running Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2b3pre) Gecko/20091113 Namoroka/3.6b3pre GTB6 (.NET CLR 3.5.30729). This is also #34 in the crash list for Windows 3.6 Beta 2 with 84 crashes.

STR:
1. Load the site in the URL. I did not crash right away, but after typing something in the search box I crashed.
The crash appears to actually be in js_IsAboutToBeFinalized. The stack trace looks messed up, but I'm guessing this is just being called from GC. It would require disassembling the minidump to figure out exactly what's happening. Hopefully I can squeeze that in today.
collided w/ comment 2 but the broadlist of sites this signature is seen on seems to confirm the idea that this is closer to GC than a specific site.  here is a bit more data.

127 total crashes for JS_IsAboutToBeFinalized on 20091112-crashdata.csv
20 start up crashes inside 3 minutes

os breakdown
  54 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Service Pack 3
  39 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Service Pack 2
  19 js_IsAboutToBeFinalized Mac OS X 10.4.11 8S2167
   7 JS_IsAboutToBeFinalized Windows NT 6.0.6002 Service Pack 2
   6 JS_IsAboutToBeFinalized Windows NT 6.1.7600
   5 JS_IsAboutToBeFinalized Windows NT 6.0.6001 Service Pack 1
   4 JS_IsAboutToBeFinalized Windows NT 6.1.7100
   3 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Dodatek Service Pack 2
   2 js_IsAboutToBeFinalized Mac OS X 10.4.11 8S165
   2 JS_IsAboutToBeFinalized Windows NT 6.0.6000
   2 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Dodatek Service Pack 3
   1 js_IsAboutToBeFinalized Mac OS X 10.5.8 9L30
   1 js_IsAboutToBeFinalized Mac OS X 10.5.6 9G66
   1 js_IsAboutToBeFinalized Mac OS X 10.5.6 9G55
   1 js_IsAboutToBeFinalized Mac OS X 10.5.5 9F33
   1 JS_IsAboutToBeFinalized Windows NT 6.1.7231
   1 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Szervizcsomag 3
   1 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Service Pack 3, v.3244
   1 JS_IsAboutToBeFinalized Windows NT 5.1.2600 Dodatek Service Pack. 1
   1 JS_IsAboutToBeFinalized Windows NT 5.1.2600

distribution of all versions where the JS_IsAboutToBeFinalized crash was found
on 20091112-crashdata.csv
  75 Firefox 3.5.5
  22 Firefox 3.6b2
   9 Firefox 3.0.15
   8 Firefox 3.6b1
   4 Firefox 3.5.3
   2 Firefox 3.5.2
   2 Firefox 3.5
   1 Firefox 3.5b4
   1 Firefox 3.5.4
   1 Firefox 3.1b3
   1 Firefox 3.0.8
   1 Firefox 3.0


____________people also saw this signature on 
     108
   8 \N///
   5 about:blank///
   4 chrome://lastpass/content
   3 http://apps.facebook.com/inthemafia
   2 http://www.scion.com/bys2007
   2 http://www.facebook.com/home.php?ref=home#
   2 http://apps.facebook.com/texas_holdem
   1 https://www.mywot.com/en
   1 https://wizzrss.com:8443/?previous_page=login_up
   1 https://adwords.google.com/select
   1 https://addons.mozilla.org/ru
   1 https://addons.mozilla.org/ja
   1 http://www.youtube.com/watch?v=V83R5nkwmxU
   1 http://www.wp.pl/
   1 http://www.wer-kennt-wen.de/people
   1 http://www.videojug.com/webvideo
   1 http://www.trilulilu.ro/sfantuzu
[and variety of other sites]
(In reply to comment #1)
> http://crash-stats.mozilla.com/report/index/bc7069c7-594b-4e91-b276-7956f2091113
> is the link to my crash report.

We don't have a minidump for that crash stored on cm-fs01. Does anyone know why that would be?
ted, do you know how we can investigate comment 4 ?
I don't know anything about the minidump storage, you want to ask Aravind or Lars.
aravind, can you help?
We were behind processing crashes on Friday.  The crash is there, just a day behind.

[root@cm-fs01 name]# ls -l bc/70/bc7069c7-594b-4e91-b276-7956f2091113*
lrwxrwxrwx 1 500 500     42 Nov 14 09:27 bc/70/bc7069c7-594b-4e91-b276-7956f2091113 -> ../../../date/2009/11/14/09/25/webhead01_0
-rw-rw-r-- 1 500 500 147052 Nov 13 15:54 bc/70/bc7069c7-594b-4e91-b276-7956f2091113.dump
-rw-rw-r-- 1 500 500    734 Nov 13 15:54 bc/70/bc7069c7-594b-4e91-b276-7956f2091113.json
[root@cm-fs01 name]# pwd
/mnt/breakpad_dumps/sat/name
[root@cm-fs01 name]#
Well actually, looking at the time stamps, I think it may have been a deferred crash at one point (Friday) and was requested and processed on Saturday.
(In reply to comment #9)
> Well actually, looking at the time stamps, I think it may have been a deferred
> crash at one point (Friday) and was requested and processed on Saturday.

So it sounds like you are saying the crashes get dropped into the day buckets (directories 'mon', 'tue') etc, based on the day that the crash was "processed" (I'm not entirely sure what that means). Is there a document somewhere that explains what minidumps are stored where? It's pretty mystifying to me at the moment. I'm not complaining--it's just that I'm going to need some more help or info to be able to understand and effectively use the minidump storage.
(In reply to comment #10)

> So it sounds like you are saying the crashes get dropped into the day buckets
> (directories 'mon', 'tue') etc, based on the day that the crash was "processed"
> (I'm not entirely sure what that means). Is there a document somewhere that
> explains what minidumps are stored where? It's pretty mystifying to me at the
> moment. I'm not complaining--it's just that I'm going to need some more help or
> info to be able to understand and effectively use the minidump storage.


So when crashes come in, only about 15% of them are processed (all pre-release builds and all builds with comments are processed).  The crashes that aren't processed go into a deferred storage area.  When someone explicitly requests a crash report, we pull the crash from the deferred storage and process it right away (within 60s usually).

The day-of-the-week system you see on cm-fs01 is a late addition to the system.  The crash system monitor moves crashes that have been processed into the corresponding day-of-the-week location (depending on when it runs).

So what most likely happened in the above case was that the crash came in on Friday, but was explicitly requested on Saturday and processed then (and therefore ended in the Sat store).

Does that clear things up?  feel free to find me (in person or on irc) and I can answer any other questions you have.
(In reply to comment #11)
> Does that clear things up?  feel free to find me (in person or on irc) and I
> can answer any other questions you have.

Thanks! That helps a lot. I created a new minidump downloading script based on that and posted it to bug 466022. I basically search the days starting from the day the crash happened, because that or the next few days should be the most likely.
The database contains a "date_processed" field:
http://code.google.com/p/socorro/source/browse/trunk/socorro/database/schema.py#296

I guess we're just not exposing that in the UI anywhere. It should be a trivial change to do so.
I was also able to crash this using  Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b4) Gecko/20091124 Firefox/3.6b4 GTB6. That crash is http://crash-stats.mozilla.com/report/index/bp-8da8b655-fb70-465f-bdd5-8e00f2091125. Nominating to get this on the radar.
Flags: blocking1.9.2?
Whiteboard: [crashkill]
Summary: Crash in [@JS_IsAboutToBeFinalized ] when loading site → Crash in [@ JS_IsAboutToBeFinalized ] when loading site
OS: Windows NT → All
me too on windows, but not mac using the test url.

There is a wide range of sites where this seems to get reported.  they are mostly video and picture sites.

 252 www.youtube.com
  66 www.hulu.com
  61 www.facebook.com
  33 www.megavideo.com
  22 www.google.com
  19 apps.facebook.com
  15 www.redtube.com
  14 www.tube8.com
  14 mail.google.com
and 680 more

its good we have seem to found one site out of these that seems to trigger gc reliably. more of these kind of site in beta 4 when the better url reporting kicks in.
sounds like flash involvement to me
Can we get a full dump from someone who can repro it?  That would help a lot, if the stack is screwy.
I can't repro on Win 7 64-bit, Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2b4pre) Gecko/20091124 Namoroka/3.6b4pre, Flash Version: 10.0.22.87.

I will, uh, go upgrade my Flash on this computer now, though!
Having upgraded my Flash to 10.0.32.18, I crashed on session restore with this signature and now on-load of the URL, so it looks like it's a Flash bug introduced between r22 and r32.

Adding msintov, URL above is a reliable repro case for me.
Can we try this with the preview edition that we've been given? Removing blocking nomination based on comment 19, but if it turns out to be something we're doing in JS that we need to repair, please renominate.
Flags: blocking1.9.2?
chofmann asked me to confirm that this is crashing with Shockwave Flash 10.0 r42 - it crashes with that version as well as the previous versions I had from labs.adobe.com. Working on getting WinDbg on this machine so we can get some better information.
The correlation reports show a lot of add-ons possibly related to this, in particular LinkExtend. But I don't know if that add-on actually has any DLLs. I do see some DLLs with reasonably high ratios, but not that many. So it might just be a bug that is exposed by certain plugins' JS.
Depends on: amo-mxr
I looked at a minidump. It is crashing in js_IsAboutToBeFinalized trying to evaluate a->list->thingSize. The crash address is 0x20000009 in almost all of these. The load is actually through [ecx+8], so the bad pointer is 0x20000001 (the value of a->list). The bad pointer 0x20000001 turns out to be loaded from code memory in js3250.dll, and in fact is in a compile-time unit string table.

So, the guess is that this is a regression from bug 513530. That patch introduced compile-time unit-string tables. If one of them makes its way into a cycle collect root set (however that works), then when CC calls Traverse, it will crash like this.

Gregor, does that sound right? And do you have cycles to fix this?
are there multiple bugs?  there is higher volume on 3.6b5 but some report on earlier versions.

checking --- 20091220-crashdata.csv JS_IsAboutToBeFinalized
release total-crashes
              JS_IsAboutToBeFinalized crashes
                         pct.
all     215848  161     0.000745895
3.0.15  6811    1       0.000146821
3.0.16  31577   6       0.000190012
3.5.5   18251   11      0.000602707
3.5.6   105854  74      0.000699076
3.6b5   16669   49      0.00293959
3.6b4   5000    4       0.0008
3.6b3   644     1       0.0015528
3.6b2   661     1       0.00151286
3.6b1   2101    5       0.00237982
(In reply to comment #24)
> That patch
> introduced compile-time unit-string tables. If one of them makes its way into a
> cycle collect root set (however that works), then when CC calls Traverse, it
> will crash like this.

js_IsAboutToBeFinalized indeed must check for strings in the unit table. But the crash also indicates rather suboptimal implementation of nsXPConnect::Traverse. If that method would be smart, it would not call the method on JS strings. They cannot be a part of XPCOM cycle and the code knows well before calling 
js_IsAboutToBeFinalized that the GC thing pointer is JSString.
Assignee: general → igor
Summary: Crash in [@ JS_IsAboutToBeFinalized ] when loading site → Missing unit string check in js_IsAboutToBeFinalized
Blocks: 513530
Attached patch fix + api testSplinter Review
Attachment #418811 - Flags: review?(wagnerg)
I did the same fix a few months ago but look at comment 7 and 9 from bug 514819.
This is a fast path and we should not check in there according to Andreas.
(In reply to comment #28)
> I did the same fix a few months ago but look at comment 7 and 9 from bug
> 514819.
> This is a fast path and we should not check in there according to Andreas.

This is not a fast path. JS_IsAboutToBeFinalized is not called during the GC frequently and is a part of public API. An embedding may call it for an arbitrary GC thing so the code must check for a possibility of unit strings.
It is interesting that on Linux js_IsAboutToBeFinalized returns true since it is just happens that the layout allows to reinterpret a unit string as a member of JSGCArena. Moreover, since xpconnect does not use the result of js_IsAboutToBeFinalized for strings, the bogus true result of the function is harmless on Linux.
Attachment #418811 - Flags: review?(wagnerg) → review+
http://hg.mozilla.org/tracemonkey/rev/9d0fe1237c26
Whiteboard: [crashkill] → [crashkill] fixed-in-tracemonkey
Our bug, not Flash's (blocking1.9.2? was unset per comment 19, if I'm reading this bug's history correctly). Reconsider for 1.9.2? Want in a 1.9.2.x release at any rate.

/be
Flags: wanted1.9.2?
Flags: blocking1.9.2?
Flags: blocking1.9.2? → blocking1.9.2+
http://hg.mozilla.org/mozilla-central/rev/9d0fe1237c26
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Summary: Missing unit string check in js_IsAboutToBeFinalized → Missing unit string check in js_IsAboutToBeFinalized and crash [@ JS_IsAboutToBeFinalized ]
Verified fixed on the 1.9.2 branch using  Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2b6pre) Gecko/20091229 Namoroka/3.6b6pre. Also verified on the equivalent nighlty Windows XP build. I verified using the site in the URL which was crashing consistently for me prior to the fix. Adding keyword.
Keywords: verified1.9.2
Looks to be about 20 crashes in this stack on Windows using the Firefox 3.6 rc - http://tinyurl.com/ye255et - would this have something to do with the flash version people are using?
Crash Signature: [@ JS_IsAboutToBeFinalized ]
You need to log in before you can comment on or make changes to this bug.