Crash in nsXPCWrappedJS::nsXPCWrappedJS

RESOLVED FIXED in Firefox 51

Status

()

P1
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: marcia, Assigned: jonco)

Tracking

({crash})

49 Branch
mozilla53
x86
Windows 7
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 wontfix, firefox49 wontfix, firefox-esr45 wontfix, firefox50 wontfix, firefox51 verified, firefox52 fixed, firefox53 fixed)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

This bug was filed from the Socorro interface and is 
report bp-74fc5f2a-51f9-4552-b25d-adc782160814.
=============================================================

Crash seen in early betas that has risen to #10 in Beta 3. Since it has no bug associated with it, filing to get it on the radar for tracking: http://bit.ly/2aLJ9PR.

Perhaps terrence has some ideas as to what might be the root cause, adding ni.
Flags: needinfo?(terrence)
These are (almost) all hitting the MOZ_RELEASE_ASSERT added in bug 1214961. Either the hashtable is failing to grow, in which case this looks like it is really some kind of OOM. Maybe some addon ends up creating a ton of these multicompartment wrappers.
Blocks: 1214961
Although, before that patch we should have been adding every wrapper to the global XPCJSRuntime table, so it can't be making it worse. It looks like we'd just silently fail! I can't imagine would have led to anything good down the line.
> It looks like we'd just silently fail!
I meant, this was the behavior before bug 1214961.
I guess this is a longstanding OOM. I'd like to make this table smaller regardless in order to improve GC latencies, although I'm not sure how much success I'm likely to have. I suppose it's possible that a small number of things are taking lots of the space and we could fix the issue by deCOMtaminating a handful of things. If it's a long tail issue however, I don't think there's much we can do beyond a full-company effort to move ourself off of XPIDL and other legacy technologies that no longer get the platform support they need to stay effective.
Depends on: 1295684
Now the #7 topcrash in beta 5. I guess if that is something showing separately, when for 47 or 48 it was still happening, but showing as an OOM crash, it probably isn't making the overall crash rate worse. It would be nice if we could use this new information and try some of the strategies Terrence suggests. It sounds like that is unlikely to happen for 49.
Crash volume for signature 'nsXPCWrappedJS::nsXPCWrappedJS':
 - nightly (version 51): 9 crashes from 2016-08-01.
 - aurora  (version 50): 27 crashes from 2016-08-01.
 - beta    (version 49): 2227 crashes from 2016-08-02.
 - release (version 48): 1369 crashes from 2016-07-25.
 - esr     (version 45): 2 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       1       5       0
 - aurora       15       3       2
 - beta       1124     257      53
 - release     324     169      91
 - esr           0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #189
 - aurora  #107      #127
 - beta    #7        #3353
 - release #21
 - esr
status-firefox48: --- → affected
status-firefox50: --- → affected
status-firefox51: --- → affected
status-firefox-esr45: --- → affected
I still see this crash in a build that should have bug 1295684 (bp-7d718248-c70a-4cdc-84b6-a3fde2160828) so I guess my theory about this being an OOM was wrong.

Comment 8

2 years ago
the issue looks partly correlated to kaspersky being present as well:

  nsXPCWrappedJS::nsXPCWrappedJS|EXCEPTION_BREAKPOINT (529 crashes)
     51% (271/529) vs.   3% (1250/46019) light_plugin_ACF0E80077C511E59DED005056C00008@kaspersky.com
This could absolutely be a binary addon failing to handle GC properly and passing us a dead or moved pointer. Given the spike, I think that's probably the most likely explanation.
Flags: needinfo?(terrence)
Some interesting data about the signature in Firefox 48:
(100.0% in signature vs 00.73% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(nsXPConnect::GetRuntimeInstance()-> GetMultiCompartmentWrappedJSMap()->Add(cx, mRoot))
(99.95% in signature vs 39.31% overall) reason = EXCEPTION_BREAKPOINT
(67.52% in signature vs 08.02% overall) contains_memory_report = 1
(56.04% in signature vs 02.18% overall) Addon "Kaspersky Protection" = true
(51.40% in signature vs 17.77% overall) Addon "Adblock Plus" = true
(00.39% in signature vs 25.43% overall) dom_ipc_enabled = 1
(34.24% in signature vs 00.69% overall) Addon "Adblock Plus" = true ∧ Addon "Kaspersky Protection" = true
(26.15% in signature vs 07.28% overall) Addon "Adblock Plus" = true ∧ platform_pretty_version = Windows 7
(In reply to Marco Castelluccio [:marco] from comment #10)
> Some interesting data about the signature in Firefox 48:
> ...
> (56.04% in signature vs 02.18% overall) Addon "Kaspersky Protection" = true

Does this mean that Kaspersky is only potentially responsible for ~56% of these crashes (and thus comment 9 is only 44% correct)?
Flags: needinfo?(mcastelluccio)
(In reply to Andrew Overholt [:overholt] from comment #11)
> (In reply to Marco Castelluccio [:marco] from comment #10)
> > Some interesting data about the signature in Firefox 48:
> > ...
> > (56.04% in signature vs 02.18% overall) Addon "Kaspersky Protection" = true
> 
> Does this mean that Kaspersky is only potentially responsible for ~56% of
> these crashes (and thus comment 9 is only 44% correct)?

Yes, 56% of crash reports with the signature 'nsXPCWrappedJS::nsXPCWrappedJS'
have that addon.
This is true before/after the spike (~60% before, ~56% after), so I believe
the addon is not responsible for it.
Flags: needinfo?(mcastelluccio)
Thanks. Could there be another binary add-on like Terrence speculates in comment 9?
Can anyone reproduce this? Without an STR and the lack of a definitive correlation with a specific binary add-on I'm not sure what we can do next ...
Priority: -- → P1
Crash volume for signature 'nsXPCWrappedJS::nsXPCWrappedJS':
 - nightly (version 52): 2 crashes from 2016-09-19.
 - aurora  (version 51): 7 crashes from 2016-09-19.
 - beta    (version 50): 1169 crashes from 2016-09-20.
 - release (version 49): 4368 crashes from 2016-09-05.
 - esr     (version 45): 8 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly       2       0
 - aurora        5       2
 - beta        973     196
 - release    3470     898
 - esr           1       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #480
 - aurora  #443      #526
 - beta    #7        #4151
 - release #7        #349
 - esr     #6221
status-firefox52: --- → affected

Comment 16

2 years ago
(In reply to Andrew Overholt [:overholt] from comment #14)
> Can anyone reproduce this? Without an STR and the lack of a definitive
> correlation with a specific binary add-on I'm not sure what we can do next
> ...

Andrew, there is a user of the French community board with this issue:
https://forums.mozfr.org/viewtopic.php?f=5&t=130668
CR: https://crash-stats.mozilla.com/report/index/ff01c0e2-faf5-4657-85d0-306f32160904
He has Kaspersky too.

Do you want some more info?
Flags: needinfo?(overholt)
(In reply to Loic from comment #16)
> (In reply to Andrew Overholt [:overholt] from comment #14)
> > Can anyone reproduce this? Without an STR and the lack of a definitive
> > correlation with a specific binary add-on I'm not sure what we can do next
> > ...
> 
> Andrew, there is a user of the French community board with this issue:
> https://forums.mozfr.org/viewtopic.php?f=5&t=130668
> CR:
> https://crash-stats.mozilla.com/report/index/ff01c0e2-faf5-4657-85d0-
> 306f32160904
> He has Kaspersky too.
> 
> Do you want some more info?

That'd be great! My French is too rusty to be useful on that forum :) Thank you!
Flags: needinfo?(overholt)
Sorry, forgot to needinfo you when I answered in comment 17.
Flags: needinfo?(epinal99-bugzilla2)

Comment 19

2 years ago
Crash report from comment #0 and the French user indicate that Kaspersky 2016 is implied (see product_info.dll in tab "Modules"), 16.0.0.694 and 16.0.1.445.
Upgrading to Kaspersky 2017 (17.0.0.611) should fix the issue.

A possibility is to blacklist the version 16.0.1.445 wich is almost implied in all crashes:
https://crash-stats.mozilla.com/signature/?signature=nsXPCWrappedJS%3A%3AnsXPCWrappedJS&date=%3E%3D2016-10-11T14%3A36%3A00.000Z&date=%3C2016-10-18T14%3A36%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1
Flags: needinfo?(epinal99-bugzilla2)
How do we blacklist specific versions of binary add-ons?
Flags: needinfo?(benjamin)
(In reply to Loic from comment #19)
> Crash report from comment #0 and the French user indicate that Kaspersky
> 2016 is implied (see product_info.dll in tab "Modules"), 16.0.0.694 and
> 16.0.1.445.
> Upgrading to Kaspersky 2017 (17.0.0.611) should fix the issue.

Did upgrading to Kaspersky 17.0.0.611 fix the issue for the reporter?

> A possibility is to blacklist the version 16.0.1.445 wich is almost implied
> in all crashes:
> https://crash-stats.mozilla.com/signature/
> ?signature=nsXPCWrappedJS%3A%3AnsXPCWrappedJS&date=%3E%3D2016-10-
> 11T14%3A36%3A00.000Z&date=%3C2016-10-18T14%3A36%3A00.
> 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum
> ns=platform&_columns=reason&_columns=address&_sort=-date&page=1

Kaspersky is not in almost all crashes, but in ~50% of them.

(In reply to Andrew Overholt [:overholt] from comment #20)
> How do we blacklist specific versions of binary add-ons?

We have a DLL blocklist, we would need to test if it's effective with Kaspersky.

It's possible that the presence of Kaspersky makes this crash more likely or that Kaspersky causes a subset of the crashes with this signature. Should we try to block it?
Here are the numbers for 50.0b over the last week:

> 1332 crash reports have been analyzed and the following versions have been found:
>  - product_info.dll
>    - 16.0.1.445: 708
>    - 16.0.0.694: 27
>    - 17.0.0.643: 7

Note that blocklisting the addon as opposed to the DLL might block the injection of the DLL too, but wouldn't require a code change. So, if we want to proceed with blocking, perhaps we can try with the addon blocklist first.
(In reply to Marco Castelluccio [:marco] from comment #21)
> (In reply to Loic from comment #19)
> > Crash report from comment #0 and the French user indicate that Kaspersky
> > 2016 is implied (see product_info.dll in tab "Modules"), 16.0.0.694 and
> > 16.0.1.445.
> > Upgrading to Kaspersky 2017 (17.0.0.611) should fix the issue.
> 
> Did upgrading to Kaspersky 17.0.0.611 fix the issue for the reporter?

If the answer to ^ turns out to be "yes", then ...

> It's possible that the presence of Kaspersky makes this crash more likely or
> that Kaspersky causes a subset of the crashes with this signature. Should we
> try to block it?

... I think we should at least try.
Flags: needinfo?(benjamin)

Comment 24

2 years ago
Hello guys,
Not sure how related my issue is to this, but I've been getting a similar crash.
https://crash-stats.mozilla.com/report/index/902dbed4-73d9-4b6d-8c3d-7b6182161021

Can someone more knowledgeable look at the crash reports and check if they have the same root cause?
Anyway, I don't have Kaspersky installed. 

What I did notice is that the crash occurring on my end is pretty much 100% preceded in a crash of the media player on the Bloomberg live stream (at least I don't ever remember Firefox crashing without watching Bloomberg).

Let me know if I can provide you with more info.
(In reply to Denis Rosca from comment #24)
> Hello guys,
> Not sure how related my issue is to this, but I've been getting a similar
> crash.
> https://crash-stats.mozilla.com/report/index/902dbed4-73d9-4b6d-8c3d-
> 7b6182161021
> 
> Can someone more knowledgeable look at the crash reports and check if they
> have the same root cause?
> Anyway, I don't have Kaspersky installed. 
> 
> What I did notice is that the crash occurring on my end is pretty much 100%
> preceded in a crash of the media player on the Bloomberg live stream (at
> least I don't ever remember Firefox crashing without watching Bloomberg).
> 
> Let me know if I can provide you with more info.

Hello Denis, thanks for the report.
Can you reproduce this crash consistently? Can you reproduce it with addons disabled?
When did the crash start to happen? After updating Firefox?

Comment 26

2 years ago
(In reply to Marco Castelluccio [:marco] from comment #25)
> 
> Hello Denis, thanks for the report.
> Can you reproduce this crash consistently? Can you reproduce it with addons
> disabled?
> When did the crash start to happen? After updating Firefox?

My problems started 1 or 2 months ago (can't tell you more specific dates because I didn't pay too much attention to it at first).
I can pretty reliably reproduce the *Firefox crashing* issue when running with plugins enabled. The problem is that the failure modes differ a bit:
1. Firefox just freezes so I have to kill the process
2. Firefox crashes and asks to submit a crash report.

So I'm not exactly sure if this is the same problem or I just happen to hit two different issues somehow.

I'll try to see if it reproduces without addons.

I have a question about how I could debug case 1 above? How do I get more info about the reason Firefox froze? If I just kill the process no crash report is generated.
Have you ever used mozregression?
If this is a regression in Firefox, this tool will help finding what caused it.

You can download it here: https://github.com/mozilla/mozregression/releases. Simply select an estimated good date (e.g. 2016-01-01) and select today as "bad". It will download several builds of Firefox and let you test if the build is good (unaffected) or bad (affected) and at the end it will tell you what caused the regression.

(In reply to Denis Rosca from comment #26)
> So I'm not exactly sure if this is the same problem or I just happen to hit
> two different issues somehow.

I bet they are the same, but I'm not sure.

> I'll try to see if it reproduces without addons.

Great, let us know.

> I have a question about how I could debug case 1 above? How do I get more
> info about the reason Firefox froze? If I just kill the process no crash
> report is generated.

What happens if you let Firefox be frozen for a while? In theory, it should be automatically killed after a while and a crash report should be generated.

Comment 28

2 years ago
(In reply to Denis Rosca from comment #24)
> Hello guys,
> Not sure how related my issue is to this, but I've been getting a similar
> crash.
> https://crash-stats.mozilla.com/report/index/902dbed4-73d9-4b6d-8c3d-
> 7b6182161021

You have LastPass installed in your profile. Do you have the latest version? If not, can you update LP, please.
(In reply to Denis Rosca from comment #26)
> I have a question about how I could debug case 1 above? How do I get more
> info about the reason Firefox froze? If I just kill the process no crash
> report is generated.

The simplest way is for you to download the crashfirefox.exe binary we provide: https://ftp.mozilla.org/pub/utilities/crashfirefox-intentionally/crashfirefox.exe

If you run that it will locate a running Firefox and crash it in a way such that a crash report will be generated.

Comment 30

2 years ago
So... I've hit this again:
https://crash-stats.mozilla.com/report/index/c9cf876b-1a48-4e6f-84b2-d050b2161102

I have to say that my initial statement of being able to reproduce it consistently doesn't seem to be true. Firefox is crashing quite a lot, but with different causes it seems.


(In reply to Loic from comment #28)
> (In reply to Denis Rosca from comment #24)
> > Hello guys,
> > Not sure how related my issue is to this, but I've been getting a similar
> > crash.
> > https://crash-stats.mozilla.com/report/index/902dbed4-73d9-4b6d-8c3d-
> > 7b6182161021
> 
> You have LastPass installed in your profile. Do you have the latest version?
> If not, can you update LP, please.

I did update LastPass, the add-on reports to be on version 4.1.29a which seems to be the most recent one?
(In reply to Loic from comment #19)
> Crash report from comment #0 and the French user indicate that Kaspersky
> 2016 is implied (see product_info.dll in tab "Modules"), 16.0.0.694 and
> 16.0.1.445.
> Upgrading to Kaspersky 2017 (17.0.0.611) should fix the issue.
> 
> A possibility is to blacklist the version 16.0.1.445 wich is almost implied
> in all crashes:
> https://crash-stats.mozilla.com/signature/
> ?signature=nsXPCWrappedJS%3A%3AnsXPCWrappedJS&date=%3E%3D2016-10-
> 11T14%3A36%3A00.000Z&date=%3C2016-10-18T14%3A36%3A00.
> 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum
> ns=platform&_columns=reason&_columns=address&_sort=-date&page=1

Loic: Can we get the user to agree to share his crash dump with Kaspersky?
Flags: needinfo?(epinal99-bugzilla2)

Comment 32

2 years ago
I asked him but I'm not sure if the user will respond me, so feel free to send the report (or not).
Flags: needinfo?(epinal99-bugzilla2)
(In reply to Loic from comment #32)
> I asked him but I'm not sure if the user will respond me, so feel free to
> send the report (or not).

We've already shared the report, which is public and can be accessed by anyone, but we can't share the dump (which might contain personal information) without explicit permission from the user.
Unfortunately the dump is often what the third-party developers need in order to debug the problem.

Comment 34

2 years ago
I've managed to reproduce the issue and have got mini dumps and memory reports from the crash reporter and full dumps from WER. Email me to get a full dump.

It is not an easy one to reproduce. I made a test that simulates a user: navigates, opens/closes tabs (but opens no more than 10 tabs), clicks at random places. Eventually the browser crashes for some reason, OOM is the most common one. But sometimes I get a crash in the same place as in this bug. So it may take a day or two to reproduce it again.

The hash that failed to add an entry looks a little bit weird to me. I found it using xul!nsXPConnect::gSelf->mRuntime->mWrappedJSMap->mTable->impl. It has similar state in all dumps. For example:

[+0x000] gen              : 0x8
[+0x000] hashShift        : 0x15
[+0x00c] entryCount       : 0x3dc
[+0x010] removedCount     : 0x17b

Correct me if I am wrong but given these values
shouldCompressTable() => false
overloaded() => false
so there was no need to rehash and adding an entry should not cause big memory allocations. The number of entries is pretty small too. Why did it fail to add an entry then?


It is possible that we are dealing with a memory leak here, so it is reasonable to collect more memory related data. Is there a way to configure the crash reporter to write verbose and not anonymized memory reports? What else would you recommend to collect?
ni on Andrew to see if he can find someone to answer the questions from Sergei in Comment 34.
Flags: needinfo?(overholt)
Andrew said he could help.
Flags: needinfo?(overholt) → needinfo?(continuation)
Component: XPConnect → XPCOM
Flags: needinfo?(continuation)
(In reply to Sergei from comment #34)
> I've managed to reproduce the issue and have got mini dumps and memory
> reports from the crash reporter and full dumps from WER. Email me to get a
> full dump.

Thanks for looking at this. Could you at least upload the memory report here? The actual URLs probably aren't a big deal, so an anonymized one should be okay.
 
> It is not an easy one to reproduce. I made a test that simulates a user:
> navigates, opens/closes tabs (but opens no more than 10 tabs), clicks at
> random places. Eventually the browser crashes for some reason, OOM is the
> most common one. But sometimes I get a crash in the same place as in this
> bug. So it may take a day or two to reproduce it again.
> 
> The hash that failed to add an entry looks a little bit weird to me. I found
> it using xul!nsXPConnect::gSelf->mRuntime->mWrappedJSMap->mTable->impl. It
> has similar state in all dumps. For example:
> 
> [+0x000] gen              : 0x8
> [+0x000] hashShift        : 0x15
> [+0x00c] entryCount       : 0x3dc
> [+0x010] removedCount     : 0x17b
> 
> Correct me if I am wrong but given these values
> shouldCompressTable() => false
> overloaded() => false
> so there was no need to rehash and adding an entry should not cause big
> memory allocations. The number of entries is pretty small too. Why did it
> fail to add an entry then?

Hmm. Unfortunately I'm not very familiar with the internals of the PLDHashtable. Maybe njn could look into this a bit? It does feel like there might be some kind of bug in PLDHashtable when an allocation fails (like an allocation that doesn't check for failure), but I've read over the code and I wasn't able to find anything.

> It is possible that we are dealing with a memory leak here, so it is
> reasonable to collect more memory related data. Is there a way to configure
> the crash reporter to write verbose and not anonymized memory reports? What
> else would you recommend to collect?

The reports should be verbose, aside from the anonymization. They would at least give us a bit of a sense of what might be going wrong.
Flags: needinfo?(n.nethercote)
> The hash that failed to add an entry looks a little bit weird to me. I found
> it using xul!nsXPConnect::gSelf->mRuntime->mWrappedJSMap->mTable->impl. It
> has similar state in all dumps. For example:
> 
> [+0x000] gen              : 0x8
> [+0x000] hashShift        : 0x15
> [+0x00c] entryCount       : 0x3dc
> [+0x010] removedCount     : 0x17b
> 
> Correct me if I am wrong but given these values
> shouldCompressTable() => false
> overloaded() => false
> so there was no need to rehash and adding an entry should not cause big
> memory allocations. The number of entries is pretty small too. Why did it
> fail to add an entry then?

If hashShift is 0x15 then the capacity should be 1 << (32 - 0x15) which is 2048. The number of used entries is 988 and removed entries is 379. I agree that the table shouldn't resize in this situation.

I don't have full context here... what hash table operation failed? Did a hash table operation lead to OOM? Did you get that from a minidump? If you can email me the minidump I can take a look.


> It is possible that we are dealing with a memory leak here, so it is
> reasonable to collect more memory related data. Is there a way to configure
> the crash reporter to write verbose and not anonymized memory reports? What
> else would you recommend to collect?

You can get non-anonymized memory reports by visiting about:memory and clicking on the "Measure and save..." button.
Flags: needinfo?(n.nethercote) → needinfo?(sergey.kogan)

Comment 39

2 years ago
Sent the full dump to Nicholas.

(In reply to Andrew McCreight [:mccr8] from comment #37)
> Could you at least upload the memory report here?

I uploaded the report (except the full dump), here it is: https://crash-stats.mozilla.com/report/index/22753c66-5511-40a1-9aa1-6c4ec2161123
Flags: needinfo?(sergey.kogan) → needinfo?(n.nethercote)

Comment 40

2 years ago
Reproduced it again on a VM with freshly installed (from MSDN .iso) Windows 7 SP1. Kaspersky was also installed but was not loaded. Kaspersky addon was disabled. There was no other software or addons, just clean Windows and Firefox 50.

The hash is even smaller:

    [+0x000] gen              : 0x4
    [+0x000] hashShift        : 0x16
    [+0x00c] entryCount       : 0x1fb
    [+0x010] removedCount     : 0xef

Full dump (written by the crash reporter this time) and memory report can be downloaded here: https://1drv.ms/u/s!Avfdd6lsYwdUgpZqJktPsq1EINvQaQ

Some observations:
- Running the test on a VM with 2GB of memory gives more stable results. Tried 1GB, 1.5GB but 2GB is better.
- Eventually (after 15-30 minutes) Firefox consumes all available memory and crashes.
- Almost all of the crashes are caused by OOM asserts. Others - by access violations and other asserts, but almost all of them occur in the OOM situation, too.

Comment 41

2 years ago
Set some breakpoints and reproduced the issue again. The actual failure stack looks like this:

00 xul!JS::Zone::getUniqueId
01 xul!js::MovableCellHasher<JSObject *>::ensureHash
02 xul!js::detail::HashTable<...>::lookupForAdd
03 xul!js::HashMap<...>::lookupForAdd
04 xul!JSObject2WrappedJSMap::Add
05 xul!nsXPCWrappedJS::nsXPCWrappedJS

xul!JS::Zone::getUniqueId returned false because xul!JS::Zone::uniqueIds_ failed to add an entry. This hash had the following state in the full dump I posted yesterday:

((xul!nsXPCWrappedJS*)0x2af0e380)->mJSObj->ptr->group_->value->compartment_->zone_->uniqueIds_->impl
    [+0x000] gen              : 0x35
    [+0x000] hashShift        : 0x11
    [+0x00c] entryCount       : 0x5ce8
    [+0x010] removedCount     : 0x318

So it was overloaded and a big allocation was needed to recreate the table and add an entry.
Oh, so that makes more sense: we have an OOM failure inside a JS hashtable, not a PLDHashtable. The JS engine tries to handle OOMs, sometimes, so you can end up with odd crashes in OOM situations. It would be nice to improve that somehow.

The other question is, why are we hitting OOM? Looking at the memory report in the crash dump in comment 39, nothing looks too out of the ordinary to me. It is using about 1.5GB of memory, and there's little system memory left, but the usage is spread out between a number of categories. About 320MB of chrome JS, including 114MB for JS runtime stuff, 320 MB of window stuff, 200MB of heap-unclassified, etc.

I do see that for chrome JS the unique-id-map is 3,145,728, which is fairly large given that it is a hash table. It is certainly believable that you can't grow that when memory usage is at 95%.
(Assignee)

Comment 43

2 years ago
(In reply to Andrew McCreight [:mccr8] from comment #42)
> Oh, so that makes more sense: we have an OOM failure inside a JS hashtable,
> not a PLDHashtable. The JS engine tries to handle OOMs, sometimes, so you
> can end up with odd crashes in OOM situations. It would be nice to improve
> that somehow.

This is crashing because of the release assert here: http://searchfox.org/mozilla-central/source/js/xpconnect/src/XPCWrappedJS.cpp#419

I think we could refactor this to so that this part of initialisation is fallible which should improve things.

From the size of that hash table is sounds like something strange is going on though.

Comment 44

2 years ago
I can share another crash report with full dump and memory report, maybe it can be helpful. Only the Kaspersky addon was installed this time. Download link: https://1drv.ms/u/s!Avfdd6lsYwdUgpZr_d5za1yeOlA9EA

The same unique-id-map that was mentioned in comment 42 has the following size in the reports that I have access to:

- A lot of 3d-party addons (report from comment 39): hash size = 3,145,728
- No 3d-party addons (report from comment 40): hash size = 786,432
- Kaspersky addon only, no other 3d-party addons (report from this comment): hash size = 786,432
Flags: needinfo?(n.nethercote)
(Assignee)

Comment 45

2 years ago
Created attachment 8816489 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure

Oh, that constructor is already fallible since it has an out parameter for the return code.  How about setting this if we can't add the wrapper to the map?  Hopefully this will allow the error to get handled somewhere further up the stack rather than crashing.
Attachment #8816489 - Flags: review?(continuation)
Comment on attachment 8816489 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure

Review of attachment 8816489 [details] [diff] [review]:
-----------------------------------------------------------------

I guess this is a bit of an improvement. I do see one caller of GetNewOrUsed that doesn't handle failure correctly and will crash with a null deref, nsXPCWrappedJSClass::DelegatedQueryInterface, but the rest look okay.
Attachment #8816489 - Flags: review?(continuation) → review+
(Assignee)

Comment 47

2 years ago
Created attachment 8817262 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure v2

Updated patch to check for failure in nsXPCWrappedJSClass::DelegatedQueryInterface.
Attachment #8816489 - Attachment is obsolete: true
Attachment #8817262 - Flags: review?(continuation)
Attachment #8817262 - Flags: review?(continuation) → review+

Comment 48

2 years ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a4417f62daa8
Handle failure in nsXPCWrappedJS constructor r=mccr8

Comment 49

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a4417f62daa8
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox53: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
We might need to wait until this moves to Aurora or even Beta to see the effect. Should we uplift?

Comment 51

2 years ago
i think the crash volume is high enough on aurora to quickly gauge if the fix has an effect on this channel. what do you think about uplifting?
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 52

2 years ago
Comment on attachment 8817262 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure v2

Approval Request Comment
[Feature/Bug causing the regression]: The release assert was added in bug 1214961.
[User impact if declined]: Possible crashes.
[Is this code covered by automated tests?]: It's exercised by our tests.  I'm not sure if it's tested specifically.
[Has the fix been verified in Nightly?]: Requesting uplift to check whether it affects crash volumes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Simple change, adds handling of OOM condition.
[String changes made/needed]: None.
Flags: needinfo?(jcoppeard)
Attachment #8817262 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

2 years ago
Assignee: nobody → jcoppeard
Comment on attachment 8817262 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure v2

crash fix for aurora52
Attachment #8817262 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Jon Coppeard (:jonco) from comment #54)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/
> ea07499d13936886f4b10455628a2f951798bf9c

setting flags
status-firefox52: affected → fixed
Hi :jonco,
Do you think this is worth uplifting to Beta51?
Flags: needinfo?(jcoppeard)
The last affected Aurora build ID is 20161213004019, but the volume has always been very low so it might still be too early to tell if the crash is fully fixed.

It is the #5 top crasher on release at the moment, so it would be pretty nice to fix it in 51.

https://crash-stats.mozilla.com/search/?signature=%3DnsXPCWrappedJS%3A%3AnsXPCWrappedJS&product=Firefox&version=52.0a2&date=%3E%3D2016-12-06T08%3A31%3A56.000Z&date=%3C2016-12-20T08%3A31%3A56.000Z&_sort=-date&_facets=signature&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id
(Assignee)

Comment 58

2 years ago
Comment on attachment 8817262 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure v2

Approval Request Comment
[Feature/Bug causing the regression]: The release assert was added in bug 1214961.
[User impact if declined]: Possible crashes.
[Is this code covered by automated tests?]: It's exercised by our tests.  I'm not sure if it's tested specifically.
[Has the fix been verified in Nightly?]: No crashes on nightly since this landed, but it was low volume anyway.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No, this has been on m-c since 8th December.
[Why is the change risky/not risky?]: Simple change, adds handling of OOM condition.
[String changes made/needed]: None.
Flags: needinfo?(jcoppeard)
Attachment #8817262 - Flags: approval-mozilla-beta?
Comment on attachment 8817262 [details] [diff] [review]
bug1295033-nsXPCWrappedJS-failure v2

Fix a crash. Beta51+. Should be in 51 beta 10.
Attachment #8817262 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Jon, when landing a patch, please update the status flags ;) Thanks
status-firefox48: affected → wontfix
status-firefox49: affected → wontfix
status-firefox50: affected → wontfix
status-firefox51: affected → fixed
status-firefox-esr45: affected → wontfix
No crashes with 51.0b10 so far.
Sergei, could you verify that you aren't able to crash anymore with your setup?
Flags: needinfo?(sergey.kogan)

Comment 64

2 years ago
Sorry for the delay.
Yes I can confirm that Firefox does not crash in my environment anymore. Tested version
https://ftp.mozilla.org/pub/firefox/releases/51.0b12/win32/en-US/Firefox%20Setup%2051.0b12.exe
Ran the test for 4.5 hours on exactly the same machines - no crashes at all.
Flags: needinfo?(sergey.kogan)
status-firefox51: fixed → verified
Thanks Sergei, really appreciated!
Also, thanks for your investigation into this crash that let us figure out how to fix it!
You need to log in before you can comment on or make changes to this bug.