Closed Bug 1308800 Opened 8 years ago Closed 7 years ago

Crash [@ js::IsDerivedProxyObject ]

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

51 Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
Tracking Status
platform-rel --- ?
firefox49 --- wontfix
firefox-esr45 --- affected
firefox50 --- wontfix
firefox51 + fixed
firefox52 + fixed
firefox53 + fixed

People

(Reporter: esthermonchari, Unassigned)

References

Details

(4 keywords, Whiteboard: [platform-rel-Facebook])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-14f4c0a6-629a-418b-8d79-62f072160925.
=============================================================

Seen while looking at nightly crash stats. crash started with build id 20160925030226 on 2016-09-25 22:04:16 and still continuing.

adding ni on:Jon maybe help figure out if i placed that on the right component and if you have some ideas on what might be causing the crash.
Flags: needinfo?(jcoppeard)
I don't really understand this.  I can't see how we can go from nsDOMCSSDeclaration::RemoveProperty to js::IsDerivedProxyObject, but I guess it's possible there is a lot of inlining going on.

Maybe someone in layout would be better placed to look at this?
Component: JavaScript: Standard Library → Layout
Flags: needinfo?(jcoppeard)
There *is* a static call chain that would get you between those, but it seems unlikely that it would be inlined all the way through:

uint32 nsDOMCSSDeclaration::RemoveProperty(int32)
uint32 nsDOMCSSAttributeDeclaration::SetCSSDeclaration(mozilla::css::Declaration*)
uint32 nsStyledElementNotElementCSSInlineStyle::SetInlineStyleDeclaration(mozilla::css::Declaration*, nsAString_internal*, uint8)
uint32 mozilla::dom::Element::SetAttrAndNotify(int32, nsIAtom*, nsIAtom*, nsAttrValue*, nsAttrValue*, uint8, uint8, uint8, uint8)
void nsDocument::EnqueueLifecycleCallback(uint32, mozilla::dom::Element*, mozilla::dom::LifecycleCallbackArgs*, mozilla::dom::CustomElementDefinition*)
void nsContentUtils::AddScriptRunner(nsIRunnable*)
void nsContentUtils::AddScriptRunner(struct already_AddRefed<nsIRunnable>)
uint32 WindowDestroyedEvent::Run()
uint8 js::NukeCrossCompartmentWrappers(JSContext*, js::CompartmentFilter*, js::CompartmentFilter*, uint32)
void js::NukeCrossCompartmentWrapper(JSContext*, JSObject*)
uint8 js::IsDeadProxyObject(JSObject*)
uint8 js::IsDerivedProxyObject(JSObject*, js::BaseProxyHandler*)
Severity: normal → critical
Component: Layout → DOM: CSS Object Model
Boris, I don't know who else to ask to investigate here.
Flags: needinfo?(bzbarsky)
There is no way the call chain in comment 2 is actually being inlined all the way through.

Not only that, but EnqueueLifecycleCallback calling AddScriptRunner can't trigger WindowDestroyedEvent::Run.  It can only trigger Run() on the runnable EnqueueLifecycleCallback passes in, which is not a WindowDestroyedEvent.

I looked at crash-stats crashes in js::IsDerivedProxyObject, and most of them seem to be calling it from js::jit::GetPropertyIC::tryAttachStub, which is a much more sane claim.  Specifically, I looked at the first 50 of the 168 reports we have in the past week claiming a crash in js::IsDerivedProxyObject (across all branches), and I found:

* 1 report (<https://crash-stats.mozilla.com/report/index/f15fb217-9b62-4c53-a301-a361f2161011>) which only has js::IsDerivedProxyObject on the stack.
* 1 report (<https://crash-stats.mozilla.com/report/index/0ccfb967-9518-4b59-bed8-904fd2161011>) which claims CSS2PropertiesBinding::set_maxWidth called js::IsDerivedProxyObject.
* 1 report (<https://crash-stats.mozilla.com/report/index/9617af00-5d39-46da-8504-691af2161010>) which claims js::jit::GetPropertyIC::tryAttachModuleNamespace called js::IsDerivedProxyObject.  Note that the tryAttachStub callsites all actually point to the line where tryAttachStub calls tryAttachModuleNamespace, so this is really the tryAttachStub thing.
* 47 reports (e.g. <https://crash-stats.mozilla.com/report/index/493b1955-5064-44cd-8be4-f00b42161011>) which claim calls from tryAttachStub.

I tried looking at branches more specifically.  I see 11 hits in the past week on 52.0a1.  All have the tryAttachStub/tryAttachModuleNamespace callstack.

I will now claim that the CSS2Properties things are at the very least much lower frequency, and more likely simply bogus stacks, and what needs fixing here is the tryAttachStub path.
Component: DOM: CSS Object Model → JavaScript Engine: JIT
Flags: needinfo?(bzbarsky) → needinfo?(jdemooij)
Oh, the two stacks I saw so far that show us coming through tryAttachModuleNamespace claim we come through here:

    if (!obj->is<ModuleNamespaceObject>())
        return true;

and are crashing under that first line with the is() call, and that this line calls js::IsDerivedProxyObject.  That seems plausible, given:

  JSObject::is<js::ModuleNamespaceObject>() const
  {
      return js::IsDerivedProxyObject(this, &js::ModuleNamespaceObject::proxyHandler);
  }

If the crash address can be trusted, by the way, these are all null-pointer crashes.  I don't know how much I would trust it, since I don't see how obj can be null in tryAttachStub there.....
I looked at one of these crashes in the debugger, it seems we pass a nullptr |obj| to GetPropertyIC::update, and tryAttachModuleNamespace is simply the first place where we dereference this object.

It's interesting that most URLs are for Facebook's "photo theater view" (viewing a single image almost full-screen). These URLs look like this:

  https://www.facebook.com/photo.php?fbid=...&set=...&type=3&theater 

I tried some of these URLs on Windows but no crashes. Maybe QA can do more testing here.
Crash volume for signature 'js::IsDerivedProxyObject':
 - nightly (version 52): 56 crashes from 2016-09-19.
 - aurora  (version 51): 317 crashes from 2016-09-19.
 - beta    (version 50): 89 crashes from 2016-09-20.
 - release (version 49): 203 crashes from 2016-09-05.
 - esr     (version 45): 4 crashes from 2016-07-25.

Crash volume on the last weeks (Week N is from 10-17 to 10-23):
            W. N-1  W. N-2  W. N-3  W. N-4
 - nightly      12       8      19      13
 - aurora      105      76      95      18
 - beta         22      25      26      10
 - release      53      52      56      20
 - esr           0       0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #127      #69
 - aurora  #33       #12
 - beta    #857      #390
 - release #1292     #492
 - esr
Hi Kanru,
This becomes worse in 51.0b1. Can you help take a look at this one?
Flags: needinfo?(kchen)
The correlation tab for 52 shows:

(35.29% in signature vs 08.13% overall) "DXVA2D3D9+" in app_notes = true
(35.29% in signature vs 08.87% overall) "DXVA2D3D9?" in app_notes = true
(32.35% in signature vs 06.89% overall) "DXVA2D3D11?" in app_notes = true
(29.41% in signature vs 06.05% overall) "DXVA2D3D11+" in app_notes = true

And for 51

(98.70% in signature vs 06.41% overall) address = 0x0
(100.0% in signature vs 20.19% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ
(07.98% in signature vs 62.64% overall) submitted_from_infobar = true
(57.00% in signature vs 15.68% overall) os_arch = amd64
(100.0% in signature vs 66.61% overall) plugin = false
(100.0% in signature vs 66.61% overall) plugin_version = null
(100.0% in signature vs 78.46% overall) shutdown_progress = null
(43.00% in signature vs 21.22% overall) os_arch = x86
(37.30% in signature vs 16.60% overall) "DXVA2D3D9?" in app_notes = true
(100.0% in signature vs 83.23% overall) ipc_channel_error = null
(34.69% in signature vs 14.63% overall) "DXVA2D3D9+" in app_notes = true
(04.56% in signature vs 37.15% overall) startup_crash = null ∧ submitted_from_infobar = true
(54.56% in signature vs 21.66% overall) "D3D11 Layers+" in app_notes = true ∧ startup_crash = 0
(54.56% in signature vs 21.66% overall) startup_crash = 0 ∧ "D3D11 Layers?" in app_notes = true
(02.28% in signature vs 31.16% overall) submitted_from_infobar = true ∧ "D3D11 Layers?" in app_notes = true
(02.28% in signature vs 31.16% overall) "D3D11 Layers+" in app_notes = true ∧ submitted_from_infobar = true
(03.42% in signature vs 25.46% overall) submitted_from_infobar = true ∧ startup_crash = 0
(04.56% in signature vs 26.69% overall) dom_ipc_enabled = 1 ∧ submitted_from_infobar = true
(20.03% in signature vs 03.43% overall) platform_version = 10.0.14393 ∧ os_arch = amd64

Looks like it's somehow more reproducible on DXVA2D3D9 and DXVA2D3D11 enabled platform.
some of the module correlations on beta also indicate that this could be related to media-playback:

  js::IsDerivedProxyObject|EXCEPTION_ACCESS_VIOLATION_READ (267 crashes)
     97% (259/267) vs.  35% (18271/51599) mozavcodec.dll
     97% (259/267) vs.  35% (18272/51599) mozavutil.dll
     89% (238/267) vs.  32% (16616/51599) msmpeg2vdec.dll
     89% (238/267) vs.  33% (16849/51599) mf.dll
     89% (238/267) vs.  33% (16861/51599) evr.dll
     89% (238/267) vs.  33% (17114/51599) dxva2.dll
     89% (238/267) vs.  33% (17224/51599) mfplat.dll
     87% (231/267) vs.  33% (17222/51599) avrt.dll
     82% (220/267) vs.  29% (15113/51599) dxgi.dll
     93% (248/267) vs.  41% (21070/51599) NapiNSP.dll
     93% (248/267) vs.  41% (21075/51599) nlaapi.dll
also adding the regression keyword, since the signature is spiking since 51 builds...
Keywords: regression
Version: 52 Branch → 51 Branch
Track 51+ as there is a spike in 51.
Hi Blake, 
Can you help shed some light here??
Flags: needinfo?(bwu)
Blake, the recent spike of this signature maybe related to a media-playback regression. Does comment 9 and comment 10 ring any bells?
My team is not very familiar with window platform yet. 
Redirect ni to the expert, Chris. :-)
Flags: needinfo?(bwu) → needinfo?(cpearce)
Hi Anthony,
Can you help find someone to look at this if cpearce is not around since this crash is getting worse on Beta51?
Flags: needinfo?(ajones)
Hi Jean-Yves, Gerald, Jandem, we need a bit of help on this one. It's been a top crasher for a while and hoping we can get some sort of resolution soon.
Flags: needinfo?(jyavenard)
Flags: needinfo?(gsquelart)
As noted above, the majority of crash reports have URLs which are the facebook photo viewer. So I don't think that this is necessarily media related. The fact that video is or has been active when people visit facebook is not at all surprising; every third entry in my facebook feed is a video.
Flags: needinfo?(cpearce)
It's impossible to say much here. If we get a nullptr JSObject* it could be some (DOM) native returning a bogus value, it could be a regalloc bug, it could be a GC bug, it could be a TI bug, etc.

I'll try to repro this again, but it would be great if we could get some QA testing as well for the Facebook photo viewer. Do we have Facebook contacts?
I'm going to assume this is not a video issue for now.
Flags: needinfo?(jyavenard)
Flags: needinfo?(gsquelart)
Flags: needinfo?(ajones)
Tracking 52/53+ so we can address this crash. ni on Andrei to see if this is something (Facebook photo viewer) that SV can try to reproduce.
Flags: needinfo?(andrei.vaida)
I tried to reproduce the issue on Windows 10 x64 & x86, Windows 8.1 x64 & x86, Windows 7 x64 & x86, using  latest Nightly 53.0a1 (2017-01-03), latest Aurora 52.0a2 (2017-01-03) and 51.0b10 (20161222080852), focusing on facebook photo viewer and video player and also on other popular media websites but with no luck.
Flags: needinfo?(andrei.vaida)
This spike in crashes in 51 (and later) is nearing release.  Not all the crashes are nullptrs.  I agree with Anthony there's no indication this is media related.
Given the severity of this crash, we'd like to track this a s release blocking for Fx51. I hope we have a fix/mitigation in place soon that does not jeopardize the 51 go live date.
(In reply to [:philipp] from comment #28)
> so maybe there are some clues about the cause of these crashes in this
> regression range...

The main thing in that regression range is Shu's frontend rewrite, bug 1263355. Unfortunately that patch is so huge that (1) backout is impossible for sure, (2) we're not going to find this by looking at the patch. Bug 1324773 is a regression from bug 1263355 that still has to land on beta, but the odds of it being related are small.

As I said before, most of these crashes are on Facebook's photo viewer, with URLs that look like this:

  https://www.facebook.com/photo.php?fbid=...&set=...&type=3&theater 

I know we have contacts at Facebook. As I said 2 weeks ago, we need to get them involved - maybe they have better ways to test this or they can tell us what's special about that page. Who can work on that?

I'll try to reproduce this today, but it has been impossible so far.
Dees, see comment 30.
Flags: needinfo?(dchinniah)
(In reply to Marco Castelluccio [:marco] from comment #31)
> Dees, see comment 30.

I've just added both :marco and :jandem to the Facebook DL. Please use that channel of communication and start a new thread. They are super responsive.
Flags: needinfo?(dchinniah)
platform-rel: --- → ?
Whiteboard: [platform-rel-Facebook]
(In reply to Desigan Chinniah [:cyberdees] [:dees] [London - GMT] from comment #32)
> I've just added both :marco and :jandem to the Facebook DL. Please use that
> channel of communication and start a new thread. They are super responsive.

I didn't get any emails from that. Could you please email me the details?
Flags: needinfo?(dchinniah)
Oh nevermind, I found it on the Google Groups page.
Flags: needinfo?(dchinniah)
In case this helps, I hit this crash on my win10 laptop a few days back while looking at FB photos and zooming in/out. This was my comment "Crashed when zooming in using Ctrl+ shortcut key on FB feed." and the attached crash report is from my laptop from Jan 2nd. 

Please lemme know if I can help in anyway. I will try to spend more time on FB (now officially) to repro this :)
Crashes with the signature "js::SetLengthProperty" are also often happening with the Facebook photo viewer.
Crash Signature: [@ js::IsDerivedProxyObject ] → [@ js::IsDerivedProxyObject ] [@ js::SetLengthProperty ]
A lot of folks at Mozilla have been trying to repro this crash including me and we've had no luck so far. Given that, I am removing the release blocking tag from this bug. The crash volume is still a bit worrisome but not worth blocking the release on.
Ran into the crash again! lots of ctrl+, ctrl-, scrolling up and down on the FB feed. I noticed that this is harder to repro with ctrl+mouse scroll but rather repros with ctrl+, ctrl-

https://crash-stats.mozilla.com/report/index/d3349291-32f0-4af3-a44b-316182170111
I was at zoom level 50% when it crashes and was changing zoom levels from 30% - 120% while going in & out of photo viewer mode. Doing all of these in rapid succession while scrolling up and down on my FB feed :)
(In reply to Ritu Kothari (:ritu) from comment #40)
> Ran into the crash again! lots of ctrl+, ctrl-, scrolling up and down on the
> FB feed. I noticed that this is harder to repro with ctrl+mouse scroll but
> rather repros with ctrl+, ctrl-
> 
> https://crash-stats.mozilla.com/report/index/d3349291-32f0-4af3-a44b-
> 316182170111

Are you doing Ctrl+/- from within the photo theater view or on the feed page itself?
This time (3rd time) I hit it while rapidly clicking on -> key in FB photo viewer, zoom level at 30% https://crash-stats.mozilla.com/report/index/2582ca3a-b7de-4316-a08e-3f7df2170111
(In reply to Shu-yu Guo [:shu] from comment #42)
> (In reply to Ritu Kothari (:ritu) from comment #40)
> > Ran into the crash again! lots of ctrl+, ctrl-, scrolling up and down on the
> > FB feed. I noticed that this is harder to repro with ctrl+mouse scroll but
> > rather repros with ctrl+, ctrl-
> > 
> > https://crash-stats.mozilla.com/report/index/d3349291-32f0-4af3-a44b-
> > 316182170111
> 
> Are you doing Ctrl+/- from within the photo theater view or on the feed page
> itself?

In both, so zoom in/out while scrolling my fb feed and zoom in/out while going back/forward using <-/-> key on my fb photos
Can we annotate the crash dump with the filename/lineno of the JS script? That way maybe the Facebook people could work around this somehow.

Maybe in GetPropertyIC::update we could call CrashReporter::AnnotateCrashReport with the filename/lineno? That function isn't very fast, but we would only have to leave it in for a few days. We would just have to add a JS engine hook for AnnotateCrashReport.
We've talked in the past about getting the profiler pseudostack into the crash reports since it's likely to not be corrupted. With the changes to how the JS profiling information is stored it's more complicated now since you have to call to the JS engine to resolve the JS IP. Shu would be a good person to ask but it may be really risky to do from a crash handler with the newer architecture.
In bug 1295918 we plan to retrieve content JS stacks from the parent process. But it might be too late for this bug.

(In reply to Bill McCloskey (:billm) from comment #45)
> Can we annotate the crash dump with the filename/lineno of the JS script?
> That way maybe the Facebook people could work around this somehow.
> 
> Maybe in GetPropertyIC::update we could call
> CrashReporter::AnnotateCrashReport with the filename/lineno? That function
> isn't very fast, but we would only have to leave it in for a few days. We
> would just have to add a JS engine hook for AnnotateCrashReport.

I don't know whit is GetPropertyIC::update. Will annotating in GetPropertyIC::update cause significant slow down? Maybe we can record the filename/lineno when we enter JS engine.
Flags: needinfo?(kchen)
Since ritu can repro... would it help to have her run under GDB and dump the JS stack on a crash?  Or in general give another engineer access to drive GDB (or MSVC) once she has a crash?
(In reply to Randell Jesup [:jesup] from comment #48)
> Since ritu can repro... would it help to have her run under GDB and dump the
> JS stack on a crash?  Or in general give another engineer access to drive
> GDB (or MSVC) once she has a crash?

I'm heading to MV tomorrow to take a look with her.
I spent a few hours with Ritu trying to reproduce the crash again. She was unable to reproduce again with the crashing build from comment 40 after an hour or two. :(
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #47)
> In bug 1295918 we plan to retrieve content JS stacks from the parent
> process. But it might be too late for this bug.
> 
> (In reply to Bill McCloskey (:billm) from comment #45)
> > Can we annotate the crash dump with the filename/lineno of the JS script?
> > That way maybe the Facebook people could work around this somehow.
> > 
> > Maybe in GetPropertyIC::update we could call
> > CrashReporter::AnnotateCrashReport with the filename/lineno? That function
> > isn't very fast, but we would only have to leave it in for a few days. We
> > would just have to add a JS engine hook for AnnotateCrashReport.
> 
> I don't know whit is GetPropertyIC::update.

It's a function that's higher up in the crash stack.

> Will annotating in
> GetPropertyIC::update cause significant slow down?

Quite possibly. We could use a faster annotation, but I'm not sure it's worth it. The crash annotation function iterates over a hashtable with maybe 100 things and builds a string out of them. I don't have a sense of how often we update GetProp ICs, but this seems like a tolerable cost for a few days on Nightly.

> Maybe we can record the
> filename/lineno when we enter JS engine.

That could be very far away from where the problem occurs.
I have pretty reliable STR now and I have an rr recording.
Depends on: 1331058
We have a patch for this.

Let's wait for confirmation from crash-stats before we close this.
Flags: needinfo?(jdemooij)
Priority: -- → P1
Fixed in bug 1331058.
Verified the crash signature dropped to zero in last 7 days.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.