Closed Bug 574904 Opened 14 years ago Closed 7 years ago

Main-process crash [@ LookupNPP] [@ nsJSObjWrapper::GetNewOrUsed(_NPP*, JSContext*, JSObject*) ]

Categories

(Core Graveyard :: Plug-ins, defect)

x86
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jruderman, Unassigned)

References

Details

(Keywords: crash, testcase)

Crash Data

Attachments

(4 files)

Using:
* Firefox trunk (32-bit debug build from Tinderbox)
* Mac OS X 10.6.4
* "Java Plug-In 2 for NPAPI Browsers" Version 13.2.0 (came with Mac OS X)
Attached file crash log
In order to crash you might have to load the testcase from the command line.
Does this reproduce on some other OS, so that we can catch it in recording? I personally don't have snow leopard, so I can't take this if it's mac-only.
This bug has nothing to do with OOPP -- it also happens on OS X 10.5.8
(testing with today's Minefield nightly):

bp-657091e0-0bb1-4893-992c-786f12100704

And it's hard to see any specific connection with Java -- it may just
be that the delays caused by starting up the JVM uncover some other
bug(s).

I can't reproduce this bug (on OS X 10.5.8) in either of today's
Namoroka or Shiretoko nightlies (with either their bundled copies of
the JEP or with Sun/Apple's Java Plugin2).

It'd be nice to have a regression range.
Summary: Main-process crash [@ LookupNPP] with OOPP (Java) → Main-process crash [@ LookupNPP]
> And it's hard to see any specific connection with Java -- it may
> just be that the delays caused by starting up the JVM uncover some
> other bug(s).

The crashes don't happen (on the trunk) using my working version of
what I've been calling the "new JEP".  But I still think this is
likely to be mostly or entirely a Mozilla bug.
The proximate cause appears to be an attempt to deference a null NPObject pointer:

http://hg.mozilla.org/mozilla-central/diff/9b2a99adc05e/modules/plugin/base/src/nsJSNPRuntime.cpp
Another thing:

Jesse's crash seems to be in the main (browser) process -- not in a
plugin-specific process.  So it's odd that JavaVM, JavaPlugin2_NPAPI
and JavaNativeFoundation are all loaded there (in the main process).
Out of process plugins are only invoked for Flash, QuickTime and Silverlight, so that's not unexpected.
I'm pretty sure that Sun's Java Plugin2 is run out-of-process on the Mac (on the trunk).  Though *no* plugins are run out-of-process on OS X 10.5.X (only on 10.6.X).
(Following up comment #7)

I just confirmed (using today's nightly on OS X 10.6.4) that Minefield
does launch applets in a separate process.  But even when you use the
"normal" way to load an applet (e.g. clicking on
http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html),
JavaVM, JavaPlugin2_NPAPI, and JavaNativeFoundation all get loaded
into the main process (as well as into the applet-specific process).

Still odd ... though not quite as odd as I first thought.
(Following up comment #11)

The same thing happens with Flash (testing with today's Minefield
nightly and Flash 10.1 on OS X 10.6.4) -- "Flash Player" gets loaded
into both the main process and the plugin-specific process.  (Though
"FlashPlayer-10.6" only gets loaded into the plugin-specific process.)

Please tell me whether this a bug or a feature, and whether or not I
(or someone else) should open a bug on it.
Summary: Main-process crash [@ LookupNPP] → Main-process crash [@ LookupNPP] [@ nsJSObjWrapper::GetNewOrUsed(_NPP*, JSContext*, JSObject*) ]
(In reply to comment #13)
I think those were the mine reproducing the bug using the latest nightly. I was not able to reproduce it with a trunk build from early June so it would be useful to find the regression range.

The source of the error appears to be this line returning NULL: 
NPObject *npobj = (NPObject *)::JS_GetPrivate(cx, obj);
http://hg.mozilla.org/releases/mozilla-1.9.2/annotate/c07c6c1b5071/modules/plugin/base/src/nsJSNPRuntime.cpp#l1100

Here is the documentation:
https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_GetPrivate

It appears to be used correctly but warns that new object's private date fields are initially NULL. Presumably these are expected to be set previously.
> I was not able to reproduce it with a trunk build from early June

My results have been different -- I'm now routinely getting crashes
with builds from January, though not from the end of last year (and
should soon have a precise regression range).  But I suspect these
crashes are sensitive to timing issues, and so people may find
different regression ranges on different machines, or on different OS
versions (I've been testing on OS X 10.5.8).
If you are reproducing it as early as January I wonder if this changeset had an impact on it. I will try it as soon as my build finishes.

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/bdc85ebc0587
For what it's worth, here's the regression range I found:

firefox-2010-01-23-03-mozilla-central
firefox-2010-01-24-03-mozilla-central

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2010-01-23+03%3A00%3A00&enddate=2010-01-24+03%3A00%3A00

Until and unless somebody confirms this, I won't try to guess which patch in this range might have triggered the problem.
Reversing this changeset against trunk fixes the crash.
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/bdc85ebc0587
> Reversing this changeset against trunk fixes the crash.
> http://hg.mozilla.org/releases/mozilla-1.9.2/rev/bdc85ebc0587

Interesting ... but there's nothing inherently wrong with that patch,
that I can see.

Here are some more clues:

Just before every crash I see the following in the Console (though I
also see it with older builds that don't crash):

Process manager already initialized -- can't fully enable headless mode.

Remember that my "new JEP" doesn't crash (even with today's Minefield
nightly).  But it displays error messages which indicate the
following:

1) The browser creates a dummy plugin while loading this bug's URL (as
   if the JavaScript code contained the Packages keyword, or its
   equivalent).

2) The browser calls NPP_SetWindow() and NPP_HandleEvent() on this
   dummy plugin -- which it shouldn't do, and doesn't normally do.
(In reply to comment #19)
> Interesting ... but there's nothing inherently wrong with that patch,
> that I can see.

This patch assumes that npobj is not null and crashes when it is null. Before this patch null would be returned instead. Either this is a bad assumptions or the bug is causing this to be null.
> 1) The browser creates a dummy plugin while loading this bug's URL (as
>    if the JavaScript code contained the Packages keyword, or its
>    equivalent).

I've just confirmed that the browser doesn't normally do this (nsPluginHost::InstantiateDummyJavaPlugin() doesn't normally get called) when loading a "normal" Java applet (e.g. http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html).
> This patch assumes that npobj is not null and crashes when it is null.

Correct.  But the larger question is should npobj ever be null at this point.

> the bug is causing this to be null

I suspect this is true -- that the underlying bug (whatever it turns out to be) is what causes npobj to be null.
> Process manager already initialized -- can't fully enable headless mode.

Oops, this message is a red herring:  You *always* see it (using Apple/Sun's Java Plugin2) whenever any Java applet is displayed.
Benoit, I know this is off topic, but do you know why what I reported in comment #11 and comment #12 happens, and whether it should happen?
Safari (at least on OS X 10.5.8) doesn't load Java (or crash) when loading this bug's testcase.
(In reply to comment #24)
> Benoit, I know this is off topic, but do you know why what I reported in
> comment #11 and comment #12 happens, and whether it should happen?

I can confirm that JavaPlugin2 is supposed to be loaded out-of-process. 

I wasn't sure on what you meant by gets loaded in the main process. Are you basing this from logging? Is this distinct from PluginModuleParent that doesn't load the plug-in itself but manages the plug-in process? I wanted to clarify on IRC but was not sure what your nick was.
Here's how I found out which modules are loaded in the main
process and in plugin-specific processes:

1) Run Minefield (a recent nightly, like today's) on OS X 10.6.4.

2) Load a Java applet
   (e.g. http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html)
   or a Flash "movie"
   (e.g. http://www.playercore.com/bugfiles/146162/AddReturnHTML.html).

3) Find the pids of the main ("firefox-bin") process and the
   plugin-specific process(es).

4) Attach with gdb to each process:

   gdb [firefox-pid] firefox
   gdb [plugin-pid] plugin

5) In each gdb session run "info sharedlibrary".
> I wanted to clarify on IRC but was not sure what your nick was.

Since I've mostly been working on my new JEP for the last few months,
I've mostly stayed off IRC :-)
Thanks for the steps, I can confirm this as well. I will look into this tomorrow to see what needs the plug-in loaded in the main process. I think we might use it at one place to check version number.
(Following up comment #17)

I've confirmed this regression range on OS X 10.6.4 (I'd previously tested on OS X 10.5.8).  Both tests were on the same machine (a 2.8Ghz 8-core MacPro3,1, about 2.5 years old).
Jesse, please try to confirm my regression range from comment #17.
(Following up comment #21)

>> 1) The browser creates a dummy plugin while loading this bug's URL
>>    (as if the JavaScript code contained the Packages keyword, or
>>    its equivalent).
>
> I've just confirmed that the browser doesn't normally do this
> (nsPluginHost::InstantiateDummyJavaPlugin() doesn't normally get
> called) when loading a "normal" Java applet (e.g.
> http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html).

I've just found out that nsPluginHost::InstantiateDummyJavaPlugin()
*doesn't* get called when this bug's testcase is loaded.  So the
browser *doesn't* create a dummy Java plugin (which it shouldn't be
doing anyway).

For complicated reasons, my new JEP just assumed this was the case.

I'll keep digging.
(In further reply to comment #20)

I've now come around to your point of view, Benoit.  Maybe it's best
just to fix the following line in nsJSObjWrapper::GetNewOrUsed():

http://hg.mozilla.org/mozilla-central/diff/91fbd4442e60/modules/plugin/base/src/nsJSNPRuntime.cpp

The fix would be something like:

  if (npobj && (LookupNPP(npobj) == npp))

This change makes the crash go away.  And it seems to be the correct
thing to do if npobj is null.

It still leaves the problem of why npobj is null in the first place.
But maybe that isn't so abnormal as I first thought.  Others (who know
more than I do about JavaScript) will have to judge, I'm afraid.
Attached patch Tentative FixSplinter Review
Here is the patch I had prepared yesterday.

This patch restores the semantics before changeset bdc85ebc0587 (returning null when npobj is null). I'm not sure what the intended semantics are as I'm not familiar with this code. Perhaps your changes better reflect what the semantics should of been.
Comment on attachment 456066 [details] [diff] [review]
Tentative Fix

Can you take a look at the following patch to see if we want a similar fix? Comment 18 and Comment 33 are related to this patch. It was committed part of a bug I don't have read access to.
Attachment #456066 - Flags: review?(benjamin)
npobj being null is definitely not good. Have you guys ever seen this on Windows, or does this problem only show up on mac? If we could catch this in recording, we could probably figure out why npobj is null.

If you're very dedicated, you might be able to use valgrind and chronicle/chronomancer to do the same basic record steps. I'm not sure whether that actually works on mac, but it probably could.
Comment on attachment 456066 [details] [diff] [review]
Tentative Fix

I won't take this without an assertion. I'll take it with an assertion, but I really want to figure out what's actually going on, especially if we have a testcase.
Attachment #456066 - Flags: review?(benjamin) → review-
FWIW, I was writing a browser-chrome mochitest for a different bug and
bumped in to this crash (I think).  Using the x-test plugin only.
It crashes a current mozilla-central Linux64 debug build.  100% reproducible.
Mats, could you only reproduce on Mac, or on other platforms?
Oh, linux64. I tried to use the steps the test seems to be doing on Windows in recording, and I couldn't reproduce. Hrm.
Yeah, it was Linux64 and it was probably a --disable-ipc build when I think
about it.  I'll test other platforms/configurations and report back...
bsmedberg, the browser-chrome test I attached crashes all my debug builds:
Linux x86-64, with --disable-ipc
Linux x86-64, with  --enable-ipc
Windows XP 32-bit, with --enable-ipc
MacOSX 10.5.8 32-bit, with --disable-ipc
Crash Signature: [@ LookupNPP] [@ nsJSObjWrapper::GetNewOrUsed(_NPP*, JSContext*, JSObject*) ]
Crash Signature: [@ LookupNPP] [@ nsJSObjWrapper::GetNewOrUsed(_NPP*, JSContext*, JSObject*) ] → [@ LookupNPP] [@ nsJSObjWrapper::GetNewOrUsed(_NPP*, JSContext*, JSObject*) ] [@ nsJSObjWrapper::GetNewOrUsed ]
I'm marking this bug as WORKSFORME as bug crashlog signature didn't appear from a long time (over half year) [except some obsolete <43 versions, no crashes starting since 43 version].
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: