Closed Bug 768651 Opened 12 years ago Closed 11 years ago

Firefox started with "cfx run" hangs for some URLs: Windows only

Categories

(Add-on SDK Graveyard :: General, defect, P1)

x86
Windows 7
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla25

People

(Reporter: wbamberg, Assigned: gkrizsanits)

References

Details

Attachments

(1 file)

Reported in the forum by Konrad Gorski: https://forums.mozilla.org/addons/viewtopic.php?f=27&t=10904&p=22901#p22901 1) On Windows 7, create a minimal add-on, for example: const widgets = require("widget"); const tabs = require("tabs"); var widget = widgets.Widget({ id: "mozilla-link", label: "Mozilla website", contentURL: "http://www.mozilla.org/favicon.ico", onClick: function() { tabs.open("http://www.mozilla.org/"); } }); console.log("The add-on is running."); 2) Run it using "cfx run", then navigate to certain URLs, for example: http://www.ebay.com/ctg/Nikon-D5100-162-MP-Digital-SLR-Camera-Black-Kit-w-AFS-1855mm-VR-Lens-/101827356?_pcatid=782&_pcategid=31388&_dmpt=Digital_Cameras&_dashexp=1 http://www.ebay.com/itm/COBRA-ESD-9275-Digital-6-Band-Laser-Radar-Detector-w-Safety-Alert-LaserEye-/350560167708?pt=LH_DefaultDomain_0&hash=item519f03a71c 3) Firefox will hang. If instead you: 1) build the add-on with "cfx xpi" 2) run Firefox 3) install the add-on 4) visit those pages ...then everything's fine.
As a guess, one of the SDK's default preferences that change Firefox's defaults for cfx run/test?
(In reply to Wes Kocher (:KWierso) from comment #1) > As a guess, one of the SDK's default preferences that change Firefox's > defaults for cfx run/test? I've tested with all of the preferences that the SDK sets commented out, and this still happens. I've tested with all of the environmental variables that the SDK sets commented out, and this still happens. That last post in the forum thread about a Windows update possibly causing this seems possible.
I think if you set hangmonitor.timeout in about:config to a number of seconds then after that a crash report will be submitted. That could at least give us a stack trace so we know what is hanging here.
Looks like it's hanging on plugin IPC stuff. I imagine disabling plugins would make the problem go away. Might need some of the plugin guys to look at this.
Maybe Eddy might have some quick thoughts here
(In reply to Dave Townsend (:Mossop) from comment #6) > Maybe Eddy might have some quick thoughts here nsNPAPIPlugin::CreatePlugin is returning with error code NS_ERROR_OUT_OF_MEMORY (see thread 0, stackframe 14). That looks very suspicious.
Wouldn't it be related to bug 771847? Is firefox crashing in bug 771847? If yes, we may have to tweak our test runner in order to retrieve the crash report somehow!!
Assignee: nobody → ejpbruel
Whiteboard: [triage:followup]
Priority: -- → P2
Whiteboard: [triage:followup]
have the same problem on FF 14.0.1 on Win7
make off all plugins - now can develop without problem
I can reproduce this, the problem is specifically with the flash plugin (so far I've seen crashes on last.fm and amazon.com/cloudplayer). Disabling flash makes the problem go away.
I'm seeing this now with Firefox 19.0.2 under "cfx" on Windows 7 Pro. Firefox hangs on pages which have Flash. Pages without Flash work fine. One or two copies of "plugin-container.exe" are running. Firefox does not respond to input, the cursor does not change on mouse-over, and the Firefox window cannot be closed. Killing the "plugin-container.exe" processes will sometimes get Firefox going again.
Tried opening the error console, then loading a page with Flash with the browser running under "cfx": Errors: Timestamp: 3/22/2013 4:46:55 PM Warning: ReferenceError: reference to undefined property aEvent.button Source File: chrome://browser/content/browser.js Line: 8893 Timestamp: 3/22/2013 4:46:55 PM Warning: ReferenceError: reference to undefined property e.button Source File: chrome://browser/content/utilityOverlay.js Line: 148 Running the same version of Firefox, not under "cfx", works fine for the same pages.
The above is with Windows 7 Pro, Firefox 19.0.2, Flash 11.6.602.180, SDK 1.13.2. So it's broken with the latest and greatest versions of everything.
cfx run seems flaky, I wonder if cfx.js will help us here eventually. My workaround is just to use Wladimir's add-on: https://addons.mozilla.org/en-US/firefox/addon/autoinstaller/ In my mind the class of problems you might run into because you aren't using a clean Firefox profile are exceedingly rare.
Assignee: ejpbruel → gkrizsanits
Here is what happens: - nsNPAPIPlugin::CreatePlugin sends out an rpc init request for the flash plugin - flash returns an error code from the NP_Initialize (error code value: 1) (- regardless of the error code on windows we send out an async SendSetAudioSessionData right after, but I turned this off and did not change a thing) - PPluginModuleParent::CallNP_Shutdown gets called because of the error code, an rpc NP_Shutdown request is sent out - PluginModuleChild::AnswerNP_Shutdown (this is the plugin-container process) calls mShutdownFunc(), which is a function in the flash plugin itself, and it never returns, and our main thread in the firefox process is being blocked waiting for the response for the rpc command... Problems: 1., I don't think it's a good idea that the shutdown part is rpc... can we do this part async? Or can we use some kind of timeout and shut it down forcefully after a while (I know this sounds terrible too...)? Letting the plugin-container process deadlock our main thread this way is quite bad, is there a chance we can fix this somehow? I kind of know how difficult is to give a good answer for this... I'm just desperately trying to figure out a workaround for this issue since this bug hurting add-on developers a lot. Any smart hack would be great... (the real solution would be not to block ever the main thread of the firefox process I guess, but that's on a long time wish list of all of us and very hard in practice, right?) 2., If we could figure out why the flash plugin fails to init in this setup that would be great. I've been trying to playing with procmon to figure out the root of the failure, and changing stuff like the cwd of the firefox process we start up with cfx, but no luck. Should we try and assemble a minimal example, and it to the flash plugin developers?
Flags: needinfo?(benjamin)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16) > Here is what happens: > > - nsNPAPIPlugin::CreatePlugin sends out an rpc init request for the flash > plugin > - flash returns an error code from the NP_Initialize (error code value: 1) > (- regardless of the error code on windows we send out an async > SendSetAudioSessionData right after, but I turned this off and did not > change a thing) > - PPluginModuleParent::CallNP_Shutdown gets called because of the error > code, an rpc NP_Shutdown request is sent out This is a bug. If NP_Initialize fails, NP_Shutdown should not be called. Please file a separate bug for that. > 1., I don't think it's a good idea that the shutdown part is rpc... can we > do this part async? Not really, no. Or can we use some kind of timeout and shut it down > forcefully after a while (I know this sounds terrible too...)? We already have a timeout for RPC calls; 45 seconds in release builds and infinite in debug builds. It should also show the plugin hang UI on Windows. > > 2., If we could figure out why the flash plugin fails to init in this setup > that would be great. I've been trying to playing with procmon to figure out > the root of the failure, and changing stuff like the cwd of the firefox > process we start up with cfx, but no luck. Should we try and assemble a > minimal example, and it to the flash plugin developers? It is unlikely that this will get attention from Adobe. It sounds like your test runner is setting some environment variable or other setup which causes Flash to fail. You can of course just debug the Flash player at the point we call NP_Initialize to see if you can figure out what's going on.
Flags: needinfo?(benjamin)
Thanks for the lots of useful info. (In reply to Benjamin Smedberg [:bsmedberg] from comment #17) > This is a bug. If NP_Initialize fails, NP_Shutdown should not be called. > Please file a separate bug for that. Bug 889480. > We already have a timeout for RPC calls; 45 seconds in release builds and > infinite in debug builds. It should also show the plugin hang UI on Windows. Alright, I don't think I have anything better than that... > It is unlikely that this will get attention from Adobe. I was afraid you're going to say this :) > You can of course just debug the Flash player at the point we > call NP_Initialize to see if you can figure out what's going on. I wish I had a debug version of the Flash player, or the source code... I feel like poking a black box with a stick and hope that it starts to work accidentally...
How much of a win would be if instead of hanging on sites for a long long time, flash plugin would simply just fail to load instead in this setup? Hopefully we can figure out why flash fails to init (and I think we should), just trying to estimate how much we would gain if Bug 889480 were fixed.
Flags: needinfo?(dtownsend+bugmail)
cfx is a python tool, right? The slightly tedious way to do this is to launch Firefox from Python using subprocess (or whatever cfx is doing) and progressively add special environment setup until Flash fails. Some possibilities: * security descriptors on process launch * Custom profile, files or permissions * Unusual prefs * Environment variables * Process groups In fact, if cfx is using process groups at all, it's possible that is the cause. Flash probably uses groups in its own way to enable its sandbox, and if cfx is setting up its own process groups that could interfere.
When 889480 is fixed Flash will no longer hang, it just won't work.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #20) > cfx is a python tool, right? The slightly tedious way to do this is to > launch Firefox from Python using subprocess (or whatever cfx is doing) and > progressively add special environment setup until Flash fails. Some > possibilities: > > * security descriptors on process launch > * Custom profile, files or permissions > * Unusual prefs > * Environment variables > * Process groups > > In fact, if cfx is using process groups at all, it's possible that is the > cause. Flash probably uses groups in its own way to enable its sandbox, and > if cfx is setting up its own process groups that could interfere. Not much I see yet there: env vars: https://github.com/mozilla/addon-sdk/blob/master/python-lib/cuddlefish/runner.py#L502 process start: https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/__init__.py#L56-L59 The no-remote flag can be interesting... I'm sure you know a lot more about the killableprocess.py than I do :) Does it do anything special in the way it starts up the process?
Holy crap, are you guys still using killableprocess? https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/killableprocess.py#L119 could well be the cause of this. Try removing that flag and see what happens.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #19) > How much of a win would be if instead of hanging on sites for a long long > time, flash plugin would simply just fail to load instead in this setup? > Hopefully we can figure out why flash fails to init (and I think we should), > just trying to estimate how much we would gain if Bug 889480 were fixed. I'll certainly take that over hanging any day of the week. I do think having flash working is useful for certain use cases (add-ons that affect youtube f.e.)
Flags: needinfo?(dtownsend+bugmail)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #23) > Holy crap, are you guys still using killableprocess? > > https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/ > killableprocess.py#L119 could well be the cause of this. Try removing that > flag and see what happens. we made the mistake of forking mozrunner some time ago and we haven't taken the time to get ourselves onto a more recent version :(
(In reply to Benjamin Smedberg [:bsmedberg] from comment #23) > Holy crap, are you guys still using killableprocess? > > https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/ > killableprocess.py#L119 could well be the cause of this. Try removing that > flag and see what happens. It didn't help, but at least I know where to focus now, I know some about windows processes, just not familiar with all these python code in the SDK. Has been trying to find my way in it so far... I'll play with it some more tomorrow.
CREATE_BREAKAWAY_FROM_JOB is the main problem, Flash plugin starts a child process as well at some point. If I comment it out I get all sort of errors cannot access the process, and I get all sort of errors, because we cannot access some handles... File "c:\development\addonsdk\trunk\addon-sdk\python-lib\cuddlefish\runner.py", line 708, in run_app runner.start() File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\__init__.py", line 532, in start self.process_handler = run_command(self.command+self.cmdargs, self.env, **self.kp_kwargs) File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\__init__.py", line 60, in run_command return killableprocess.Popen(cmd, cwd="c:/Development/mozilla/mozilla-central3/obj/dist/bin", env=env, **killable_kw args) File "c:\mozilla-build\python\lib\subprocess.py", line 679, in __init__ errread, errwrite) File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\killableprocess.py", line 165, in _execute_child winprocess.AssignProcessToJobObject(self._job, int(hp)) File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\winprocess.py", line 51, in ErrCheckBool raise WinError() WindowsError: [Error 5] Access is denied. Error in atexit._run_exitfuncs: Traceback (most recent call last): File "c:\mozilla-build\python\lib\atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "c:\development\addonsdk\trunk\addon-sdk\python-lib\cuddlefish\runner.py", line 536, in maybe_remove_outfile os.remove(outfile) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\gar bo\\appdata\\local\\temp\\harness-stdout-m7uwfe' Error in sys.exitfunc: Traceback (most recent call last): File "c:\mozilla-build\python\lib\atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "c:\development\addonsdk\trunk\addon-sdk\python-lib\cuddlefish\runner.py", line 536, in maybe_remove_outfile os.remove(outfile) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\gar bo\\appdata\\local\\temp\\harness-stdout-m7uwfe' But if I comment out the CREATE_SUSPENDED, then the process starts regardless, and flash works in it. Although I have to comment out the in/out/err channels as well too, to have the console working as before, and the closing of the process is less than ideal... There should be a way to set a flag on the job object (JOB_OBJECT_LIMIT_BREAKAWAY_OK) and then no subsequent child process will be part of this job, that might be enough for making flash to work and would be a minimal change... but I don't know how to do that from python... Frankly, I don't know much about win jobs, just found this flag on msdn. Also, I'm not really sure why we need all this killableprocess thing. It would be great to do something simpler probably...
I suspect most of it stems from the days when Firefox would restart itself out from under you, to ensure that you could kill the actual process.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #28) > I suspect most of it stems from the days when Firefox would restart itself > out from under you, to ensure that you could kill the actual process. So it seems like if I don't set the CREATE_BREAKAWAY_FROM_JOB flag and don't create a windows job object (and assign the process to it), flash just works. What might be the downside of this approach (if there is any) in the current world?
The point of killableprocess is that it would not only kill the process you launch, but also any *other* child process that gets launched. Removing the job/CREATE_BREAKAWAY_FROM_JOB code will basically mean that killableprocess at best only kills the one process it launches, and not any subprocesses such as plugin processes.
I think that it's still better than the current version we have. I wish I could come up with something better, but I kind of tried out all the various flags, windows has to offer here for the job object, and flash does not seem to like any of them, or we get an access denied when trying to access the process handle. So I can keep playing with it for a while but I'm getting a bit pessimistic about finding any better solution here. The good thing that hitting the close button of the browser kills everything nicely at least, but if addons will have their separate process in the future, this is going to be a problem. What do you think Dave?
Its possible you can experiment with giving the Firefox process the JOB_OBJECT_LIMIT_BREAKAWAY_OK permission, but that assumes that Flash creates its subprocesses with CREATE_BREAKAWAY_FROM_JOB, which is... unlikely.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #32) > Its possible you can experiment with giving the Firefox process the > JOB_OBJECT_LIMIT_BREAKAWAY_OK permission, but that assumes that Flash > creates its subprocesses with CREATE_BREAKAWAY_FROM_JOB, which is... > unlikely. Exactly. I was desperate enough to try that but it is not the case, and it's unlikely that anyone will make that change... further more, any other plugin can create custom processes.
I was talking to Dave about this one, and he agreed that this approach is still a lot more than what we have right now. But we were wondering how the current mozrunner solve this whole shutting down problem without this killableprocess? And if we could integrate it / migrate to it easily?
So the current version of mozrunner does the same thing (creating job and all), so in this respect it's very similar. We might want to migrate to it at some point, but it won't fix our problem. On try my fix seems to be green... https://tbpl.mozilla.org/?tree=Try&rev=a61bff0e0abb
Attachment #775577 - Flags: review?(dtownsend+bugmail) → review+
Commits pushed to master at https://github.com/mozilla/addon-sdk https://github.com/mozilla/addon-sdk/commit/465b49e7c0d74899e1fef7ad2f8425b4c2e85fa0 Bug 768651 - Cfx run hangs on windows on sites using flash https://github.com/mozilla/addon-sdk/commit/980641c74debec6ed76a021fc5be13124d59164a Merge pull request #1104 from krizsa/master Fixing Bug 768651 - cfx run hangs on windows on sites with flash. r=Mossop
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
It's still hanging for me on youtube.com with Firefox Setup Stub 25.0b7
(In reply to cprcrack from comment #39) > It's still hanging for me on youtube.com with Firefox Setup Stub 25.0b7 Which version of the SDK?
It's not supposed to be "fixed" until Firefox 25. The release channel is still at 24.
(In reply to John Nagle from comment #41) > It's not supposed to be "fixed" until Firefox 25. The release channel is > still at 24. I downloaded from the beta channel at <http://www.mozilla.org/firefox/beta/>, version 25.0 beta 7. It's supposed to be fixed there right? (In reply to Dave Townsend (:Mossop) from comment #40) > Which version of the SDK? The latest: addon-sdk-1.14
(In reply to cprcrack from comment #42) > (In reply to John Nagle from comment #41) > > It's not supposed to be "fixed" until Firefox 25. The release channel is > > still at 24. > > I downloaded from the beta channel at > <http://www.mozilla.org/firefox/beta/>, version 25.0 beta 7. It's supposed > to be fixed there right? > > (In reply to Dave Townsend (:Mossop) from comment #40) > > Which version of the SDK? > > The latest: addon-sdk-1.14 Unfortunately the current release version is too old to contain this fix. We're working on releasing 1.15 soon to address this
I'm running 1.15 and FF 26 on Win 7 pro and still appear to be getting the error. Any recommendations to investigate?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: