Closed
Bug 978439
Opened 11 years ago
Closed 11 years ago
Plugin container eating 100% CPU with Adobe Flash and NouVeau Xorg Server
Categories
(Core :: General, defect, P3)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: fredbezies, Unassigned)
Details
Attachments
(4 files, 1 obsolete file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 (Beta/Release)
Build ID: 20140301102433
Steps to reproduce:
Since a few days (just after patch for bug #973192 was included ?), plugin-container is going "crazy" and eats all CPU on flash enabled site like youtube.
To reproduce it ?
1) Have a linux distribution with nouveau Xorg server.
2) Go to youtube
3) Click on connect button
Actual results:
Plugin container eating 100% CPU.
Expected results:
Plugin container not eating all CPU ?
Reporter | ||
Comment 1•11 years ago
|
||
Some technical information : Flash plugin : 11.2.202.341
Using Archlinux 64 bits.
NouVeau : 1.0.10
Reporter | ||
Comment 2•11 years ago
|
||
Comment 3•11 years ago
|
||
(In reply to Frederic Bezies from comment #0)
> Since a few days (just after patch for bug #973192 was included ?), [...]
Could you confirm this by narrowing it down with mozregression?
http://mozilla.github.io/mozregression/
Flags: needinfo?(fredbezies)
Reporter | ||
Comment 4•11 years ago
|
||
Cannot use this tool because of Python3 on my archlinux. Will try to track regression manually.
Flags: needinfo?(fredbezies)
Reporter | ||
Comment 5•11 years ago
|
||
After fighting a lot, I got a regression "window", and I hope it will get smaller soon.
Last good revision : d275eebfae04 -> February 14th
First bad revision : b80f7eece913 -> February 15th
Mozregression is stuck to this now :
... attempting to bisect inbound builds (starting from previous day, to make sure no inbound revision is missed)
Downloading build from: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/02/2014-02-13-03-02-01-mozilla-central/firefox-30.0a1.en-US.linux-x86_64.tar.bz2
===== Downloaded 100% =====
Installing nightly
Getting inbound builds between a62bde1d6efe and b80f7eece913
Reporter | ||
Comment 6•11 years ago
|
||
Mozregression results :
Last good revision: a62bde1d6efe
First bad revision: c8f504ccd3c3
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a62bde1d6efe&tochange=c8f504ccd3c3
Will try to see with homemade builds and find guilty commit. Will be very long. If you have an idea, I'll be pleased to hear it !
Reporter | ||
Comment 7•11 years ago
|
||
Got this in terminal when plugin-container is going crazy.
[26226] WARNING: [PluginModuleParent::ActorDestroy] abnormal shutdown without minidump!: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/dom/plugins/ipc/PluginModuleParent.cpp, line 715
For application/x-shockwave-flash found plugin libflashplayer.so
[26341] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/base/nsTraceRefcntImpl.cpp, line 142
[26341] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/base/nsTraceRefcntImpl.cpp, line 142
LoadPlugin() /usr/lib/mozilla/plugins/libflashplayer.so returned 7f8e1df4c5e0
Reporter | ||
Comment 8•11 years ago
|
||
And another log part :
nsPluginNativeWindowGtk: NPPVpluginNeedsXEmbed=1
nsPluginNativeWindowGtk: call SetWindow with xid=0x1e007a7
nsPluginNativeWindowGtk: call SetWindow with xid=0x1e007a7
[3632] WARNING: blocked access to response header: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/content/base/src/nsXMLHttpRequest.cpp, line 1203
For application/x-shockwave-flash found plugin libflashplayer.so
[3708] WARNING: '!compMgr', file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/glue/nsComponentManagerUtils.cpp, line 59
For application/x-shockwave-flash found plugin libflashplayer.so
[3632] WARNING: NS_ENSURE_TRUE(mTransactionPump || mCachePump) failed: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/netwerk/protocol/http/nsHttpChannel.cpp, line 5374
For application/x-shockwave-flash found plugin libflashplayer.so
nsPluginNativeWindowGtk: NPPVpluginNeedsXEmbed=1
nsPluginNativeWindowGtk: call SetWindow with xid=0x1e008f7
nsPluginNativeWindowGtk: call SetWindow with xid=0x1e008f7
Cannot find any files modified between February 14 and February 15 in it :(
Comment 9•11 years ago
|
||
(In reply to Frederic Bezies from comment #7)
> Got this in terminal when plugin-container is going crazy.
>
> [26226] WARNING: [PluginModuleParent::ActorDestroy] abnormal shutdown
> without minidump!: file
> /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/dom/plugins/ipc/
> PluginModuleParent.cpp, line 715
> For application/x-shockwave-flash found plugin libflashplayer.so
> [26341] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file
> /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/base/
> nsTraceRefcntImpl.cpp, line 142
This looks like the plugin actually gets killed for timeouts, but then loaded again? Is the site triggering this or are you manually reloading?
Otherwise nothing in particular stands out to me so far, if you can narrow this down to a specific commit that would be great!
Reporter | ||
Comment 10•11 years ago
|
||
Killed it in order to "unfreeze" firefox.
Building gave nothing interesting. Will use tinderbox builds. Faster and more useful :D
Reporter | ||
Comment 11•11 years ago
|
||
Using ftp://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/ I have this regression window :
Last working tinderbox build : https://hg.mozilla.org/mozilla-central/rev/23f7a629a217
First broken tinderbox build : https://hg.mozilla.org/mozilla-central/rev/5d7caa093f4f
Don't have time for now using inbound tinderbox builds, will do it later this afternoon.
Comment 12•11 years ago
|
||
(In reply to Frederic Bezies from comment #11)
> Last working tinderbox build :
> https://hg.mozilla.org/mozilla-central/rev/23f7a629a217
> First broken tinderbox build :
> https://hg.mozilla.org/mozilla-central/rev/5d7caa093f4f
That's: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=23f7a629a217&tochange=5d7caa093f4f
Reporter | ||
Comment 13•11 years ago
|
||
And the $64,000 question : which one is "guilty" here ?
Using inbound tinderbox could help or not ?
Comment 14•11 years ago
|
||
At least to me there is nothing jumping out - narrowing it down to an inbound commit should help here.
Reporter | ||
Comment 15•11 years ago
|
||
Looks like I found something, but I've not finished testing all commits as there is a lot.
It seems the older "broken commit" is related to bug Bug 860254.
cf https://hg.mozilla.org/integration/mozilla-inbound/rev/00ea960e8164.
I will try to verify this information asap.
Reporter | ||
Comment 16•11 years ago
|
||
Doing my best with inbound builds, but I cannot find these ones :
5b69776cb061 David Major — Bug 860254 - Part 2: Split junk settings into opt_junk (allocate) versus opt_poison (deallocate). r=glandium
c1b9533aa1a0 David Major — Bug 860254 - Part 1: #define MALLOC_FILL but keep the options const-false. r=glandium
2c1080ed8c03 Glenn Randers-Pehrson — Bug 945912 - Add crashtest for libpng crash. r=ryanvm
f57515a8bb3b Luke Wagner — Bug 936236 - Inline getTransferableContents into caller and handlify (r=jonco)
673bbf3b807f Luke Wagner — Bug 936236 - Inline copyData into caller and handlify (r=jonco)
0e0c6b897c47 Luke Wagner — Bug 936236 - Inline allocateSlots into caller and handlify (r=jonco)
91edf542ecf4 Luke Wagner — Bug 971845 - Don't uninline in JS_GetArrayBufferData, add JS_GetStableArrayBufferData and use that when necessary (r=sfink)
47870c0b90ba Luke Wagner — Bug 971845 - Remove dead JSContext methods (r=sfink)
c78178cf6702 Steve Fink — No bug. Comment clarification on how to free JS_AllocateArrayBufferContents results. r=woof!, DONTBUILD
Reporter | ||
Comment 17•11 years ago
|
||
Got it !
Reverting all 3 patches from bug 860254 is fixing the bug.
Comment 18•11 years ago
|
||
We are unlikely to revert bug 860254 to fix this bug. This is likely evidence of a use-after-free error in the X libraries or something else that comes with NouVeau. A valgrind memcheck run might help diagnose the error further.
Component: Plug-ins → General
Priority: -- → P3
Reporter | ||
Comment 19•11 years ago
|
||
I understand you won't revert it. But at least, I found why with nouveau plugin-container is going crazy.
I can do a valgring memcheck if you tell me how to do so.
Comment 20•11 years ago
|
||
A recent ASAN build from here might also help figure out what is happening:
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/
Reporter | ||
Comment 21•11 years ago
|
||
Downloading last ASAN build and will report any info I can grab.
Reporter | ||
Comment 22•11 years ago
|
||
Here is what I get in terminal with last ASAN build :
==22420==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x000003537030 in thread T0
#0 0x444e65 (/home/fred/Téléchargements/firefox/plugin-container+0x444e65)
#1 0x7faf6b432668 (/usr/lib/libX11.so.6+0x43668)
==22420==ABORTING
And I got a "firefox prompt" telling me that Adobe Flash plugin crashed. If I click on "reload", same error message in terminal.
Comment 23•11 years ago
|
||
It looks like Flash has other bugs that prevent it from running long enough with ASan. How about we make the bet that the root bug here is probably the same as in bug 973192, and so, by using a Nightly ASan build from before the workarounds landed there, you try to reproduce this with ASan just by starting up Firefox?
Reporter | ||
Comment 24•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #23)
> It looks like Flash has other bugs that prevent it from running long enough
> with ASan. How about we make the bet that the root bug here is probably the
> same as in bug 973192, and so, by using a Nightly ASan build from before the
> workarounds landed there, you try to reproduce this with ASan just by
> starting up Firefox?
I'm grabbing a February 15 ASAN build in order to get bug 973192 "alive". And I will report what I get in terminal.
Reporter | ||
Comment 25•11 years ago
|
||
Using ASAN nightly based on revision https://hg.mozilla.org/mozilla-central/rev/305117764929, in order to have February 15 build, when bug 973192 was "alive", I only got this :
==4299==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x000001d1e030 in thread T0
#0 0x444e65 (/home/fred/Téléchargements/firefox/plugin-container+0x444e65)
#1 0x7fbe4ff17668 (/usr/lib/libX11.so.6+0x43668)
==4299==ABORTING
Flash is crashing too soon :(
Comment 26•11 years ago
|
||
Valgrind would happily continue in presence of such errors. Let me know if you have the motivation (will require making a special build).
Reporter | ||
Comment 27•11 years ago
|
||
Just give me instructions. I will follow them asap. I want to see this nasty bug dead for good.
Comment 28•11 years ago
|
||
So in theory this page would be all you need to read,
https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_Valgrind
but in practice it's confusing, lacks turn-key instructions, and out of date regarding jemalloc.
So here are my valgrinding instructions:
1) make a build with --enable-valgrind, --disable-icf, and with the -g -O2 compiler flags (if we were looking for gecko bugs, we'd use less optimization and inlining, but here we're not really looking for a gecko bug). Here is my mozconfig that does all that:
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-g -O2"
ac_add_options --enable-valgrind
ac_add_options --disable-icf
mk_add_options MOZ_OBJDIR=./obj-firefox-valgrind
mk_add_options MOZ_MAKE_FLAGS="-j8 -s"
mk_add_options AUTOCLOBBER=1
export MOZCONFIG=/path/to/this/mozconfig, and './mach build' as usual.
2) run your build in valgrind with this command line (note the yourprofile here needs to be replaced by your actual testing profile name...). This will record the valgrind output in a valgrind-log file.
$ valgrind --soname-synonyms=somalloc=NONE --smc-check=all-non-file --track-origins=yes obj-firefox-valgrind/dist/bin/firefox -P yourprofile -no-remote 2>&1 | tee valgrind-log
Reporter | ||
Comment 29•11 years ago
|
||
Valgrind log. Don't know if it will be really useful. I hope it will help.
Comment 30•11 years ago
|
||
That seems very interesting and relevant, since it happens under XCloseDisplay(). Too bad that you dont have symbols there. Have you installed debug symbols for X libraries, and so is it Valgrind that's bad at picking them up?
Julian, do you have generic advice to get symbols from system libraries in Valgrind stacks on Linux, that would have a chance to help with the stack in comment 29?
Flags: needinfo?(jseward)
Reporter | ||
Comment 31•11 years ago
|
||
I'm building debug packages to have something more complete. I will post another valgrind log asap.
Comment 32•11 years ago
|
||
OK, I'll clear the needinfo on Julian, then, for now.
Flags: needinfo?(jseward)
Comment 33•11 years ago
|
||
Another thing that would be super useful would be to valgrind the standalone version of glxtest that I attached on https://bugs.freedesktop.org/show_bug.cgi?id=75279#c3 .
If the same valgrind error reproduces there, then you've proven that there is a specific bug in Nouveau, you can then hopefully get Nouveau developers to pay attention, and from there on you dont need to worry about mozilla valgrind builds anymore...
Comment 34•11 years ago
|
||
(Valgrinding it is super easy: you just build it, it doesn't need special build settings to be valgrindable, and just run valgrind --track-origins=yes ./glxtest).
Reporter | ||
Comment 35•11 years ago
|
||
Here is the "full" version of valgrind log, I mean with debug symbols.
Attachment #8386255 -
Attachment is obsolete: true
Reporter | ||
Comment 36•11 years ago
|
||
Looks like something is rotten in NouVeau :(
Will report to NouVeau developpers. Thanks for the tip.
Comment 37•11 years ago
|
||
Woohoo, comment 35 and 36 are like it's Christmas today. Yep, do talk to Nouveau devs, especially with comment 36 they will certainly be interested!
Reporter | ||
Comment 38•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #37)
> Woohoo, comment 35 and 36 are like it's Christmas today. Yep, do talk to
> Nouveau devs, especially with comment 36 they will certainly be interested!
I added all logs on freedesktop bug report. And christmas ? If you think so ;)
At least, until nouveau is fixed, I can apply a "dirty workaround" :/
Reporter | ||
Comment 39•11 years ago
|
||
Mesa 10.0.5 / 10.1.1 is released and got a fix for this issue. So, adding a release note could be a great idea, until all distributions are using Mesa 10.0.5+ or 10.1.1+
http://lists.freedesktop.org/archives/mesa-announce/2014-April/000080.html
http://lists.freedesktop.org/archives/mesa-announce/2014-April/000081.html
Fix ?
Ilia Mirkin (8):
nouveau: fix fence waiting logic in screen destroy
Reporter | ||
Comment 40•11 years ago
|
||
It is fixed since Archlinux upgraded to a fixed MesaGL version. Please closed it as FIXED.
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•