Closed Bug 978439 Opened 11 years ago Closed 11 years ago

Plugin container eating 100% CPU with Adobe Flash and NouVeau Xorg Server

Categories

(Core :: General, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: fredbezies, Unassigned)

Details

Attachments

(4 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 (Beta/Release) Build ID: 20140301102433 Steps to reproduce: Since a few days (just after patch for bug #973192 was included ?), plugin-container is going "crazy" and eats all CPU on flash enabled site like youtube. To reproduce it ? 1) Have a linux distribution with nouveau Xorg server. 2) Go to youtube 3) Click on connect button Actual results: Plugin container eating 100% CPU. Expected results: Plugin container not eating all CPU ?
Some technical information : Flash plugin : 11.2.202.341 Using Archlinux 64 bits. NouVeau : 1.0.10
(In reply to Frederic Bezies from comment #0) > Since a few days (just after patch for bug #973192 was included ?), [...] Could you confirm this by narrowing it down with mozregression? http://mozilla.github.io/mozregression/
Flags: needinfo?(fredbezies)
Cannot use this tool because of Python3 on my archlinux. Will try to track regression manually.
Flags: needinfo?(fredbezies)
After fighting a lot, I got a regression "window", and I hope it will get smaller soon. Last good revision : d275eebfae04 -> February 14th First bad revision : b80f7eece913 -> February 15th Mozregression is stuck to this now : ... attempting to bisect inbound builds (starting from previous day, to make sure no inbound revision is missed) Downloading build from: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/02/2014-02-13-03-02-01-mozilla-central/firefox-30.0a1.en-US.linux-x86_64.tar.bz2 ===== Downloaded 100% ===== Installing nightly Getting inbound builds between a62bde1d6efe and b80f7eece913
Mozregression results : Last good revision: a62bde1d6efe First bad revision: c8f504ccd3c3 Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a62bde1d6efe&tochange=c8f504ccd3c3 Will try to see with homemade builds and find guilty commit. Will be very long. If you have an idea, I'll be pleased to hear it !
Got this in terminal when plugin-container is going crazy. [26226] WARNING: [PluginModuleParent::ActorDestroy] abnormal shutdown without minidump!: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/dom/plugins/ipc/PluginModuleParent.cpp, line 715 For application/x-shockwave-flash found plugin libflashplayer.so [26341] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/base/nsTraceRefcntImpl.cpp, line 142 [26341] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/base/nsTraceRefcntImpl.cpp, line 142 LoadPlugin() /usr/lib/mozilla/plugins/libflashplayer.so returned 7f8e1df4c5e0
And another log part : nsPluginNativeWindowGtk: NPPVpluginNeedsXEmbed=1 nsPluginNativeWindowGtk: call SetWindow with xid=0x1e007a7 nsPluginNativeWindowGtk: call SetWindow with xid=0x1e007a7 [3632] WARNING: blocked access to response header: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/content/base/src/nsXMLHttpRequest.cpp, line 1203 For application/x-shockwave-flash found plugin libflashplayer.so [3708] WARNING: '!compMgr', file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/glue/nsComponentManagerUtils.cpp, line 59 For application/x-shockwave-flash found plugin libflashplayer.so [3632] WARNING: NS_ENSURE_TRUE(mTransactionPump || mCachePump) failed: file /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/netwerk/protocol/http/nsHttpChannel.cpp, line 5374 For application/x-shockwave-flash found plugin libflashplayer.so nsPluginNativeWindowGtk: NPPVpluginNeedsXEmbed=1 nsPluginNativeWindowGtk: call SetWindow with xid=0x1e008f7 nsPluginNativeWindowGtk: call SetWindow with xid=0x1e008f7 Cannot find any files modified between February 14 and February 15 in it :(
(In reply to Frederic Bezies from comment #7) > Got this in terminal when plugin-container is going crazy. > > [26226] WARNING: [PluginModuleParent::ActorDestroy] abnormal shutdown > without minidump!: file > /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/dom/plugins/ipc/ > PluginModuleParent.cpp, line 715 > For application/x-shockwave-flash found plugin libflashplayer.so > [26341] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file > /home/fred/.moz-commitbuilder-cache/mozbuild-trunk/xpcom/base/ > nsTraceRefcntImpl.cpp, line 142 This looks like the plugin actually gets killed for timeouts, but then loaded again? Is the site triggering this or are you manually reloading? Otherwise nothing in particular stands out to me so far, if you can narrow this down to a specific commit that would be great!
Killed it in order to "unfreeze" firefox. Building gave nothing interesting. Will use tinderbox builds. Faster and more useful :D
Using ftp://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/ I have this regression window : Last working tinderbox build : https://hg.mozilla.org/mozilla-central/rev/23f7a629a217 First broken tinderbox build : https://hg.mozilla.org/mozilla-central/rev/5d7caa093f4f Don't have time for now using inbound tinderbox builds, will do it later this afternoon.
And the $64,000 question : which one is "guilty" here ? Using inbound tinderbox could help or not ?
At least to me there is nothing jumping out - narrowing it down to an inbound commit should help here.
Looks like I found something, but I've not finished testing all commits as there is a lot. It seems the older "broken commit" is related to bug Bug 860254. cf https://hg.mozilla.org/integration/mozilla-inbound/rev/00ea960e8164. I will try to verify this information asap.
Doing my best with inbound builds, but I cannot find these ones : 5b69776cb061 David Major — Bug 860254 - Part 2: Split junk settings into opt_junk (allocate) versus opt_poison (deallocate). r=glandium c1b9533aa1a0 David Major — Bug 860254 - Part 1: #define MALLOC_FILL but keep the options const-false. r=glandium 2c1080ed8c03 Glenn Randers-Pehrson — Bug 945912 - Add crashtest for libpng crash. r=ryanvm f57515a8bb3b Luke Wagner — Bug 936236 - Inline getTransferableContents into caller and handlify (r=jonco) 673bbf3b807f Luke Wagner — Bug 936236 - Inline copyData into caller and handlify (r=jonco) 0e0c6b897c47 Luke Wagner — Bug 936236 - Inline allocateSlots into caller and handlify (r=jonco) 91edf542ecf4 Luke Wagner — Bug 971845 - Don't uninline in JS_GetArrayBufferData, add JS_GetStableArrayBufferData and use that when necessary (r=sfink) 47870c0b90ba Luke Wagner — Bug 971845 - Remove dead JSContext methods (r=sfink) c78178cf6702 Steve Fink — No bug. Comment clarification on how to free JS_AllocateArrayBufferContents results. r=woof!, DONTBUILD
Got it ! Reverting all 3 patches from bug 860254 is fixing the bug.
We are unlikely to revert bug 860254 to fix this bug. This is likely evidence of a use-after-free error in the X libraries or something else that comes with NouVeau. A valgrind memcheck run might help diagnose the error further.
Component: Plug-ins → General
Priority: -- → P3
I understand you won't revert it. But at least, I found why with nouveau plugin-container is going crazy. I can do a valgring memcheck if you tell me how to do so.
A recent ASAN build from here might also help figure out what is happening: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/
Downloading last ASAN build and will report any info I can grab.
Here is what I get in terminal with last ASAN build : ==22420==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x000003537030 in thread T0 #0 0x444e65 (/home/fred/Téléchargements/firefox/plugin-container+0x444e65) #1 0x7faf6b432668 (/usr/lib/libX11.so.6+0x43668) ==22420==ABORTING And I got a "firefox prompt" telling me that Adobe Flash plugin crashed. If I click on "reload", same error message in terminal.
It looks like Flash has other bugs that prevent it from running long enough with ASan. How about we make the bet that the root bug here is probably the same as in bug 973192, and so, by using a Nightly ASan build from before the workarounds landed there, you try to reproduce this with ASan just by starting up Firefox?
(In reply to Benoit Jacob [:bjacob] from comment #23) > It looks like Flash has other bugs that prevent it from running long enough > with ASan. How about we make the bet that the root bug here is probably the > same as in bug 973192, and so, by using a Nightly ASan build from before the > workarounds landed there, you try to reproduce this with ASan just by > starting up Firefox? I'm grabbing a February 15 ASAN build in order to get bug 973192 "alive". And I will report what I get in terminal.
Using ASAN nightly based on revision https://hg.mozilla.org/mozilla-central/rev/305117764929, in order to have February 15 build, when bug 973192 was "alive", I only got this : ==4299==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x000001d1e030 in thread T0 #0 0x444e65 (/home/fred/Téléchargements/firefox/plugin-container+0x444e65) #1 0x7fbe4ff17668 (/usr/lib/libX11.so.6+0x43668) ==4299==ABORTING Flash is crashing too soon :(
Valgrind would happily continue in presence of such errors. Let me know if you have the motivation (will require making a special build).
Just give me instructions. I will follow them asap. I want to see this nasty bug dead for good.
So in theory this page would be all you need to read, https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_Valgrind but in practice it's confusing, lacks turn-key instructions, and out of date regarding jemalloc. So here are my valgrinding instructions: 1) make a build with --enable-valgrind, --disable-icf, and with the -g -O2 compiler flags (if we were looking for gecko bugs, we'd use less optimization and inlining, but here we're not really looking for a gecko bug). Here is my mozconfig that does all that: ac_add_options --enable-application=browser ac_add_options --enable-optimize="-g -O2" ac_add_options --enable-valgrind ac_add_options --disable-icf mk_add_options MOZ_OBJDIR=./obj-firefox-valgrind mk_add_options MOZ_MAKE_FLAGS="-j8 -s" mk_add_options AUTOCLOBBER=1 export MOZCONFIG=/path/to/this/mozconfig, and './mach build' as usual. 2) run your build in valgrind with this command line (note the yourprofile here needs to be replaced by your actual testing profile name...). This will record the valgrind output in a valgrind-log file. $ valgrind --soname-synonyms=somalloc=NONE --smc-check=all-non-file --track-origins=yes obj-firefox-valgrind/dist/bin/firefox -P yourprofile -no-remote 2>&1 | tee valgrind-log
Attached file Valgrind log. (obsolete) —
Valgrind log. Don't know if it will be really useful. I hope it will help.
That seems very interesting and relevant, since it happens under XCloseDisplay(). Too bad that you dont have symbols there. Have you installed debug symbols for X libraries, and so is it Valgrind that's bad at picking them up? Julian, do you have generic advice to get symbols from system libraries in Valgrind stacks on Linux, that would have a chance to help with the stack in comment 29?
Flags: needinfo?(jseward)
I'm building debug packages to have something more complete. I will post another valgrind log asap.
OK, I'll clear the needinfo on Julian, then, for now.
Flags: needinfo?(jseward)
Another thing that would be super useful would be to valgrind the standalone version of glxtest that I attached on https://bugs.freedesktop.org/show_bug.cgi?id=75279#c3 . If the same valgrind error reproduces there, then you've proven that there is a specific bug in Nouveau, you can then hopefully get Nouveau developers to pay attention, and from there on you dont need to worry about mozilla valgrind builds anymore...
(Valgrinding it is super easy: you just build it, it doesn't need special build settings to be valgrindable, and just run valgrind --track-origins=yes ./glxtest).
Here is the "full" version of valgrind log, I mean with debug symbols.
Attachment #8386255 - Attachment is obsolete: true
Looks like something is rotten in NouVeau :( Will report to NouVeau developpers. Thanks for the tip.
Woohoo, comment 35 and 36 are like it's Christmas today. Yep, do talk to Nouveau devs, especially with comment 36 they will certainly be interested!
(In reply to Benoit Jacob [:bjacob] from comment #37) > Woohoo, comment 35 and 36 are like it's Christmas today. Yep, do talk to > Nouveau devs, especially with comment 36 they will certainly be interested! I added all logs on freedesktop bug report. And christmas ? If you think so ;) At least, until nouveau is fixed, I can apply a "dirty workaround" :/
Mesa 10.0.5 / 10.1.1 is released and got a fix for this issue. So, adding a release note could be a great idea, until all distributions are using Mesa 10.0.5+ or 10.1.1+ http://lists.freedesktop.org/archives/mesa-announce/2014-April/000080.html http://lists.freedesktop.org/archives/mesa-announce/2014-April/000081.html Fix ? Ilia Mirkin (8): nouveau: fix fence waiting logic in screen destroy
It is fixed since Archlinux upgraded to a fixed MesaGL version. Please closed it as FIXED.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: