Last Comment Bug 769048 - Attach to and report on crashes in FlashPlayerPlugin_*.exe processes which are children of our plugin container
: Attach to and report on crashes in FlashPlayerPlugin_*.exe processes which ar...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: unspecified
: x86_64 Windows 7
: P1 normal with 1 vote (vote)
: mozilla16
Assigned To: Benjamin Smedberg [:bsmedberg]
:
Mentors:
https://wiki.mozilla.org/User:Benjami...
Depends on: 770805 771082 771134 771251 772432 773632
Blocks: 770078 773332
  Show dependency treegraph
 
Reported: 2012-06-27 14:18 PDT by Benjamin Smedberg [:bsmedberg]
Modified: 2012-09-17 06:29 PDT (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
+
fixed
fixed


Attachments
The injector library, rev. 1 (15.18 KB, patch)
2012-06-27 18:46 PDT, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Review
Part A - Breakpad update cherrypicked from upstream (14.27 KB, patch)
2012-07-01 18:46 PDT, Benjamin Smedberg [:bsmedberg]
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review
Part B - Ambiguous call resolution from breakpad update (1.08 KB, patch)
2012-07-01 18:47 PDT, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Review
Part C - the injected library (17.86 KB, patch)
2012-07-01 18:48 PDT, Benjamin Smedberg [:bsmedberg]
khuey: review+
Details | Diff | Review
Part D - MemoryModule import (15.42 KB, patch)
2012-07-01 18:50 PDT, Benjamin Smedberg [:bsmedberg]
ehsan: review+
Details | Diff | Review
Part E - Crashreport API for injection (6.10 KB, patch)
2012-07-01 18:51 PDT, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Review
Part F - actually do injection from PluginModuleParent specifically for Flash and FlashPlayerPlugin processes only (9.38 KB, patch)
2012-07-01 18:52 PDT, Benjamin Smedberg [:bsmedberg]
jmathies: review+
Details | Diff | Review
Part E, files added (10.04 KB, patch)
2012-07-02 05:15 PDT, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Review
Part E - fix crash when crash reporter is disabled mid-run as in mochitests (10.17 KB, patch)
2012-07-02 08:53 PDT, Benjamin Smedberg [:bsmedberg]
ehsan: review+
Details | Diff | Review
Part C - the injected library, with build fix (19.34 KB, patch)
2012-07-02 08:54 PDT, Benjamin Smedberg [:bsmedberg]
ehsan: review+
Details | Diff | Review
Part F - actually do injection from PluginModuleParent specifically for Flash and FlashPlayerPlugin processes only (11.75 KB, patch)
2012-07-02 08:56 PDT, Benjamin Smedberg [:bsmedberg]
jmathies: review+
Details | Diff | Review
Part B - Ambiguous call resolution from breakpad update (985 bytes, patch)
2012-07-02 09:57 PDT, Benjamin Smedberg [:bsmedberg]
ehsan: review+
Details | Diff | Review
Part D - MemoryModule import, with fixes (16.37 KB, patch)
2012-07-02 11:27 PDT, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Review

Description Benjamin Smedberg [:bsmedberg] 2012-06-27 14:18:29 PDT
We currently only see crashes in the plugin-container process. With the Flash Player 11.3 sandbox, the interesting crashes all occur in the FlashPlayerPlugin_*.exe processes. We can inject our crash reporter into those processes and report crashes directly to crash-stats from there, which will significantly aid bucketing and diagnosing the increases in crash rates with Flash 11.3.
Comment 1 Benjamin Smedberg [:bsmedberg] 2012-06-27 14:23:30 PDT
http://hg.mozilla.org/users/bsmedberg_mozilla.com/breakpadinject/ contains PoC code which hooks breakpad up via DLL injection in a way that doesn't require any changes to the Flash player.
Comment 2 Benjamin Smedberg [:bsmedberg] 2012-06-27 14:27:07 PDT
The Mozilla patches here depend on the upstream breakpad change http://breakpad.appspot.com/406002/ , although in a pinch we could take a one-off change if a full breakpad merge was too complicated or risky.
Comment 3 Benjamin Smedberg [:bsmedberg] 2012-06-27 18:46:38 PDT
Created attachment 637340 [details] [diff] [review]
The injector library, rev. 1

Not requesting review yet until I finish and test the other half.
Comment 4 Benjamin Smedberg [:bsmedberg] 2012-07-01 18:46:26 PDT
Created attachment 638240 [details] [diff] [review]
Part A - Breakpad update cherrypicked from upstream
Comment 5 Benjamin Smedberg [:bsmedberg] 2012-07-01 18:47:10 PDT
Created attachment 638241 [details] [diff] [review]
Part B - Ambiguous call resolution from breakpad update
Comment 6 Benjamin Smedberg [:bsmedberg] 2012-07-01 18:48:17 PDT
Created attachment 638242 [details] [diff] [review]
Part C - the injected library
Comment 7 Benjamin Smedberg [:bsmedberg] 2012-07-01 18:50:53 PDT
Created attachment 638244 [details] [diff] [review]
Part D - MemoryModule import

In the checkin comment I included the exact revision of https://github.com/fancycode/MemoryModule from which I copied and modified this code, if diffing against that would be useful.
Comment 8 Benjamin Smedberg [:bsmedberg] 2012-07-01 18:51:30 PDT
Created attachment 638245 [details] [diff] [review]
Part E - Crashreport API for injection
Comment 9 Benjamin Smedberg [:bsmedberg] 2012-07-01 18:52:24 PDT
Created attachment 638246 [details] [diff] [review]
Part F - actually do injection from PluginModuleParent specifically for Flash and FlashPlayerPlugin processes only
Comment 10 Benjamin Smedberg [:bsmedberg] 2012-07-01 19:21:40 PDT
Just started https://tbpl.mozilla.org/?tree=Try&rev=4a1b0f213788 for a full tryserver run.

https://wiki.mozilla.org/User:Benjamin_Smedberg/Flash_Crash_Reporting contains a design document.

I've locally verified at least one crash in the sandboxed process which was caught successfully and produced a minidump.
Comment 11 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-07-01 19:40:40 PDT
Comment on attachment 638242 [details] [diff] [review]
Part C - the injected library

Review of attachment 638242 [details] [diff] [review]:
-----------------------------------------------------------------

Don't you need to add something to the package-manifest?
Comment 12 Bob Clary [:bc:] 2012-07-01 22:10:58 PDT
e:/builds/moz2_slave/try-w32/build/toolkit/crashreporter/nsExceptionHandler.cpp(66) : fatal error C1083: Cannot open include file: 'InjectCrashReporter.h': No such file or directory
Comment 13 Benjamin Smedberg [:bsmedberg] 2012-07-02 05:15:55 PDT
Created attachment 638316 [details] [diff] [review]
Part E, files added

Indeed bc, I forgot to `hg add` files. Part E now fixed.
Comment 15 Jim Mathies [:jimm] 2012-07-02 05:27:23 PDT
after crashing flash, should I see entries in about:crashes from this?
Comment 16 Jim Mathies [:jimm] 2012-07-02 05:57:19 PDT
(In reply to Jim Mathies [:jimm] from comment #15)
> after crashing flash, should I see entries in about:crashes from this?

Managed to trigger this I think, here's the stack - 

https://crash-stats.mozilla.com/report/index/bp-aef2e9de-e6d8-4283-967a-98f832120702
Comment 17 Jim Mathies [:jimm] 2012-07-02 06:07:36 PDT
Comment on attachment 638246 [details] [diff] [review]
Part F - actually do injection from PluginModuleParent specifically for Flash and FlashPlayerPlugin processes only

Review of attachment 638246 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginModuleParent.cpp
@@ +1193,5 @@
> +         ok = Process32Next(snapshot, &entry)) {
> +        if (entry.th32ParentProcessID == pid) {
> +            nsString name(entry.szExeFile);
> +            ToUpperCase(name);
> +            if (StringBeginsWith(name, NS_LITERAL_STRING("FLASHPLAYERPLUGIN"))) {

nit - I would suggest putting this in a const up top with a comment explaining how it is used.

::: dom/plugins/ipc/PluginModuleParent.h
@@ +321,5 @@
> +    
> +    NS_OVERRIDE void OnCrash(DWORD processID, const nsAString& aDumpID);
> +
> +    DWORD mFlashProcess1;
> +    DWORD mFlashProcess2;

since InitializeInjector() can fail before these are assigned, we need to initialize them to 0.
Comment 18 Benjamin Smedberg [:bsmedberg] 2012-07-02 06:17:26 PDT
Yeah, you should see the plugin frowny-face and clicking the button in the plugin frowny face should submit the report, and they should show up in about:crashes.
Comment 19 Benjamin Smedberg [:bsmedberg] 2012-07-02 08:53:29 PDT
Created attachment 638381 [details] [diff] [review]
Part E - fix crash when crash reporter is disabled mid-run as in mochitests

This is a trivial change to part E: there is a race condition when crash handling is disabled while running, which basically only ever happens in mochitests. Added a null-check.
Comment 20 Benjamin Smedberg [:bsmedberg] 2012-07-02 08:54:30 PDT
Created attachment 638383 [details] [diff] [review]
Part C - the injected library, with build fix

Contains the fix from khuey (thanks!) as well as an additional build fix for the native breakpad client which only showed up in clobber builds.
Comment 21 Benjamin Smedberg [:bsmedberg] 2012-07-02 08:56:24 PDT
Created attachment 638384 [details] [diff] [review]
Part F - actually do injection from PluginModuleParent specifically for Flash and FlashPlayerPlugin processes only

Jim, this is an updated version of part F, which adds the pref that we talked about on IRC. It also changes an incorrect negative in PMP::ShouldContinueFromReplyTimeout and conditional ordering in PMP::ActorDestroy which meant that we weren't submitting hang reports correctly.
Comment 22 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-02 09:50:14 PDT
Comment on attachment 638241 [details] [diff] [review]
Part B - Ambiguous call resolution from breakpad update

Review of attachment 638241 [details] [diff] [review]:
-----------------------------------------------------------------

This makes us not use MiniDumpWithFullMemoryInfo, since we're no longer passing |minidump_type| to the constructor, right?  Is this expected?
Comment 23 Benjamin Smedberg [:bsmedberg] 2012-07-02 09:57:06 PDT
Created attachment 638399 [details] [diff] [review]
Part B - Ambiguous call resolution from breakpad update

You're right.
Comment 24 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-02 10:49:51 PDT
Comment on attachment 638244 [details] [diff] [review]
Part D - MemoryModule import

Review of attachment 638244 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the comments below addressed.

::: toolkit/crashreporter/LoadLibraryRemote.cpp
@@ +32,5 @@
> +  unsigned char *localCodeBase;
> +  unsigned char *remoteCodeBase;
> +  HMODULE *modules;
> +  int numModules;
> +  int initialized;

You never set the initialized value to anything non-zero, and never seem to use it, so it can just be removed.

@@ +73,5 @@
> +  }
> +}
> +
> +// Protection flags for memory pages (Executable, Readable, Writeable)
> +static int ProtectionFlags[2][2][2] = {

Nit: static const DWORD.

@@ +211,5 @@
> +
> +    for (; importDesc < importEnd && importDesc->Name; importDesc++) {
> +      POINTER_TYPE *thunkRef;
> +      FARPROC *funcRef;
> +      HMODULE handle = LoadLibraryA((LPCSTR) (codeBase + importDesc->Name));

Nit: please call FreeLibrary when you're done with the handle.

@@ +264,5 @@
> +                                     const WCHAR* library,
> +                                     const char* symbol)
> +{
> +  // Map the DLL into memory
> +  HANDLE hLibrary = CreateFile(library, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING,

Nit: use nsAutoHandle here, and below for hMapping too.

@@ +296,5 @@
> +  if (dos_header->e_magic != IMAGE_DOS_SIGNATURE) {
> +#if DEBUG_OUTPUT
> +    OutputDebugStringA("Not a valid executable file.\n");
> +#endif
> +    return NULL;

Nit: UnmapViewOfFile in this and the following error detection branches.  Or use RVAMap in nsWindowsDllBlocklist.cpp.

@@ +358,5 @@
> +  }
> +
> +  // mark memory pages depending on section headers and release
> +  // sections that are marked as "discardable"
> +  FinalizeSections(&result, hRemoteProcess);

FinalizeSections can fail in various ways, and it returns void, which effectively causes us to ignore those error conditions.  That seems wrong to me.

::: toolkit/crashreporter/LoadLibraryRemote.h
@@ +8,5 @@
> +#include <windows.h>
> +
> +void* LoadRemoteLibraryAndGetAddress(HANDLE hRemoteProcess,
> +                                     const WCHAR* library,
> +                                     const char* symbol);

The implementation of this function cannot get the proc address for ordinal exports.  This is fine for now, but it may confuse future callers because of the similarity to GetProcAddress, so please add a comment here saying that |symbol| must be a string name and not an ordinal number.
Comment 25 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-02 11:06:22 PDT
Comment on attachment 638383 [details] [diff] [review]
Part C - the injected library, with build fix

Review of attachment 638383 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should add an entry point function to the injector DLL which just returns FALSE on all reason codes, so that if someone uses LoadLibrary on this DLL, it would fail normally.

::: toolkit/crashreporter/injector/injector.cpp
@@ +22,5 @@
> +  HANDLE hCrashPipe = reinterpret_cast<HANDLE>(context);
> +
> +  ExceptionHandler* e = new ExceptionHandler(wstring(),
> +                                             NULL, NULL, NULL, ExceptionHandler::HANDLER_ALL,
> +                                             MiniDumpNormal, hCrashPipe, NULL);

This is an edge case, but it's best to handle OOM here.  If you don't link to libcp*.lib, you can just check for null.  Otherwise you should use the nothrow new variant.
Comment 26 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-02 11:17:21 PDT
Comment on attachment 638381 [details] [diff] [review]
Part E - fix crash when crash reporter is disabled mid-run as in mochitests

Review of attachment 638381 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/InjectCrashReporter.cpp
@@ +31,5 @@
> +{
> +  if (mInjectorPath.IsEmpty())
> +    return NS_OK;
> +
> +  HANDLE hProcess = OpenProcess(PROCESS_CREATE_THREAD |

Nit: Using nsAutoHandle could get rid of multiple CloseHandle calls below.
Comment 27 Benjamin Smedberg [:bsmedberg] 2012-07-02 11:27:17 PDT
Created attachment 638423 [details] [diff] [review]
Part D - MemoryModule import, with fixes
Comment 28 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-02 11:42:12 PDT
Here's my assessment on the risk of this patch.  The CreateRemoteThread based hack is quite well known and heavily used on Windows.  The part about basically doing our own loader stuff to load a DLL directly in memory is a bit riskier, but the fact that we only use it to load a single known DLL (the injector DLL) and also the fact that we don't do things which rely on the OS specific differences (such as handling module search paths etc) means that it is possible to verify those pieces by doing in-house QA on all versions of Windows we support, which greatly raises my confidence if we're going to take this on beta.

The most significant risk with this patch, as far as I can tell, is for us to do something wrong and crash the sandboxed process as we're injecting breakpad into it (before we finish setting up the reporter in that process.)  So we should keep an eye on people complaining about not being able to view flash content after this lands.

Based on the above, I think that we should take this on beta, as the benefits outweigh the risks, IMO.
Comment 29 Benjamin Smedberg [:bsmedberg] 2012-07-02 12:03:45 PDT
https://tbpl.mozilla.org/?tree=Try&rev=01508fb96949 passed, working on a rebuild of the final patches and getting this pushed to -central now.
Comment 30 Benjamin Smedberg [:bsmedberg] 2012-07-03 07:11:30 PDT
Part A - https://hg.mozilla.org/mozilla-central/rev/796f649b8140
Part B - https://hg.mozilla.org/mozilla-central/rev/c154cee1928a
Part C - https://hg.mozilla.org/mozilla-central/rev/141f0a09f4b6
Part D - https://hg.mozilla.org/mozilla-central/rev/6990667ebd08
Part E - https://hg.mozilla.org/mozilla-central/rev/52b93da29023
Part F - https://hg.mozilla.org/mozilla-central/rev/1d370ca5bb8d

When doing final testing of this I noticed a preexisting race condition that might occasionally show up as "no crash report available" for these crashes and even regular reports. I'll file and fix that separately.
Comment 31 Bob Clary [:bc:] 2012-07-03 07:21:27 PDT
Benjamin, Were you able to determine the problem with the inability of minidump_stackwalk to process the dumps?
Comment 32 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-07-03 12:25:36 PDT
Can someone please comment here when we get Nightly builds to test and provide a link? I'd like to get the QA-org and Nightly Testers involved with this bug.
Comment 33 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-03 13:09:28 PDT
(In reply to comment #32)
> Can someone please comment here when we get Nightly builds to test and provide
> a link? I'd like to get the QA-org and Nightly Testers involved with this bug.

Nightly builds are currently in progress (about an hour away).
Comment 34 Benjamin Smedberg [:bsmedberg] 2012-07-03 13:14:35 PDT
bc, I haven't seen any dumps personally that were incomplete, although I've primarily tried to use them by attaching in MSVC and not with minidump-stackwalk.
Comment 35 Bob Clary [:bc:] 2012-07-03 13:30:07 PDT
I opened one using MSVC. It wasn't particularly interesting but it did open and show some information.

Laura, what does Socorro use to process crash reports? I'm worried this will send all the crashes to '(no signature)' if the dump can't be processed.
Comment 36 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-03 14:56:36 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #33)
> (In reply to comment #32)
> > Can someone please comment here when we get Nightly builds to test and provide
> > a link? I'd like to get the QA-org and Nightly Testers involved with this bug.
> 
> Nightly builds are currently in progress (about an hour away).

The nightly should now be available.
Comment 37 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-07-03 14:58:49 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #36)
> The nightly should now be available.

Just to confirm, the builds are here, right?
ftp://ftp.mozilla.org/pub/firefox/nightly/2012-07-03-11-08-46-mozilla-central/
Comment 38 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-03 15:42:11 PDT
(In reply to comment #37)
> (In reply to Ehsan Akhgari [:ehsan] from comment #36)
> > The nightly should now be available.
> 
> Just to confirm, the builds are here, right?
> ftp://ftp.mozilla.org/pub/firefox/nightly/2012-07-03-11-08-46-mozilla-central/

Yes.
Comment 39 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-07-03 15:56:48 PDT
A request has been communicated the following groups of people to dogfood and report crash IDs to this bug:
 * QA Staff
 * QA Contractors
 * Nightly Testers
 * QA Community
 * QMO
Comment 40 Bob Clary [:bc:] 2012-07-03 20:27:20 PDT
browser+plugin hang reports
https://www.youtube.com/watch?v=ZmYRllome8Q&feature=relmfu
bp-8c62eb77-a49a-461c-81c8-1e71d2120704
Comment 42 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-07-04 08:13:17 PDT
https://crash-stats.mozilla.com/report/index/bp-ec35d2e0-3d92-48ca-87c7-d95ed2120704
I get this crash with the testcase from bug 763237, using bsmedberg's special build from this bug.
Comment 43 Simona B [:simonab] 2012-07-04 08:31:35 PDT
Mozilla/5.0 (Windows NT 6.0; rv:16.0) Gecko/16.0 Firefox/16.0
buildId: 20120703110846

I got the crash on Win Vista, using the latest Nightly, while watching a video on YouTube.
https://crash-stats.mozilla.com/report/index/bp-a97a4e39-7682-4895-b43d-f32312120704
Comment 44 Benjamin Smedberg [:bsmedberg] 2012-07-06 14:42:16 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd58bfdf6c71 Followup to part F to actually submit crash annotations
Comment 45 Alex Keybl [:akeybl] 2012-07-06 16:50:07 PDT
Comment on attachment 638240 [details] [diff] [review]
Part A - Breakpad update cherrypicked from upstream

[Triage Comment]
This pre-approval is for patches A through F and any followups made on m-c today. Given the absence of a risk evaluation, my approval is under the assumption that it will be no more risky on 15 than 16.

The motivation for uplift is to get Adobe as much external and actionable crash data as possible when they're back in the office on Monday.
Comment 47 Ryan VanderMeulen [:RyanVM] 2012-07-07 12:03:11 PDT
https://hg.mozilla.org/mozilla-central/rev/cd58bfdf6c71
Comment 48 Scoobidiver (away) 2012-07-16 00:55:49 PDT
Target Milestone is for m-c.

Note You need to log in before you can comment on or make changes to this bug.