Closed Bug 769048 Opened 12 years ago Closed 12 years ago

Attach to and report on crashes in FlashPlayerPlugin_*.exe processes which are children of our plugin container

Categories

(Core Graveyard :: Plug-ins, defect, P1)

x86_64
Windows 7
defect

Tracking

(firefox14 wontfix, firefox15+ fixed, firefox16 fixed)

RESOLVED FIXED
mozilla16
Tracking Status
firefox14 --- wontfix
firefox15 + fixed
firefox16 --- fixed

People

(Reporter: benjamin, Assigned: benjamin)

References

()

Details

Attachments

(6 files, 7 obsolete files)

14.27 KB, patch
Details | Diff | Splinter Review
10.17 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
19.34 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
11.75 KB, patch
jimm
: review+
Details | Diff | Splinter Review
985 bytes, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
16.37 KB, patch
Details | Diff | Splinter Review
We currently only see crashes in the plugin-container process. With the Flash Player 11.3 sandbox, the interesting crashes all occur in the FlashPlayerPlugin_*.exe processes. We can inject our crash reporter into those processes and report crashes directly to crash-stats from there, which will significantly aid bucketing and diagnosing the increases in crash rates with Flash 11.3.
http://hg.mozilla.org/users/bsmedberg_mozilla.com/breakpadinject/ contains PoC code which hooks breakpad up via DLL injection in a way that doesn't require any changes to the Flash player.
The Mozilla patches here depend on the upstream breakpad change http://breakpad.appspot.com/406002/ , although in a pinch we could take a one-off change if a full breakpad merge was too complicated or risky.
Assignee: nobody → benjamin
Priority: -- → P1
Target Milestone: --- → mozilla14
Attached patch The injector library, rev. 1 (obsolete) — Splinter Review
Not requesting review yet until I finish and test the other half.
Attachment #638241 - Flags: review?(ehsan)
Attached patch Part C - the injected library (obsolete) — Splinter Review
Attachment #637340 - Attachment is obsolete: true
Attachment #638242 - Flags: review?(ehsan)
Attached patch Part D - MemoryModule import (obsolete) — Splinter Review
In the checkin comment I included the exact revision of https://github.com/fancycode/MemoryModule from which I copied and modified this code, if diffing against that would be useful.
Attachment #638244 - Flags: review?(ehsan)
Attachment #638245 - Flags: review?(ehsan)
Attachment #638242 - Flags: review?(khuey)
Just started https://tbpl.mozilla.org/?tree=Try&rev=4a1b0f213788 for a full tryserver run.

https://wiki.mozilla.org/User:Benjamin_Smedberg/Flash_Crash_Reporting contains a design document.

I've locally verified at least one crash in the sandboxed process which was caught successfully and produced a minidump.
Blocks: 770078
Comment on attachment 638242 [details] [diff] [review]
Part C - the injected library

Review of attachment 638242 [details] [diff] [review]:
-----------------------------------------------------------------

Don't you need to add something to the package-manifest?
Attachment #638242 - Flags: review?(khuey) → review+
e:/builds/moz2_slave/try-w32/build/toolkit/crashreporter/nsExceptionHandler.cpp(66) : fatal error C1083: Cannot open include file: 'InjectCrashReporter.h': No such file or directory
Attached patch Part E, files added (obsolete) — Splinter Review
Indeed bc, I forgot to `hg add` files. Part E now fixed.
Attachment #638245 - Attachment is obsolete: true
Attachment #638245 - Flags: review?(ehsan)
Attachment #638316 - Flags: review?(ehsan)
after crashing flash, should I see entries in about:crashes from this?
(In reply to Jim Mathies [:jimm] from comment #15)
> after crashing flash, should I see entries in about:crashes from this?

Managed to trigger this I think, here's the stack - 

https://crash-stats.mozilla.com/report/index/bp-aef2e9de-e6d8-4283-967a-98f832120702
Comment on attachment 638246 [details] [diff] [review]
Part F - actually do injection from PluginModuleParent specifically for Flash and FlashPlayerPlugin processes only

Review of attachment 638246 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginModuleParent.cpp
@@ +1193,5 @@
> +         ok = Process32Next(snapshot, &entry)) {
> +        if (entry.th32ParentProcessID == pid) {
> +            nsString name(entry.szExeFile);
> +            ToUpperCase(name);
> +            if (StringBeginsWith(name, NS_LITERAL_STRING("FLASHPLAYERPLUGIN"))) {

nit - I would suggest putting this in a const up top with a comment explaining how it is used.

::: dom/plugins/ipc/PluginModuleParent.h
@@ +321,5 @@
> +    
> +    NS_OVERRIDE void OnCrash(DWORD processID, const nsAString& aDumpID);
> +
> +    DWORD mFlashProcess1;
> +    DWORD mFlashProcess2;

since InitializeInjector() can fail before these are assigned, we need to initialize them to 0.
Attachment #638246 - Flags: review?(jmathies) → review+
Yeah, you should see the plugin frowny-face and clicking the button in the plugin frowny face should submit the report, and they should show up in about:crashes.
This is a trivial change to part E: there is a race condition when crash handling is disabled while running, which basically only ever happens in mochitests. Added a null-check.
Attachment #638316 - Attachment is obsolete: true
Attachment #638316 - Flags: review?(ehsan)
Attachment #638381 - Flags: review?(ehsan)
Contains the fix from khuey (thanks!) as well as an additional build fix for the native breakpad client which only showed up in clobber builds.
Attachment #638242 - Attachment is obsolete: true
Attachment #638242 - Flags: review?(ehsan)
Attachment #638383 - Flags: review?(ehsan)
Jim, this is an updated version of part F, which adds the pref that we talked about on IRC. It also changes an incorrect negative in PMP::ShouldContinueFromReplyTimeout and conditional ordering in PMP::ActorDestroy which meant that we weren't submitting hang reports correctly.
Attachment #638246 - Attachment is obsolete: true
Attachment #638384 - Flags: review?(jmathies)
Attachment #638384 - Flags: review?(jmathies) → review+
Comment on attachment 638241 [details] [diff] [review]
Part B - Ambiguous call resolution from breakpad update

Review of attachment 638241 [details] [diff] [review]:
-----------------------------------------------------------------

This makes us not use MiniDumpWithFullMemoryInfo, since we're no longer passing |minidump_type| to the constructor, right?  Is this expected?
You're right.
Attachment #638241 - Attachment is obsolete: true
Attachment #638241 - Flags: review?(ehsan)
Attachment #638399 - Flags: review?(ehsan)
Comment on attachment 638244 [details] [diff] [review]
Part D - MemoryModule import

Review of attachment 638244 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the comments below addressed.

::: toolkit/crashreporter/LoadLibraryRemote.cpp
@@ +32,5 @@
> +  unsigned char *localCodeBase;
> +  unsigned char *remoteCodeBase;
> +  HMODULE *modules;
> +  int numModules;
> +  int initialized;

You never set the initialized value to anything non-zero, and never seem to use it, so it can just be removed.

@@ +73,5 @@
> +  }
> +}
> +
> +// Protection flags for memory pages (Executable, Readable, Writeable)
> +static int ProtectionFlags[2][2][2] = {

Nit: static const DWORD.

@@ +211,5 @@
> +
> +    for (; importDesc < importEnd && importDesc->Name; importDesc++) {
> +      POINTER_TYPE *thunkRef;
> +      FARPROC *funcRef;
> +      HMODULE handle = LoadLibraryA((LPCSTR) (codeBase + importDesc->Name));

Nit: please call FreeLibrary when you're done with the handle.

@@ +264,5 @@
> +                                     const WCHAR* library,
> +                                     const char* symbol)
> +{
> +  // Map the DLL into memory
> +  HANDLE hLibrary = CreateFile(library, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING,

Nit: use nsAutoHandle here, and below for hMapping too.

@@ +296,5 @@
> +  if (dos_header->e_magic != IMAGE_DOS_SIGNATURE) {
> +#if DEBUG_OUTPUT
> +    OutputDebugStringA("Not a valid executable file.\n");
> +#endif
> +    return NULL;

Nit: UnmapViewOfFile in this and the following error detection branches.  Or use RVAMap in nsWindowsDllBlocklist.cpp.

@@ +358,5 @@
> +  }
> +
> +  // mark memory pages depending on section headers and release
> +  // sections that are marked as "discardable"
> +  FinalizeSections(&result, hRemoteProcess);

FinalizeSections can fail in various ways, and it returns void, which effectively causes us to ignore those error conditions.  That seems wrong to me.

::: toolkit/crashreporter/LoadLibraryRemote.h
@@ +8,5 @@
> +#include <windows.h>
> +
> +void* LoadRemoteLibraryAndGetAddress(HANDLE hRemoteProcess,
> +                                     const WCHAR* library,
> +                                     const char* symbol);

The implementation of this function cannot get the proc address for ordinal exports.  This is fine for now, but it may confuse future callers because of the similarity to GetProcAddress, so please add a comment here saying that |symbol| must be a string name and not an ordinal number.
Attachment #638244 - Flags: review?(ehsan) → review+
Attachment #638399 - Flags: review?(ehsan) → review+
Comment on attachment 638383 [details] [diff] [review]
Part C - the injected library, with build fix

Review of attachment 638383 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should add an entry point function to the injector DLL which just returns FALSE on all reason codes, so that if someone uses LoadLibrary on this DLL, it would fail normally.

::: toolkit/crashreporter/injector/injector.cpp
@@ +22,5 @@
> +  HANDLE hCrashPipe = reinterpret_cast<HANDLE>(context);
> +
> +  ExceptionHandler* e = new ExceptionHandler(wstring(),
> +                                             NULL, NULL, NULL, ExceptionHandler::HANDLER_ALL,
> +                                             MiniDumpNormal, hCrashPipe, NULL);

This is an edge case, but it's best to handle OOM here.  If you don't link to libcp*.lib, you can just check for null.  Otherwise you should use the nothrow new variant.
Attachment #638383 - Flags: review?(ehsan) → review+
Comment on attachment 638381 [details] [diff] [review]
Part E - fix crash when crash reporter is disabled mid-run as in mochitests

Review of attachment 638381 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/InjectCrashReporter.cpp
@@ +31,5 @@
> +{
> +  if (mInjectorPath.IsEmpty())
> +    return NS_OK;
> +
> +  HANDLE hProcess = OpenProcess(PROCESS_CREATE_THREAD |

Nit: Using nsAutoHandle could get rid of multiple CloseHandle calls below.
Attachment #638381 - Flags: review?(ehsan) → review+
Attachment #638244 - Attachment is obsolete: true
Here's my assessment on the risk of this patch.  The CreateRemoteThread based hack is quite well known and heavily used on Windows.  The part about basically doing our own loader stuff to load a DLL directly in memory is a bit riskier, but the fact that we only use it to load a single known DLL (the injector DLL) and also the fact that we don't do things which rely on the OS specific differences (such as handling module search paths etc) means that it is possible to verify those pieces by doing in-house QA on all versions of Windows we support, which greatly raises my confidence if we're going to take this on beta.

The most significant risk with this patch, as far as I can tell, is for us to do something wrong and crash the sandboxed process as we're injecting breakpad into it (before we finish setting up the reporter in that process.)  So we should keep an eye on people complaining about not being able to view flash content after this lands.

Based on the above, I think that we should take this on beta, as the benefits outweigh the risks, IMO.
https://tbpl.mozilla.org/?tree=Try&rev=01508fb96949 passed, working on a rebuild of the final patches and getting this pushed to -central now.
Part A - https://hg.mozilla.org/mozilla-central/rev/796f649b8140
Part B - https://hg.mozilla.org/mozilla-central/rev/c154cee1928a
Part C - https://hg.mozilla.org/mozilla-central/rev/141f0a09f4b6
Part D - https://hg.mozilla.org/mozilla-central/rev/6990667ebd08
Part E - https://hg.mozilla.org/mozilla-central/rev/52b93da29023
Part F - https://hg.mozilla.org/mozilla-central/rev/1d370ca5bb8d

When doing final testing of this I noticed a preexisting race condition that might occasionally show up as "no crash report available" for these crashes and even regular reports. I'll file and fix that separately.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Benjamin, Were you able to determine the problem with the inability of minidump_stackwalk to process the dumps?
Can someone please comment here when we get Nightly builds to test and provide a link? I'd like to get the QA-org and Nightly Testers involved with this bug.
(In reply to comment #32)
> Can someone please comment here when we get Nightly builds to test and provide
> a link? I'd like to get the QA-org and Nightly Testers involved with this bug.

Nightly builds are currently in progress (about an hour away).
bc, I haven't seen any dumps personally that were incomplete, although I've primarily tried to use them by attaching in MSVC and not with minidump-stackwalk.
I opened one using MSVC. It wasn't particularly interesting but it did open and show some information.

Laura, what does Socorro use to process crash reports? I'm worried this will send all the crashes to '(no signature)' if the dump can't be processed.
(In reply to Ehsan Akhgari [:ehsan] from comment #33)
> (In reply to comment #32)
> > Can someone please comment here when we get Nightly builds to test and provide
> > a link? I'd like to get the QA-org and Nightly Testers involved with this bug.
> 
> Nightly builds are currently in progress (about an hour away).

The nightly should now be available.
(In reply to Ehsan Akhgari [:ehsan] from comment #36)
> The nightly should now be available.

Just to confirm, the builds are here, right?
ftp://ftp.mozilla.org/pub/firefox/nightly/2012-07-03-11-08-46-mozilla-central/
(In reply to comment #37)
> (In reply to Ehsan Akhgari [:ehsan] from comment #36)
> > The nightly should now be available.
> 
> Just to confirm, the builds are here, right?
> ftp://ftp.mozilla.org/pub/firefox/nightly/2012-07-03-11-08-46-mozilla-central/

Yes.
A request has been communicated the following groups of people to dogfood and report crash IDs to this bug:
 * QA Staff
 * QA Contractors
 * Nightly Testers
 * QA Community
 * QMO
Depends on: 770805
Target Milestone: mozilla14 → mozilla16
https://crash-stats.mozilla.com/report/index/bp-ec35d2e0-3d92-48ca-87c7-d95ed2120704
I get this crash with the testcase from bug 763237, using bsmedberg's special build from this bug.
Mozilla/5.0 (Windows NT 6.0; rv:16.0) Gecko/16.0 Firefox/16.0
buildId: 20120703110846

I got the crash on Win Vista, using the latest Nightly, while watching a video on YouTube.
https://crash-stats.mozilla.com/report/index/bp-a97a4e39-7682-4895-b43d-f32312120704
Depends on: 771082
Depends on: 771134
Depends on: 771251
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd58bfdf6c71 Followup to part F to actually submit crash annotations
Comment on attachment 638240 [details] [diff] [review]
Part A - Breakpad update cherrypicked from upstream

[Triage Comment]
This pre-approval is for patches A through F and any followups made on m-c today. Given the absence of a risk evaluation, my approval is under the assumption that it will be no more risky on 15 than 16.

The motivation for uplift is to get Adobe as much external and actionable crash data as possible when they're back in the office on Monday.
Attachment #638240 - Flags: approval-mozilla-aurora+
Depends on: 772432
Blocks: 773332
Target Milestone is for m-c.
Target Milestone: mozilla15 → mozilla16
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: