Closed Bug 1239258 Opened 8 years ago Closed 4 years ago

Intermittent TEST-UNEXPECTED-FAIL | leakcheck | plugin process: missing output line for total leaks!

Categories

(Core Graveyard :: Plug-ins, defect, P3)

defect

Tracking

(firefox46 affected)

RESOLVED WORKSFORME
Tracking Status
firefox46 --- affected

People

(Reporter: nigelb, Unassigned)

References

()

Details

(Keywords: intermittent-failure)

https://treeherder.mozilla.org/logviewer.html#?job_id=19725226&repo=mozilla-inbound


 01:01:20     INFO -  --DOMWINDOW == 1 (0x7fcd1477c800) [pid = 5218] [serial = 201] [outer = (nil)] [url = chrome://browser/content/browser.xul]
 01:01:20     INFO -  --DOMWINDOW == 0 (0x7fcd31c49400) [pid = 5218] [serial = 5] [outer = (nil)] [url = resource://gre-resources/hiddenWindow.html]
 01:01:21     INFO -  nsStringStats
 01:01:21     INFO -   => mAllocCount:         845393
 01:01:21     INFO -   => mReallocCount:       102250
 01:01:21     INFO -   => mFreeCount:          845393
 01:01:21     INFO -   => mShareCount:        1123197
 01:01:21     INFO -   => mAdoptCount:          26878
 01:01:21     INFO -   => mAdoptFreeCount:      26878
 01:01:21     INFO -   => Process ID: 5218, Thread ID: 140519937627968
 01:01:21     INFO -  TEST-INFO | Main app process: exit 0
 01:01:21     INFO -  runtests.py | Application ran for: 0:09:57.302575
 01:01:21     INFO -  zombiecheck | Reading PID log: /tmp/tmpcKwv82pidlog
 01:01:21     INFO -  ==> process 5218 launched child process 5275
 01:01:21     INFO -  ==> process 5218 launched child process 5371
 01:01:21     INFO -  ==> process 5218 launched child process 5380
 01:01:21     INFO -  ==> process 5218 launched child process 5389
 01:01:21     INFO -  zombiecheck | Checking for orphan process with PID: 5275
 01:01:21     INFO -  zombiecheck | Checking for orphan process with PID: 5371
 01:01:21     INFO -  zombiecheck | Checking for orphan process with PID: 5380
 01:01:21     INFO -  zombiecheck | Checking for orphan process with PID: 5389
 01:01:21     INFO -  Stopping web server
 01:01:21     INFO -  Stopping web socket server
 01:01:21     INFO -  Stopping ssltunnel
 01:01:21     INFO -  TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes
 01:01:21     INFO -  TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes
 01:01:21     INFO -  TEST-INFO | leakcheck | tab process: leak threshold set at 10000 bytes
 01:01:21     INFO -  TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes
 01:01:21     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, plugin process 5380
 01:01:21     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
 01:01:21     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
 01:01:21     INFO -     0 |TOTAL                                 |       47        0|     848        0|
 01:01:21     INFO -  nsTraceRefcnt::DumpStatistics: 28 entries
 01:01:21     INFO -  TEST-PASS | leakcheck | plugin process: no leaks detected!
 01:01:21     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 5218
 01:01:21     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
 01:01:21     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
 01:01:21     INFO -     0 |TOTAL                                 |       22        0|46519274        0|
 01:01:21     INFO -  nsTraceRefcnt::DumpStatistics: 1465 entries
 01:01:21     INFO -  TEST-PASS | leakcheck | default process: no leaks detected!
 01:01:21  WARNING -  TEST-UNEXPECTED-FAIL | leakcheck | plugin process: missing output line for total leaks!
 01:01:21     INFO -  TEST-INFO | leakcheck | missing output line from log file /tmp/tmpnS2WoB.mozrunner/runtests_leaks_plugin_pid5371.log
 01:01:21     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 5275
 01:01:21     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
 01:01:21     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
 01:01:21     INFO -     0 |TOTAL                                 |       22     4648|26517147       34|
 01:01:21     INFO -    16 |AsyncTransactionTrackersHolder        |       72       72|     331        1|
 01:01:21     INFO -    84 |CompositorChild                       |      880      880|       1        1|
 01:01:21     INFO -    86 |CondVar                               |       40      120|     880        3|
 01:01:21     INFO -   262 |IPC::Channel                          |       16       32|      18        2|
 01:01:21     INFO -   329 |MessagePump                           |       16       16|      50        1|
 01:01:21     INFO -   339 |Mutex                                 |       32       96|    8520        3|
 01:01:21     INFO -   363 |PCompositorChild                      |      776      776|       1        1|
 01:01:21     INFO -   371 |PImageBridgeChild                     |      920      920|       1        1|
 01:01:21     INFO -   445 |RefCountedMonitor                     |       80      160|      13        2|
 01:01:21     INFO -   447 |RefCountedTask                        |       16       64|      26        4|
 01:01:21     INFO -   507 |StoreRef                              |       16       32|      21        2|
 01:01:21     INFO -   572 |WaitableEventKernel                   |       72       72|      53        1|
 01:01:21     INFO -   578 |WeakReference<MessageListener>        |       16       32|    6705        2|
 01:01:21     INFO -   630 |base::Thread                          |       48       48|       3        1|
 01:01:21     INFO -   663 |ipc::MessageChannel                   |      512     1024|      13        2|
 01:01:21     INFO -  1136 |nsTArray_base                         |        8       48| 8771838        6|
 01:01:21     INFO -  1146 |nsThread                              |      256      256|      49        1|
 01:01:21     INFO -  nsTraceRefcnt::DumpStatistics: 1255 entries
 01:01:21     INFO -  TEST-INFO | leakcheck | tab process: leaked 1 AsyncTransactionTrackersHolder
 01:01:21     INFO -  TEST-INFO | leakcheck | tab process: leaked 1 CompositorChild
Andrew, Can you take a look at this one? It's kind of weird.
Flags: needinfo?(continuation)
What this assertion means is that the plugin process crashed before it was able to output a leak log. It is odd that the actual crash doesn't show up in the summary. That's probably a separate bug.

I looked at the two logs for this that are in orange factor. If you look at the line after the TEST-UNEXPECTED-FAIL line, it has the file name for the leak log, which contains the PID for the process that crashed: runtests_leaks_plugin_pid5371.log. You can then look back in the log to see where that process was created, and how it crashed.

The crash looks like this:

[NPAPI 4969] ###!!! ASSERTION: plug removed: 'glib assertion', file /builds/slave/m-in-l64-d-0000000000000000000/build/src/toolkit/xre/nsSigHandlers.cpp, line 140
#01: libglib-2.0.so.0 + 0x4ef61
#02: libglib-2.0.so.0 + 0x4f172
#03: libnptest.so + 0x53d4
#04: libgtk-x11-2.0.so.0 + 0x136dd8
#05: libgobject-2.0.so.0 + 0xfca2
[...]
#17: libglib-2.0.so.0 + 0x48124
#18: base::MessagePumpForUI::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpForUI::Dispatcher*) [ipc/chromium/src/base/message_pump_glib.cc:192]
#19: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:235]
#20: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:520]
#21: XRE_InitChildProcess [toolkit/xre/nsEmbedFunctions.cpp:629]
#22: content_process_main(int, char**) [ipc/contentproc/plugin-container.cpp:240]
#23: libc.so.6 + 0x2176d
#24: _start
** (plugin-container:4969): ERROR **: plug removed
[Parent 4811] WARNING: [PluginModuleParent::ActorDestroy] abnormal shutdown without minidump!: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/plugins/ipc/PluginModuleParent.cpp, line 1592

In both logs, the crash is happening in test_fullscreen-api.html, which was just enabled for e10s in bug 1191597. I don't know if this is a problem with the full screen API or with some odd plugin thing.

Xidorn, do you know what is happening here? Thanks.
Blocks: 1191597
Component: General → Plug-ins
Flags: needinfo?(continuation) → needinfo?(quanxunzhen)
Product: Firefox → Core
It looks like bug 1237853 was filed on this issue already, but I guess it wasn't expected to be intermittently happening still?
I filed bug 1239343 for the test harness not noticing the fatal plugin assertion.
(In reply to Andrew McCreight [:mccr8] from comment #2)
> What this assertion means is that the plugin process crashed before it was
> able to output a leak log. It is odd that the actual crash doesn't show up
> in the summary. That's probably a separate bug.
> 
> I looked at the two logs for this that are in orange factor. If you look at
> the line after the TEST-UNEXPECTED-FAIL line, it has the file name for the
> leak log, which contains the PID for the process that crashed:
> runtests_leaks_plugin_pid5371.log. You can then look back in the log to see
> where that process was created, and how it crashed.
> 
> The crash looks like this:
> 
> [NPAPI 4969] ###!!! ASSERTION: plug removed: 'glib assertion', file
> /builds/slave/m-in-l64-d-0000000000000000000/build/src/toolkit/xre/
> nsSigHandlers.cpp, line 140
> #01: libglib-2.0.so.0 + 0x4ef61
> #02: libglib-2.0.so.0 + 0x4f172
> #03: libnptest.so + 0x53d4

FWIW, this is (per running addr2line on libnptest.so):
DeleteWidget
/builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/plugins/test/testplugin/nptest_gtk2.cpp:264

which is explicitly calling g_error.
(In reply to Andrew McCreight [:mccr8] from comment #3)
> It looks like bug 1237853 was filed on this issue already, but I guess it
> wasn't expected to be intermittently happening still?

Before I add the workaround, it was a permafail. I guess the workaround was not reliable enough, and there might be some race condition here.

To totaly resolve this, that bug need to be fixed. But if we do not get much progress there while this intermittent gets more annoying, we probably can make the workaround a bit more reliable via allowing more time before closing the window.

(BTW, why there is no red circle next to "Bugzilla@Mozilla" today given there is a ni? for me... probably a bug need to be filed...)
Flags: needinfo?(quanxunzhen)
Depends on: 1237853
Ed, is there anything we can do to make this failure starrable in Treeherder?
Flags: needinfo?(emorley)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)
> Ed, is there anything we can do to make this failure starrable in Treeherder?

It's already starrable, the bug summary is just incomplete.

The algorithm (https://github.com/mozilla/treeherder/blob/cda4b33cef44192a73546be51c593ba0ad838e05/treeherder/model/error_summary.py#L97-L121) is roughly as follows:

1) Try leak-specific search if line is of form "leakcheck | .*\d+ bytes leaked (Object-1, Object-2, Object-3, ...)"
2) Else if in pipe-delimited format && middle token not blacklisted/too short a string, then search using the middle token
3) Otherwise try the full line

In this case, we reach #3, since the middle token is "leakcheck", which is in the blacklist:
https://github.com/mozilla/treeherder/blob/cda4b33cef44192a73546be51c593ba0ad838e05/treeherder/model/error_summary.py#L171
Flags: needinfo?(emorley)
Summary: Intermittent leakcheck | plugin process: missing output line for total leaks! → Intermittent TEST-UNEXPECTED-FAIL | leakcheck | plugin process: missing output line for total leaks!
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
I looked at one log, and it looked like there was an assertion.
https://public-artifacts.taskcluster.net/JYpgqo41R-OxQaoEn2ibnQ/0/public/logs/live_backing.log

I wonder why the log parser is missing this. Maybe the weird character.

09:02:04     INFO -  [NPAPI 3029] ###!!! ASSERTION: plug removed: 'glib assertion', file /home/worker/workspace/build/src/toolkit/xre/nsSigHandlers.cpp, line 140
Oh I already noticed that in comment 2...
Bug 1239362 is now fixed, but this error is still occurring sometimes, and the "plug removed" assertion is not involved.
Depends on: 1239362
The volume did seem to go down quite a bit after that bug landed.

I looked at the two failures in the last week. If you look at the line after the UNEXPECTED-FAIL line, it gives the PID of the process that failed to create a leak log:
TEST-INFO | leakcheck | missing output line from log file /tmp/tmpDERMDD.mozrunner/runtests_leaks_plugin_pid3384.log

Then you can search for the pid in the log.

For the Linux one, I see something about "received an X Window System error". Maybe that is related? That log is odd because it looks like shutdown gets far enough to print out nsStringStats. It looks like that's happening during dom/html/test/test_fullscreen-api.html.

For the Windows one, I'm not sure if there are any errors in the log around when the plugin process is created, but it looks like it is created during dom/plugins/test/mochitest/test_hangui.xul so maybe something went wrong with that.
(In reply to Andrew McCreight [:mccr8] from comment #46)
> For the Linux one, I see something about "received an X Window System
> error". Maybe that is related? [...] It looks like that's
> happening during dom/html/test/test_fullscreen-api.html.

BadWindow with request_code 3, which is X_GetWindowAttributes, does sound
related, thanks, and likely still due to bug 1237853.

GDK2 (used in the plugin process) is calling exit(1), without using g_error(), and so this is missing our X/GDK error handler and not triggering a crash report.  I suspect it would be possible to InstallX11ErrorHandler() (designed for GDK2) for the plugin process to get slightly better error messages and trigger the crash reporter.
That actually sounds similar to bug 1309420 comment 14: when we assert, we are okay, but when we hit some X windows error, we leak (probably because we exit without shutting things down properly). I just worked around that by disabling the test, which wasn't running in debug builds either.
See Also: → 1309420
Depends on: 1331320
Recent occurrences here are actually Bug 1497029 as all failure logs have 
22:39:17     INFO -    File "Z:\task_1538864482\build\tests\bin\fix_stack_using_bpsyms.py", line 76, in __init__
22:39:17     INFO -      (rva, size, line, filenum) = line.split(None)

before the errors, so please disregard the next Orange factor comment regarding the failures on the 6th of October.
See Also: → 1535922

No failures since Dec 2018.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.