Closed
Bug 1271182
Opened 9 years ago
Closed 8 years ago
Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!, in various browser-chrome mochitests, one negative Runnable and sometimes one negative CancelableRunnable
Categories
(Core :: XPCOM, defect, P3)
Core
XPCOM
Tracking
()
People
(Reporter: cbook, Assigned: mccr8)
References
(Depends on 2 open bugs, )
Details
(Keywords: intermittent-failure)
Attachments
(4 files)
2.48 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
1.47 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
jmaher
:
review+
|
Details |
1.09 KB,
patch
|
RyanVM
:
review+
|
Details | Diff | Splinter Review |
https://treeherder.mozilla.org/logviewer.html#?job_id=3843368&repo=mozilla-central#L3640
13:41:04 WARNING - TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!
mccr8, kyle: any idea what this error message mean :)
Assignee | ||
Comment 1•9 years ago
|
||
The classes are CancelableRunnable, Runnable, and nsTimerEvent. Something is wrong with the leak checking for these classes. Maybe this is a regression from bug 1266595?
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(khuey)
The usual cause of this is an unmarked copy constructor. But that doesn't appear to be what's going on here, because nsTimerEvent shouldn't be copy constructed. It's possible TimerEventAllocator does something crazy ...
Flags: needinfo?(khuey)
Updated•9 years ago
|
Summary: Intermittent leakcheck | default process: negative leaks caught! → Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Summary: Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught! → Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!, in various browser-chrome mochitests, one negative Runnable and sometimes one negative CancelableRunnable and nsTimerEvent
One other thought is that maybe we have a threadsafety bug in nsTraceRefcnt around the first object created for a class (for the initial object being created around the same time on two different threads).
Assignee | ||
Comment 7•9 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #6)
> One other thought is that maybe we have a threadsafety bug in nsTraceRefcnt
> around the first object created for a class (for the initial object being
> created around the same time on two different threads).
That's an interesting idea, but it does look like all of the calls to GetBloatEntry are inside AutoTraceLogLock. (It looks like calls to InitTraceLog could theoretically race because gInitialized is just a bool, but presumably that would have shown up in other ways, and hopefully we poke at it before threads are started.)
So Runnable and CancelableRunnable use NS_DECL_THREADSAFE_ISUPPORTS / NS_DECL_ISUPPORTS_INHERITED. nsTimerEvent then uses MOZ_COUNT_CTOR and MOZ_COUNT_DTOR. Those aren't really supposed to be used together, and I suspect there might be an issue with the interaction.
Assignee: nobody → dbaron
These are generally good practice for reference-counted objects; they
catch cases where these operations are used by accident, breaking
reference-counting.
This doesn't show any existing problems, though.
MozReview-Commit-ID: EvRkNCymOqT
Attachment #8756643 -
Flags: review?(khuey)
I suspect this will fix the odd behavior we're seeing, although I
haven't really thought through how very closely.
MozReview-Commit-ID: qHgBRAc1PE
Attachment #8756644 -
Flags: review?(khuey)
Attachment #8756643 -
Flags: review?(khuey) → review+
Comment on attachment 8756644 [details] [diff] [review]
Remove MOZ_COUNT_[CD]TOR from nsTimerEvent, since it uses XPCOM reference counting macros
Review of attachment 8756644 [details] [diff] [review]:
-----------------------------------------------------------------
Not sure why you think this will fix it, but it can't hurt ...
Attachment #8756644 -
Flags: review?(khuey) → review+
Keywords: leave-open
Comment hidden (Intermittent Failures Robot) |
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd4cebf613e67bc4e04e94a5239879194819f4c0
Bug 1271182 - Add some deleted assignment operators and copy constructors. r=khuey
https://hg.mozilla.org/integration/mozilla-inbound/rev/abafa95db1b11babb82ee2b32cd8c0bac784711c
Bug 1271182 - Remove MOZ_COUNT_[CD]TOR from nsTimerEvent, since it uses XPCOM reference counting macros. r=khuey
Comment 14•9 years ago
|
||
bugherder |
OK, you were right -- that didn't fix it.
Assignee: dbaron → nobody
Keywords: leave-open
Comment hidden (Intermittent Failures Robot) |
Comment 17•9 years ago
|
||
as a note, these failures seem to occur in the browser/components/contextualidentity/test/browser directory. Is it possible there is something in one of those tests?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•9 years ago
|
||
:mconley, I see your name as author/reviewer for many of the test related changes in browser/components/contextualidentity/test/browser, can you help us figure out a solution here. Maybe figuring out what changed in early may?
Flags: needinfo?(mconley)
Comment 22•9 years ago
|
||
I'm afraid I was only a reviewer for one of the tests, and wasn't an author of any of them. Redirecting to baku, who I think did a good chunk of work here.
Flags: needinfo?(mconley) → needinfo?(amarchesini)
Assignee | ||
Comment 23•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #21)
> :mconley, I see your name as author/reviewer for many of the test related
> changes in browser/components/contextualidentity/test/browser, can you help
> us figure out a solution here. Maybe figuring out what changed in early may?
This is a regression from Kyle's event loop work. (Though likely it just exposed an existing problem.)
Flags: needinfo?(amarchesini)
Comment 24•9 years ago
|
||
Thanks for the info Andrew- as Kyle is on holiday, then I assume in london the week after, can we find someone else to look at his work or back it out?
Assignee | ||
Comment 25•9 years ago
|
||
Really, it would be better to figure out which test is failing and disable it than back it out.
I'll try to figure out what the issue is here.
Assignee: nobody → continuation
Comment 26•9 years ago
|
||
It seems a bug triggered by how the tests are written. In contextualIdentities code there are no runnable involved.
Plus, there is a CancelableRunnable, mainly used in workers: probably it's the ServiceWorker test.
Assignee | ||
Comment 27•9 years ago
|
||
dbaron's patch should at least fix this for TimerEvent, because it is invisible to the leak checker now.
One tricky thing here when investigating this is that refcount logging is not threadsafe. eg thread 1 could do addref, thread 2 could do addref, then thread 1 could do a release, but we end up reporting to the refcount logger addref(to 1), release(to 1), addref(to 2). I can't think how that would affect the bloat logging, though.
Assignee | ||
Updated•9 years ago
|
Summary: Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!, in various browser-chrome mochitests, one negative Runnable and sometimes one negative CancelableRunnable and nsTimerEvent → Intermittent TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!, in various browser-chrome mochitests, one negative Runnable and sometimes one negative CancelableRunnable
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 32•9 years ago
|
||
These failures occur in more than just the contextualidentity directory.
They mostly seem to only leak Runnable, but sometimes also include CancelableRunnable.
I assume this is related to something Kyle landed. He landed bug 1269268 and bug 1268313 on 5-5, which more or less lines up with when this was filed.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Do you have any other ideas here?
Flags: needinfo?(dbaron)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 47•9 years ago
|
||
I found this log which has a more interesting negative leak:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=202329#L5421
141 |ConnectionPool::CloseConnectionRunnabl| 40 -40| 2 -1|
146 |ConnectionPool::IdleDatabaseInfo | 16 16| 18 1|
147 |ConnectionPool::IdleResource | 8 8| 23 1|
216 |DatabaseConnection::UpdateRefcountFunc| 168 168| 2 1|
608 |Runnable | 24 -24| 34193 -1|
674 |Statement | 120 2040| 237 17|
1403 |nsStringBuffer | 8 144| 292065 18|
1448 |nsTArray_base | 8 160| 2437944 20|
There are some IDB runnables leaking. I looked them over and didn't see anything obviously weird about them, so I'm still stumped.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 50•9 years ago
|
||
:khuey, we do know your code landing triggered this to be so frequent. Can you please own driving this to completion?
Flags: needinfo?(khuey)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 64•9 years ago
|
||
I've been working on greening-up mochitest browser-chrome e10s on Windows 7 VM debug (AWS) and have come across this intermittent. I reduced it down to the following mochitest, hope this helps:
browser/components/contextualidentity/test/browser/browser_aboutURLs.js
...
14:21:11 WARNING - TEST-UNEXPECTED-FAIL | leakcheck | default process: negative leaks caught!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=473ad42c6ab22cbad74b3538e999c452632ce6c5&selectedJob=24479423
Comment 65•9 years ago
|
||
Yoshi, I see you are the original author of browser_aboutURLs.js which landed right around the time this bug was filed. Can you help us fix this bug or disable the test if you don't have time to do it this month?
Flags: needinfo?(allstars.chh)
Comment 66•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/67208/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67208/
Attachment #8774778 -
Flags: review?(jmaher)
Comment 67•9 years ago
|
||
Comment on attachment 8774778 [details]
Bug 1271182 - Temporarily disable browser_aboutURLS on win due to intermittent leak;
https://reviewboard.mozilla.org/r/67208/#review64050
::: browser/components/contextualidentity/test/browser/browser.ini:12
(Diff revision 1)
> file_set_storages.html
> serviceworker.html
> worker.js
>
> [browser_aboutURLs.js]
> +skip-if = os == "win" # intermittent negative leak bug 1271182
I believe this is problematic on all platforms, it might be good to do that.
Also debug is the problem here, if we are going to disable, lets leave it running on opt. Possibly we can do windows/linux debug to greatly reduce this.
I also ni'd Yoshi who wrote this test.
Attachment #8774778 -
Flags: review?(jmaher) → review-
Assignee | ||
Comment 68•9 years ago
|
||
Per comment 32, this happens in more than just the contextualidentity directory. But maybe it is most frequent there, so this would at least reduce the orange rate...
Comment 69•9 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
I'll check this, but I have other bugs at hand right now, might don't have enough time to check this.
Since Robert already has a patch for disable the test, I'll let him to disable the test first.
Flags: needinfo?(allstars.chh)
Comment 72•9 years ago
|
||
Comment on attachment 8774778 [details]
Bug 1271182 - Temporarily disable browser_aboutURLS on win due to intermittent leak;
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67208/diff/1-2/
Attachment #8774778 -
Flags: review- → review?(jmaher)
Comment 73•9 years ago
|
||
Comment on attachment 8774778 [details]
Bug 1271182 - Temporarily disable browser_aboutURLS on win due to intermittent leak;
https://reviewboard.mozilla.org/r/67208/#review64468
Attachment #8774778 -
Flags: review?(jmaher) → review+
Comment 74•9 years ago
|
||
(In reply to Joel Maher ( :jmaher ) from comment #73)
> Comment on attachment 8774778 [details]
> Bug 1271182 - Temporarily disable browser_aboutURLS on win due to
> intermittent leak;
>
> https://reviewboard.mozilla.org/r/67208/#review64468
Verified on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1d3d9dbfe17
Comment 75•9 years ago
|
||
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/86ed11684f27
Temporarily disable browser_aboutURLS on win due to intermittent leak; r=jmaher
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 77•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 78•9 years ago
|
||
bugherder uplift |
status-firefox49:
--- → fixed
Comment hidden (Intermittent Failures Robot) |
(In reply to Joel Maher ( :jmaher ) from comment #50)
> :khuey, we do know your code landing triggered this to be so frequent. Can
> you please own driving this to completion?
I agree that I'm probably responsible for this, but I really have no idea what could be going wrong here, and this is my last week at Mozilla so I'm out of time to look into it. Sorry :'(
Flags: needinfo?(khuey)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 84•8 years ago
|
||
Reopening, since this doesn't seem to be fixed, per recent comments.
(Alternately, if anyone's really sure this bug should be closed and the new failures are something else, please file a new bug for the new failures so that people stop starring them as this one.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 95•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 106•8 years ago
|
||
mccr8, are you really looking into this or should we see if someone else has the cycles to?
status-firefox51:
--- → affected
status-firefox52:
--- → affected
Flags: needinfo?(continuation)
Target Milestone: mozilla50 → ---
Assignee | ||
Comment 107•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #106)
> mccr8, are you really looking into this or should we see if someone else has
> the cycles to?
I'm out of ideas for what might cause this, or how to investigate what might cause this.
Assignee: continuation → nobody
Flags: needinfo?(continuation)
Comment 108•8 years ago
|
||
Andrew, can you please help find a new owner for this bug?
Flags: needinfo?(overholt)
Assignee | ||
Comment 109•8 years ago
|
||
I see Nathan hasn't posted in this bug yet. Maybe he has some ideas.
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 110•8 years ago
|
||
I meant to imply that maybe he hadn't seen the bug, because this is in Core:General and he is not CCed.
Comment 111•8 years ago
|
||
I'm working on it.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 113•8 years ago
|
||
The negative leaks now seem to often be CancelableRunnable, Runnable and IPC::Message. I don't remember the Message before. Figuring out something involving a Message might be more tractable than a Runnable.
Component: General → IPC
Assignee | ||
Comment 114•8 years ago
|
||
Well, now that a non-refcounted class is involved, I have some ideas for figuring out what might be going wrong.
Flags: needinfo?(overholt)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → continuation
Assignee | ||
Comment 115•8 years ago
|
||
By my count, in the last week there are 27 failures in bc5, 11 in e10s bc2, 9 in e10s bc4, 6 in bc4, 4 in bc6, 1 in bc1 and e10s bc3.
Comment 116•8 years ago
|
||
Note that specific chunk numbers can be immaterial since tests move around from chunk to chunk routinely due to the chunk-by-runtime logic the harness uses.
Assignee | ||
Comment 117•8 years ago
|
||
Potentially then we could figure out if there's a particular set of tests that are always or mostly involved when this fails.
Assignee | ||
Comment 118•8 years ago
|
||
Ah, right, I'd forgotten the test in question, browser_aboutURLs.js, is disabled on Linux and Windows, which is why the negative leak is not showing up on those platforms. My own testing indicates that test is a problem. So we could disable it on all platforms to green up the tree. I'll still look into what is actually happening here.
Whiteboard: [disabled on Windows and Linux]
Assignee | ||
Updated•8 years ago
|
Whiteboard: [disabled on Windows and Linux] → [disabled on Windows and Linux debug]
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 120•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #109)
> I see Nathan hasn't posted in this bug yet. Maybe he has some ideas.
I see that Andrew is doing a fine job with this bug. ;)
Flags: needinfo?(nfroyd)
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(dbaron)
Assignee | ||
Comment 121•8 years ago
|
||
What I basically ended up doing to track down this bug (ignoring the various more complex things I tried that did not pan out) was to modify GetSerialNumber in nsTraceRefcnt to assert if it did not find a serial number when aCreate is false (I'll add this in bug 1309051). This means that we are attempting to destroy an object that is not known to nsTraceRefcnt to be alive. I then ran with XPCOM_MEM_LOG_CLASSES=IPC::Message, which hit that assertion while running browser_aboutURLs.js (confirming earlier work to figure out which test was causing issues).
I then added logging to browser_aboutURLs.js (I'll add this in bug 1309042) that printed out which about: page was being loaded, which indicated that it was asserting during the loading of about:bloat. I'd previously fixed an issue where something called into a helper method of the cycle collector while the CC was still running (bug 1074416), so I was immediately suspicious. I then saw that the code in DumpStatistics was ignoring logging while it was running, which seemed like it could be the issue.
Assignee | ||
Updated•8 years ago
|
Component: IPC → XPCOM
Assignee | ||
Comment 122•8 years ago
|
||
Windows was green when I did a try run.
Assignee | ||
Comment 123•8 years ago
|
||
MozReview-Commit-ID: 3x58vIBsj7D
Attachment #8799804 -
Flags: review?(ryanvm)
Comment 124•8 years ago
|
||
Comment on attachment 8799804 [details] [diff] [review]
Renable browser_aboutURLs.js on Windows.
Review of attachment 8799804 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/components/contextualidentity/test/browser/browser.ini
@@ +8,5 @@
> serviceworker.html
> worker.js
>
> [browser_aboutURLs.js]
> +skip-if = debug && os == "linux" # Assertion
While I understand work is happening to re-enable this on Linux as well, we should really document this with a reference to the bug in question. Or continue to reference this bug.
Attachment #8799804 -
Flags: review?(ryanvm) → review+
Comment 125•8 years ago
|
||
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e836cda145ec
Renable browser_aboutURLs.js on Windows. r=RyanVM
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Reporter | ||
Comment 126•8 years ago
|
||
bugherder |
Assignee | ||
Comment 127•8 years ago
|
||
Fixed by bug 1308652. This has not recurred since that landed.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
Keywords: leave-open
Resolution: --- → FIXED
Whiteboard: [disabled on Windows and Linux debug] → [disabled on Linux debug]
Comment on attachment 8799804 [details] [diff] [review]
Renable browser_aboutURLs.js on Windows.
> [browser_aboutURLs.js]
>-skip-if = (debug && (os == "win" || os == "linux")) # intermittent negative leak bug 1271182
>+skip-if = debug && os == "linux" # Assertion
Hi Andrew,
Is the assertion you referring here is something like
"Assertion failure: originAttrsLoadInfo.mUserContextId == originAttrsLoadContext.mUserContextId" ?
If so, I am trying to fix this in bug 1308938, and I will remove the skip-if for linux debug here.
Flags: needinfo?(continuation)
Assignee | ||
Comment 129•8 years ago
|
||
(In reply to Yoshi Huang[:allstars.chh] from comment #128)
> If so, I am trying to fix this in bug 1308938, and I will remove the skip-if for linux debug here.
Yes, I know, I filed bug 1308938. :) Thanks.
Flags: needinfo?(continuation)
Comment hidden (Intermittent Failures Robot) |
Andrew, I plan to uplift bug 1308938 to 51, do you think we can also uplift this bug to 51?
Thanks
Flags: needinfo?(continuation)
Comment 132•8 years ago
|
||
That would be a good idea.
Flags: needinfo?(continuation)
Whiteboard: [disabled on Linux debug] → [disabled on Linux debug][checkin-needed-aurora][checkin-needed-beta]
Assignee | ||
Comment 133•8 years ago
|
||
(In reply to Yoshi Huang[:allstars.chh] (PTO 10.21 ~ 11.4) from comment #131)
> Andrew, I plan to uplift bug 1308938 to 51, do you think we can also uplift
> this bug to 51?
Sure. It is a test only change so it doesn't need approval. The leak fix is in bug 1308652.
Reporter | ||
Comment 134•8 years ago
|
||
bugherder uplift |
Whiteboard: [disabled on Linux debug][checkin-needed-aurora][checkin-needed-beta] → [disabled on Linux debug][checkin-needed-beta]
Reporter | ||
Comment 135•8 years ago
|
||
bugherder uplift |
Updated•8 years ago
|
Whiteboard: [disabled on Linux debug][checkin-needed-beta]
You need to log in
before you can comment on or make changes to this bug.
Description
•