Closed Bug 802239 Opened 13 years ago Closed 13 years ago

mozStorage leaks one thread per connection until shutdown

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla19

Tracking Flags:

Tracking

Status

firefox16

---

affected

firefox17

affected

firefox18

fixed

firefox19

---

fixed

firefox-esr10

wontfix

firefox-esr17

wontfix

People

(Reporter: bzbarsky, Assigned: khuey)

References

Details

(Whiteboard: [MemShrink:P2], [qa-])

Attachments

(1 file, 1 obsolete file)

Patch 13 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 4.00 KB, patch	asuth : review+	Details \| Diff \| Splinter Review
Patch 13 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 4.53 KB, patch	asuth : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

13 years ago

See discussion in bug 798849. To quote that bug: That's insane.

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

13 years ago

Blocks: 798849

Andrew McCreight [:mccr8]

Updated

•

13 years ago

Whiteboard: [MemShrink]

Josh Matthews [:jdm]

Updated

•

13 years ago

Whiteboard: [MemShrink]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Updated

•

13 years ago

Assignee: nobody → khuey

Whiteboard: [MemShrink]

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 1

•

13 years ago

Looks like this regressed in bug 496019, see comment 71.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 2

•

13 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

This is mostly the interdiff between v4.4 and v4.5 in Bug 496019, with one major exception. This does not handle the case where someone forgets to call AsyncClose before the connection is destroyed. That already leaks stuff (Bug 704030) and it seems like we should deal with both problems at the same time. It does add a fatal assertion that we don't do that, which should catch any mistakes in the tree.

Attachment #671942 - Flags: review?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 3

•

13 years ago

Comment on attachment 671942 [details] [diff] [review] Patch It's sad that the wrong patch landed way back when. Thanks for the updated patch, including the improve nsCOMPtr use! If you/anyone reading this is going to handle bug 704030 too, I think mak would be the best reviewer for that.

Attachment #671942 - Flags: review?(bugmail) → review+

Karl Tomlinson (:karlt)

Updated

•

13 years ago

Blocks: 496019

Karl Tomlinson (:karlt)

Comment 4

•

13 years ago

Sounds like we should consider back porting to other releases for our poor 32-bit users, unless there's some reason why there won't be many connections in the life of the app?

status-firefox-esr10: --- → affected

status-firefox16: --- → affected

status-firefox17: --- → affected

status-firefox18: --- → affected

status-firefox19: --- → affected

status-firefox-esr17: --- → affected

tracking-firefox17: --- → ?

tracking-firefox18: --- → ?

tracking-firefox-esr17: --- → ?

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Nicholas Nethercote [inactive]

Comment 5

•

13 years ago

Additional clarification: khuey said this wasn't a major problem because in the browser these connections are mostly for-the-lifetime-of-the-process anyway, and so fixing the leak won't make much difference. However, it may be an issue for some add-ons, and also in tests that open and close many connections.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 6

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/df42fa3fb07f

Alex Keybl [:akeybl]

Comment 7

•

13 years ago

This is a longstanding issue with no known critical user impact. No need to track for upcoming releases. If there's still *significant* value in uplifting, please nominate for aurora/beta approval.

tracking-firefox17: ? → -

tracking-firefox18: ? → -

tracking-firefox-esr17: ? → -

Ryan VanderMeulen [:RyanVM]

Comment 8

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/df42fa3fb07f

Status: NEW → RESOLVED

Closed: 13 years ago

Flags: in-testsuite-

Resolution: --- → FIXED

Target Milestone: --- → mozilla19

Ryan VanderMeulen [:RyanVM]

Comment 9

•

13 years ago

So this got backed out on inbound without any comments as to why... https://hg.mozilla.org/integration/mozilla-inbound/rev/cabf5819a565 I transplanted the backout to m-c rather than waiting for the next merge. https://hg.mozilla.org/mozilla-central/rev/d9520848b410

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Target Milestone: mozilla19 → ---

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 10

•

13 years ago

It got backed out because it appeared to be causing a bunch of asserts (but not hte one I added) in xpcshell.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 11

•

13 years ago

Attached patch Patch — Details — Splinter Review

Now with less crashes. The only change is in Connection::isAsyncClosing. Service::Observe contains code to observe xpcom-shutdown-threads and spin the event loop why isAsyncClosing is true on any of the set of extant Connections. Because I used mAsyncExecutionTarget.forget(), mAsyncExecutionTarget is null during the closing process now and isAsyncClosing always returns false. Thus we don't spin the event loop enough and we trip the assertion that after xpcom-shutdown-threads all of our connections are shutdown. This code is the only consumer of isAsyncClosing, so this should be safe.

Attachment #671942 - Attachment is obsolete: true

Attachment #672858 - Flags: review?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 12

•

13 years ago

Comment on attachment 672858 [details] [diff] [review] Patch Looks good. Although mAsyncExecutionThreadShuttingDown never transitions to false, mDBConn will become false and this will be eventually (it's not done in a lock, but a lock is not required for correctness) result in isAsyncClosing returning false and the loop spinning terminating. The old 4.5 rev patch would have terminated prematurely depending on the race.

Attachment #672858 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 13

•

13 years ago

Er, and the internalClose happens on the main thread anyways, so strike the bit about "eventually". Man, it's been a while. Thanks for all y'all making the platform a better place!

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 14

•

13 years ago

Thanks for dredging your memory here. https://hg.mozilla.org/integration/mozilla-inbound/rev/30a31da7097f

Ryan VanderMeulen [:RyanVM]

Comment 15

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/30a31da7097f

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla19

Scoobidiver (away)

Updated

•

13 years ago

status-firefox19: affected → ---

(no longer active)

Updated

•

13 years ago

status-firefox19: --- → fixed

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 16

•

13 years ago

Comment on attachment 672858 [details] [diff] [review] Patch [Approval Request Comment] Bug caused by (feature/regressing bug #): A long time ago User impact if declined: Moth test runs will go orange for silly reasons more often, lowering our test coverage. Testing completed (on m-c, etc.): On m-c Risk to taking this patch (and alternatives if risky): This is moderately risky. We thing we've figured this out correctly but this is complex multithreaded code. We haven't seen any problems on m-c since the second landing. String or UUID changes made by this patch: None.

Attachment #672858 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Comment 17

•

13 years ago

Comment on attachment 672858 [details] [diff] [review] Patch [Triage Comment] The risk/reward here doesn't play out from the user perspective. Apologies.

Attachment #672858 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-

Ed Morley [:emorley]

Comment 18

•

13 years ago

(In reply to Alex Keybl [:akeybl] from comment #17) > Comment on attachment 672858 [details] [diff] [review] > Patch > > [Triage Comment] > The risk/reward here doesn't play out from the user perspective. Apologies. Fair enough, but Linux32 mochitest-browser-chrome now hidden on TBPL for aurora.

Ryan VanderMeulen [:RyanVM]

Comment 19

•

13 years ago

(In reply to Ed Morley (PTO/travelling until 24th Oct) [:edmorley UTC+1] from comment #18) > Fair enough, but Linux32 mochitest-browser-chrome now hidden on TBPL for > aurora. Alex, please reconsider this nomination. I really don't think we want no mozilla18 browser-chrome test coverage on Linux and this patch is the only way to get it working again.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 20

•

13 years ago

Comment on attachment 672858 [details] [diff] [review] Patch Renominating for comments 18/19.

Attachment #672858 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora?

Lukas Blakk [:lsblakk] use ?needinfo

Comment 21

•

13 years ago

Comment on attachment 672858 [details] [diff] [review] Patch We don't want to end up with no mochitest coverage, is there user-facing risk here if there are regressions from this patch? That's not clear from the comments so far, and if the risk is just more wonkiness in the tests that we will a) catch quickly and b) fix with a simple backout of this patch, we could reconsider this for uplift.

Karl Tomlinson (:karlt)

Comment 22

•

13 years ago

The line of discussion here seems to imply that tests are doing something that our users are not doing. If that is the case, then the alternative may be to disable those particular tests. There is also risk in that approach that some testing of features that users do use may be disabled.

Masatoshi Kimura [:emk]

Comment 23

•

13 years ago

Disabling particular tests will not workaround the problem. It will only move the failure to other tests (see bug 798849 comment #0). The only way would be hiding the entire browser-chrome on Linux opt if the patch can not be accepted.

Karl Tomlinson (:karlt)

Comment 24

•

13 years ago

From comment 5, the difference between the browser and the tests is that some tests are starting new mozStorage connections, while the browser (re-)uses lifetime connections. Only the tests that use new mozStorage connections would need to be disabled. I'm assuming not every test in browser-chrome uses new mozStorage connections.

Lukas Blakk [:lsblakk] use ?needinfo

Comment 25

•

13 years ago

Kyle: In your risk assessment you say: "This is moderately risky. We thing we've figured this out correctly but this is complex multithreaded code. We haven't seen any problems on m-c since the second landing." It's been on central 5 days now, is that an enough time now to re-evaluate the risk again and know if we're doing this right? Given comment 17 and comment 18 we want to be able to approve this for aurora but we are concerned about any potential regressions that could go undetected. What is a decent amount of time (and usage) to give us more confidence in uplifting here?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 26

•

13 years ago

I think we can be more confident now that the fix is correct. We have not seen any confirmed or suspected regressions from this.

Alex Keybl [:akeybl]

Comment 27

•

13 years ago

(In reply to Ryan VanderMeulen from comment #19) > (In reply to Ed Morley (PTO/travelling until 24th Oct) [:edmorley UTC+1] > from comment #18) > > Fair enough, but Linux32 mochitest-browser-chrome now hidden on TBPL for > > aurora. > > Alex, please reconsider this nomination. I really don't think we want no > mozilla18 browser-chrome test coverage on Linux and this patch is the only > way to get it working again. I didn't suggest we hide the tests, just that we leave this unfixed. I don't see this bug suggesting we're permaorange "Moth test runs will go orange for silly reasons more often".

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 28

•

13 years ago

This bug causes us to run out of virtual address space and crash, which turns the test run orange.

Phil Ringnalda (:philor)

Comment 29

•

13 years ago

To put it absolutely as clearly as possible: The "bc" job, the browser-chrome tests, are already hidden on the "Linux pgo" row of tbpl on https://tbpl.mozilla.org/?tree=Mozilla-Aurora. It's no longer Moth, browser-chrome got separated out from Moth. That means that the Linux 32-bit on-push and nightly builds will continue to run browser-chrome tests, but no sheriff will look at them, for the lifespan of Gecko 18. edmorley spoke for us in comment 18 - we're done with looking at OOM-destroyed runs. It's still possible to see them in https://tbpl.mozilla.org/?tree=Mozilla-Aurora&noignore=1, though they hardly ever run to completion, but we will not be looking at them and we will not be sorting out the dozens of different expected OOM failures resulting from this bug from any other failures, so if (as is not uncommon) someone breaks a test on 32-bit Linux, we will not know, and we will not back them out. It has run green 4 times since Tuesday morning, and orange 27 times. I didn't look, since they are not now our responsibility, but based on how it went on the trunk, I would expect that more than 20 of those oranges were not just "orange for silly reasons," they were instead "orange from OOM, causing the test run to halt and fail, so there is no way that even someone who knows all of the OOM failures can know whether or not something bad happened to another test."

Boris Zbarsky [:bzbarsky]

Reporter

Comment 30

•

13 years ago

Alex, the bug covering the permaorange is bug 798849, which is blocked on this one. The patch for this bug fixes that one.

bhavana bajaj [:bajaj]

Updated

•

13 years ago

Attachment #672858 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Phil Ringnalda (:philor)

Comment 31

•

13 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/fb18c6ecbb99

status-firefox18: affected → fixed

Manuela Muntean [Away]

Updated

•

13 years ago

Whiteboard: [MemShrink:P2] → [MemShrink:P2], [qa-]

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

13 years ago

tracking-firefox-esr10: --- → -

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

13 years ago

status-firefox-esr10: affected → wontfix

status-firefox-esr17: affected → wontfix

Ed Morley [:emorley]

Comment 32

•

13 years ago

Note that: status-firefox-esr17: affected → wontfix Means I've just hidden Linux32 opt/pgo mochitest-browser-chrome on ESR17.

BMO Automation

Updated

•

1 year ago

Product: Toolkit → Core

You need to log in before you can comment on or make changes to this bug.