Closed Bug 855331 Opened 12 years ago Closed 12 years ago

TransactionThreadPool doing sketchy things with mCompleteCallbacks array that can mutate

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla23

Tracking Flags:

Tracking

Status

firefox20

---

unaffected

firefox21

---

unaffected

firefox22

wontfix

firefox23

fixed

firefox-esr17

---

unaffected

b2g18

---

unaffected

People

(Reporter: bent.mozilla, Assigned: janv)

References

Details

(Keywords: regression, sec-critical, Whiteboard: [regression from bug 767944?][don't uplift yet, see comment 23][adv-main23+])

Attachments

(1 file)

idb mochitests under valgrind (on mac) 12 years ago Jan Varga [:janv] 1.63 MB, text/plain		Details

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Description

•

12 years ago

See https://tbpl.mozilla.org/php/getParsedLog.php?id=21144311&tree=Mozilla-Inbound TransactionThreadPool::FinishTransaction() calls MaybeFireCallback with a DatabasesCompleteCallback& reference that comes out of mCompleteCallbacks. It then calls Run() on the callback, which could do something like call WaitForAllDatabasesToComplete which appends an element to mCompleteCallbacks. If that causes reallocation then the DatabasesCompleteCallback& on our stack is now garbage. Somehow the log says we're jumping to HasTransactionsForDatabase (line 472), but I can't see how that is happening. Maybe we're calling a virtual Run() on one of our callback pointers that is now garbage?

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Keywords: sec-critical

Al Billings [:abillings - ex-MoCo]

Comment 1

•

12 years ago

When we figure this out, we'd like to know how far back it goes.

status-firefox22: --- → affected

tracking-firefox22: --- → +

Ryan VanderMeulen [:RyanVM]

Comment 2

•

12 years ago

bent says he's not working on bug 855276, which presumably means the same for this bug. Can we please get an owner? It's a frequent orange and s-s.

Al Billings [:abillings - ex-MoCo]

Updated

•

12 years ago

status-firefox23: --- → affected

tracking-firefox22: + → ?

tracking-firefox23: --- → +

David Bolter [:davidb] (NeedInfo me for attention)

Comment 3

•

12 years ago

Kyle you lost our coin toss :) (Please find another owner if appropriate)

Assignee: nobody → khuey

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Assignee: khuey → nobody

tracking-firefox22: ? → +

Keywords: regression

Whiteboard: work happening in bug 855276

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Assignee: nobody → khuey

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 4

•

12 years ago

We suspect that this isn't actually a security issue, but we're not confident enough to open it up yet. In particular, comment 0 is wrong, and every instance of this we've seen is a null deref.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

12 years ago

Whiteboard: work happening in bug 855276

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Keywords: sec-critical → sec-other

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 5

•

12 years ago

We're pretty convinced this is a security bug again ;-) Still haven't managed to reproduce.

Keywords: sec-other → sec-critical

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 6

•

12 years ago

I landed a patch to change from pass-by-reference to pass-by-value when calling this function and that didn't change anything, so this looks like actual heap corruption.

Andrew McCreight [:mccr8]

Comment 7

•

12 years ago

I'm going to assume this affects everything. Adjust as needed.

status-b2g18: --- → affected

status-firefox20: --- → wontfix

status-firefox21: --- → affected

status-firefox-esr17: --- → affected

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Keywords: regression

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 8

•

12 years ago

I strongly suspect this is a regression for bug 767944.

Andrew McCreight [:mccr8]

Comment 9

•

12 years ago

Great, that would match up with the time this started happening.

status-b2g18: affected → unaffected

status-firefox20: wontfix → unaffected

status-firefox21: affected → unaffected

status-firefox-esr17: affected → unaffected

Keywords: regression

Whiteboard: [regression from bug 767944?]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 10

•

12 years ago

Yes that's my primary piece of evidence ;-)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 11

•

12 years ago

I think we should consider backing out Bug 767944. I haven't managed to reproduce this locally :-/

Jan Varga [:janv]

Assignee

Comment 12

•

12 years ago

Please give me a few days, I'll try to run IDB tests under valgrind.

Jan Varga [:janv]

Assignee

Comment 13

•

12 years ago

Attached file idb mochitests under valgrind (on mac) — Details

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

12 years ago

Assignee: khuey → Jan.Varga

Ryan VanderMeulen [:RyanVM]

Comment 14

•

12 years ago

(In reply to Jan Varga [:janv] from comment #12) > Please give me a few days, I'll try to run IDB tests under valgrind. Where do we stand on this? Are we considering a backout at this point? We're nearing the end of the Fx23 cycle and I don't think we want to bring this over to Aurora.

Jan Varga [:janv]

Assignee

Comment 15

•

12 years ago

The centralized quota manager is already on Aurora, but it seems the crash doesn't happen there so often. Could the crash somehow relate to the fact that dom/indexedDB tests now run in mochitest-3 ? They used to run in mochitest-2.

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Comment 16

•

12 years ago

It's important to keep in mind that this patch cannot be simply backed out. It changed the directory structure of idb databases so a complete backout would effectively lose all idb data. If we decide to back this out we'll need to handle that migration as well.

Jan Varga [:janv]

Assignee

Comment 17

•

12 years ago

I'm testing a patch here: https://tbpl.mozilla.org/?tree=Try&rev=01b6bfd06fd7 It looks promising so far ...

Jan Varga [:janv]

Assignee

Comment 18

•

12 years ago

16 * 40 mochitest-3 runs, no crashes in MaybeFireCallback() Should we land it on m-c and see if it really helps ?

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Comment 19

•

12 years ago

Definitely.

Jan Varga [:janv]

Assignee

Comment 20

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/093f7b379757

Ryan VanderMeulen [:RyanVM]

Comment 21

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/093f7b379757 Looking very good to me! Please request Aurora uplift ASAP :)

Status: NEW → RESOLVED

Closed: 12 years ago

status-firefox23: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla23

Jan Varga [:janv]

Assignee

Comment 22

•

12 years ago

Ben, it looks to me that we were firing a callback for some other database instance and that database instance didn't exist anymore. Not sure how can that happen, but that's the only explanation I have for now.

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Comment 23

•

12 years ago

We really need to understand why this happened before we move on or uplift or anything else here. Otherwise we risk simply hiding the problem or (worse) moving it somewhere else.

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Updated

•

12 years ago

Whiteboard: [regression from bug 767944?] → [regression from bug 767944?][don't uplift yet, see comment 23]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 24

•

12 years ago

(In reply to ben turner [:bent] from comment #23) > We really need to understand why this happened before we move on or uplift > or anything else here. Otherwise we risk simply hiding the problem or > (worse) moving it somewhere else. So are we going to do that ...? We need to do something on beta.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

12 years ago

Flags: needinfo?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Comment 25

•

12 years ago

Jan, are you looking into this?

Flags: needinfo?(bent.mozilla)

Jan Varga [:janv]

Assignee

Comment 26

•

12 years ago

(In reply to ben turner [:bent] from comment #25) > Jan, are you looking into this? well, I can take a look ... do you think it could be something like the thing described in comment 22 ?

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Comment 27

•

12 years ago

I haven't been able to come up with any explanation of why this patch seems to work. Since you came up with it I'm hoping you might have a better idea ;)

Alex Keybl [:akeybl]

Comment 28

•

11 years ago

There hasn't been enough traction here to get into Beta 22, and comment 27 doesn't make me feel good about getting this into our final two betas.

status-firefox22: affected → wontfix

Daniel Veditz [:dveditz]

Comment 29

•

11 years ago

This is a "kungFuDeathGrip" type patch. The worst risk of uplifting it would seem to be memory leaks (but those should show up as unexpected leaks in tbpl and don't seem to be).

Blocks: 767944

Daniel Veditz [:dveditz]

Comment 30

•

11 years ago

I don't know how to appeal the "wontfix" because there's no attachment on which to request approval :-( but that is what the previous comment was attempting to be.

status-firefox22: wontfix → affected

Daniel Veditz [:dveditz]

Comment 31

•

11 years ago

(In reply to ben turner [:bent] from comment #23) > We really need to understand why this happened before we move on or uplift > or anything else here. Otherwise we risk simply hiding the problem or > (worse) moving it somewhere else. Whether we uplift to Fx22 or not the above is true since the patch is on its way to release regardless.

Ben Turner (not reading bugmail, use the needinfo flag!)

Reporter

Comment 32

•

11 years ago

Can we back this out of trunk before we branch? There's no reason to think that this patch is what we actually want.

Ryan VanderMeulen [:RyanVM]

Comment 33

•

11 years ago

Can we just back this out from beta after the merge rather than inflicting the nearly once per push crash back on the rest of the tree? This was one of our most frequent oranges when we were hitting it and the rate at which this and associated bugs has moved along doesn't leave me brimming with confidence that the pain of a trunk backout will be short-lived.

Jan Varga [:janv]

Assignee

Comment 34

•

11 years ago

Ok, Ben and I suspect that those crashes we used to have before the "noone knows why it works" patch could be caused by the issue described in the bug 878703 and that the centralized manager somehow made the crashes to happen more often. I'm going to backout the "noone knows why it works" patch on the try server this weekend. I need to retrigger the tests like hundreds of times to be sure that the crashes don't happen anymore, that's why I'll do it on the weekend, when the load is lower.

Alex Keybl [:akeybl]

Updated

•

11 years ago

status-firefox22: affected → wontfix

Jan Varga [:janv]

Assignee

Comment 35

•

11 years ago

no, bug 878703 seems to be unrelated when I backed out the hacky patch, crashes reappeared: https://tbpl.mozilla.org/php/getParsedLog.php?id=24186335&tree=Try

Al Billings [:abillings - ex-MoCo]

Comment 36

•

11 years ago

Tagging this since we seem to be shipping this in two weeks.

Whiteboard: [regression from bug 767944?][don't uplift yet, see comment 23] → [regression from bug 767944?][don't uplift yet, see comment 23][adv-main23+]

Daniel Veditz [:dveditz]

Updated

•

11 years ago

Group: core-security

You need to log in before you can comment on or make changes to this bug.