Closed Bug 832609 Opened 7 years ago Closed 6 years ago

test_823965.html : Assertion failure: (next == this) == (prev == this), at ../../../dist/include/mozilla/LinkedList.h:191

Categories

(Core :: DOM: Device Interfaces, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla24
Tracking Status
firefox19 --- fixed
firefox20 --- fixed
firefox21 --- wontfix
firefox22 + fixed
firefox23 + fixed
firefox24 --- fixed
b2g18 --- fixed

People

(Reporter: jfkthame, Assigned: justin.lebar+bug)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=18943119&tree=Mozilla-Inbound

Assertion failure: (next == this) == (prev == this), at ../../../dist/include/mozilla/LinkedList.h:191
TEST-UNEXPECTED-FAIL | /tests/dom/devicestorage/test/test_823965.html | Exited with code 11 during test run
PROCESS-CRASH | /tests/dom/devicestorage/test/test_823965.html | application crashed [@ mozilla::LinkedListElement<nsDOMMemoryFile::DataOwner>::isInList() const]
Thread 31 (crashed)

Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 31 (crashed)
 0  libxul.so!mozilla::LinkedListElement<nsDOMMemoryFile::DataOwner>::isInList() const [LinkedList.h : 191 + 0x18]
    rbx = 0x000000000b07f720   r12 = 0x0000000000000000
    r13 = 0x0000000000000001   r14 = 0x00007f1d6aafecc0
    r15 = 0x0000000000000000   rip = 0x00007f1dccbd2f9e
    rsp = 0x00007f1d6aafeb10   rbp = 0x00007f1d6aafeb10
    Found by: given as instruction pointer in context
 1  libxul.so!mozilla::LinkedListElement<nsDOMMemoryFile::DataOwner>::~LinkedListElement [LinkedList.h : 120 + 0x4]
    rbx = 0x000000000b07f720   r12 = 0x0000000000000000
    r13 = 0x0000000000000001   r14 = 0x00007f1d6aafecc0
    r15 = 0x0000000000000000   rip = 0x00007f1dccbd3039
    rsp = 0x00007f1d6aafeb20   rbp = 0x00007f1d6aafeb30
    Found by: call frame info
 2  libxul.so!nsDOMMemoryFile::DataOwner::Release() [nsDOMFile.h : 399 + 0x7]
    rbx = 0x000000000b07f720   r12 = 0x0000000000000000
    r13 = 0x0000000000000001   r14 = 0x00007f1d6aafecc0
    r15 = 0x0000000000000000   rip = 0x00007f1dccbee8f9
    rsp = 0x00007f1d6aafeb40   rbp = 0x00007f1d6aafeb50
    Found by: call frame info
 3  libxul.so!DataOwnerAdapter::Release() [nsDOMFile.cpp : 87 + 0x4]
    rbx = 0x0000000002c04be0   r12 = 0x0000000000000000
    r13 = 0x0000000000000001   r14 = 0x00007f1d6aafecc0
    r15 = 0x0000000000000000   rip = 0x00007f1dccbee9d8
    rsp = 0x00007f1d6aafeb60   rbp = 0x00007f1d6aafeb70
    Found by: call frame info
 4  libxul.so!nsTArray_Impl<nsCOMPtr<nsIInputStream>, nsTArrayInfallibleAllocator>::DestructRange(unsigned int, unsigned int) [nsTArray.h : 432 + 0x4]
    rbx = 0x0000000003244ab0   r12 = 0x0000000003244ab0
    r13 = 0x0000000000000001   r14 = 0x00007f1d6aafecc0
    r15 = 0x0000000000000000   rip = 0x00007f1dcd843657
    rsp = 0x00007f1d6aafeb80   rbp = 0x00007f1d6aafeb90
    Found by: call frame info
 5  libxul.so!nsTArray_Impl<nsCOMPtr<nsIInputStream>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) [nsTArray.h : 1017 + 0xc]
    rbx = 0x0000000000000000   r12 = 0x0000000003e682b8
    r13 = 0x0000000000000001   r14 = 0x00007f1d6aafecc0
    r15 = 0x0000000000000000   rip = 0x00007f1dcd8438e3
    rsp = 0x00007f1d6aafeba0   rbp = 0x00007f1d6aafebc0
    Found by: call frame info
...etc.
Can this LinkedList be used by multiple threads?
...without locking?

What's happening here is that the linked list destructor is running, and it's checking whether the node is in a list.  In this case, it's partially in a list (one of the pointer is to self, but the other pointer is not).  That's only a valid state if the list is currently being mutated.  Which means that either we have memory corruption or the list is being mutated /while the destructor is running/.
Well DataOwner's dtor can run on any thread, so if there's no locking we need to add some.
Attached patch Patch (obsolete) — Splinter Review
Assignee: nobody → khuey
Status: NEW → ASSIGNED
Attachment #705443 - Flags: review?(justin.lebar+bug)
Depends on: 833913
khuey and I talked about this on IRC, and we agree this patch is scary.

So what we're going to do is disable this memory reporter on b2g18 (that's safest possible change we could make there) and then try a simpler approach for trunk (register one memory reporter for each large blob).  This simpler approach requires bug 833913.
Attachment #705443 - Attachment is obsolete: true
Attachment #705443 - Flags: review?(justin.lebar+bug)
Attachment #705531 - Flags: review?(justin.lebar+bug)
Comment on attachment 705531 [details] [diff] [review]
Disable the reporter on b2g18.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 803684
User impact if declined: crashes, memory corruption, bad times
Testing completed: Looks about right to me.
Risk to taking this patch (and alternatives if risky): Hard to imagine things getting worse as a result of this patch.
String or UUID changes made by this patch: none
Attachment #705531 - Flags: review?(justin.lebar+bug)
Attachment #705531 - Flags: review+
Attachment #705531 - Flags: approval-mozilla-b2g18?
Attachment #705531 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+
Whiteboard: checkin-needed
Keywords: checkin-needed
Whiteboard: checkin-needed
Comment on attachment 705531 [details] [diff] [review]
Disable the reporter on b2g18.

[Approval Request Comment]
See comment 9.
Attachment #705531 - Flags: approval-mozilla-beta?
Attachment #705531 - Flags: approval-mozilla-aurora?
Attachment #705531 - Flags: approval-mozilla-beta?
Attachment #705531 - Flags: approval-mozilla-beta+
Attachment #705531 - Flags: approval-mozilla-aurora?
Attachment #705531 - Flags: approval-mozilla-aurora+
Aand we never followed up with this bug.  :(
Duplicate of this bug: 873623
The paper-over fix needs to be uplifted to Aurora again, and we need to actually fix this.  It's not just the dtor; the constructor also can run on any thread, afaict from bug 873623.
Comment on attachment 705531 [details] [diff] [review]
Disable the reporter on b2g18.

We need to re-land this on Aurora.  :(
Attachment #705531 - Flags: approval-mozilla-aurora+ → approval-mozilla-aurora?
Ah crap.  I hate it when I lose track of bugs.
I'm compiling a patch for this that involves making a StaticMutex, which is something I've wanted in many other cases.  If this works, it would be pretty nice...
Assignee: khuey → justin.lebar+bug
No way this will compile anywhere else.

https://tbpl.mozilla.org/?tree=Try&rev=6469459a5db5
Depends on bug 873801, which has its own dependency.
Attachment #751407 - Flags: review?(khuey)
Depends on: 873801
Comment on attachment 705531 [details] [diff] [review]
Disable the reporter on b2g18.

Approving for Aurora 23 - should we also get this on Beta 22?  We aren't tracking this bug because it's not clear that we have signs of significant, reproducible user impact here.  Please nominate if you believe it to be more serious.
Attachment #705531 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
> We aren't tracking this bug because it's not clear that we have signs of significant, 
> reproducible user impact here.

Probably because few sites use workers.

> should we also get this on Beta 22?

Yes; thanks for asking.

> Please nominate if you believe it to be more serious.

I believe this to be pretty serious.  I'll elaborate in a PM.
No longer depends on: 833913
Comment on attachment 705531 [details] [diff] [review]
Disable the reporter on b2g18.

Re-nom'ing for beta.  It's a shame we missed this for the last release, but that ship has sailed.
Attachment #705531 - Flags: approval-mozilla-beta+ → approval-mozilla-beta?
Checkin-needed to re-land on Aurora (and beta, if and when we get re-approval).
Keywords: checkin-needed
Whiteboard: [for checkin-needed, see comment 28]
https://hg.mozilla.org/releases/mozilla-aurora/rev/729fb1f47da8

Not sure what the beta holdup is here...
Keywords: checkin-needed
Whiteboard: [for checkin-needed, see comment 28]
(In reply to Ryan VanderMeulen [:RyanVM][Out May 23-27] from comment #29)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/729fb1f47da8
> 
> Not sure what the beta holdup is here...

It's not clear that this intermittent failure is hitting FF22, so we were waiting for jlebar to elaborate on severity here (didn't get a PM).
I sent a PM to lsblakk immediately after posting the comment.  I didn't get a response; maybe it didn't go through.

I'll just say it here: I think you could exploit this bug without a lot of difficulty.
Attachment #705531 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Gah, https://hg.mozilla.org/integration/mozilla-inbound/rev/56f89cd46261 landed with this bug in the commit message instead of bug 873801.  At least they're related bugs...
https://hg.mozilla.org/mozilla-central/rev/56f89cd46261
https://hg.mozilla.org/mozilla-central/rev/8d0ac9d38a1b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Ah, that's why this hasn't landed on beta: The m-c merge removed checkin-needed.

I'll land this on beta myself.
That and the whole holiday weekend thing...
I just hit the following assertion: http://pastebin.mozilla.org/2565047
It was my understanding that this bug should have fixed it.
That's not an assertion, that's just a stack.  I bet that's a refcounting threadsafety assertion firing.
Well, now it's an assertion + a stack:)
http://pastebin.mozilla.org/2565061

And yes, it looks like a refcounting threadsafety assertion.
Yeah my guess is that we create it on whatever thread wants to use it first which may not be the main thread.  It should really just have threadsafe refcounting.

Open a new bug please.
You need to log in before you can comment on or make changes to this bug.