Closed Bug 1085575 Opened 11 years ago Closed 10 years ago

Intermittent test_app_install.html | application crashed [@ libxul.so + 0xd0fc88] after "Assertion failure: aThisObjectStore->mComittedAutoIncrementId <= otherObjectStore->mComittedAutoIncrementId, at dom/indexedDB/ActorsParent.cpp:12131"

Categories

(Core :: Storage: IndexedDB, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox41 --- disabled
firefox42 --- disabled
firefox44 --- wontfix
firefox45 --- fixed
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: RyanVM, Assigned: khuey)

References

Details

(Keywords: assertion, crash, intermittent-failure, Whiteboard: [rr-chaos])

Attachments

(1 file)

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=502252&repo=mozilla-central builder Android 4.0 Panda mozilla-central debug test mochitest-4 buildid 20141017193913 builduid bf31259211cc4ce68150f9515487210a results warnings (1) revision 92c87e95915e slave panda-0423 starttime Fri Oct 17 2014 23:42:19 GMT-0400 (Eastern Standard Time) 21:08:28 INFO - 731 INFO TEST-START | /tests/dom/datastore/tests/test_app_install.html 21:08:28 INFO - INFO | automation.py | Application ran for: 0:15:05.521777 21:08:28 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpsln9YCpidlog 21:08:28 INFO - Contents of /data/anr/traces.txt: 21:08:28 INFO - /data/tombstones does not exist; tombstone check skipped 21:08:29 INFO - mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-debug/1413599953/fennec-36.0a1.en-US.android-arm.crashreporter-symbols.zip 21:08:34 INFO - mozcrash Saved minidump as /builds/panda-0423/test/build/blobber_upload_dir/4ab61bc4-abd3-f6d0-5cf0d034-384ddba6.dmp 21:08:34 INFO - mozcrash Saved app info as /builds/panda-0423/test/build/blobber_upload_dir/4ab61bc4-abd3-f6d0-5cf0d034-384ddba6.extra 21:08:34 WARNING - PROCESS-CRASH | /tests/dom/datastore/tests/test_app_install.html | application crashed [@ libxul.so + 0xd0fc88] 21:08:34 INFO - Crash dump filename: /tmp/tmpFunlVa/4ab61bc4-abd3-f6d0-5cf0d034-384ddba6.dmp 21:08:34 INFO - Operating system: Android 21:08:34 INFO - 0.0.0 Linux 3.2.0+ #2 SMP PREEMPT Thu Nov 29 08:06:57 EST 2012 armv7l pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys 21:08:34 INFO - CPU: arm 21:08:34 INFO - 2 CPUs 21:08:34 INFO - Crash reason: SIGSEGV 21:08:34 INFO - Crash address: 0x0 21:08:34 INFO - Thread 32 (crashed) 21:08:34 INFO - 0 libxul.so + 0xd0fc88 21:08:34 INFO - r4 = 0x00000001 r5 = 0x00000000 r6 = 0x6e874820 r7 = 0x6e17ad00 21:08:34 INFO - r8 = 0x6c5ffca0 r9 = 0x6c5ffbd0 r10 = 0x00000002 fp = 0x00000000 21:08:34 INFO - sp = 0x6c5ffbd0 lr = 0x62acbce7 pc = 0x62ad2c88 21:08:34 INFO - Found by: given as instruction pointer in context 21:08:34 INFO - 1 libxul.so!nsBaseHashtable<nsUint64HashKey, nsRefPtr<mozilla::dom::indexedDB::{anonymous}::FullObjectStoreMetadata>, mozilla::dom::indexedDB::{anonymous}::FullObjectStoreMetadata*>::s_EnumReadStub [nsBaseHashtable.h:92c87e95915e : 391 + 0xb] 21:08:34 INFO - r4 = 0x62ad2c4d r5 = 0x6f37c518 r6 = 0x00000001 r7 = 0x62acbecd 21:08:34 INFO - r8 = 0x00000001 r9 = 0x000000c0 r10 = 0x0000ffff fp = 0x6c5ffc68 21:08:34 INFO - sp = 0x6c5ffc18 pc = 0x62acbedb 21:08:34 INFO - Found by: call frame info 21:08:34 INFO - 2 libxul.so!PLDHashTable::Enumerate(PLDHashOperator (*)(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*), void*) [pldhash.cpp:92c87e95915e : 722 + 0x11] 21:08:34 INFO - r4 = 0x6bf63f58 r5 = 0x6f37c518 r6 = 0x00000001 r7 = 0x62acbecd 21:08:34 INFO - r8 = 0x00000001 r9 = 0x000000c0 r10 = 0x0000ffff fp = 0x6c5ffc68 21:08:34 INFO - sp = 0x6c5ffc28 pc = 0x62131ab5 21:08:34 INFO - Found by: call frame info 21:08:34 INFO - 3 libxul.so!nsBaseHashtable<nsUint64HashKey, nsRefPtr<mozilla::dom::indexedDB::{anonymous}::FullObjectStoreMetadata>, mozilla::dom::indexedDB::{anonymous}::FullObjectStoreMetadata*>::EnumerateRead [nsBaseHashtable.h:92c87e95915e : 176 + 0x3] 21:08:34 INFO - r4 = 0x6bf63f58 r5 = 0x6c5ffca0 r6 = 0x62ad2c4d r7 = 0x6b86ecb8 21:08:34 INFO - r8 = 0x6ea4c360 r9 = 0x6ea4c440 r10 = 0x6ea4c410 fp = 0x6c5ffe2f 21:08:34 INFO - sp = 0x6c5ffc60 pc = 0x62acc42f 21:08:34 INFO - Found by: call frame info 21:08:34 INFO - 4 libxul.so!mozilla::dom::indexedDB::::OpenDatabaseOp::EnsureDatabaseActor [ActorsParent.cpp:92c87e95915e : 12088 + 0x3] 21:08:34 INFO - r4 = 0x00000002 r5 = 0x00000000 r6 = 0x6bf63f20 r7 = 0x6b86ecb8 21:08:34 INFO - r8 = 0x6ea4c360 r9 = 0x6ea4c440 r10 = 0x6ea4c410 fp = 0x6c5ffe2f 21:08:34 INFO - sp = 0x6c5ffc80 pc = 0x62adf6cd 21:08:34 INFO - Found by: call frame info 21:08:34 INFO - 5 libxul.so!mozilla::dom::indexedDB::::OpenDatabaseOp::EnsureDatabaseActorIsAlive [ActorsParent.cpp:92c87e95915e : 11968 + 0x5] 21:08:34 INFO - r4 = 0x6ea4c360 r5 = 0x00000000 r6 = 0x00000000 r7 = 0x6c5ffdf0 21:08:34 INFO - r8 = 0x6ea4c460 r9 = 0x00000001 r10 = 0x00000001 fp = 0x6c5ffe2f 21:08:34 INFO - sp = 0x6c5ffcd0 pc = 0x62adfabf 21:08:34 INFO - Found by: call frame info 21:08:37 INFO - 10-17 21:08:22.757 F/MOZ_Assert( 2248): Assertion failure: aThisObjectStore->mComittedAutoIncrementId <= otherObjectStore->mComittedAutoIncrementId, at /builds/slave/m-cen-and-d-000000000000000000/build/dom/indexedDB/ActorsParent.cpp:12131
See Also: → 1080577
Ben, can you please help look into this long-running crash/assert?
Flags: needinfo?(bent.mozilla)
I haven't been able to catch this yet, even with chaos mode turned on...
Flags: needinfo?(bent.mozilla)
What platform are you testing on? Seems that the majority of the recent failures are on 10.10.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #93) > What platform are you testing on? Seems that the majority of the recent > failures are on 10.10.
Flags: needinfo?(bent.mozilla)
I was never able to catch this in a recording. Maybe khuey will have more luck?
Flags: needinfo?(bent.mozilla)
You should just disable this test on mac.
Flags: needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
Whiteboard: [test disabled on OSX debug][leave open]
I caught this in rr, will take a look at it on Tuesday.
Whiteboard: [test disabled on OSX debug][leave open] → [test disabled on OSX debug][leave open][rr-chaos]
Flags: needinfo?(khuey)
Ok, I got this in a debugger and it is indeed a race. One connection thread is currently committing an update to an autoincrement object store. At http://hg.mozilla.org/mozilla-central/annotate/ea39d4a6232c/dom/indexedDB/ActorsParent.cpp#l22244 we write the new counts to disk, and at http://hg.mozilla.org/mozilla-central/annotate/ea39d4a6232c/dom/indexedDB/ActorsParent.cpp#l22274 we update our in-memory count. Simultaneously, we're opening the database again and in AssertMetadataConsistency on the PBackground thread at http://hg.mozilla.org/mozilla-central/annotate/ea39d4a6232c/dom/indexedDB/ActorsParent.cpp#l21071. The comment here clearly contemplates this race, but the assertions are wrong. thisObjectStore->mComittedAutoincrementId can be greater than otherObjectStore->mComittedAutoincrementId (and equal to otherObjectStore->mNextAutoincrementId) if we race this way. Do you see any reason not to just adjust the assertion?
Flags: needinfo?(khuey) → needinfo?(bent.mozilla)
Well, it's been a while, but we shouldn't be able to simultaneously commit a version change transaction and open the same database again, should we? The open should have been delayed until the version change was finished.
Flags: needinfo?(bent.mozilla)
Who said anything about a versionchange transaction? These numbers are updated whenever somebody adds something that needs an autogenerated key, no?
Flags: needinfo?(bent.mozilla)
Ah, right. See, I told you it's been a while! So the race is basically that the write transaction thread has finished updating the autoincrement counts on disk but then the thread got swapped out before updating the in-memory counts? So when we open the new database it reads higher counts off disk than what we have in memory. Yeah, I guess just making the assertions handle this case properly is fine. You'd want something like: MOZ_ASSERT(thisObjectStore->mComittedAutoincrementId <= otherObjectStore->mComittedAutoincrementId || thisObjectStore->mComittedAutoincrementId == otherObjectStore->mNextAutoincrementId) Right?
Flags: needinfo?(bent.mozilla)
(In reply to Ben Turner (not reading bugmail, use the needinfo flag!) from comment #588) > Ah, right. See, I told you it's been a while! :) > So the race is basically that the write transaction thread has finished > updating the autoincrement counts on disk but then the thread got swapped > out before updating the in-memory counts? So when we open the new database > it reads higher counts off disk than what we have in memory. Right. > Yeah, I guess just making the assertions handle this case properly is fine. > You'd want something like: > > MOZ_ASSERT(thisObjectStore->mComittedAutoincrementId <= > otherObjectStore->mComittedAutoincrementId || > thisObjectStore->mComittedAutoincrementId == > otherObjectStore->mNextAutoincrementId) > > Right? Yes, this is what I proposed in comment 585. I'll write up the patch, and attempt to test it now that I understand the failure mode.
Attached patch PatchSplinter Review
Assignee: nobody → khuey
Status: NEW → ASSIGNED
Attachment #8722228 - Flags: review?(bent.mozilla)
Comment on attachment 8722228 [details] [diff] [review] Patch Spell check fail! r+
Attachment #8722228 - Flags: review?(bent.mozilla) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [test disabled on OSX debug][leave open][rr-chaos] → [test disabled on OSX debug][rr-chaos]
I re-enabled the test on OSX debug now. https://hg.mozilla.org/integration/mozilla-inbound/rev/de49494c8da7 Kyle, is this safe to uplift to 45/46 as well or would you prefer it ride the trains?
Flags: needinfo?(khuey)
Whiteboard: [test disabled on OSX debug][rr-chaos] → [rr-chaos]
Target Milestone: --- → mozilla47
This is safe to uplift.
Flags: needinfo?(khuey)
Comment on attachment 8722228 [details] [diff] [review] Patch Approval Request Comment [Feature/regressing bug #]: not sure offhand, but it's an old one [User impact if declined]: Sad sheriffs, especially if ESR45 has to live with this being unfixed. [Describe test coverage new/current, TreeHerder]: This has caused various hard to diagnose intermittents, as seen in this bug and the dupes to it. [Risks and why]: Per comment 598, this is low risk. The majority of the patch is just s/comitted/committed. The meat of the patch is an assertion tweak which doesn't affect shipping code. [String/UUID change made/needed]: None
Attachment #8722228 - Flags: approval-mozilla-beta?
Attachment #8722228 - Flags: approval-mozilla-aurora?
Comment on attachment 8722228 [details] [diff] [review] Patch I prefer happy sheriffs than sad sheriffs, taking it.
Attachment #8722228 - Flags: approval-mozilla-beta?
Attachment #8722228 - Flags: approval-mozilla-beta+
Attachment #8722228 - Flags: approval-mozilla-aurora?
Attachment #8722228 - Flags: approval-mozilla-aurora+
The whole directory was disabled on non-b2g, btw, so that cset doesn't do anything.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: