Closed Bug 1078438 Opened 10 years ago Closed 10 years ago

Contacts/SMS apps are sometimes empty at launch

Categories

(Core :: Storage: IndexedDB, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: rik, Assigned: bent.mozilla)

References

Details

Attachments

(2 files, 4 obsolete files)

On my dogfooding flame, after today's update, I'm getting an empty Contacts app.

This is the error I'm seeing at launch:
W/GeckoConsole(  297): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:5510"]
W/GeckoConsole(  297): [JavaScript Error: "TypeError: aFailureCb is not a function" {file: "resource://gre/modules/ContactDB.jsm" line: 1060}]


Before that update, since the middle of last week, only ~40% of my contacts would show up (stops at a contact starting with H) with this error:
W/GeckoConsole( 287): [JavaScript Error: “IndexedDB UnknownErr: ActorsParent.cpp:5510”]
E/Communications( 3866): Content JS ERROR: Error getting first contacts
Hm, any chance you can email me your contacts database+files?
That error just means that the other process crashed though, right?
I'm not getting an empty Contacts app, but I also noticed similar errors in logcat today (only line numbers are different):

E/QCALOG  (  306): [MessageQ] ProcessNewMessage: [XT-CS] unknown deliver target [OS-Agent]
E/QCALOG  (  306): [MessageQ] ProcessNewMessage: [XTWWAN-PE] unknown deliver target [OS-Agent]
I/Gecko   (  208):
I/Gecko   (  208): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x120005,name=PBackground::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value
I/Gecko   (  208):
I/Gecko   (  208): IPDL protocol error: could not look up PBlob
I/Gecko   (  208): IPDL protocol error: Error deserializing 'PBlobParent'
I/Gecko   (  208):
I/Gecko   (  208): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x60004,name=PBackgroundIDBDatabase::Msg_PBackgroundIDBDatabaseFileConstructor) Value error: message was deserialized, but contained an illegal value
I/Gecko   (  208):
I/Gecko   (  208):
I/Gecko   (  208): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
I/Gecko   (  208):
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:7692"]
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:7692"]
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:7692"]
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:6065"]
W/AudioFlinger(  204): Thread AudioOut_2 cannot connect to the power manager service
E/AudioFlinger(  204): no wake lock to update!
I/Gecko   ( 1672): ###################################### forms.js loaded
I/Gecko   ( 1672): ############################### browserElementPanning.js loaded
I/Gecko   ( 1672): ######################## BrowserElementChildPreload.js loaded
D/wpa_supplicant(  992): RX ctrl_iface - hexdump(len=11): 53 49 47 4e 41 4c 5f 50 4f 4c 4c
D/wpa_supplicant(  992): wlan0: Control interface command 'SIGNAL_POLL'
D/wpa_supplicant(  992): nl80211: survey data missing!
E/QCALOG  (  306): [MessageQ] ProcessNewMessage: [XTWiFi-PE] unknown deliver target [OS-Agent]

This log output was either a manifestation of bug 1079313 or one of bug 1079322, can't remember.
Just sent the .sqlite and .sqlite-journal files to Ben.
I get the same issue (W/GeckoConsole(  297): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:5510"]) when launching the SMS App from my 2.0 backup, but the consequences is that the SMS app does not load :)

so basically same thing.
I can send my db too if necessary, ben.
I think we have several bugs filled with this problem.

Gabriele did you opened a bug related to this problem too? (Cannot find it)
Flags: needinfo?(gsvelto)
Yes, there's plenty of fallout from recent IndexedDB changes. The most similar is bug 1082505, :khuey suggested trying the fix from bug 1082543 but I haven't had time to do it yet.
Flags: needinfo?(gsvelto)
Hrm. So I just looked at the database, it looks like your 'getallcache' obejctstore is empty. This may be a dupe of 1072773.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #9)
> Hrm. So I just looked at the database, it looks like your 'getallcache'
> obejctstore is empty. This may be a dupe of 1072773.

But I have a similar issue with the SMS app too. From time to time, I have the same error, and the SMS app stays blank. Same IndexedDB error in the log.
But simply restarting the app then works.

I'll try tomorrow with a newer build and I can send you the SMS DB if necessary.
Flags: needinfo?(felash)
Attached file boot.log
Flags: needinfo?(felash)
I think we still have the same error in the latest build.

It prevented from booting after a full flash. (see boot.log attachment)

Then I rebooted using "adb reboot" and it correctly booted. I then flashed my own SMS and Contacts db to the phone, then rebooted (I can provide all of this if you want).

Then I launched the SMS app, the DB migration likely started (my db comes from FxOS v1.3). But it worked fine.

So I don't have the issue with the same STR right now (I tried several times to push my db). This is weird. I'll report if I see this again.
Does SMS use a separate objectStore for a cache like contacts? One that it might not always include in the same transaction?
SMS actually uses 3 databases:
* the one in mozContacts
* the one in mozMobileMessaging
* one managed by asyncStorage to store drafts

Each database are independant, and for sure they all use different transactions when we write to them. But we use no database to cache stuff yet, they all have different responsibilities.

But as I said, I didn't reproduce the issue in a recent build in SMS :/
Sorry, just reproduced on latest nightly.
Summary: Contact app is empty at launch → Contacts/SMS apps are sometimes empty at launch
Julien, have you seen this recently?
Flags: needinfo?(felash)
I need to check with a newer gecko. Let me update my phones and I'll report.
yeah, just got it on latest build for Buri :(

It seems to happen after a reboot only, but I get it quite consistently on this build.
Flags: needinfo?(felash)
"quite" consistently, not 100%. Of 3 reboots, I got the issue twice.
Can you post updated logcat for it?
Flags: needinfo?(felash)
0-29 10:05:32.940  1222  1257 W GeckoConsole: [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:415"]
10-29 10:05:33.060  1481  1481 E Messages: Content JS ERROR: Reading the database. Error: InternalError 

This is a filtered logcat, do you want an unfiltered log?
Flags: needinfo?(felash)
I checked the unfiltered log, I have nothing better.
Julien, also, if you don't could you email me the contacts db and files folder? I promise not to post it on dropbox! :)

Though if you're uncomfortable with that I can try to give you a better logging patch to make this more debuggable.
Flags: needinfo?(felash)
Unfortunately I don't do my own build these days so gdb is not useful.

I'm sending to you the data I use.
Flags: needinfo?(felash)
Ok, I know what is going on here now. This isn't actually caused by bug 994190, we're just losing a race here.
No longer blocks: IndexedDB-on-PBackground
Attached patch Patch, v1 (obsolete) — Splinter Review
Ok, this patch fixes it locally for me. I'll try to chat with the SQLite team to see if there are better ways to do this.
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Comment on attachment 8517846 [details] [diff] [review]
Patch, v1

Review of attachment 8517846 [details] [diff] [review]:
-----------------------------------------------------------------

Jan, how does this look?
Attachment #8517846 - Flags: review?(Jan.Varga)
Comment on attachment 8517846 [details] [diff] [review]
Patch, v1

Review of attachment 8517846 [details] [diff] [review]:
-----------------------------------------------------------------

looks good - if there's no other way to fix it
Attachment #8517846 - Flags: review?(Jan.Varga) → review+
Attached patch Patch, v2 (obsolete) — Splinter Review
The SQLite folks think that this will be a bit safer. Jan, how does this look?
Attachment #8517846 - Attachment is obsolete: true
Attachment #8518447 - Flags: review?(Jan.Varga)
Comment on attachment 8518447 [details] [diff] [review]
Patch, v2

This one doesn't seem to work as well... Let me tweak it a bit more.
Attachment #8518447 - Flags: review?(Jan.Varga)
Attachment #8518447 - Attachment is obsolete: true
Comment on attachment 8517846 [details] [diff] [review]
Patch, v1

This one works. v2 doesn't.
Attachment #8517846 - Attachment is obsolete: false
Attached patch Patch, v1.1 (obsolete) — Splinter Review
This is basically the same as v1 but simplified a bit, and with beefier comments.
Attachment #8517846 - Attachment is obsolete: true
Attachment #8519026 - Flags: review?(Jan.Varga)
Comment on attachment 8519026 [details] [diff] [review]
Patch, v1.1

Review of attachment 8519026 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, I wonder if we have other bugs filed which are caused by the same issue.
Attachment #8519026 - Flags: review?(Jan.Varga) → review+
Attached patch Patch, v1.2 (obsolete) — Splinter Review
This is basically the same patch as v1 but we have an additional failsafe in mozStorageConnection.cpp.

Mak, I just need you to review the change to mozStorageConnection. We were being too conservative there before and not letting connections rollback to drop their locks.
Attachment #8519026 - Attachment is obsolete: true
Attachment #8519082 - Flags: review?(mak77)
Attachment #8519082 - Flags: review?(Jan.Varga)
Attachment #8519082 - Flags: review?(Jan.Varga) → review+
Comment on attachment 8519082 [details] [diff] [review]
Patch, v1.2

Review of attachment 8519082 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/mozStorageConnection.cpp
@@ +977,5 @@
>  
>      srv = WaitForUnlockNotify(aNativeConnection);
> +    if (MOZ_UNLIKELY(srv != SQLITE_OK)) {
> +      NS_WARNING("WaitForUnlockNotify detected deadlock, yielding to let "
> +                 "another connection drop its locks!");

is it possible a few yields won't be enough to exit the deadlock and we will end up looping here indefinitely or for a very long time?

I was thinking maybe we should still check a time diff here, and after a meaningful amount of time (a few seconds?) MOZ_ASSERT and still break out of the loop.
I'm not sure how I feel about a timeout like that... It just depends on how fast a machine's cpu and disk are. Sometimes you'll just get unlucky and get an error, but it might be rare so writing code that handles that error will be challenging. Also, the sqlite3_unlock_notify() that we call today waits forever with no timeout, right?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #40)
> be challenging. Also, the sqlite3_unlock_notify() that we call today waits
> forever with no timeout, right?

if sqlite3_unlock_notify detects a deadlock it returns SQLITE_LOCKED but we wait the unlock notification only if it returns SQLITE_OK.
Afaict your code is handling the SQLITE_LOCKED (or other error) case in which we were unable to install a notification callback at all, most likely because Sqlite detected a deadlock across 2 connections. That deadlock wouldn't be solved just by waiting the shared lock an so Sqlite doesn't even try to install the callback.

This is how I interpreted the Deadlock Detection section at https://www.sqlite.org/c3ref/unlock_notify.html and our implementation of WaitForUnlockNotify.

So the problem I see there, is that at one of the next loops, the deadlock could be solved by one of the 2 connections, but that's not mandatory, it could be a real deadlock that will never solve.
The previous break was then just giving up and returning an error, to avoid entering the deadlock, while with your change we could be trapped indefinitely in the loop.

Did I misinterpreted something about your proposal?
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #41)
> This is how I interpreted the Deadlock Detection

Yep, that is correct.

> Did I misinterpreted something about your proposal?

Nope, I think you nailed the mozStorage part. The only thing you're missing is that we explicitly ROLLBACK the *other* connection in indexedDB (see the ActorsParent.cpp changes in the patch) when this happens so that we're guaranteed to never deadlock forever. Of course, that is only true for indexedDB connections, but as far as I know the only time we ever use the unlock_notify mechanism is with indexedDB anyway.

However, in the meantime, I have figured out another way around this problem that won't require any mozStorage changes, so I don't think we have to worry about this at all anymore. Sorry!
Attached patch Patch, v2Splinter Review
Nuke the table locks from orbit.
Attachment #8519082 - Attachment is obsolete: true
Attachment #8519082 - Flags: review?(mak77)
Attachment #8520703 - Flags: review?(Jan.Varga)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #42)
> Nope, I think you nailed the mozStorage part. The only thing you're missing
> is that we explicitly ROLLBACK the *other* connection in indexedDB (see the
> ActorsParent.cpp changes in the patch) when this happens so that we're
> guaranteed to never deadlock forever. Of course, that is only true for
> indexedDB connections, but as far as I know the only time we ever use the
> unlock_notify mechanism is with indexedDB anyway.

Yes my point was not specifically about indexedDB, rather about other consumers that could cause trouble to Firefox.  It's true the possibilities to hit this kind of deadlock are low, but the add-ons ecosystem is very unpredictable, so I would have preferred to not risk the possibility of an infinite loop here.
Comment on attachment 8520703 [details] [diff] [review]
Patch, v2

Review of attachment 8520703 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +2204,5 @@
> +    // ourselves and we know we will never allow a write transaction to modify
> +    // an object store that a read transaction is in the process of using.
> +    "PRAGMA read_uncommitted = TRUE;"
> +    // No more PRAGMAs.
> +    ;

This looks so scary and awesome at the same time :)
Attachment #8520703 - Flags: review?(Jan.Varga) → review+
For those following along, this fix works fine, but exposes us to another SQLite bug that the SQLite team is testing a fix for right now. Once that fix is available we can land both.
Ok, we have the new SQLite version ready in bug 1101217.
https://hg.mozilla.org/mozilla-central/rev/577bc6a54c1d
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
\o/ thanks
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: