Closed Bug 1078438 Opened 10 years ago Closed 10 years ago

Contacts/SMS apps are sometimes empty at launch

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla36

People

(Reporter: rik, Assigned: bent.mozilla)

References

Details

Attachments

(2 files, 4 obsolete files)

boot.log 10 years ago Julien Wajsberg [:julienw] 61.58 KB, text/x-log		Details
Patch, v1 10 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 11.16 KB, patch	janv : review+	Details \| Diff \| Splinter Review
Patch, v2 10 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 9.54 KB, patch		Details \| Diff \| Splinter Review
Patch, v1.1 10 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 14.31 KB, patch	janv : review+	Details \| Diff \| Splinter Review
Patch, v1.2 10 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 17.08 KB, patch	janv : review+	Details \| Diff \| Splinter Review
Patch, v2 10 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 7.56 KB, patch	janv : review+	Details \| Diff \| Splinter Review

Anthony Ricaud (:rik)

Reporter

Description

•

10 years ago

On my dogfooding flame, after today's update, I'm getting an empty Contacts app.

This is the error I'm seeing at launch:
W/GeckoConsole(  297): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:5510"]
W/GeckoConsole(  297): [JavaScript Error: "TypeError: aFailureCb is not a function" {file: "resource://gre/modules/ContactDB.jsm" line: 1060}]


Before that update, since the middle of last week, only ~40% of my contacts would show up (stops at a contact starting with H) with this error:
W/GeckoConsole( 287): [JavaScript Error: “IndexedDB UnknownErr: ActorsParent.cpp:5510”]
E/Communications( 3866): Content JS ERROR: Error getting first contacts

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 1

•

10 years ago

Hm, any chance you can email me your contacts database+files?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 2

•

10 years ago

That error just means that the other process crashed though, right?

Panos Astithas (he/him) [:past] (please ni?)

Comment 3

•

10 years ago

I'm not getting an empty Contacts app, but I also noticed similar errors in logcat today (only line numbers are different):

E/QCALOG  (  306): [MessageQ] ProcessNewMessage: [XT-CS] unknown deliver target [OS-Agent]
E/QCALOG  (  306): [MessageQ] ProcessNewMessage: [XTWWAN-PE] unknown deliver target [OS-Agent]
I/Gecko   (  208):
I/Gecko   (  208): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x120005,name=PBackground::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value
I/Gecko   (  208):
I/Gecko   (  208): IPDL protocol error: could not look up PBlob
I/Gecko   (  208): IPDL protocol error: Error deserializing 'PBlobParent'
I/Gecko   (  208):
I/Gecko   (  208): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x60004,name=PBackgroundIDBDatabase::Msg_PBackgroundIDBDatabaseFileConstructor) Value error: message was deserialized, but contained an illegal value
I/Gecko   (  208):
I/Gecko   (  208):
I/Gecko   (  208): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
I/Gecko   (  208):
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:7692"]
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:7692"]
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:7692"]
W/GeckoConsole(  208): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:6065"]
W/AudioFlinger(  204): Thread AudioOut_2 cannot connect to the power manager service
E/AudioFlinger(  204): no wake lock to update!
I/Gecko   ( 1672): ###################################### forms.js loaded
I/Gecko   ( 1672): ############################### browserElementPanning.js loaded
I/Gecko   ( 1672): ######################## BrowserElementChildPreload.js loaded
D/wpa_supplicant(  992): RX ctrl_iface - hexdump(len=11): 53 49 47 4e 41 4c 5f 50 4f 4c 4c
D/wpa_supplicant(  992): wlan0: Control interface command 'SIGNAL_POLL'
D/wpa_supplicant(  992): nl80211: survey data missing!
E/QCALOG  (  306): [MessageQ] ProcessNewMessage: [XTWiFi-PE] unknown deliver target [OS-Agent]

This log output was either a manifestation of bug 1079313 or one of bug 1079322, can't remember.

Anthony Ricaud (:rik)

Reporter

Comment 4

•

10 years ago

Just sent the .sqlite and .sqlite-journal files to Ben.

Julien Wajsberg [:julienw]

Comment 5

•

10 years ago

I get the same issue (W/GeckoConsole(  297): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:5510"]) when launching the SMS App from my 2.0 backup, but the consequences is that the SMS app does not load :)

so basically same thing.
I can send my db too if necessary, ben.

Francisco Jordano [:arcturus] [:francisco]

Comment 6

•

10 years ago

I think we have several bugs filled with this problem.

Gabriele did you opened a bug related to this problem too? (Cannot find it)

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 7

•

10 years ago

Yes, there's plenty of fallout from recent IndexedDB changes. The most similar is bug 1082505, :khuey suggested trying the fix from bug 1082543 but I haven't had time to do it yet.

Flags: needinfo?(gsvelto)

Comment hidden (obsolete)

Can you please test again now that bug 1079546 has landed?

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 9

•

10 years ago

Hrm. So I just looked at the database, it looks like your 'getallcache' obejctstore is empty. This may be a dupe of 1072773.

Comment hidden (obsolete)

(And please ignore comment 8)

Julien Wajsberg [:julienw]

Comment 11

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #9)
> Hrm. So I just looked at the database, it looks like your 'getallcache'
> obejctstore is empty. This may be a dupe of 1072773.

But I have a similar issue with the SMS app too. From time to time, I have the same error, and the SMS app stays blank. Same IndexedDB error in the log.
But simply restarting the app then works.

I'll try tomorrow with a newer build and I can send you the SMS DB if necessary.

Julien Wajsberg [:julienw]

Updated

•

10 years ago

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 12

•

10 years ago

Attached file boot.log — Details

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 13

•

10 years ago

I think we still have the same error in the latest build.

It prevented from booting after a full flash. (see boot.log attachment)

Then I rebooted using "adb reboot" and it correctly booted. I then flashed my own SMS and Contacts db to the phone, then rebooted (I can provide all of this if you want).

Then I launched the SMS app, the DB migration likely started (my db comes from FxOS v1.3). But it worked fine.

So I don't have the issue with the same STR right now (I tried several times to push my db). This is weird. I'll report if I see this again.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 14

•

10 years ago

Does SMS use a separate objectStore for a cache like contacts? One that it might not always include in the same transaction?

Julien Wajsberg [:julienw]

Comment 15

•

10 years ago

SMS actually uses 3 databases:
* the one in mozContacts
* the one in mozMobileMessaging
* one managed by asyncStorage to store drafts

Each database are independant, and for sure they all use different transactions when we write to them. But we use no database to cache stuff yet, they all have different responsibilities.

But as I said, I didn't reproduce the issue in a recent build in SMS :/

Julien Wajsberg [:julienw]

Comment 16

•

10 years ago

Sorry, just reproduced on latest nightly.

Julien Wajsberg [:julienw]

Updated

•

10 years ago

Summary: Contact app is empty at launch → Contacts/SMS apps are sometimes empty at launch

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 17

•

10 years ago

Bug 1087464 may fix this too.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 18

•

10 years ago

Julien, have you seen this recently?

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 19

•

10 years ago

I need to check with a newer gecko. Let me update my phones and I'll report.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 20

•

10 years ago

Thanks!

Julien Wajsberg [:julienw]

Comment 21

•

10 years ago

yeah, just got it on latest build for Buri :(

It seems to happen after a reboot only, but I get it quite consistently on this build.

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 22

•

10 years ago

"quite" consistently, not 100%. Of 3 reboots, I got the issue twice.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 23

•

10 years ago

Can you post updated logcat for it?

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 24

•

10 years ago

0-29 10:05:32.940  1222  1257 W GeckoConsole: [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:415"]
10-29 10:05:33.060  1481  1481 E Messages: Content JS ERROR: Reading the database. Error: InternalError 

This is a filtered logcat, do you want an unfiltered log?

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 25

•

10 years ago

I checked the unfiltered log, I have nothing better.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 26

•

10 years ago

Attaching gdb, breaking at http://mxr.mozilla.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#415, and printing aResultCode and the stack (bt) would help.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 27

•

10 years ago

Julien, also, if you don't could you email me the contacts db and files folder? I promise not to post it on dropbox! :)

Though if you're uncomfortable with that I can try to give you a better logging patch to make this more debuggable.

Flags: needinfo?(felash)

Julien Wajsberg [:julienw]

Comment 28

•

10 years ago

Unfortunately I don't do my own build these days so gdb is not useful.

I'm sending to you the data I use.

Flags: needinfo?(felash)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 29

•

10 years ago

Ok, I know what is going on here now. This isn't actually caused by bug 994190, we're just losing a race here.

No longer blocks: IndexedDB-on-PBackground

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 30

•

10 years ago

Attached patch Patch, v1 (obsolete) — Details — Splinter Review

Ok, this patch fixes it locally for me. I'll try to chat with the SQLite team to see if there are better ways to do this.

Assignee: nobody → bent.mozilla

Status: NEW → ASSIGNED

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 31

•

10 years ago

Comment on attachment 8517846 [details] [diff] [review]
Patch, v1

Review of attachment 8517846 [details] [diff] [review]:
-----------------------------------------------------------------

Jan, how does this look?

Attachment #8517846 - Flags: review?(Jan.Varga)

Jan Varga [:janv]

Comment 32

•

10 years ago

Comment on attachment 8517846 [details] [diff] [review]
Patch, v1

Review of attachment 8517846 [details] [diff] [review]:
-----------------------------------------------------------------

looks good - if there's no other way to fix it

Attachment #8517846 - Flags: review?(Jan.Varga) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 33

•

10 years ago

Attached patch Patch, v2 (obsolete) — Details — Splinter Review

The SQLite folks think that this will be a bit safer. Jan, how does this look?

Attachment #8517846 - Attachment is obsolete: true

Attachment #8518447 - Flags: review?(Jan.Varga)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 34

•

10 years ago

Comment on attachment 8518447 [details] [diff] [review]
Patch, v2

This one doesn't seem to work as well... Let me tweak it a bit more.

Attachment #8518447 - Flags: review?(Jan.Varga)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Updated

•

10 years ago

Attachment #8518447 - Attachment is obsolete: true

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 35

•

10 years ago

Comment on attachment 8517846 [details] [diff] [review]
Patch, v1

This one works. v2 doesn't.

Attachment #8517846 - Attachment is obsolete: false

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 36

•

10 years ago

Attached patch Patch, v1.1 (obsolete) — Details — Splinter Review

This is basically the same as v1 but simplified a bit, and with beefier comments.

Attachment #8517846 - Attachment is obsolete: true

Attachment #8519026 - Flags: review?(Jan.Varga)

Jan Varga [:janv]

Comment 37

•

10 years ago

Comment on attachment 8519026 [details] [diff] [review]
Patch, v1.1

Review of attachment 8519026 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, I wonder if we have other bugs filed which are caused by the same issue.

Attachment #8519026 - Flags: review?(Jan.Varga) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 38

•

10 years ago

Attached patch Patch, v1.2 (obsolete) — Details — Splinter Review

This is basically the same patch as v1 but we have an additional failsafe in mozStorageConnection.cpp.

Mak, I just need you to review the change to mozStorageConnection. We were being too conservative there before and not letting connections rollback to drop their locks.

Attachment #8519026 - Attachment is obsolete: true

Attachment #8519082 - Flags: review?(mak77)

Attachment #8519082 - Flags: review?(Jan.Varga)

Jan Varga [:janv]

Updated

•

10 years ago

Attachment #8519082 - Flags: review?(Jan.Varga) → review+

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 39

•

10 years ago

Comment on attachment 8519082 [details] [diff] [review]
Patch, v1.2

Review of attachment 8519082 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/mozStorageConnection.cpp
@@ +977,5 @@
>  
>      srv = WaitForUnlockNotify(aNativeConnection);
> +    if (MOZ_UNLIKELY(srv != SQLITE_OK)) {
> +      NS_WARNING("WaitForUnlockNotify detected deadlock, yielding to let "
> +                 "another connection drop its locks!");

is it possible a few yields won't be enough to exit the deadlock and we will end up looping here indefinitely or for a very long time?

I was thinking maybe we should still check a time diff here, and after a meaningful amount of time (a few seconds?) MOZ_ASSERT and still break out of the loop.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 40

•

10 years ago

I'm not sure how I feel about a timeout like that... It just depends on how fast a machine's cpu and disk are. Sometimes you'll just get unlucky and get an error, but it might be rare so writing code that handles that error will be challenging. Also, the sqlite3_unlock_notify() that we call today waits forever with no timeout, right?

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 41

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #40)
> be challenging. Also, the sqlite3_unlock_notify() that we call today waits
> forever with no timeout, right?

if sqlite3_unlock_notify detects a deadlock it returns SQLITE_LOCKED but we wait the unlock notification only if it returns SQLITE_OK.
Afaict your code is handling the SQLITE_LOCKED (or other error) case in which we were unable to install a notification callback at all, most likely because Sqlite detected a deadlock across 2 connections. That deadlock wouldn't be solved just by waiting the shared lock an so Sqlite doesn't even try to install the callback.

This is how I interpreted the Deadlock Detection section at https://www.sqlite.org/c3ref/unlock_notify.html and our implementation of WaitForUnlockNotify.

So the problem I see there, is that at one of the next loops, the deadlock could be solved by one of the 2 connections, but that's not mandatory, it could be a real deadlock that will never solve.
The previous break was then just giving up and returning an error, to avoid entering the deadlock, while with your change we could be trapped indefinitely in the loop.

Did I misinterpreted something about your proposal?

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 42

•

10 years ago

(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #41)
> This is how I interpreted the Deadlock Detection

Yep, that is correct.

> Did I misinterpreted something about your proposal?

Nope, I think you nailed the mozStorage part. The only thing you're missing is that we explicitly ROLLBACK the *other* connection in indexedDB (see the ActorsParent.cpp changes in the patch) when this happens so that we're guaranteed to never deadlock forever. Of course, that is only true for indexedDB connections, but as far as I know the only time we ever use the unlock_notify mechanism is with indexedDB anyway.

However, in the meantime, I have figured out another way around this problem that won't require any mozStorage changes, so I don't think we have to worry about this at all anymore. Sorry!

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 43

•

10 years ago

Attached patch Patch, v2 — Details — Splinter Review

Nuke the table locks from orbit.

Attachment #8519082 - Attachment is obsolete: true

Attachment #8519082 - Flags: review?(mak77)

Attachment #8520703 - Flags: review?(Jan.Varga)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 44

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #42)
> Nope, I think you nailed the mozStorage part. The only thing you're missing
> is that we explicitly ROLLBACK the *other* connection in indexedDB (see the
> ActorsParent.cpp changes in the patch) when this happens so that we're
> guaranteed to never deadlock forever. Of course, that is only true for
> indexedDB connections, but as far as I know the only time we ever use the
> unlock_notify mechanism is with indexedDB anyway.

Yes my point was not specifically about indexedDB, rather about other consumers that could cause trouble to Firefox.  It's true the possibilities to hit this kind of deadlock are low, but the add-ons ecosystem is very unpredictable, so I would have preferred to not risk the possibility of an infinite loop here.

Jan Varga [:janv]

Comment 45

•

10 years ago

Comment on attachment 8520703 [details] [diff] [review]
Patch, v2

Review of attachment 8520703 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +2204,5 @@
> +    // ourselves and we know we will never allow a write transaction to modify
> +    // an object store that a read transaction is in the process of using.
> +    "PRAGMA read_uncommitted = TRUE;"
> +    // No more PRAGMAs.
> +    ;

This looks so scary and awesome at the same time :)

Attachment #8520703 - Flags: review?(Jan.Varga) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 46

•

10 years ago

For those following along, this fix works fine, but exposes us to another SQLite bug that the SQLite team is testing a fix for right now. Once that fix is available we can land both.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Updated

•

10 years ago

Depends on: SQLite3.8.7.2

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 47

•

10 years ago

Ok, we have the new SQLite version ready in bug 1101217.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 48

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/577bc6a54c1d

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 49

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/577bc6a54c1d

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla36

Julien Wajsberg [:julienw]

Comment 50

•

10 years ago

\o/ thanks

You need to log in before you can comment on or make changes to this bug.