Description: Loading 250+ call logs will be stays at 60% until fully loading. Repro Steps: 1) Updated Buri to BuildID: 20140127004002 2) Run makeReferenceworkload.sh with "x-heavy" flag 3) Select the 'Dialer' app 4) Select the 'Call logs' Actual:. The loading bar incorrectly displayed percentage. Expected: The percentage is displayed correcty when loads. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20140121004137 Gaia: 47049555282a9a01fb60d1e1421b57e2810c96f5 Gecko: 6f7dfe36ab6c Version: 28.0a2 Firmware Version: V1.2-device.cfg Keywords: Loading, percentage, call logs, contacts Notes: Repro frequency: 100% See attached:
Does this reproduce on 1.1? Note - I also confirmed reproduction of this with the medium reference workload.
This issue does not reproduce on v1.1. The progress bar properly loads from 0% to 100%, and does not appear to hang or pause at any point. Environmental Variables Device: Buri v1.1 Moz RIL Build ID: 20140102041202 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/bdac595a4e46 Gaia: 6ff3a607f873320d00cb036fa76117f6fadd010f Platform Version: 18.1 Firmware Version: v1.2-device.cfg
.:First Broken Build:. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20131019040204 Gaia: d0861c2913a57cb681ec4cd4d52d73cc0f479ac9 Gecko: e25e62d174ed Version: 27.0a1 Firmware Version: v1.2-device.cfg .:Last Working Build:. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20131018040217 Gaia: 18c2ab4b7b2d4cacee90fb85e97354142b2fad4a Gecko: 855da6d8a327 Version: 27.0a1 Firmware Version: v1.2-device.cfg
Etienne, do you mind looking at it? thanks
I've seen this happen too, it's likely caused by one of the steps in the database upgrade procedure.
(In reply to Joe Cheng [:jcheng] from comment #5) > Etienne, do you mind looking at it? thanks It's a database migration that takes time, not something we regressed on. We could probably rework the migration code to update the progress more frequently, but I don't think changing indexedDB migration code at this point will get approve for 1.3. Fabrice, what do you think?
(In reply to Etienne Segonzac (:etienne) from comment #7) > (In reply to Joe Cheng [:jcheng] from comment #5) > > Etienne, do you mind looking at it? thanks > > It's a database migration that takes time, not something we regressed on. No - the problem here isn't the DB migration. The problem is that progress bar is communicating a lie here - the progress goes quickly to 60%, hangs there for 10 minutes, and then continues on. When I showed this to Chris in person, we thought the Dialer app was hung. This is a confirmed regression - comment 4 clearly calls out a regression range here. We should find out what regressed this. > > We could probably rework the migration code to update the progress more > frequently, but I don't think changing indexedDB migration code at this > point will get approve for 1.3. > > Fabrice, what do you think? This is going in an incorrect direction. The problem here is with perceived progress, not the progress of the DB migration itself. There's no way we're shipping with a 10 minute hang in the dialer app - people are going to think the app is hung & never manage to get their call log imported, as they won't know to wait that long.
I also thought that my call log was lost when I went to upgrade it. Maybe some fix like we recently did with contacts is in order here?
This lengthy migration was introduced by bug 862385 (leo+) in July. The way progress is displayed hasn't changed since. The x-heavy reference workload is more than twice the biggest "production" database possible (see bug 829399, we don't let the call log grow forever). So yes, we should definitely wait for Fabrice's feedback before rushing an indexeddb migration code change.
(In reply to Etienne Segonzac (:etienne) from comment #10) > This lengthy migration was introduced by bug 862385 (leo+) in July. The way > progress is displayed hasn't changed since. That's not right. The regression range points at October, not July. > > The x-heavy reference workload is more than twice the biggest "production" > database possible (see bug 829399, we don't let the call log grow forever). This problem reproduces with the medium reference workload as well (100 call logs), although the timing isn't 10 minutes (I think it was around 5 minutes). It's a general problem with any migration, but the problem gets worse with more call logs. > > So yes, we should definitely wait for Fabrice's feedback before rushing an > indexeddb migration code change. We aren't unblocking this. We are made a final decision in triage on this, as partners are not going to let us ship with a perceived hang.
The above testing in comment 1 also showcases this working fine on 1.1. Like I said, I don't think you are going in right direction here.
There's only two Gaia commits in the regression range: 1. https://github.com/mozilla-b2g/gaia/commit/e6b176ab1101577f80604aca0c3b3c3b58f74b03 2. https://github.com/mozilla-b2g/gaia/commit/669d29644d3808039ed57ca5d6a1f90f8599ecb0 Both changes don't touch the call log. Maybe it's a gecko regression?
Can we verify this isn't a gaia regression? We can do that by doing the following: 1. Test this on a 10/18 Gaia, 10/19 Gecko 2. Test this on a 10/19 Gaia, 10/18 Gecko
I can reproduce it on Peak 1.3 with Build Id 20140211103226, Platform 28.0 and Gaia from ~Thursday last week.
Reproducible on: - Gaia https://github.com/mozilla-b2g/gaia/commit/669d29644d3808039ed57ca5d6a1f90f8599ecb0 on build 20140211103226 (10/04) - Gaia https://github.com/mozilla-b2g/gaia/commit/e6b176ab1101577f80604aca0c3b3c3b58f74b03 on build 20140211103226 (10/16) So far, it seems more likely that it is a Gecko regression.
If you need help profiling this and figuring out why it's slow just NI me.
Well the following commend in the code might shed some light on the issue:) // We send 'upgradeprogress' events from two different places: // 1- For each 15% of the recentsStore cursor requests completed. // We increment the progress by 10%, leaving the total amount // in 60%. // 2- For each 25% of the Contacts API requests completed. We // also increment the progress by 10%, leaving the total // amount in 90%. // The final progress 100% event is avoided as we send a 'upgradedone' // event instead. Hopefully getting somwehere.
After more debugging, I found out where does it stop, but I still don't know the cause. Moreover, with the latest Gaia v1.3 branch on latest Peak 1.3 build, still gets stuck for a while on 60% but then it continues in increments of 10% until 90%. I need somebody that can confirm whether that's also the case in Buri as well with latest 1.3 Gaia/Gecko. I think in any case there is a performance problem retrieving Contacts, since it gets stuck just at the point where the recentsStore is properly retrieved and we are waiting for the Contacts API requests to finish. The relevant code to all this in Gaia is in the onsuccess call in the following line: https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/dialer/js/call_log_db.js#L336 At this point I need more pointers to go on on where to look.
Sounds like this is a contacts API bug then. Over to DOM: Contacts.
That would still be nice to get the gecko regression range bisected...
(In reply to Fabrice Desré [:fabrice] from comment #23) > That would still be nice to get the gecko regression range bisected... I'm going to put money down that this was caused by bug 850430 - the infamous WebIDL work for the contacts API.
This is just the upgrade for 2000 contacts taking a long time. We upgrade the entire database before responding to any requests, so the Dialer doesn't update the progress bar.
FWIW, we start upgrading the Contacts DB at 0%.
Bug 951806 helps a lot here, I'm inclined to mark this one as dupe. We could also slowly tick the update progress from 60% to 69% (since 90%->100% is not used) so the user doesn't think it's hanging.
(In reply to Reuben Morais [:reuben] from comment #27) > Bug 951806 helps a lot here, I'm inclined to mark this one as dupe. We could > also slowly tick the update progress from 60% to 69% (since 90%->100% is not > used) so the user doesn't think it's hanging. Let's retest this on the latest 1.3 build then to see how we perform with this patch.
IIRC we don't do any update of the Contacts DB. We just query the Contacts DB to build the call log contacts cache in the call log DB. Also, IIRC again, we did this progress bar hack on bug 862385 cause we were at a string frozen stage in 1.1 and we couldn't provide a better UX at that point . But now, we should be able to do it better. My proposal for this is to simply remove the progress bar and show a loading spinner with a "Call log optimization in progress" or whatever message. That way we can remove the percentage logic and provide a better UX.  https://bugzilla.mozilla.org/show_bug.cgi?id=862385#c31
Etienne, would you be ok with the spinner solution?
Having seen the considerable amount of logic that goes into handling the (fake) percentage, I'm all for the spinner solution, makes much more sense.
clear N? this bug is actively worked on
Guys - please stay focused on the problem here. This IS a confirmed regression with a clear regression range. There's clear data that proves this. Please assess what caused this regression & derive the solution from that.
Update on qawanted: 1. Test to see if this reproduces on last working gaia/first broken gecko & first broken gaia/last working gecko 2. Test this on the latest 1.3
I just retested on fresh v1.3 with reference-workload-x-heavy. It sat at 60% for almost 5 minutes and then slow went from 70% to 100% over the next minute. It then displayed the list correctly. So it did not immediately go from 60% to 100%. It did stay a long time at 60%, though. Given how most progress bars these days do not do a linear extrapolation (due to perceived performance issues), this seems like a WFM. Just my two cents.
(In reply to Jason Smith [:jsmith] from comment #34) > Update on qawanted: > > 1. Test to see if this reproduces on last working gaia/first broken gecko & > first broken gaia/last working gecko This issue reproduces when using the last working gaia(10/18)/first broken gecko(10/19). The variables for that build is: Gaia d0861c2913a57cb681ec4cd4d52d73cc0f479ac9 SourceStamp 855da6d8a327 BuildID 20131018040217 Version 27.0a1 This does *not* reproduce when using the first broken gaia(10/19)/last working gecko(10/18). The percentage progresses past 60% at a consistent speed and finishes as expected. > 2. Test this on the latest 1.3 This issue reproduces on the 02/19/14 1.3 build. Gaia a43904d9646109b48836d62f7aa51e499fbf4b2e SourceStamp 97922b6daad1 BuildID 20140219004003 Version 28.0
The above comment gives hard evidence that this is a gecko bug. Back over to DOM: Contacts & re-adding dependency, as there's strong evidence that's keeps proving that the WebIDL work caused this regression.
Can someone clarify what the behavior is on the last working build?
I assume Ben's comment 35 was using a recent (ie. containing bug 951806) build? I guess if you squint hard enough this is a regression from bug 850430. The behavior of the progress bar has always been broken (as far as this bug is concerned), the difference is that bug 850430 landed a few database versions, so the upgrade now takes longer to complete. The action here is to land the spinner, unless there's a compelling argument that we need to give the user a very accurate (more than it is today) indication of progress.
Created attachment 8378591 [details] Video of loading call log on last working build. (In reply to Jason Smith [:jsmith] from comment #38) > Can someone clarify what the behavior is on the last working build? When loading the large call log, the progress bar is displayed and fills up gradually until the whole log is loaded. The progress bar and percentage do not stop at 60%, but instead displays the progress at 10% increments. Once it's loaded, the call log is displayed correctly. Attached is a video showing the behavior of the last working build. <Working_LoadingLargeCallLog.mpeg>
I retested this with a medium reference workload on the latest 1.4 build (500 contacts), since the consensus on the bug indicates that using x-heavy is too much. What I found was that I saw the 60% stopping before like what was seen above, but it only lasted for a minute or two. To a user, this does feel a bit annoying with the lack of progress, but it's certainly improvement with the landing the upgrade perf bug.
(In reply to Jason Smith [:jsmith] from comment #41) > I retested this with a medium reference workload on the latest 1.4 build > (500 contacts), since the consensus on the bug indicates that using x-heavy > is too much. What I found was that I saw the 60% stopping before like what > was seen above, but it only lasted for a minute or two. To a user, this does > feel a bit annoying with the lack of progress, but it's certainly > improvement with the landing the upgrade perf bug. On that regard, I'm going call this a dupe of bug 951806. There's a separate bug tracking the implementation work for fixing the UX here.