966566 - [B2G][Dialer] Loading stays at 60% in the call log loading all past calls for an extended period (around 10 minutes)

Reporter

Description

•

11 years ago

Description: Loading 250+ call logs will be stays at 60% until fully loading. Repro Steps: 1) Updated Buri to BuildID: 20140127004002 2) Run makeReferenceworkload.sh with "x-heavy" flag 3) Select the 'Dialer' app 4) Select the 'Call logs' Actual:. The loading bar incorrectly displayed percentage. Expected: The percentage is displayed correcty when loads. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20140121004137 Gaia: 47049555282a9a01fb60d1e1421b57e2810c96f5 Gecko: 6f7dfe36ab6c Version: 28.0a2 Firmware Version: V1.2-device.cfg Keywords: Loading, percentage, call logs, contacts Notes: Repro frequency: 100% See attached:

Jason Smith [:jsmith]

Comment 1

•

11 years ago

Does this reproduce on 1.1? Note - I also confirmed reproduction of this with the medium reference workload.

Keywords: perf, qawanted

Jason Smith [:jsmith]

Updated

•

11 years ago

Summary: [B2G][Dialer] Loading stays at 60% while loading for an extended period (around 10 minutes) → [B2G][Dialer] Loading stays at 60% in the call log loading all past calls while loading for an extended period (around 10 minutes)

Jason Smith [:jsmith]

Updated

•

11 years ago

Summary: [B2G][Dialer] Loading stays at 60% in the call log loading all past calls while loading for an extended period (around 10 minutes) → [B2G][Dialer] Loading stays at 60% in the call log loading all past calls for an extended period (around 10 minutes)

Mike Lee [:mlee]

Updated

•

11 years ago

Priority: -- → P3

Whiteboard: dogfood1.3 → [c=progress p= s= u=] dogfood1.3

Ross Kuhlman (rkuhlman@qanalydocs.com)

Comment 2

•

11 years ago

This issue does not reproduce on v1.1. The progress bar properly loads from 0% to 100%, and does not appear to hang or pause at any point. Environmental Variables Device: Buri v1.1 Moz RIL Build ID: 20140102041202 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/bdac595a4e46 Gaia: 6ff3a607f873320d00cb036fa76117f6fadd010f Platform Version: 18.1 Firmware Version: v1.2-device.cfg

Keywords: qawanted

Jason Smith [:jsmith]

Updated

•

11 years ago

blocking-b2g: --- → 1.3?

Keywords: regression, regressionwindow-wanted

Preeti Raghunath(:Preeti)

Comment 3

•

11 years ago

Please review

blocking-b2g: 1.3? → 1.3+

Flags: needinfo?(dscravaglieri)

gbennett

Comment 4

•

11 years ago

.:First Broken Build:. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20131019040204 Gaia: d0861c2913a57cb681ec4cd4d52d73cc0f479ac9 Gecko: e25e62d174ed Version: 27.0a1 Firmware Version: v1.2-device.cfg .:Last Working Build:. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20131018040217 Gaia: 18c2ab4b7b2d4cacee90fb85e97354142b2fad4a Gecko: 855da6d8a327 Version: 27.0a1 Firmware Version: v1.2-device.cfg

Keywords: regressionwindow-wanted

Joe Cheng [:jcheng] (please needinfo)

Comment 5

•

11 years ago

Etienne, do you mind looking at it? thanks

Flags: needinfo?(etienne)

Gabriele Svelto [:gsvelto]

Comment 6

•

11 years ago

I've seen this happen too, it's likely caused by one of the steps in the database upgrade procedure.

Etienne Segonzac (:etienne)

Comment 7

•

11 years ago

(In reply to Joe Cheng [:jcheng] from comment #5) > Etienne, do you mind looking at it? thanks It's a database migration that takes time, not something we regressed on. We could probably rework the migration code to update the progress more frequently, but I don't think changing indexedDB migration code at this point will get approve for 1.3. Fabrice, what do you think?

Flags: needinfo?(etienne) → needinfo?(fabrice)

Mike Lee [:mlee]

Updated

•

11 years ago

Whiteboard: [c=progress p= s= u=] dogfood1.3 → [c=progress p= s= u=1.3] dogfood1.3

Jason Smith [:jsmith]

Comment 8

•

11 years ago

(In reply to Etienne Segonzac (:etienne) from comment #7) > (In reply to Joe Cheng [:jcheng] from comment #5) > > Etienne, do you mind looking at it? thanks > > It's a database migration that takes time, not something we regressed on. No - the problem here isn't the DB migration. The problem is that progress bar is communicating a lie here - the progress goes quickly to 60%, hangs there for 10 minutes, and then continues on. When I showed this to Chris in person, we thought the Dialer app was hung. This is a confirmed regression - comment 4 clearly calls out a regression range here. We should find out what regressed this. > > We could probably rework the migration code to update the progress more > frequently, but I don't think changing indexedDB migration code at this > point will get approve for 1.3. > > Fabrice, what do you think? This is going in an incorrect direction. The problem here is with perceived progress, not the progress of the DB migration itself. There's no way we're shipping with a 10 minute hang in the dialer app - people are going to think the app is hung & never manage to get their call log imported, as they won't know to wait that long.

Kevin Grandon :kgrandon

Comment 9

•

11 years ago

I also thought that my call log was lost when I went to upgrade it. Maybe some fix like we recently did with contacts is in order here?

Etienne Segonzac (:etienne)

Comment 10

•

11 years ago

This lengthy migration was introduced by bug 862385 (leo+) in July. The way progress is displayed hasn't changed since. The x-heavy reference workload is more than twice the biggest "production" database possible (see bug 829399, we don't let the call log grow forever). So yes, we should definitely wait for Fabrice's feedback before rushing an indexeddb migration code change.

Jason Smith [:jsmith]

Comment 11

•

11 years ago

(In reply to Etienne Segonzac (:etienne) from comment #10) > This lengthy migration was introduced by bug 862385 (leo+) in July. The way > progress is displayed hasn't changed since. That's not right. The regression range points at October, not July. > > The x-heavy reference workload is more than twice the biggest "production" > database possible (see bug 829399, we don't let the call log grow forever). This problem reproduces with the medium reference workload as well (100 call logs), although the timing isn't 10 minutes (I think it was around 5 minutes). It's a general problem with any migration, but the problem gets worse with more call logs. > > So yes, we should definitely wait for Fabrice's feedback before rushing an > indexeddb migration code change. We aren't unblocking this. We are made a final decision in triage on this, as partners are not going to let us ship with a perceived hang.

Jason Smith [:jsmith]

Comment 12

•

11 years ago

The above testing in comment 1 also showcases this working fine on 1.1. Like I said, I don't think you are going in right direction here.

Jason Smith [:jsmith]

Comment 13

•

11 years ago

https://github.com/mozilla-b2g/gaia/compare/18c2ab4b7b2d4cacee90fb85e97354142b2fad4a...d0861c2913a57cb681ec4cd4d52d73cc0f479ac9

Jason Smith [:jsmith]

Comment 14

•

11 years ago

There's only two Gaia commits in the regression range: 1. https://github.com/mozilla-b2g/gaia/commit/e6b176ab1101577f80604aca0c3b3c3b58f74b03 2. https://github.com/mozilla-b2g/gaia/commit/669d29644d3808039ed57ca5d6a1f90f8599ecb0 Both changes don't touch the call log. Maybe it's a gecko regression?

Jason Smith [:jsmith]

Comment 15

•

11 years ago

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=855da6d8a327&tochange=e25e62d174ed

Jason Smith [:jsmith]

Comment 16

•

11 years ago

Can we verify this isn't a gaia regression? We can do that by doing the following: 1. Test this on a 10/18 Gaia, 10/19 Gecko 2. Test this on a 10/19 Gaia, 10/18 Gecko

Keywords: qawanted

Sergi Mansilla [:sergi] (Telenor) [PTO]

Updated

•

11 years ago

Assignee: nobody → sergi.mansilla

Sergi Mansilla [:sergi] (Telenor) [PTO]

Comment 17

•

11 years ago

I can reproduce it on Peak 1.3 with Build Id 20140211103226, Platform 28.0 and Gaia from ~Thursday last week.

Sergi Mansilla [:sergi] (Telenor) [PTO]

Comment 18

•

11 years ago

Reproducible on: - Gaia https://github.com/mozilla-b2g/gaia/commit/669d29644d3808039ed57ca5d6a1f90f8599ecb0 on build 20140211103226 (10/04) - Gaia https://github.com/mozilla-b2g/gaia/commit/e6b176ab1101577f80604aca0c3b3c3b58f74b03 on build 20140211103226 (10/16) So far, it seems more likely that it is a Gecko regression.

Gabriele Svelto [:gsvelto]

Comment 19

•

11 years ago

If you need help profiling this and figuring out why it's slow just NI me.

Sergi Mansilla [:sergi] (Telenor) [PTO]

Comment 20

•

11 years ago

Well the following commend in the code might shed some light on the issue:) // We send 'upgradeprogress' events from two different places: // 1- For each 15% of the recentsStore cursor requests completed. // We increment the progress by 10%, leaving the total amount // in 60%. // 2- For each 25% of the Contacts API requests completed. We // also increment the progress by 10%, leaving the total // amount in 90%. // The final progress 100% event is avoided as we send a 'upgradedone' // event instead. Hopefully getting somwehere.

Sergi Mansilla [:sergi] (Telenor) [PTO]

Comment 21

•

11 years ago

After more debugging, I found out where does it stop, but I still don't know the cause. Moreover, with the latest Gaia v1.3 branch on latest Peak 1.3 build, still gets stuck for a while on 60% but then it continues in increments of 10% until 90%. I need somebody that can confirm whether that's also the case in Buri as well with latest 1.3 Gaia/Gecko. I think in any case there is a performance problem retrieving Contacts, since it gets stuck just at the point where the recentsStore is properly retrieved and we are waiting for the Contacts API requests to finish. The relevant code to all this in Gaia is in the onsuccess call in the following line: https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/dialer/js/call_log_db.js#L336 At this point I need more pointers to go on on where to look.

Flags: needinfo?(gsvelto)

Jason Smith [:jsmith]

Comment 22

•

11 years ago

Sounds like this is a contacts API bug then. Over to DOM: Contacts.

Component: Gaia::Dialer → DOM: Contacts

Keywords: qawanted

Product: Firefox OS → Core

Version: unspecified → 28 Branch

[:fabrice] Fabrice Desré

Comment 23

•

11 years ago

That would still be nice to get the gecko regression range bisected...

Flags: needinfo?(fabrice)

Jason Smith [:jsmith]

Comment 24

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #23) > That would still be nice to get the gecko regression range bisected... I'm going to put money down that this was caused by bug 850430 - the infamous WebIDL work for the contacts API.

Blocks: 850430

Reuben Morais [:reuben]

Comment 25

•

11 years ago

This is just the upgrade for 2000 contacts taking a long time. We upgrade the entire database before responding to any requests, so the Dialer doesn't update the progress bar.

No longer blocks: 850430

Reuben Morais [:reuben]

Comment 26

•

11 years ago

FWIW, we start upgrading the Contacts DB at 0%.

Reuben Morais [:reuben]

Comment 27

•

11 years ago

Bug 951806 helps a lot here, I'm inclined to mark this one as dupe. We could also slowly tick the update progress from 60% to 69% (since 90%->100% is not used) so the user doesn't think it's hanging.

Jason Smith [:jsmith]

Comment 28

•

11 years ago

(In reply to Reuben Morais [:reuben] from comment #27) > Bug 951806 helps a lot here, I'm inclined to mark this one as dupe. We could > also slowly tick the update progress from 60% to 69% (since 90%->100% is not > used) so the user doesn't think it's hanging. Let's retest this on the latest 1.3 build then to see how we perform with this patch.

Keywords: qawanted

Sergi Mansilla [:sergi] (Telenor) [PTO]

Updated

•

11 years ago

Assignee: sergi.mansilla → nobody

Fernando Jiménez Moreno [:ferjm]

Comment 29

•

11 years ago

IIRC we don't do any update of the Contacts DB. We just query the Contacts DB to build the call log contacts cache in the call log DB. Also, IIRC again, we did this progress bar hack on bug 862385 cause we were at a string frozen stage in 1.1 and we couldn't provide a better UX at that point [1]. But now, we should be able to do it better. My proposal for this is to simply remove the progress bar and show a loading spinner with a "Call log optimization in progress" or whatever message. That way we can remove the percentage logic and provide a better UX. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=862385#c31

Fernando Jiménez Moreno [:ferjm]

Comment 30

•

11 years ago

Etienne, would you be ok with the spinner solution?

Flags: needinfo?(etienne)

Sergi Mansilla [:sergi] (Telenor) [PTO]

Comment 31

•

11 years ago

Having seen the considerable amount of logic that goes into handling the (fake) percentage, I'm all for the spinner solution, makes much more sense.

Fernando Jiménez Moreno [:ferjm]

Updated

•

11 years ago

Component: DOM: Contacts → Gaia::Dialer

Product: Core → Firefox OS

Version: 28 Branch → unspecified

Fernando Jiménez Moreno [:ferjm]

Updated

•

11 years ago

Keywords: regression

David Scravaglieri [:scravag]

Comment 32

•

11 years ago

clear N? this bug is actively worked on

Flags: needinfo?(dscravaglieri)

sarsenyev

Updated

•

11 years ago

QA Contact: sarsenyev

Jason Smith [:jsmith]

Comment 33

•

11 years ago

Guys - please stay focused on the problem here. This IS a confirmed regression with a clear regression range. There's clear data that proves this. Please assess what caused this regression & derive the solution from that.

Keywords: regression

Jason Smith [:jsmith]

Comment 34

•

11 years ago

Update on qawanted: 1. Test to see if this reproduces on last working gaia/first broken gecko & first broken gaia/last working gecko 2. Test this on the latest 1.3

Jon Hylands [:jhylands]

Updated

•

11 years ago

Priority: P3 → P1

Ben Kelly [:bkelly, not reviewing]

Comment 35

•

11 years ago

I just retested on fresh v1.3 with reference-workload-x-heavy. It sat at 60% for almost 5 minutes and then slow went from 70% to 100% over the next minute. It then displayed the list correctly. So it did not immediately go from 60% to 100%. It did stay a long time at 60%, though. Given how most progress bars these days do not do a linear extrapolation (due to perceived performance issues), this seems like a WFM. Just my two cents.

Matthew Vaughan

Comment 36

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #34) > Update on qawanted: > > 1. Test to see if this reproduces on last working gaia/first broken gecko & > first broken gaia/last working gecko This issue reproduces when using the last working gaia(10/18)/first broken gecko(10/19). The variables for that build is: Gaia d0861c2913a57cb681ec4cd4d52d73cc0f479ac9 SourceStamp 855da6d8a327 BuildID 20131018040217 Version 27.0a1 This does *not* reproduce when using the first broken gaia(10/19)/last working gecko(10/18). The percentage progresses past 60% at a consistent speed and finishes as expected. > 2. Test this on the latest 1.3 This issue reproduces on the 02/19/14 1.3 build. Gaia a43904d9646109b48836d62f7aa51e499fbf4b2e SourceStamp 97922b6daad1 BuildID 20140219004003 Version 28.0

Keywords: qawanted

QA Contact: sarsenyev → mvaughan

Jason Smith [:jsmith]

Comment 37

•

11 years ago

The above comment gives hard evidence that this is a gecko bug. Back over to DOM: Contacts & re-adding dependency, as there's strong evidence that's keeps proving that the WebIDL work caused this regression.

Blocks: 850430

Component: Gaia::Dialer → DOM: Contacts

Product: Firefox OS → Core

Version: unspecified → 28 Branch

Jason Smith [:jsmith]

Comment 38

•

11 years ago

Can someone clarify what the behavior is on the last working build?

Keywords: qawanted

Reuben Morais [:reuben]

Comment 39

•

11 years ago

I assume Ben's comment 35 was using a recent (ie. containing bug 951806) build? I guess if you squint hard enough this is a regression from bug 850430. The behavior of the progress bar has always been broken (as far as this bug is concerned), the difference is that bug 850430 landed a few database versions, so the upgrade now takes longer to complete. The action here is to land the spinner, unless there's a compelling argument that we need to give the user a very accurate (more than it is today) indication of progress.

Component: DOM: Contacts → Gaia::Dialer

Product: Core → Firefox OS

Version: 28 Branch → unspecified

Matthew Vaughan

Comment 40

•

11 years ago

Attached video Video of loading call log on last working build. — Details

(In reply to Jason Smith [:jsmith] from comment #38) > Can someone clarify what the behavior is on the last working build? When loading the large call log, the progress bar is displayed and fills up gradually until the whole log is loaded. The progress bar and percentage do not stop at 60%, but instead displays the progress at 10% increments. Once it's loaded, the call log is displayed correctly. Attached is a video showing the behavior of the last working build. <Working_LoadingLargeCallLog.mpeg>

Matthew Vaughan

Updated

•

11 years ago

Keywords: qawanted

Jason Smith [:jsmith]

Comment 41

•

11 years ago

I retested this with a medium reference workload on the latest 1.4 build (500 contacts), since the consensus on the bug indicates that using x-heavy is too much. What I found was that I saw the 60% stopping before like what was seen above, but it only lasted for a minute or two. To a user, this does feel a bit annoying with the lack of progress, but it's certainly improvement with the landing the upgrade perf bug.

Jason Smith [:jsmith]

Comment 42

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #41) > I retested this with a medium reference workload on the latest 1.4 build > (500 contacts), since the consensus on the bug indicates that using x-heavy > is too much. What I found was that I saw the 60% stopping before like what > was seen above, but it only lasted for a minute or two. To a user, this does > feel a bit annoying with the lack of progress, but it's certainly > improvement with the landing the upgrade perf bug. On that regard, I'm going call this a dupe of bug 951806. There's a separate bug tracking the implementation work for fixing the UX here.

No longer blocks: 850430

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: needinfo?(gsvelto)

Flags: needinfo?(etienne)

Resolution: --- → DUPLICATE

Mike Lee [:mlee]

Updated

•

11 years ago

Whiteboard: [c=progress p= s= u=1.3] dogfood1.3 → [c=progress p= s=2014.02.28 u=1.3] dogfood1.3

Target Milestone: --- → 1.4 S2 (28feb)