[B2G][Dialer] Loading stays at 60% in the call log loading all past calls for an extended period (around 10 minutes)

RESOLVED DUPLICATE of bug 951806

Status

Firefox OS
Gaia::Dialer
P1
normal
RESOLVED DUPLICATE of bug 951806
4 years ago
4 years ago

People

(Reporter: Josh Schmitt [Joshs], Unassigned)

Tracking

({perf, regression})

unspecified
1.4 S2 (28feb)
ARM
Gonk (Firefox OS)
perf, regression

Firefox Tracking Flags

(blocking-b2g:1.3+, b2g-v1.3 affected)

Details

(Whiteboard: [c=progress p= s=2014.02.28 u=1.3] dogfood1.3)

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
Description:
Loading 250+ call logs will be stays at 60% until fully loading.

Repro Steps:
1) Updated Buri to BuildID: 20140127004002
2) Run makeReferenceworkload.sh with "x-heavy" flag
3) Select the 'Dialer' app
4) Select the 'Call logs'

Actual:.
The loading bar incorrectly displayed percentage.

Expected:
The percentage is displayed correcty when loads.

Environmental Variables:
Device: Buri 1.3 MOZ 
BuildID: 20140121004137
Gaia: 47049555282a9a01fb60d1e1421b57e2810c96f5
Gecko: 6f7dfe36ab6c
Version: 28.0a2
Firmware Version: V1.2-device.cfg
Keywords: Loading, percentage, call logs, contacts

Notes:
Repro frequency: 100%
See attached:
Does this reproduce on 1.1?

Note - I also confirmed reproduction of this with the medium reference workload.
Keywords: perf, qawanted

Updated

4 years ago
Summary: [B2G][Dialer] Loading stays at 60% while loading for an extended period (around 10 minutes) → [B2G][Dialer] Loading stays at 60% in the call log loading all past calls while loading for an extended period (around 10 minutes)

Updated

4 years ago
Summary: [B2G][Dialer] Loading stays at 60% in the call log loading all past calls while loading for an extended period (around 10 minutes) → [B2G][Dialer] Loading stays at 60% in the call log loading all past calls for an extended period (around 10 minutes)

Updated

4 years ago
Priority: -- → P3
Whiteboard: dogfood1.3 → [c=progress p= s= u=] dogfood1.3
This issue does not reproduce on v1.1. The progress bar properly loads from 0% to 100%, and does not appear to hang or pause at any point.

Environmental Variables
Device: Buri v1.1 Moz RIL
Build ID: 20140102041202
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/bdac595a4e46
Gaia: 6ff3a607f873320d00cb036fa76117f6fadd010f
Platform Version: 18.1 
Firmware Version: v1.2-device.cfg
Keywords: qawanted

Updated

4 years ago
blocking-b2g: --- → 1.3?
Keywords: regression, regressionwindow-wanted
Please review
blocking-b2g: 1.3? → 1.3+
Flags: needinfo?(dscravaglieri)

Comment 4

4 years ago
.:First Broken Build:.
Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20131019040204
Gaia: d0861c2913a57cb681ec4cd4d52d73cc0f479ac9
Gecko: e25e62d174ed
Version: 27.0a1
Firmware Version: v1.2-device.cfg

.:Last Working Build:.
Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20131018040217
Gaia: 18c2ab4b7b2d4cacee90fb85e97354142b2fad4a
Gecko: 855da6d8a327
Version: 27.0a1
Firmware Version: v1.2-device.cfg
Keywords: regressionwindow-wanted
Etienne, do you mind looking at it? thanks
Flags: needinfo?(etienne)
I've seen this happen too, it's likely caused by one of the steps in the database upgrade procedure.
(In reply to Joe Cheng [:jcheng] from comment #5)
> Etienne, do you mind looking at it? thanks

It's a database migration that takes time, not something we regressed on.

We could probably rework the migration code to update the progress  more frequently, but I don't think changing indexedDB migration code at this point will get approve for 1.3.

Fabrice, what do you think?
Flags: needinfo?(etienne) → needinfo?(fabrice)

Updated

4 years ago
Whiteboard: [c=progress p= s= u=] dogfood1.3 → [c=progress p= s= u=1.3] dogfood1.3
(In reply to Etienne Segonzac (:etienne) from comment #7)
> (In reply to Joe Cheng [:jcheng] from comment #5)
> > Etienne, do you mind looking at it? thanks
> 
> It's a database migration that takes time, not something we regressed on.

No - the problem here isn't the DB migration. The problem is that progress bar is communicating a lie here - the progress goes quickly to 60%, hangs there for 10 minutes, and then continues on. When I showed this to Chris in person, we thought the Dialer app was hung.

This is a confirmed regression - comment 4 clearly calls out a regression range here. We should find out what regressed this.

> 
> We could probably rework the migration code to update the progress  more
> frequently, but I don't think changing indexedDB migration code at this
> point will get approve for 1.3.
> 
> Fabrice, what do you think?

This is going in an incorrect direction. The problem here is with perceived progress, not the progress of the DB migration itself. There's no way we're shipping with a 10 minute hang in the dialer app - people are going to think the app is hung & never manage to get their call log imported, as they won't know to wait that long.
I also thought that my call log was lost when I went to upgrade it. Maybe some fix like we recently did with contacts is in order here?
This lengthy migration was introduced by bug 862385 (leo+) in July. The way progress is displayed hasn't changed since.

The x-heavy reference workload is more than twice the biggest "production" database possible (see bug 829399, we don't let the call log grow forever).

So yes, we should definitely wait for Fabrice's feedback before rushing an indexeddb migration code change.
(In reply to Etienne Segonzac (:etienne) from comment #10)
> This lengthy migration was introduced by bug 862385 (leo+) in July. The way
> progress is displayed hasn't changed since.

That's not right. The regression range points at October, not July.

> 
> The x-heavy reference workload is more than twice the biggest "production"
> database possible (see bug 829399, we don't let the call log grow forever).

This problem reproduces with the medium reference workload as well (100 call logs), although the timing isn't 10 minutes (I think it was around 5 minutes). It's a general problem with any migration, but the problem gets worse with more call logs.

> 
> So yes, we should definitely wait for Fabrice's feedback before rushing an
> indexeddb migration code change.

We aren't unblocking this. We are made a final decision in triage on this, as partners are not going to let us ship with a perceived hang.
The above testing in comment 1 also showcases this working fine on 1.1. Like I said, I don't think you are going in right direction here.
https://github.com/mozilla-b2g/gaia/compare/18c2ab4b7b2d4cacee90fb85e97354142b2fad4a...d0861c2913a57cb681ec4cd4d52d73cc0f479ac9
There's only two Gaia commits in the regression range:

1. https://github.com/mozilla-b2g/gaia/commit/e6b176ab1101577f80604aca0c3b3c3b58f74b03
2. https://github.com/mozilla-b2g/gaia/commit/669d29644d3808039ed57ca5d6a1f90f8599ecb0

Both changes don't touch the call log. Maybe it's a gecko regression?
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=855da6d8a327&tochange=e25e62d174ed
Can we verify this isn't a gaia regression? We can do that by doing the following:

1. Test this on a 10/18 Gaia, 10/19 Gecko
2. Test this on a 10/19 Gaia, 10/18 Gecko
Keywords: qawanted
Assignee: nobody → sergi.mansilla
I can reproduce it on Peak 1.3 with Build Id 20140211103226, Platform 28.0 and Gaia from ~Thursday last week.
Reproducible on:

- Gaia https://github.com/mozilla-b2g/gaia/commit/669d29644d3808039ed57ca5d6a1f90f8599ecb0 on build 20140211103226 (10/04)
- Gaia https://github.com/mozilla-b2g/gaia/commit/e6b176ab1101577f80604aca0c3b3c3b58f74b03 on build 20140211103226 (10/16)

So far, it seems more likely that it is a Gecko regression.
If you need help profiling this and figuring out why it's slow just NI me.
Well the following commend in the code might shed some light on the issue:)

  // We send 'upgradeprogress' events from two different places:
  // 1- For each 15% of the recentsStore cursor requests completed.
  //    We increment the progress by 10%, leaving the total amount
  //    in 60%.
  // 2- For each 25% of the Contacts API requests completed. We
  //    also increment the progress by 10%, leaving the total
  //    amount in 90%.
  // The final progress 100% event is avoided as we send a 'upgradedone'
  // event instead.

Hopefully getting somwehere.
After more debugging, I found out where does it stop, but I still don't know the cause. Moreover, with the latest Gaia v1.3 branch on latest Peak 1.3 build, still gets stuck for a while on 60% but then it continues in increments of 10% until 90%. I need somebody that can confirm whether that's also the case in Buri as well with latest 1.3 Gaia/Gecko.

I think in any case there is a performance problem retrieving Contacts, since it gets stuck just at the point where the recentsStore is properly retrieved and we are waiting for the Contacts API requests to finish.

The relevant code to all this in Gaia is in the onsuccess call in the following line:
https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/dialer/js/call_log_db.js#L336

At this point I need more pointers to go on on where to look.
Flags: needinfo?(gsvelto)
Sounds like this is a contacts API bug then. Over to DOM: Contacts.
Component: Gaia::Dialer → DOM: Contacts
Keywords: qawanted
Product: Firefox OS → Core
Version: unspecified → 28 Branch
That would still be nice to get the gecko regression range bisected...
Flags: needinfo?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #23)
> That would still be nice to get the gecko regression range bisected...

I'm going to put money down that this was caused by bug 850430 - the infamous WebIDL work for the contacts API.
Blocks: 850430
This is just the upgrade for 2000 contacts taking a long time. We upgrade the entire database before responding to any requests, so the Dialer doesn't update the progress bar.
No longer blocks: 850430
FWIW, we start upgrading the Contacts DB at 0%.
Bug 951806 helps a lot here, I'm inclined to mark this one as dupe. We could also slowly tick the update progress from 60% to 69% (since 90%->100% is not used) so the user doesn't think it's hanging.
(In reply to Reuben Morais [:reuben] from comment #27)
> Bug 951806 helps a lot here, I'm inclined to mark this one as dupe. We could
> also slowly tick the update progress from 60% to 69% (since 90%->100% is not
> used) so the user doesn't think it's hanging.

Let's retest this on the latest 1.3 build then to see how we perform with this patch.
Keywords: qawanted
Assignee: sergi.mansilla → nobody
IIRC we don't do any update of the Contacts DB. We just query the Contacts DB to build the call log contacts cache in the call log DB.

Also, IIRC again, we did this progress bar hack on bug 862385 cause we were at a string frozen stage in 1.1 and we couldn't provide a better UX at that point [1]. But now, we should be able to do it better. My proposal for this is to simply remove the progress bar and show a loading spinner with a "Call log optimization in progress" or whatever message. That way we can remove the percentage logic and provide a better UX.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=862385#c31
Etienne, would you be ok with the spinner solution?
Flags: needinfo?(etienne)
Having seen the considerable amount of logic that goes into handling the (fake) percentage, I'm all for the spinner solution, makes much more sense.
Component: DOM: Contacts → Gaia::Dialer
Product: Core → Firefox OS
Version: 28 Branch → unspecified
Keywords: regression
clear N? this bug is actively worked on
Flags: needinfo?(dscravaglieri)

Updated

4 years ago
QA Contact: sarsenyev
Guys - please stay focused on the problem here. This IS a confirmed regression with a clear regression range. There's clear data that proves this. Please assess what caused this regression & derive the solution from that.
Keywords: regression
Update on qawanted:

1. Test to see if this reproduces on last working gaia/first broken gecko & first broken gaia/last working gecko

2. Test this on the latest 1.3
Priority: P3 → P1

Comment 35

4 years ago
I just retested on fresh v1.3 with reference-workload-x-heavy.  It sat at 60% for almost 5 minutes and then slow went from 70% to 100% over the next minute.

It then displayed the list correctly.  So it did not immediately go from 60% to 100%.  It did stay a long time at 60%, though.

Given how most progress bars these days do not do a linear extrapolation (due to perceived performance issues), this seems like a WFM.  Just my two cents.

Comment 36

4 years ago
(In reply to Jason Smith [:jsmith] from comment #34)
> Update on qawanted:
> 
> 1. Test to see if this reproduces on last working gaia/first broken gecko &
> first broken gaia/last working gecko

This issue reproduces when using the last working gaia(10/18)/first broken gecko(10/19). The variables for that build is:

Gaia   d0861c2913a57cb681ec4cd4d52d73cc0f479ac9
SourceStamp 855da6d8a327
BuildID 20131018040217
Version 27.0a1
 
This does *not* reproduce when using the first broken gaia(10/19)/last working gecko(10/18). The percentage progresses past 60% at a consistent speed and finishes as expected.


> 2. Test this on the latest 1.3

This issue reproduces on the 02/19/14 1.3 build.

Gaia   a43904d9646109b48836d62f7aa51e499fbf4b2e
SourceStamp 97922b6daad1
BuildID 20140219004003
Version 28.0
Keywords: qawanted
QA Contact: sarsenyev → mvaughan
The above comment gives hard evidence that this is a gecko bug. Back over to DOM: Contacts & re-adding dependency, as there's strong evidence that's keeps proving that the WebIDL work caused this regression.
Blocks: 850430
Component: Gaia::Dialer → DOM: Contacts
Product: Firefox OS → Core
Version: unspecified → 28 Branch
Can someone clarify what the behavior is on the last working build?
Keywords: qawanted
I assume Ben's comment 35 was using a recent (ie. containing bug 951806) build?

I guess if you squint hard enough this is a regression from bug 850430. The behavior of the progress bar has always been broken (as far as this bug is concerned), the difference is that bug 850430 landed a few database versions, so the upgrade now takes longer to complete. The action here is to land the spinner, unless there's a compelling argument that we need to give the user a very accurate (more than it is today) indication of progress.
Component: DOM: Contacts → Gaia::Dialer
Product: Core → Firefox OS
Version: 28 Branch → unspecified

Comment 40

4 years ago
Created attachment 8378591 [details]
Video of loading call log on last working build.

(In reply to Jason Smith [:jsmith] from comment #38)
> Can someone clarify what the behavior is on the last working build?

When loading the large call log, the progress bar is displayed and fills up gradually until the whole log is loaded. The progress bar and percentage do not stop at 60%, but instead displays the progress at 10% increments. Once it's loaded, the call log is displayed correctly.

Attached is a video showing the behavior of the last working build. <Working_LoadingLargeCallLog.mpeg>

Updated

4 years ago
Keywords: qawanted
I retested this with a medium reference workload on the latest 1.4 build (500 contacts), since the consensus on the bug indicates that using x-heavy is too much. What I found was that I saw the 60% stopping before like what was seen above, but it only lasted for a minute or two. To a user, this does feel a bit annoying with the lack of progress, but it's certainly improvement with the landing the upgrade perf bug.
(In reply to Jason Smith [:jsmith] from comment #41)
> I retested this with a medium reference workload on the latest 1.4 build
> (500 contacts), since the consensus on the bug indicates that using x-heavy
> is too much. What I found was that I saw the 60% stopping before like what
> was seen above, but it only lasted for a minute or two. To a user, this does
> feel a bit annoying with the lack of progress, but it's certainly
> improvement with the landing the upgrade perf bug.

On that regard, I'm going call this a dupe of bug 951806. There's a separate bug tracking the implementation work for fixing the UX here.
No longer blocks: 850430
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(gsvelto)
Flags: needinfo?(etienne)
Resolution: --- → DUPLICATE
Duplicate of bug: 951806

Updated

4 years ago
Whiteboard: [c=progress p= s= u=1.3] dogfood1.3 → [c=progress p= s=2014.02.28 u=1.3] dogfood1.3
Target Milestone: --- → 1.4 S2 (28feb)
You need to log in before you can comment on or make changes to this bug.