Race in homescreen first time setup causes only one collection to appear

RESOLVED FIXED in 2.0 S5 (4july)

Status

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: kgrandon, Assigned: amac)

Tracking

unspecified
2.0 S5 (4july)
x86
macOS
Dependency tree / graph
Bug Flags:
in-moztrap -

Firefox Tracking Flags

(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.1 fixed)

Details

(Whiteboard: [systemsfe])

Attachments

(1 attachment, 3 obsolete attachments)

In bug 1020158 we ran into issues with the test_cleanup_gaia.py test. We have not yet once seen this on a device, and are unsure of what's causing it.

See here for some good information: https://bugzilla.mozilla.org/show_bug.cgi?id=1020158#c19
homescreen/test_homescreen_edit_mode.py is failing for the same reason.

I've disabled the test here:
https://github.com/mozilla-b2g/gaia/commit/5587e3b14bd3202e3a37d16aa53dfd141323c4ae
Duplicate of this bug: 1028058
Posted file WIP - Not even tested! (obsolete) —
This is a preliminary fix for the race condition on saveTable. I'll finish it later tonight or early tomorrow, unless you prefer some other approach.
Attachment #8444563 - Flags: feedback?(kgrandon)
Comment on attachment 8444563 [details] [review]
WIP - Not even tested!

Hi Antonio - nice patch.

My only concern is that if there really is some race when we call save(), that we might end up with the same situation if two different places call into it with two different sets of data. I think it will take some testing to see if this really fixes it though.
Attachment #8444563 - Flags: feedback?(kgrandon) → feedback+
Assignee: nobody → amac
Ok, I updated the PR. The code as it's now works, but I haven't checked the unit tests still (which will have probably be broken). I'll finish this tomorrow.

As I said on the PR, I added a log to see if the function was indeed being called reentrantly, and it was (from 1 to 4 times on the first boot).

I'll ask for review once the unit tests are working.
Posted file V1. Unit tests working (obsolete) —
I fixed the unit test that was failing, and reenabled the Gaia UI test on MacOS. The good news is that this test apparently works now (at least on the last two tries I launched) the bad one is that there another one that still fails intermittently (a functional one).
Attachment #8445170 - Flags: review?(kgrandon)
BTW, this should go into 2.0 if the Vertical Homescreen goes into 2.0. Are we asking for approval or blocking on this bugs?
Attachment #8444563 - Attachment is obsolete: true
Comment on attachment 8445170 [details] [review]
V1. Unit tests working

James - could you also take a look at this? Thanks!
Attachment #8445170 - Flags: review?(jlal)
Unit tests and Gaia unit UI tests are passing. There's a Gaia functional UI test failing, but it's from the browser: https://travis-ci.org/mozilla-b2g/gaia/builds/28317414
Was hoping to review this seriously today but I will take a look tomorrow
Comment on attachment 8445170 [details] [review]
V1. Unit tests working

I am sorry but I am not really sure what the goal is here... readwrite transactions are order safe http://www.w3.org/TR/IndexedDB/#transaction-concept (they execute in the order you declare them like your queue is doing).

What problem are we trying to solve and how does building the queue locally fix this rather then relying on the transaction ordering?
Attachment #8445170 - Flags: review?(jlal) → review-
Hmm. You're right. The problem I was trying to solve wasn't the order inside a transaction but that several transactions might be created concurrently. I didn't knew that transactions worked as a typical readers/writers problem. So this should not be neccessary. And yet, throttling the transactions this way seems to fix the race problem (or I was really unlucky on my previous tests and really lucky on this one, which is also possible). 

I'll try running some more tests tomorrow without the throttling and see if I can reproduce the problem. Thanks for the information!
Just a thought, if we are able to reproduce, have we tried putting in some logging to get a stack inside of the save method?

Perhaps dumping the stack as well as all of the items we are saving could be useful. Just something simple like, console.log(new Error().stack);
Comment on attachment 8445170 [details] [review]
V1. Unit tests working

Now that we have some more information from James, going to clear the review for the time being.
Attachment #8445170 - Flags: review?(kgrandon)
The underlying bug here is now affecting CI on Linux too. I guess some change in timing has exposed on linux what was only on macos.
Blocks: 1027144
Kevin, I've re-jigged this bug ot represent what's being investigated here, hope that's alright!
Component: Gaia::UI Tests → Gaia::Homescreen
Summary: Investigate test_cleanup_gaia.py failure → Race in homescreen first time setup causes only one collection to appear
This bug is preventing a Smoketest to pass.
blocking-b2g: --- → 2.0?
QA Whiteboard: [VH-FL-blocking+][VH-FC-blocking+]
Whiteboard: [systemsfe]
blocking-b2g: 2.0? → 2.0+
Target Milestone: --- → 2.0 S5 (4july)
I finally got it to fail once locally.When it failed, I didn't get *any* readonly transactions. On a normal run, there's a readwrite transaction followed by a readonly transaction (from fetch I guess), and then some more readwrite transactions. On the run that failed, though, I got only a couple of readwrite transactions, and the homescreen shown was in an incorrect state. The first readwrite transaction was one invoked from the collection listener (it added only 6 items, being the 4 collections and the two dividers). The second one added the rest of the icons, but they weren't shown.

Rebooting the emulator with the same profile, though, shown an almost correct grid (it had only one collection, but all of the rest of the icons were present). I've tried artificially delaying the call to fetch until after the collections have been processed, but that didn't seem to trigger the error. I have to run out now, but will keep investigating later.
Antonio - how did you reproduce locally? I'm still trying.. You mentioned emulator, did it fail in b2g desktop?
I would recommend instrumenting the item.js store with lots of logging perhaps. If we dump everything that the store currently has, and what we're about to save, it might be useful.
(In reply to Kevin Grandon :kgrandon from comment #19)
> Antonio - how did you reproduce locally? I'm still trying.. You mentioned
> emulator, did it fail in b2g desktop?

I'm running the tests on a loop, using the emulator and running the tests locally on a Mac. It failed about 1/100 times or so (I think me adding logs on item.js changed the timing and made it fail less...).
I finally managed to reproduce this consistently. The problem happens when the collection is updated between the time when the listeners are set and the time when the initial homescreen database and grid population is done. If the collection is updated *before* the listeners are set then there are no problem, and if it's updated after the grid has been populated (and the database created) then there's no problem either.

With this I cannot reproduce the problem artificially any more (I forced the problem to manifest by adding some delays to the collection app and the database init code) and running the test 200 times locally didn't fail either.


Fortunately there's an easy solution, which is to delay setting the listeners until the initial read of the datastore. At that point the database has been created and read and the grid is being populated.
Attachment #8445170 - Attachment is obsolete: true
Attachment #8447188 - Flags: review?(kgrandon)
Antonio - great find, you are a life saver! I will look at your code today. My only question is, would it be possible to "miss" this event and not have any collections?
Nevermind, I understand now that these are for datastore things. This makes total sense in that case, and we should verify that this works for bookmarks and apps.

I'm going to re-enable the test and check on gaia-try.
You have an R+, but I would like to land with the tests enabled at the same time. Opening a new PR with a revert of the commit that disabled the tests. Let's see what this does.
Attachment #8447188 - Attachment is obsolete: true
Attachment #8447188 - Flags: review?(kgrandon)
Attachment #8447277 - Flags: review+
We have a solid looking try-run with the tests re-enabled, so I think we're looking good here. 

https://tbpl.mozilla.org/?tree=Gaia-Try&rev=066fecdd128c8cbc3967c31dc4f230eefd4f77e6 (Unit tests fixed now also)
Thanks for all of the hard work on this.

Landed: https://github.com/mozilla-b2g/gaia/commit/f89802f0554f4250774de912f0b5b297151817cc
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Cool, thanks for finishing this! :)
You need to log in before you can comment on or make changes to this bug.