Closed
Bug 1374863
Opened 7 years ago
Closed 7 years ago
'browser.engagement.tab_open_event_count' fails to properly count in automated tests.
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox56 | --- | fixed |
People
(Reporter: Silne30, Assigned: Silne30)
Details
Attachments
(4 files)
When running the attached automated test, an initial tab is open when the browser opens. Two more tabs open totaling three tabs. When the ping is sent, 'browser.engagement.tab_open_event_count' == 1 rather than 2. This has recently surfaced as this test did not fail last week. When running the test manually, it counts properly. I have added conditions to wait for the tab to open but it still fails in automation.
Comment 1•7 years ago
|
||
I wonder if it's something related to what :Dexter said in bug 1271304 comment 1 about how open events during session restore are ignored. Or maybe a window's load event fires later than normal these days? Do you have a regression window?
Flags: needinfo?(jdorlus)
Assignee | ||
Comment 2•7 years ago
|
||
Since the tests were not up and running yet, I don't have an exact regression window. I am just going to go back to when I remember it working and try to work from there.
Flags: needinfo?(jdorlus)
Assignee | ||
Comment 3•7 years ago
|
||
It was definitely working when b446a0c0642f8b8f705d9ea7808d09d857e8dab3 was made and I created the test. This was early april. I am running a git bisect to find out when the bug was introduced.
Assignee | ||
Comment 4•7 years ago
|
||
I don't think it's related to 1271304 since I am not restoring an old session in the automated test. It's just a regular session and the tab scalar seems to increment incorrectly. Only happens when I run the test in an automated way, not if I do it manually.
Assignee | ||
Comment 5•7 years ago
|
||
Assignee | ||
Comment 6•7 years ago
|
||
I was told that there was a mochitest covering most of these scenarios. Did it not catch anything like this?
Flags: needinfo?(chutten)
Assignee | ||
Comment 7•7 years ago
|
||
I ran a git bisect and have a regression range somewhere between [ab0396e461ae03210b1ec8110ba6dc66dc61ca96] Bug 1297072 - added 2d matrix utility functions; r=gregtatum and [c33780a246c738ae1077ef9d77ebecf412452a4f] Bug 1358921: Lazily load SelfSupportBackend.jsm when first needed. r=florian This is an approximation since it's my first time using the bisect tool. I would run the bisect, rerun the test , etc. I hope this helps.
Comment 8•7 years ago
|
||
No idea, I'm afraid. Alessio?
Flags: needinfo?(chutten) → needinfo?(alessio.placitelli)
Comment 9•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #8) > No idea, I'm afraid. Alessio? John, could you provide some more context on the test? Is it running on Windows and, if so, which Windows version? Does it happen on all platforms? It would also be very useful to have a Telemetry log trace about the failure. Let me know if you have problems getting it!
Flags: needinfo?(alessio.placitelli) → needinfo?(jdorlus)
Comment 10•7 years ago
|
||
Context: on Windows 10 we open a new, additional tab by default on new profiles.
Assignee | ||
Comment 11•7 years ago
|
||
Context: I haven't run this test on windows. This fails on Linux only. And we are short a tab opening not over by 1. In the regression window, the test works on one end of the window and fails on the other. It could have to do with the speed of tabs opening. But I am not sure. But it doesn't fail when I try to reproduce manually. Env: Linux 64 Bit Test Run Method: Automated
Flags: needinfo?(jdorlus) → needinfo?(alessio.placitelli)
Assignee | ||
Comment 12•7 years ago
|
||
I will get you some gecko logs.
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Comment 13•7 years ago
|
||
I got you some gecko logs for the test that runs.
Attachment #8882456 -
Flags: feedback?(alessio.placitelli)
Comment 14•7 years ago
|
||
(In reply to John Dorlus [:Silne30] from comment #13) > Created attachment 8882456 [details] > geckologs.txt > > I got you some gecko logs for the test that runs. Thanks! I'm seeing this error in the log file: > TEST-UNEXPECTED-ERROR | test_debug_tabs_scalars.py TestMainTabScalars.test_main_tab_scalars | KeyError: 'browser.engagement.tab_open_event_count' It looks like the ping you're examining doesn't contain that scalar. On a closer look, it seems that your ping data doesn't contain any scalar data. Are you sure you're looking at the correct ping and not at the 'saved-session' ping? The latter does not contain any scalar data.
Updated•7 years ago
|
Attachment #8882456 -
Flags: feedback?(alessio.placitelli)
Comment 15•7 years ago
|
||
Is this failure actually from a landed test? If not, then you should investigate this as part of the bug you are working on.
Flags: needinfo?(jdorlus)
Assignee | ||
Comment 16•7 years ago
|
||
This was caught by a landed test.
Flags: needinfo?(jdorlus) → needinfo?(gfritzsche)
Assignee | ||
Comment 17•7 years ago
|
||
Yes. One of the landed tests caught this. 1352269
Comment 18•7 years ago
|
||
Clearing needinfo as i don't see any question. Note that there are still open questions for you here from 6 days ago.
Flags: needinfo?(gfritzsche)
Assignee | ||
Comment 19•7 years ago
|
||
Ah. I missed that. I am rerunning the test to make sure. Thanks.
Assignee | ||
Comment 20•7 years ago
|
||
Attachment #8884108 -
Flags: review?(alessio.placitelli)
Comment 21•7 years ago
|
||
Comment on attachment 8884108 [details] mainpinglog.txt Hey John, I'm not sure what I should check/review in this log. Is anything from comment 14 unclear?
Flags: needinfo?(jdorlus)
Attachment #8884108 -
Flags: review?(alessio.placitelli)
Assignee | ||
Comment 22•7 years ago
|
||
No. I addressed comment 14. You were right. That was a saved-session ping. But this time, I have uploaded the logs from the mainping and the count is still wrong.
Flags: needinfo?(jdorlus) → needinfo?(alessio.placitelli)
Comment 23•7 years ago
|
||
I'm not familiar with the test, could you briefly describe what do you expect? I see that the dumped ping does not contain the 'tab_open_event_count' scalar at all. Moreover, the 'subsessionCounter' is at 2, that means that a subsession split was performed before this ping and that scalars were cleared. You're looking at a 'main-ping' with reason 'shutdown'. Is this what you're looking for? You're probably more interested in the ping from the other subsession split. But, again, as I'm not familiar with the test I don't know :)
Flags: needinfo?(jdorlus)
Updated•7 years ago
|
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Comment 24•7 years ago
|
||
Hey Alessio, Yes. The received ping is correct. The reason the subsession counter is at 2 is because in the beginning of the tests, when we change the telemetry prefs, marionette restarts the browser. So it rightfully is on the second subsession when the test starts. The problem is also not the ping because all of the other scalars that are also part of that subsession are there and accounted for. There is something up with the tab open event counter specifically. If I comment out the assertion for that scalar, the test passes 50/50 times. If I don't comment out that scalar, the test fails 50/50 times. What's interesting is that sometimes when the test fails, the scalar is not in the ping at all. Sometimes it's an assertion error. This is how the test runs. Open a browser. Make changes to telemetry prefs. Restart Browser (Starting second subsession) Open a second tab Wait for second tab to open Open Third tab Wait for third tab to open Close third tab Wait for third tab to close Switch to second tab Close second tab Wait for second tab to close Switch to first tab Restart Browser Wait for ping The test asserts that max_concurrent_tab_count == 3 which passes It asserts that max_concurrent_window_count == 1 which passes But asserting tab_open_event_count == 2 fails. (1 is recorded in the ping or it's not there at all.) This is really strange behavior but it seems to be something that only an automated test brings out. May be the speed of the tabs opening causes a lack of the count incrementing. Not sure. But this is not a mistake. Something is legitimately failing.
Flags: needinfo?(jdorlus) → needinfo?(alessio.placitelli)
Assignee | ||
Comment 25•7 years ago
|
||
I ran the test 100 times and it passed 8 times and failed 92. Perhaps there is some kind of race condition happening.
Comment 26•7 years ago
|
||
(In reply to John Dorlus [:Silne30] from comment #24) > Hey Alessio, > > Yes. The received ping is correct. The reason the subsession counter is at 2 > is because in the beginning of the tests, when we change the telemetry > prefs, marionette restarts the browser. So it rightfully is on the second > subsession when the test starts. The problem is also not the ping because > all of the other scalars that are also part of that subsession are there and > accounted for. The |subsessionCounter| is the number of subsessions since the browser started, not since the profile was created [1]. Restarting the browser does not increase the |subsessionCounter|. This value is reset at every restart. Are you absolutely sure there's no other ping coming in? I would suggest you dump *all* the pings you receive since the beginning of the test. > There is something up with the tab open event counter > specifically. If I comment out the assertion for that scalar, the test > passes 50/50 times. If I don't comment out that scalar, the test fails 50/50 > times. What's interesting is that sometimes when the test fails, the scalar > is not in the ping at all. Sometimes it's an assertion error. This is how > the test runs. At each session split, where the subsessionCounter is increased, all the scalars are reset. Some of them are set again (e.g. maximum number of tabs and windows), and there's why you see them in all the main pings despite of the subsession split. The particular one that's causing a failure here only gets set when a new tab is created. Let's look at the test flow below. > Open a browser. > Make changes to telemetry prefs. > Restart Browser (Starting second subsession) Nope, you end the previous session here. Then start a new session when Firefox restarts. So, when running again, you are at subsessionCounter = 1 unless there was a subsession split. Moreover, when restarting the browser here, you would will have at least two pings on disk: saved-session and 'main-ping' with reason 'shutdown'. These are related to the first browsing session in the above 3 lines, and you are not really interested in them. > Open a second tab > Wait for second tab to open > Open Third tab > Wait for third tab to open > Close third tab > Wait for third tab to close > Switch to second tab > Close second tab > Wait for second tab to close > Switch to first tab > Restart Browser > Wait for ping > The test asserts that max_concurrent_tab_count == 3 which passes > It asserts that max_concurrent_window_count == 1 which passes > But asserting tab_open_event_count == 2 fails. (1 is recorded in the ping or > it's not there at all.) Which makes me thing, again, that something is triggering a subsession split in between your test or that you're opening the tabs before the session-restore message is received, meaning that the open-tab events are ignored (the max count isn't because it's simply counting independently). > This is really strange behavior but it seems to be something that only an > automated test brings out. May be the speed of the tabs opening causes a > lack of the count incrementing. Not sure. But this is not a mistake. > Something is legitimately failing. I think we're still unable to show this is a legitimate issue with the code: we do have mochitest covering this [2] which are not failing and there are a few grey areas in the analysis that need to be cleared up (are we receiving multiple pings? are we waiting for the session restore?). [1] - http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/main-ping.html [2] - http://searchfox.org/mozilla-central/search?q=&case=false®exp=false&path=browser_UsageTelemetry
Flags: needinfo?(alessio.placitelli) → needinfo?(jdorlus)
Assignee | ||
Comment 27•7 years ago
|
||
Per IRC comment, going to add a 5 second wait after the first subsession split when we restart the browser after changing the prefs to see if that is causing the issue.
Flags: needinfo?(jdorlus)
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8887598 -
Flags: review?(chutten)
Comment 29•7 years ago
|
||
mozreview-review |
Comment on attachment 8887598 [details] Bug 1374863 - Added wait for telemetry to reinitialize after reboot when setting gecko prefs. https://reviewboard.mozilla.org/r/157084/#review163788 Does what it says on the tin... But is there nothing better than time.sleep? testReset and friends have promises that you can await in JS.
Attachment #8887598 -
Flags: review?(chutten) → review+
Assignee | ||
Comment 30•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8887598 [details] Bug 1374863 - Added wait for telemetry to reinitialize after reboot when setting gecko prefs. https://reviewboard.mozilla.org/r/157084/#review163788 Unfortunately not.
Assignee | ||
Updated•7 years ago
|
Comment 31•7 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s 7626131f953b -d 664f305ee1a5: rebasing 408345:7626131f953b "Bug 1374863 - Added wait for telemetry to reinitialize after reboot when setting gecko prefs. r=chutten" (tip) merging toolkit/components/telemetry/tests/marionette/harness/telemetry_harness/testcase.py warning: conflicts while merging toolkit/components/telemetry/tests/marionette/harness/telemetry_harness/testcase.py! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Updated•7 years ago
|
Keywords: checkin-needed
Comment hidden (mozreview-request) |
Comment 33•7 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s 242b5dc9be46 -d 6fc6a92ad62e: rebasing 408932:242b5dc9be46 "Bug 1374863 - Added wait for telemetry to reinitialize after reboot when setting gecko prefs. r=chutten" (tip) merging toolkit/components/telemetry/tests/marionette/harness/telemetry_harness/testcase.py warning: conflicts while merging toolkit/components/telemetry/tests/marionette/harness/telemetry_harness/testcase.py! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment hidden (mozreview-request) |
Comment 35•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/614cdfbf0e2c Added wait for telemetry to reinitialize after reboot when setting gecko prefs. r=chutten
Comment 36•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/614cdfbf0e2c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in
before you can comment on or make changes to this bug.
Description
•