Intermittent raptor-main TEST-UNEXPECTED-FAIL: no raptor test results were found for raptor-tp6-paypal-firefox
Categories
(Testing :: Raptor, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell unknown])
#[markdown(off)]
Filed by: ncsoregi [at] mozilla.com
https://treeherder.mozilla.org/logviewer.html#?job_id=226819019&repo=autoland
05:03:50 INFO - Adding profile profile_raptor-tp6-tumblr-firefox/raptor-tp6-tumblr-firefox_pagecycle_2.profile to archive /Users/cltbld/tasks/task_1549544222/build/blobber_upload_dir/profile_raptor-tp6-tumblr-firefox.zip
05:03:55 INFO - Adding profile profile_raptor-tp6-tumblr-firefox/raptor-tp6-tumblr-firefox_pagecycle_3.profile to archive /Users/cltbld/tasks/task_1549544222/build/blobber_upload_dir/profile_raptor-tp6-tumblr-firefox.zip
05:03:56 INFO - raptor-main cleaning up after gecko profiling
05:03:56 INFO - results-handler summarizing raptor test results
05:03:56 INFO - raptor-output ignoring the first dcf value due to initial pageload noise
05:03:56 INFO - raptor-output ignoring the first fcp value due to initial pageload noise
05:03:56 INFO - raptor-output turning on subtest alerting for measurement type: fcp
05:03:56 INFO - raptor-output ignoring the first fnbpaint value due to initial pageload noise
05:03:56 INFO - raptor-output ignoring the first ttfi value due to initial pageload noise
05:03:56 INFO - raptor-output ignoring the first loadtime value due to initial pageload noise
05:03:56 INFO - raptor-output turning on subtest alerting for measurement type: loadtime
05:03:56 INFO - raptor-output gecko profiling enabled - not posting results for perfherder
05:03:56 INFO - raptor-output results can also be found locally at: /Users/cltbld/tasks/task_1549544222/build/raptor.json
05:03:56 INFO - raptor-control-server shutting down control server
05:03:56 INFO - raptor-main finished
05:03:56 INFO - raptor-main TEST-UNEXPECTED-FAIL: test 'raptor-tp6-paypal-firefox' timed out loading test page: https://www.paypal.com/myaccount/summary/
05:03:57 ERROR - Return code: 1
05:03:57 WARNING - setting return code to 1
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 3•6 years ago
|
||
Based on retriggers this showed up with bug Bug 1445739. Looking in mercurial I found that this was last changed/added with 1518519.
Gijs, Florin can you please take a look?
Comment 4•6 years ago
|
||
I suspect this is similar to bug 1526086, and Firefox hangs at shutdown trying to create the startup cache, and that's why we don't get raptor results. (raptor doesn't restart Firefox between test iterations, so the end of the run is the first run with that profile) Either that or Firefox does shut down successfully but the writing of the raptor data doesn't happen as quickly as we expect, or something.
A few questions for Bebe or other folks working on raptor:
- is there some way to see what happens to the Firefox process? Looking at all the logs, I don't see a minidump or really any other output about the process lifetime, errors it might produce, etc. Is there a bug on file about this maybe? What happens in other cases if Firefox crashes mid-run?
- Is it easier to reproduce if we do only 1 run instead of 3 (with the gecko profiler enabled as well, to increase the amount of IO we end up doing at shutdown)?
- where does the relevant raptor code live for writing out the JSON data etc. ?
Natalia, does this not affect other raptor tests, ie are there other bugs for other chunks / raptor tests that are running into this issue, or something vaguely similar?
Also, Kris, how tricky would it be to make writing the startup cache interruptible, such that we just finish writing it the next time Firefox starts? That seems preferable over a shutdown hang/crash...
Comment hidden (Intermittent Failures Robot) |
Comment 6•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #4)
Natalia, does this not affect other raptor tests, ie are there other bugs for other chunks / raptor tests that are running into this issue, or something vaguely similar?
There is a number of bugs created for similar raptor tests, but this one here was further investigated as it had a high frequency at the time, along with Bug 1526034.
Examples of other bugs: Bug 1522286, Bug 1492820, Bug 1514145, Bug 1520351, Bug 1507529, Bug 1519558, Bug 1518227, Bug 1511185, Bug 1520235, Bug 1506936
Gijs - if there is any other info I could provide please let me know
Comment 7•6 years ago
|
||
(In reply to Natalia Csoregi [:nataliaCs] from comment #6)
(In reply to :Gijs (he/him) from comment #4)
Natalia, does this not affect other raptor tests, ie are there other bugs for other chunks / raptor tests that are running into this issue, or something vaguely similar?
There is a number of bugs created for similar raptor tests, but this one here was further investigated as it had a high frequency at the time, along with Bug 1526034.
Thanks. So I'm fairly confused now, because I've just realized that hte summary of this bug says 'TEST-UNEXPECTED-FAIL: test 'raptor-tp6-paypal-firefox/chrome' timed out loading test page: https://www.paypal.com/myaccount/summary/'
but everything I see in orangefactor says "raptor-main TEST-UNEXPECTED-FAIL: no raptor test results were found for raptor-tp6-paypal-firefox".
Those are not the same failures.
Bug 1526034 has the summary "High Frequency OSError: [Errno 24] Too many open files on Raptor failures", but actually the logs all error out on either the "no raptor test results were found for raptor-tp6-[some-test]" or the "timed out loading test page" failures, and don't seem to include the OSError stuff.
Examples of other bugs: Bug 1522286, Bug 1492820, Bug 1514145, Bug 1520351, Bug 1507529, Bug 1519558, Bug 1518227, Bug 1511185, Bug 1520235, Bug 1506936
OK, these fortunately don't seem related, nor as frequent as this bug and bug 1526034.
I have ideas about why my patch might cause the "no raptor test results were found" issue - that has to do with Firefox shutdown. I don't really understand why I might have anything to do with the "timed out loading test page" issues, which happen well before shutdown - I suspect that's a separate issue, and needs a new bisection, given the items in the pushlog from comment #3 are with the former error and not the latter. Andreea, can you help split these two things up?
In terms of tracking down exactly what is going on here, I'd need more debug output so I'm hoping Bebe or :rwood can answer the questions in comment #4 .
Comment 8•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #4)
Also, Kris, how tricky would it be to make writing the startup cache interruptible, such that we just finish writing it the next time Firefox starts? That seems preferable over a shutdown hang/crash...
I don't even know what that would look like. Writing the startup cache is simple, and relatively cheap. If it's a real problem, we just need to figure out why we get deadlocks.
I doubt it's related to this problem, though. This only happens during really short sessions, where we haven't had a chance to write the cache before shutdown. That shouldn't happen in raptor tests.
Comment 9•6 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #8)
This only happens during really short sessions, where we haven't had a chance to write the cache before shutdown. That shouldn't happen in raptor tests.
These are profiling raptor tests, so the browser is up for less than 2 minutes, and is probably pretty busy most of that time. The link in comment #3 seems to pretty convincingly show this was tripped by the addons-shouldn't-trip-startupcache patch...
Comment 10•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #9)
(In reply to Kris Maglione [:kmag] from comment #8)
This only happens during really short sessions, where we haven't had a chance to write the cache before shutdown. That shouldn't happen in raptor tests.
These are profiling raptor tests, so the browser is up for less than 2 minutes
The startup cache should be written about 10 seconds after startup. I suppose it's possible that we'd run into it here, but it still doesn't seem super likely.
Comment 11•6 years ago
•
|
||
Gijs, sure.
I looked in the orange factor logs, all of them, and 8 classifications were "timed out loading test page: https://www.paypal.com/myaccount/summary" the rest of them are this bug.
I have filed bug 1527493 for the timed out issue. Removing ni for Florin from here.
Comment 12•6 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #11)
Gijs, sure.
I looked in the orange factor logs, all of them, and 8 classifications were "timed out loading test page: https://www.paypal.com/myaccount/summary" the rest of them are this bug.
I have filed bug 1527493 for the timed out issue. Removing ni for Florin from here.
I can't really make progress here without some info from Florin or :rwood, I'm afraid - see comment #4.
Comment 13•6 years ago
•
|
||
Oh, sorry about that. Is there anything else I can help with?
Comment 14•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #4)
A few questions for Bebe or other folks working on raptor:
- is there some way to see what happens to the Firefox process? Looking at all the logs, I don't see a minidump or really any other output about the process lifetime, errors it might produce, etc. Is there a bug on file about this maybe? What happens in other cases if Firefox crashes mid-run?
- Is it easier to reproduce if we do only 1 run instead of 3 (with the gecko profiler enabled as well, to increase the amount of IO we end up doing at shutdown)?
- where does the relevant raptor code live for writing out the JSON data etc. ?
:rwood can you help with here?
Comment 15•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #4)
- where does the relevant raptor code live for writing out the JSON data etc. ?
The code that writes out the PERFHERDER_DATA dump lies in mozilla-central/testing/raptor/raptor/output.py.
It's the one that's eventually called by Raptor.process_results().
Comment hidden (Intermittent Failures Robot) |
Comment 17•6 years ago
|
||
"1 failures in 2832 pushes (0.0 failures/push) were associated with this bug in the last 7 days." Seems to have resolved.
Comment hidden (Intermittent Failures Robot) |
Comment 19•6 years ago
|
||
Description
•