Open Bug 1457329 Opened 2 years ago Updated 2 months ago

Intermittent widget/tests/unit/test_taskbar_jumplistitems.js | test_jumplist - [test_jumplist : 208] false == true

Categories

(Core :: Widget: Win32, defect, P3)

defect

Tracking

()

Tracking Status
firefox61 --- fixed
firefox62 --- disabled

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [retriggered][stockwell disabled])

Attachments

(1 file, 1 obsolete file)

Filed by: shindli [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=175841451&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/VOX-dOFjSGycp-NfDCmkCQ/runs/0/artifacts/public/logs/live_backing.log

2:54:03     INFO -  TEST-START | xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentscript.js
22:54:04     INFO -  TEST-PASS | xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentscript.js | took 918ms
22:54:04     INFO -  TEST-START | toolkit/components/url-classifier/tests/unit/test_listmanager.js
22:54:08     INFO -  TEST-PASS | toolkit/components/url-classifier/tests/unit/test_listmanager.js | took 3339ms
22:54:08     INFO -  TEST-START | widget/tests/unit/test_taskbar_jumplistitems.js
22:54:08  WARNING -  TEST-UNEXPECTED-FAIL | widget/tests/unit/test_taskbar_jumplistitems.js | xpcshell return code: 0
22:54:08     INFO -  TEST-INFO took 134ms
22:54:08     INFO -  >>>>>>>
22:54:08     INFO -  PID 8984 | JavaScript strict warning: resource://gre/modules/XPCOMUtils.jsm, line 277: ReferenceError: reference to undefined property 1
22:54:08     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
Summary: Intermittent widget/tests/unit/test_taskbar_jumplistitems.js | xpcshell return code: 0 → Intermittent widget/tests/unit/test_taskbar_jumplistitems.js | test_jumplist - [test_jumplist : 208] false == true
I did retriggers for Windows 10 x64 opt and Windows 10 x64 pgo.
Retriggered this fail for 15 pushes from the first one from Orange Factor: revision 83d635a47201 - aklotz@mozilla.com.
Added 20 pushes more and found out that on this revision 19f68300761e - sledru@mozilla.com it failed again, so I continued retriggering from there. The jobs on this platform were in the queue and I have cancelled some of them, hoping that it won't fail on older pushes, but with no luck. 

The retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=Windows%2010%20x64%20xpcshell&tochange=83d635a4720194bec17d98035878d33cca1aa0e8&fromchange=35fe513112642f7d2f9c99da61538db193701a94&filter-resultStatus=success&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable

I retriggered up to 8df37ea84bf3 - ryanvm@gmail.com 

:jmaher shall I continue with the retriggers? Am I on the right track?
Flags: needinfo?(jmaher)
Whiteboard: [stockwell needswork:owner] → [stockwell needswork:owner][retriggered]
Went on with the investigation that Cristina started and reached up until here where it still fails: http://tinyurl.com/y8s527wh

Seems like we need to go further back in time for this one.
Continued the investigation

   I have searched in mercurial for the last time the file widget/tests/unit/test_taskbar_jumplistitems.js has been changed, and due to the fact the investigation was started on inbound, I was pointed to these two merges from the 20th of April, starting with this one https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=windows%20xpcshell&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&filter-resultStatus=success&tochange=97e83068cae9e8761fc8d5d734a1941276816c3d&fromchange=55f736cf8ab717c206e120b7f00e94a71f201684&group_state=expanded.
   Retriggered the xpcshell test and it failed on the last merge that day as well, so I went to Autoland to find out if something there might have caused it.
   This push that supposedly caused the failures to occur https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=513cd669aca140b8792069c05c4b5215d88bb41a&filter-searchStr=windows%2010%20opt-xpcshell&selectedJob=169001500 has been backed out https://hg.mozilla.org/integration/autoland/rev/49d1cb161b40922d872836674ec92d0b7a8aa7b0 , and relanded (fixed) but the failures were still present.
   This is the range from where I started going backwards with the retriggers: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&filter-searchStr=windows%2010%20opt-xpcshell&tochange=377b62a130e78fa57d0311b2887cbabeffea6823&fromchange=4be9ce9a82a0c7d68014a062abb9240cdbfd913f 
   As far as I could go, this is where I got to https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6d4b4514388db355da61b35d371b095dc0dd4c78&filter-searchStr=windows%20xpcshell&group_state=expanded
  It is still unclear as to what the culprit is, so we'll continue with the investigation.
:gbrown do you think we shall continue the investigation? Since the failure rate is 2-3 failures of 20?
Flags: needinfo?(gbrown)
Notice that bug 1446133 is very similar, and is older: I think the same failure has been reported in two different places. So probably this has been around since at least 2018-03-15 (bug 1446133 filed). I don't see any sign of it before that, so maybe it is worth tracking it back just a little bit more?
Flags: needinfo?(gbrown)
thanks :gbrown for looking at this. :apavel, is this something you can make another pass on today?
Flags: needinfo?(jmaher) → needinfo?(apavel)
:jmaher: got this far: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows%20xpcshell%20Windows%2010%20x64%20pgo&group_state=expanded&tochange=6d4b4514388db355da61b35d371b095dc0dd4c78&fromchange=27d65d2c59095a8a37b8a5ada20b90429a7d5553&selectedJob=176422790 

I'm not sure if next shift can take over due to the failures caused by bug Bug 1444168, thus adding ni? to both Noemi and Cosmin. 
 
Also from IRC:

Aryx> https://bugzilla.mozilla.org/show_bug.cgi?id=1456807 is similar and it seems retriggers fail more often than the original runs, so wasn't sure of this has something to do with the machines
<Aryx> i think it's caused by the same problem like the jump list failure. if it's caused by a checkin and you find that one, retriggers for the other failure might show that it's also responsible for that
Flags: needinfo?(nerli)
Flags: needinfo?(csabou)
Flags: needinfo?(apavel)
This bug and bug 1446133 are the same bug, [test_jumplist : 208] vs [test_jumplist : 213]. With the help from Aryx it seems that https://hg.mozilla.org/mozilla-central/rev/61aa0247279a#l24.1 removed 5 lines, that's why the line number changed. This, on line 208, did not fail for a month or so and then appeared around 25th of april https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-01-02&endday=2018-05-02&tree=trunk&bug=1446133. 

After a pretty long investigation I've gone way back and tried to find where the original fail began. This is the range I last was working on http://tinyurl.com/y9q6wnle. I still haven't got to the bottom of this going back until this push http://tinyurl.com/y8p4om3d

Maybe :jimm could take a look at this as Aryx mentioned it might be something machine related not an actually problem with the test. Thank you.
Will close the initial bug as a duplicate of this one.
Flags: needinfo?(csabou) → needinfo?(jmathies)
Duplicate of this bug: 1446133
did a few more retriggers here, I think we are getting close:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows%20pgo%20x64%20xpcshell&filter-resultStatus=success&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&tochange=38b1cc41779caa473eed6fb78ba500faded0e615&fromchange=7989f5d56e52e1c71296221140c60a5ae18832ec

Narcis- it might be another hour or so for the results to come in, could you follow up on this and determine what the root cause is?  feel free to needinfo the next shift or me if you cannot come to a conclusion.
Flags: needinfo?(nbeleuzu)
   Followed up with more retriggers on this range: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows%20%20xpcshell&filter-resultStatus=success&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&fromchange=a8130e46c530b9736245ce7414ed9926e5e6b7bb&tochange=4811c426205d64975325c471f897ede03fba2757&group_state=expanded 
  Got up to this push that had the failure https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=e0a165295ef81483c10ae295769d6a5980b52225&filter-searchStr=windows%20%20xpcshell&filter-resultStatus=success&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&group_state=expanded
  
  I stopped with the retriggers as on the following pushes, the job failed with every retrigger, but not on widget/tests/unit/test_taskbar_jumplistitems.js  : https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=36c84344de23aa92f96c0d40076d5262aab32ad2&filter-searchStr=windows%20%20xpcshell&filter-resultStatus=success&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable

 Example failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=177120523&repo=autoland&lineNumber=332
       [[[[12:55:08    FATAL - Can't download from 
       https://queue.taskcluster.net/v1/task/cPkrHYRUTeCegWnldBWOiA/artifacts/public/build/target.zip to 
       Z:\task_1525523206\build\installer.zip!
       12:55:08    FATAL - Caught exception: HTTP Error 404: Not Found]]]]]]]

:jmaher do you have any advice on how we should proceed at this point?
Flags: needinfo?(ncsoregi) → needinfo?(jmaher)
this bug was filed 10 days ago, but we are retriggering back 2 months and seeing it show up 10% of the time.  I say we are not going to get much value from more retriggers unfortunately.

Regarding the HTTP Error 404, that looks like an infrastructure failure- but it could be that we finally crossed the 2 month time window and artifacts are not available anymore.

I think going forward we need :jimm to look at this bug or wait until there are enough failures to disable it which will happen in the next week.
Flags: needinfo?(jmaher)
will take a look.
Flags: needinfo?(jmathies)
Priority: -- → P3
Comment on attachment 8975417 [details] [diff] [review]
disabled test_taskbar_jumplistitems.js on Windows 10 for frequent falures.

Review of attachment 8975417 [details] [diff] [review]:
-----------------------------------------------------------------

remove the !ccov clause, the rest looks great

::: widget/tests/unit/xpcshell.ini
@@ +2,4 @@
>  head = 
>  
>  [test_taskbar_jumplistitems.js]
> +skip-if = os == "win" && os_version == "10.0" && !ccov # Bug 1457329

we can skip this on ccov as well; it fails more than 1/3 time there.
Attachment #8975417 - Flags: review?(jmaher) → review-
Comment on attachment 8975423 [details] [diff] [review]
disabled test_taskbar_jumplistitems.js on Windows 10 for frequent falures. / removed ccov

Review of attachment 8975423 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!
Attachment #8975423 - Flags: review?(jmaher) → review+
Attachment #8975417 - Attachment is obsolete: true
Flags: needinfo?(jmaher)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disabled]
Pushed by ebalazs@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9bd22ee21331
disabled test_taskbar_jumplistitems.js on Windows 10 for frequent falures. r=jmaher
Keywords: checkin-needed
Whiteboard: [retriggered][stockwell disabled] → [retriggered][stockwell disabled][checkin-needed-beta]
https://hg.mozilla.org/releases/mozilla-beta/rev/e8ebd14b18fd
Whiteboard: [retriggered][stockwell disabled][checkin-needed-beta] → [retriggered][stockwell disabled]
You need to log in before you can comment on or make changes to this bug.