Intermittent TEST-UNEXPECTED-FAIL : xperf: File 'c:\windows\system32\wintypes.dll' (normalized from 'C:\Windows\System32\WinTypes.dll') was accessed and we were not expecting it. DiskReadCount: 2, DiskWriteCount: 0, DiskReadBytes: 32768
Categories
(Toolkit :: Startup and Profile System, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox67 | - | unaffected |
firefox67.0.1 | - | unaffected |
firefox68 | - | unaffected |
firefox69 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: bugzilla)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [comment 50][stockwell disable-recommended])
Attachments
(1 file, 1 obsolete file)
Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=244243523&repo=mozilla-inbound
Full log: https://queue.taskcluster.net/v1/task/OzT3vzZzRhWNioESM36rmA/runs/0/artifacts/public/logs/live_backing.log
17:44:34 INFO - The trace you have just captured "Z:\task_1556814716\build\test.etl.kernel" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
17:44:34 INFO - The trace you have just captured "Z:\task_1556814716\build\test.etl.user" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
17:45:05 INFO - Merged Etl: test.etl
17:52:12 INFO - reading etl filename: test.etl
17:52:12 INFO - etlparser: in readfile: test.etl.csv
17:52:12 INFO - TEST-UNEXPECTED-FAIL : xperf: File 'c:\windows\system32\wintypes.dll' (normalized from 'C:\Windows\System32\WinTypes.dll') was accessed and we were not expecting it. DiskReadCount: 2, DiskWriteCount: 0, DiskReadBytes: 32768, DiskWriteBytes: 0
17:52:14 INFO - extending with xperf!
17:52:14 INFO - Detected a regression for tp5n
17:52:14 INFO - TEST-UNEXPECTED-FAIL | tp5n | Talos has found a regression, if you have questions ask for help in irc on #perf
17:52:14 ERROR - Traceback (most recent call last):
17:52:14 INFO - File "Z:\task_1556814716\build\tests\talos\talos\run_tests.py", line 300, in run_tests
17:52:14 INFO - talos_results.add(mytest.runTest(browser_config, test))
17:52:14 INFO - File "Z:\task_1556814716\build\tests\talos\talos\ttest.py", line 64, in runTest
17:52:14 INFO - return self._runTest(browser_config, test_config, setup)
17:52:14 INFO - File "Z:\task_1556814716\build\tests\talos\talos\ttest.py", line 268, in _runTest
17:52:14 INFO - 'Talos has found a regression, if you have questions'
17:52:14 INFO - TalosRegression: Talos has found a regression, if you have questions ask for help in irc on #perf
17:52:14 INFO - TEST-INFO took 594333ms
17:52:14 INFO - SUITE-END | took 594s
17:52:14 ERROR - Return code: 1
17:52:14 WARNING - setting return code to 1
17:52:14 ERROR - # TBPL WARNING #
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 8•6 years ago
|
||
Currently trending down. Removing disable-recommended.
Comment hidden (Intermittent Failures Robot) |
Comment 10•6 years ago
|
||
Hi Robert, can you please take a look at this failure?
It seems to also fails on retriggers on older pushes.
Comment 11•6 years ago
|
||
Hi :igoldan, I believe you're familiar with xperf correct? Mind having a look (or fw on to the talos xperf test owner if need be)? Thanks!
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 15•6 years ago
|
||
Could this be related to the machines running the test? The failures on the oldest commits are all retriggers, the initial xperf runs didn't fail.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 16•6 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #15)
Could this be related to the machines running the test? The failures on the oldest commits are all retriggers, the initial xperf runs didn't fail.
Comment 17•6 years ago
|
||
Comment 18•6 years ago
|
||
Aaron, I believe xperf is only monitoring files opened/imported by Firefox. Am I right?
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 21•6 years ago
|
||
There is a certain amount of non-determinism here, at least dependent in part on the host machine and OS.
I wouldn't mind knowing how wintypes.dll
is being loaded before whitelisting, though.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 25•6 years ago
|
||
Aron, Ionut, any updates here? Should we consider disabling the test? Thanks.
Comment 26•6 years ago
•
|
||
(In reply to Cosmin Sabou [:CosminS] from comment #25)
Aron, Ionut, any updates here? Should we consider disabling the test? Thanks.
So I have the patch waiting for review & it fixes the intermittent.
There is a certain amount of non-determinism here, at least dependent in part on the host machine and OS.
I wouldn't mind knowing how wintypes.dll is being loaded before whitelisting, though.
I don't have enough cycles nor the xperf experience to investigate this. Aaron, could you look into this instead?
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 29•6 years ago
|
||
I don't think we should be whitelisting this. In fact, I think this is a legitimate perf regression from bug 1260483.
I ran Nightly in a debugger set up to break when WinTypes.dll
is loaded. The most useful frames are here:
00 ntdll!NtMapViewOfSection+0x14
01 firefox!patched_NtMapViewOfSection+0x6d
02 ntdll!LdrpMinimalMapModule+0x101
03 ntdll!LdrpMapDllWithSectionHandle+0x1b
04 ntdll!LdrpMapDllNtFileName+0x189
05 ntdll!LdrpMapDllFullPath+0xe0
06 ntdll!LdrpProcessWork+0x74
07 ntdll!LdrpDrainWorkQueue+0x16e
08 ntdll!LdrpLoadDllInternal+0x148
09 ntdll!LdrpLoadDll+0xa8
0a ntdll!LdrLoadDll+0xe4
0b mozglue!patched_LdrLoadDll+0x206
0c KERNELBASE!LoadLibraryExW+0x161
0d combase!LoadLibraryWithLogging+0x2c
0e combase!CClassCache::CDllPathEntry::LoadDll+0x56
0f combase!CClassCache::CDllPathEntry::Create+0x4f
10 combase!CClassCache::GetOrLoadWinRTInprocClass+0xd80
11 combase!CCGetOrLoadWinRTInprocClass+0x21
12 combase!WinRTGetActivationFactoryOfInprocClass+0x49
13 combase!_RoGetActivationFactory+0x361
14 mrmcorer!Microsoft::Resources::IsHighContrast+0x68
15 mrmcorer!Microsoft::Resources::GetContrastQualifierValue+0x28
16 mrmcorer!Microsoft::Resources::QualifierValueProviderBase::GetQualifierValue+0x55
17 mrmcorer!Microsoft::Resources::Runtime::CContextualHash::_GetHash+0xe2
18 mrmcorer!Microsoft::Resources::Runtime::CContext::s_GetHash+0x60
19 mrmcorer!Microsoft::Resources::Runtime::CResourceManagerQueueInternal::_FillCacheStrings+0x71
1a mrmcorer!Microsoft::Resources::Runtime::CResourceManagerQueueInternal::GetStringInternal+0x2b9
1b mrmcorer!ResourceManagerQueueGetString+0x3a
1c KERNELBASE!SHLoadIndirectStringInternal+0x161
1d windows_storage!CAssocElement::_QuerySourceAny<<lambda_dfb37409bb15dfff50d1c0491f1e76a6> >+0x1bf
1e windows_storage!CAssocElement::QueryString+0x2f
1f windows_storage!CAssocShellElement::QueryStringNoDelegate+0x25
20 windows_storage!CAssocProgidElement::QueryStringNoDelegate+0x24
21 windows_storage!CAssocShellVerbElement::QueryString+0x12d
22 windows_storage!CAssocArray::_QueryAllElements<<lambda_0ad978ee0b657b29c9332a702085a2f5> >+0x12b
23 windows_storage!CAssocArray::GetString+0xa5
24 shlwapi!AssocQueryStringW+0xb1
25 xul!nsOSHelperAppService::GetApplicationDescription+0x1c7
26 xul!nsOSHelperAppService::GetProtocolHandlerInfoFromOS+0x1c7
27 xul!nsExternalHelperAppService::GetProtocolHandlerInfo+0x2d
28 xul!XPTC__InvokebyIndex+0x72
In essence, AssocQueryStringW
is triggering main thread I/O. It is intermittent because the Windows loader has transitioned to using a thread pool, so whether or not the I/O occurs on the main thread is nondeterministic.
Comment 30•6 years ago
•
|
||
(In reply to Aaron Klotz [:aklotz] (PTO May 29 - June 5) from comment #29)
I don't think we should be whitelisting this. In fact, I think this is a legitimate perf regression from bug 1260483.
[...]
In essence,AssocQueryStringW
is triggering main thread I/O. It is intermittent because the Windows loader has transitioned to using a thread pool, so whether or not the I/O occurs on the main thread is nondeterministic.
Ok, I've pointed this out in bug 1260483. Thanks, Aaron!
Updated•6 years ago
|
Comment 31•6 years ago
|
||
igoldan: could you schedule a time for you, me, jmaher, and aklotz to discuss xperf ownership and future?
Comment hidden (Intermittent Failures Robot) |
Comment 33•6 years ago
|
||
(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #31)
igoldan: could you schedule a time for you, me, jmaher, and aklotz to discuss xperf ownership and future?
Yep, I've sent an email for that to better pick the date.
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 35•6 years ago
|
||
There are 108 total failures in the last 7 days on windows10-64, windows10-64-qr, windows10-64-shippable all opt
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249653062&repo=mozilla-central&lineNumber=1613
03:16:30 INFO - TEST-START | tp5n
03:16:30 INFO - Initialising browser for tp5n test...
03:18:30 INFO - The trace you have just captured "Z:\task_1559526833\build\test.etl.user" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
03:18:47 INFO - Merged Etl: test.etl
03:23:50 INFO - reading etl filename: test.etl
03:23:50 INFO - etlparser: in readfile: test.etl.csv
03:23:50 INFO - TEST-UNEXPECTED-FAIL : xperf: File 'c:\windows\system32\wintypes.dll' (normalized from 'C:\Windows\System32\WinTypes.dll') was accessed and we were not expecting it. DiskReadCount: 2, DiskWriteCount: 0, DiskReadBytes: 32768, DiskWriteBytes: 0
03:23:51 INFO - extending with xperf!
03:23:51 INFO - Detected a regression for tp5n
03:23:51 INFO - TEST-UNEXPECTED-FAIL | tp5n | Talos has found a regression, if you have questions ask for help in irc on #perf
03:23:51 ERROR - Traceback (most recent call last):
03:23:51 INFO - File "Z:\task_1559526833\build\tests\talos\talos\run_tests.py", line 276, in run_tests
03:23:51 INFO - talos_results.add(mytest.runTest(browser_config, test))
03:23:51 INFO - File "Z:\task_1559526833\build\tests\talos\talos\ttest.py", line 64, in runTest
03:23:51 INFO - return self._runTest(browser_config, test_config, setup)
03:23:51 INFO - File "Z:\task_1559526833\build\tests\talos\talos\ttest.py", line 263, in _runTest
03:23:51 INFO - 'Talos has found a regression, if you have questions'
03:23:51 INFO - TalosRegression: Talos has found a regression, if you have questions ask for help in irc on #perf
03:23:51 INFO - TEST-INFO took 440734ms
03:23:51 INFO - SUITE-END | took 440s
03:23:51 ERROR - Return code: 1
Aaron can you take a look?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 40•6 years ago
|
||
we need to add 'C:\Windows\System32\WinTypes.dll' to the whitelist file:
https://searchfox.org/mozilla-central/source/testing/talos/talos/xtalos/xperf_whitelist.json
redirecting to :davehunt to get this fixed- code sheriffs can fix this as well- but be warned, adding a file to xperf_whitelist.json should only be done if it is expected- in this case why is wintypes.dll being called- did firefox change and now is calling it? I think this is unrelated to firefox, but in many other cases it is related (say a new font file is used, etc.)
Comment 41•6 years ago
|
||
Thank you Joel, we'll wait for Dave to reply.
Comment 42•6 years ago
|
||
According to :aklotz in comment 29 this should not be whitelisted as it is a genuine regression from bug bug 1260483. Is there new information that demonstrates this is not the case?
Comment 43•6 years ago
|
||
I missed that, thanks :davehunt for pointing it out. Given the high failure rate, this is nearing perma-fail territory, can we backout bug 1260483? If that landed 3 years ago I don't see how this could be the problem.
Can we disable this test or make it tier-2?
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment 45•6 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #43)
Can we disable this test or make it tier-2?
Yes. Let's demote it to tier-2, to ease the burden on the code sheriffs.
Will re promote the test to tier-1, once this issue gets fixed.
![]() |
||
Comment 46•6 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #45)
Yes. Let's demote it to tier-2, to ease the burden on the code sheriffs.
Tier 1 or 2 only matters for the question if something has to be backed out (1 = yes, 2 = no, file a bug and needinfo developer instead). Regarding the load of failure classifications, it doesn't make any difference.
Comment 47•6 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #46)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #45)
Yes. Let's demote it to tier-2, to ease the burden on the code sheriffs.
Tier 1 or 2 only matters for the question if something has to be backed out (1 = yes, 2 = no, file a bug and needinfo developer instead). Regarding the load of failure classifications, it doesn't make any difference.
Ok, so we need to demote to Tier-3 then? Will the tests still run on trunk?
![]() |
||
Comment 48•6 years ago
|
||
Yes, tier 3 only hides them from treeherder by default and they will still run on the same trees. If the latter shall be modified, run-on-projects has to be adjusted.
Assignee | ||
Comment 49•6 years ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #43)
I missed that, thanks :davehunt for pointing it out. Given the high failure rate, this is nearing perma-fail territory, can we backout bug 1260483? If that landed 3 years ago I don't see how this could be the problem.
The reason why it is a problem now is probably due to Windows 10 upgrades on our test boxes. Also, the code in question is called by JS, which could have changed such that it moved the native call to an earlier point in browser startup.
Assignee | ||
Comment 50•6 years ago
|
||
I have a patch that will probably mitigate this. I'd rather we just fix this in Firefox than demote the test.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 51•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 55•6 years ago
|
||
Aaron could you choose another reviewer? agashlin has blocking review.
![]() |
||
Comment 56•6 years ago
|
||
"Blocking review" means that person must review the patch - it doesn't matter if anybody else grants review.
Comment 57•6 years ago
|
||
oh, okay. thanks for clarifying that. I thought it's just like in bugzilla when people block ni, you cannot set ni - block review, he won't see the review request.
Comment 58•6 years ago
|
||
Comment 59•6 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Description
•