Bump WINVER to 0x0601 (Windows 7)

RESOLVED FIXED in Firefox 54

Status

defect
RESOLVED FIXED
3 years ago
Last year

People

(Reporter: emk, Unassigned)

Tracking

(Blocks 1 bug)

Trunk
mozilla54
Dependency tree / graph

Firefox Tracking Flags

(firefox53 affected, firefox54 fixed)

Details

Attachments

(4 attachments)

No description provided.
> WINVER=0x0600 implies PSAPI_VERSION=2.
WINVER=0x0601
Attachment #8822455 - Flags: review?(mh+mozilla)
Attachment #8822456 - Flags: review?(mh+mozilla)
Attachment #8822457 - Flags: review?(mh+mozilla)
e10s jobs failed with the patch. Cancelled the review requests.
(In reply to Masatoshi Kimura [:emk] from comment #6)
> e10s jobs failed with the patch.

I get:

17:14:28     INFO -  Traceback (most recent call last):
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 2726, in <module>
17:14:28     INFO -      sys.exit(cli())
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 2723, in cli
17:14:28     INFO -      return run_test_harness(parser, options)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 2692, in run_test_harness
17:14:28     INFO -      result = runner.runTests(options)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 2239, in runTests
17:14:28     INFO -      result = self.runMochitests(options, tests_in_dir)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 2152, in runMochitests
17:14:28     INFO -      result = self.doTests(options, testsToRun)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 2321, in doTests
17:14:28     INFO -      self.manifest = self.buildProfile(options)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 1717, in buildProfile
17:14:28     INFO -      certificateStatus = self.fillCertificateDB(options)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 1597, in fillCertificateDB
17:14:28     INFO -      [certutil, "-N", "-d", certdbPath, "-f", pwfilePath], env=toolsEnv)
17:14:28     INFO -    File "C:\slave\test\build\tests\mochitest\runtests.py", line 313, in call
17:14:28     INFO -      process.run()
17:14:28     INFO -    File "C:\slave\test\build\venv\lib\site-packages\mozprocess\processhandler.py", line 739, in run
17:14:28     INFO -      self.proc = self.Process([self.cmd] + self.args, **args)
17:14:28     INFO -    File "C:\slave\test\build\venv\lib\site-packages\mozprocess\processhandler.py", line 114, in __init__
17:14:28     INFO -      universal_newlines, startupinfo, creationflags)
17:14:28     INFO -    File "c:\mozilla-build\python27\Lib\subprocess.py", line 679, in __init__
17:14:28     INFO -      errread, errwrite)
17:14:28     INFO -    File "C:\slave\test\build\venv\lib\site-packages\mozprocess\processhandler.py", line 280, in _execute_child
17:14:28     INFO -      cwd, startupinfo)
17:14:28     INFO -    File "C:\slave\test\build\venv\lib\site-packages\mozprocess\winprocess.py", line 200, in ErrCheckCreateProcess
17:14:28     INFO -      ErrCheckBool(result, func, args)
17:14:28     INFO -    File "C:\slave\test\build\venv\lib\site-packages\mozprocess\winprocess.py", line 57, in ErrCheckBool
17:14:28     INFO -      raise WinError()
17:14:28     INFO -  WindowsError: [Error 193] <no description>
17:14:28     INFO -  Exception AttributeError: "'Process' object has no attribute '_job'" in <bound method Process.__del__ of <mozprocess.processhandler.Process object at 0x02CE0CF0>> ignored

When I download the .zip file and run firefox.exe, Windows throws up some scary red box about how this program might not be safe to run...but lets me run it and everything seems to work fine after that.

Googling says Windows error 193 has something to do with things not being an executable, or something commonly associated with iTunes (bad libraries?).  Aaron or dmajor, do you know what might be going wrong here?
Flags: needinfo?(dmajor)
Flags: needinfo?(aklotz)
I get:

Assertion failure: cyclesDelta <= totalCyclesDelta, at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/toolkit/components/perfmonitoring/nsPerformanceStats.cpp:1222
TEST-UNEXPECTED-FAIL | dom/base/test/test_copypaste.xul | application terminated with exit code 1
PROCESS-CRASH | dom/base/test/test_copypaste.xul | application crashed [@ nsPerformanceStatsService::CommitGroup(unsigned __int64,unsigned __int64,unsigned __int64,unsigned __int64,bool,nsPerformanceGroup *)]
TEST-UNEXPECTED-FAIL | leakcheck | default process: missing output line for total leaks!

So the executable should have run for me, but something went wrong.
/js/src/vm/Stopwatch.cpp has "WINVER > 0x0601"-specific code paths. Is this related?
Flags: needinfo?(jdemooij)
Yeah, debug-only assertions...I bet this codepath was not tested before.
Yoric, could you look over the usage of the WINVER >= 0x601 codepaths in Stopwatch.cpp to see if they're still correct? That code comes from August 2015 in bug 1181175 and had never been active in CI, so it seems highly prone to bitrot.
Flags: needinfo?(dmajor) → needinfo?(dteller)
Gosh, I haven't touched that code in a while. Unfortunately, I'm currently on a deadline so I won't be able to investigate until early February.

Here are a few thoughts.
 
I don't *think* that the WINVER in Stopwatch.cpp is related, because it doesn't affect at all the data that's checked by the assertion. That should be relatively easy to test by replacing a `#if WINVER >= 0x600` with `#if 0` and checking if the crash still happens.

I believe that https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.cpp?q=path%3AStopwatch.cpp&redirect_type=single#411-434 is a more likely culprit, as it affects this data. Would the patch affect the #if in this snippet?

If so, I still don't understand how we could end up with a total that's smaller than one of its subtotals, but at least it would make more sense.
Flags: needinfo?(jdemooij)
Flags: needinfo?(dteller)
Flags: needinfo?(aklotz)
Flags: needinfo?(VYV03354)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #12)
> I believe that
> https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.
> cpp?q=path%3AStopwatch.cpp&redirect_type=single#411-434 is a more likely
> culprit, as it affects this data. Would the patch affect the #if in this
> snippet?

That code would certainly be affected by this, yes.

> If so, I still don't understand how we could end up with a total that's
> smaller than one of its subtotals, but at least it would make more sense.

Possible causes:

* We are hitting the problem described at:

https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.cpp#311-321

...but we're doing it repeatedly on every single mochitest.  That seems kind of unlikely.

* These tests are run on virtual machines, right?  What if the processor configuration presented to Windows is stable, but the underlying processors for each of the virtual processors is not stable?  i.e. when we call getCycles the first time, we're running on CPU 1 from Windows' point of view, but CPU 5 from the hypervisor's point of view.  Then, for the second call to getCycles, we're still running on CPU 1 from Windows' point of view, but now we're running on CPU 6 from the hypervisor's point of view.  That would certainly produce incorrect results.

This also seems a bit odd, given that we'd have to wind up on different virtual processors for *every* mochitest run.  But it's also possible that the underlying Windows kernel is too trusting of the underlying "hardware"; here's a similar problem on Linux:

https://forums.aws.amazon.com/thread.jspa?messageID=221076&tstart=0

Maybe performance numbers are just silly to collect *during tests* regardless?  Can we just turn off the performance monitoring service during tests?
Flags: needinfo?(dteller)
If a fix isn't immediately obvious I think it would be reasonable to just disable the Win7 codepath until Yoric has time to look at it. That code was never running before so it would be no change over the state of things today. It doesn't seem right for dormant code to hold up other changes.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #12)
> Gosh, I haven't touched that code in a while. Unfortunately, I'm currently
> on a deadline so I won't be able to investigate until early February.
> 
> Here are a few thoughts.
>  
> I don't *think* that the WINVER in Stopwatch.cpp is related, because it
> doesn't affect at all the data that's checked by the assertion. That should
> be relatively easy to test by replacing a `#if WINVER >= 0x600` with `#if 0`
> and checking if the crash still happens.
> 
> I believe that
> https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.
> cpp?q=path%3AStopwatch.cpp&redirect_type=single#411-434 is a more likely
> culprit, as it affects this data. Would the patch affect the #if in this
> snippet?
> 
> If so, I still don't understand how we could end up with a total that's
> smaller than one of its subtotals, but at least it would make more sense.

I replaced |#if WINVER >= 0x600| and |#if WINVER >= _WIN32_WINNT_VISTA| with |#if 0|. (_WIN32_WINNT_VISTA == 0x600.) The crash is gone.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=58f67abecf09c48d63019d5c8ee8ee84beb07851
Flags: needinfo?(VYV03354)
(In reply to David Major [:dmajor] from comment #14)
> If a fix isn't immediately obvious I think it would be reasonable to just
> disable the Win7 codepath until Yoric has time to look at it. That code was
> never running before so it would be no change over the state of things
> today. It doesn't seem right for dormant code to hold up other changes.

OK, I'll disable it for now.
Comment on attachment 8822455 [details]
Bug 1325299 - Bump WINVER to 0x0601.

https://reviewboard.mozilla.org/r/101364/#review105042
Attachment #8822455 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8822456 [details]
Bug 1325299 - Bump _WIN32_IE to _WIN32_IE_IE80.

https://reviewboard.mozilla.org/r/101366/#review105046

The commit message should say _WIN32_IE_IE80. _WIN32_IE_WIN7 is not mentioned on https://msdn.microsoft.com/en-us/library/windows/desktop/aa383745(v=vs.85).aspx
Attachment #8822456 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8822457 [details]
Bug 1325299 - Don't explicitly set PSAPI_VERSION.

https://reviewboard.mozilla.org/r/101368/#review105048

The commit message is kind of confusing.
Attachment #8822457 - Flags: review?(mh+mozilla) → review+
> * These tests are run on virtual machines, right?  What if the processor configuration presented to Windows is stable, but the underlying processors for each of the virtual processors is not stable?  i.e. when we call getCycles the first time, we're running on CPU 1 from Windows' point of view, but CPU 5 from the hypervisor's point of view.  Then, for the second call to getCycles, we're still running on CPU 1 from Windows' point of view, but now we're running on CPU 6 from the hypervisor's point of view.  That would certainly produce incorrect results.

I think I got it.

- `totalCyclesDelta` is incremented whenever there is CPU usage in the topmost compartment *and* the execution of the topmost compartment stops on the same core as it started;
- each individual `cyclesDelta` is incremented whenever there is CPU usage in a compartment *and* the execution of the compartment stops on the same core as it started;
- however, with previous versions of Windows, the function to identify a core was not available, so the check was #ifdef-ed away.

It is therefore entirely possible that, at some point during the execution of a mochitest, the thread is rescheduled to another core in a way such that at least one compartment executes entirely on a core but the topmost compartment starts and stops on a different core.

Given that we're running on VMs that presumably run on timeshared servers, reschedulings are bound to be frequent, so it's hardly surprising that this always happens during the execution of mochitests. 

The simplest would probably be to throw away results if `cyclesDelta > totalCyclesDelta` for any of `cyclesDelta`. We should add a loop after https://dxr.mozilla.org/mozilla-central/source/toolkit/components/perfmonitoring/nsPerformanceStats.cpp#1179 to simply check if this happens and, if so, reset stuff without actually committing data.

> Maybe performance numbers are just silly to collect *during tests* regardless?  Can we just turn off the performance monitoring service during tests?

Well, the whole point of running them during tests is to make sure that we execute the same codepath as in released versions. So I'd vote for keeping them.
Flags: needinfo?(dteller)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #24)
> The simplest would probably be to throw away results if `cyclesDelta >
> totalCyclesDelta` for any of `cyclesDelta`. We should add a loop after
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/
> perfmonitoring/nsPerformanceStats.cpp#1179 to simply check if this happens
> and, if so, reset stuff without actually committing data.

I believe this patch implements your suggestion.
Comment on attachment 8826318 [details]
Bug 1325299 - Disable untested code path in Stopwatch.(h|cpp).

https://reviewboard.mozilla.org/r/104262/#review105222

::: toolkit/components/perfmonitoring/nsPerformanceStats.cpp
(Diff revision 2)
>    // `recentTicks` from 0 to 1.  If we have `ticksDelta == 0` at
>    // this stage, we have already called `resetRecentData` but we
>    // haven't removed it from the list.
>    MOZ_ASSERT(ticksDelta != 0);
> -  MOZ_ASSERT(cyclesDelta <= totalCyclesDelta);
> +  if (cyclesDelta > totalCyclesDelta || cyclesDelta == 0 || totalCyclesDelta == 0) {
> -  if (cyclesDelta == 0 || totalCyclesDelta == 0) {

Mmmh... What happens if you simply move the assertion to after the `if() {}`?
Comment on attachment 8826318 [details]
Bug 1325299 - Disable untested code path in Stopwatch.(h|cpp).

https://reviewboard.mozilla.org/r/104262/#review105404

::: toolkit/components/perfmonitoring/nsPerformanceStats.cpp
(Diff revision 2)
>    // `recentTicks` from 0 to 1.  If we have `ticksDelta == 0` at
>    // this stage, we have already called `resetRecentData` but we
>    // haven't removed it from the list.
>    MOZ_ASSERT(ticksDelta != 0);
> -  MOZ_ASSERT(cyclesDelta <= totalCyclesDelta);
> +  if (cyclesDelta > totalCyclesDelta || cyclesDelta == 0 || totalCyclesDelta == 0) {
> -  if (cyclesDelta == 0 || totalCyclesDelta == 0) {

Let's see what's going on:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c2efad09750a338a9352fdeb8b2a2fe6944518f
Comment on attachment 8826318 [details]
Bug 1325299 - Disable untested code path in Stopwatch.(h|cpp).

https://reviewboard.mozilla.org/r/104262/#review105222

> Mmmh... What happens if you simply move the assertion to after the `if() {}`?

It had no effect:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c2efad09750a338a9352fdeb8b2a2fe6944518f
Blocks: 1331205
Bumping WINVER depends on us first dropping support for XP and Vista (bug 1130266), not the other way around.
Blocks: 1331171
No longer blocks: xp-eol
Depends on: xp-eol
Summary: Bump WINVER to 0x0601 → Bump WINVER to 0x0601 (Windows 7)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #24)
> The simplest would probably be to throw away results if `cyclesDelta >
> totalCyclesDelta` for any of `cyclesDelta`. We should add a loop after
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/
> perfmonitoring/nsPerformanceStats.cpp#1179 to simply check if this happens
> and, if so, reset stuff without actually committing data.

Maybe.  But Kimura-san has come up with two patches trying to do "something simple" and they have not worked.  I maintain the simplest thing here would be to disable the Vista/Win7-or-above codepaths and continue using the ones we have now.  Holding up this bug for buggy, not-previously tested code means that using things defined only at WINVER >= 0x0601 is difficult, less performant, and/or clunky, and that's not a desirable situation.

Do you agree?
Flags: needinfo?(dteller)
(In reply to Nathan Froyd [:froydnj] from comment #34)
> Maybe.  But Kimura-san has come up with two patches trying to do "something
> simple" and they have not worked.

To be clear, attachment 8826318 [details] worked.

Yoric, this bug now blocks two bugs. Could you continue the review?
I'll try and do this today.
Comment on attachment 8826318 [details]
Bug 1325299 - Disable untested code path in Stopwatch.(h|cpp).

https://reviewboard.mozilla.org/r/104262/#review106978

r=me, but could you provide a multiline commit message explaining why you're making that change?
Attachment #8826318 - Flags: review?(dteller) → review+
Flags: needinfo?(dteller)
Comment on attachment 8826318 [details]
Bug 1325299 - Disable untested code path in Stopwatch.(h|cpp).

https://reviewboard.mozilla.org/r/104262/#review106978

I copy & pasted your comment #24.
Pushed by VYV03354@nifty.ne.jp:
https://hg.mozilla.org/integration/autoland/rev/dcf71e5e5fc3
Bump WINVER to 0x0601. r=glandium
https://hg.mozilla.org/integration/autoland/rev/c168221313d6
Bump _WIN32_IE to _WIN32_IE_IE80. r=glandium
https://hg.mozilla.org/integration/autoland/rev/933d06e4b567
Don't explicitly set PSAPI_VERSION. r=glandium
https://hg.mozilla.org/integration/autoland/rev/cac5baad14a1
If cyclesDelta > totalCyclesDelta, reset data without comitting instead of failing assertions. r=Yoric
Thank you both!
What about NTDDI_VERSION? Should we set that as well?
Flags: needinfo?(VYV03354)
(In reply to Aaron Klotz [:aklotz] from comment #45)
> What about NTDDI_VERSION? Should we set that as well?

No, we don't have to. NTDDI_VERSION will be deduced from WINVER.
Flags: needinfo?(VYV03354)
(In reply to Wes Kocher (:KWierso) from comment #46)
> I had to back these out for failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=70809785&repo=autoland
> 
> https://hg.mozilla.org/integration/autoland/rev/41d8ef56d03b

Again, Win7 VM only :(
I'll disable the new code path for now. The long untested code path should not block unrelated bugs and should be fixed separately.
(In reply to Masatoshi Kimura [:emk] from comment #48)
> I'll disable the new code path for now. The long untested code path should
> not block unrelated bugs and should be fixed separately.

I've filed bug 1332819.
Attachment #8826318 - Flags: review?(jdemooij) → review?(dteller)
Comment on attachment 8826318 [details]
Bug 1325299 - Disable untested code path in Stopwatch.(h|cpp).

https://reviewboard.mozilla.org/r/104262/#review108570

r=me with a commit message and a comment in the code explaining what's going on.
Attachment #8826318 - Flags: review?(dteller) → review+
Pushed by VYV03354@nifty.ne.jp:
https://hg.mozilla.org/integration/autoland/rev/6d204baac7a0
Bump WINVER to 0x0601. r=glandium
https://hg.mozilla.org/integration/autoland/rev/1745d44fbe8b
Bump _WIN32_IE to _WIN32_IE_IE80. r=glandium
https://hg.mozilla.org/integration/autoland/rev/5dca98446f3f
Don't explicitly set PSAPI_VERSION. r=glandium
https://hg.mozilla.org/integration/autoland/rev/c99f1f911660
Disable untested code path in Stopwatch.(h|cpp). r=Yoric
Blocks: 1334665
Duplicate of this bug: 1119116
Blocks: xp-eol
No longer depends on: xp-eol
Blocks: 1336879
Blocks: 1337266
Blocks: 1337299
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.