Closed Bug 1076446 Opened 5 years ago Closed 5 years ago

Test DMD on TBPL (Windows)

Categories

(Core :: DMD, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: njn, Assigned: njn)

References

Details

Attachments

(1 file, 3 obsolete files)

Bug 1073312 will add a DMD test, but not run it on Windows. The immediate problem is that the test uses /usr/bin/diff which doesn't work on Windows. And there might be other problems after that.
Depends on: 1077230
On try it fails like this:

> Assertion failure: gSmallBlockActualSizeCounter == 120, at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/memory/replace/dmd/DMD.cpp:2080

This is exactly the same failure that Catalin saw in bug 819839 comment 18.
Back then he said that some of the malloc calls were ending up in the system
malloc instead of the replace_malloc one; I wonder if that's still the problem.
It's a plausible explanation, and I can't think of anything else that would
cause this assertion failure.
I've confirmed that replacing the malloc() calls in RunTestMode() in DMD.cpp with replace_malloc() calls fixes the assertion and lets DMD's test mode run to completion. So it's definitely a problem with the malloc replacement.

glandium, any idea what might be wrong? Is there some reason why malloc() calls within DMD.cpp itself wouldn't get replaced on Windows?
Flags: needinfo?(mh+mozilla)
> Is there some reason why malloc() calls within DMD.cpp itself wouldn't get replaced on Windows?

Because you're not supposed to call malloc from a replace malloc
library. You're just lucky that it happens to work on linux and mac
because of how things are, but that wouldn't work on android or
windows.

If you want to call plain malloc, you need to extract that test code
from the dmd lib and put it elsewhere. If you only care about testing
the dmd code itself and don't care about testing the replace-malloc
infrastructure itself, you can just call replace_malloc functions.
Flags: needinfo?(mh+mozilla)
> Because you're not supposed to call malloc from a replace malloc
> library. You're just lucky that it happens to work on linux and mac
> because of how things are, but that wouldn't work on android or
> windows.

Fair enough.

> If you want to call plain malloc, you need to extract that test code
> from the dmd lib and put it elsewhere. If you only care about testing
> the dmd code itself and don't care about testing the replace-malloc
> infrastructure itself, you can just call replace_malloc functions.

Does replace-malloc have any other testing? If so, then the latter option would be ok.
(In reply to Nicholas Nethercote [:njn] from comment #4)
> Does replace-malloc have any other testing? If so, then the latter option
> would be ok.

Currently no, but it's probably better to add testing for that than to jump into hoops to have that done somehow by the DMD test.
I've gotten further, but now I'm hitting the problem that DMD is getting empty stacks on Windows on try:

> Live {
>   30 blocks in heap block record 1 of 1
>   12,088 bytes (7,870 requested / 4,218 slop)
>   100.00% of the heap (100.00% cumulative)
>   Allocated at {
>   }
> }

There are supposed to be stack frame descriptions in the "Allocated at" block. Hmm.
> I've gotten further, but now I'm hitting the problem that DMD is getting
> empty stacks on Windows on try:

I've worked out why. We're hitting this timeout in EnsureWalkThreadReady():
http://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsStackWalk.cpp#308

The problem is that DMD's test mode runs extremely early -- as soon as the first call to malloc occurs. This is probably while static initializers are running, before we reach main(). This is too early for the Windows stack-walking thread to have started, so we get empty stack traces.

Catalin hit exactly this problem in bug 819839 and changed the code so that it would return empty stack traces instead of deadlocking in this case. Which is ok for general use, but no good for the test as it's currently implemented.

Probably the easiest thing is to find a way to make DMD's test mode run later. I can do that by creating a JS chrome function that triggers it, and calling that from the xpcshell test. That's arguably nicer than the magic start-up thing in place anyway.
Depends on: 1081011
Here's a draft patch.

- It moves the DMD test code out of DMD.cpp, into test/SmokeDMD.cpp.

- It executes |SmokeDMD| from the xpcshell test.

- It removes DMD's --mode option, because it's no longer needed.

- It enables the test on Windows.

On Linux and Mac it works nicely. But there are problems on Windows on try
(https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=38ef0222592f).

- On WinXP and Win7, SmokeDMD appears to hang silently, which causes
  runxpcshelltests.py to time out.

- On Win8, SmokeDMD aborts or crashes silently, causing the process.run() call
  to fail with NS_ERROR_FILE_EXECUTION_FAILED.

I'm not sure what the problem is.
> - On WinXP and Win7, SmokeDMD appears to hang silently, which causes
>   runxpcshelltests.py to time out.
> 
> - On Win8, SmokeDMD aborts or crashes silently, causing the process.run()
> call
>   to fail with NS_ERROR_FILE_EXECUTION_FAILED.

On Win8 the failures vary. In https://mail.google.com/mail/u/0/#search/tryserver/1490c783401bd749 the failure was that the IsRunning() assertion at the start of SmokeDMD.cpp failed. This indicates that dmd::Init() hasn't been called, which indicates that something is going wrong with replace-malloc, because there's a malloc call just before that assertion which should trigger dmd::Init() being called.

On another Win8 try run I saw aborts coming from jemalloc.

And on the try run mentioned in comment 8 there was an unspecified abort or crash from SmokeDMD.
> On another Win8 try run I saw aborts coming from jemalloc.

Nb: in these ones dmd::Init() was definitely called, and the crash occurred some time after that. It's all very unpredictable.
This is green on Windows, both locally and on try.

Most of the patch is taken up by moving the test code from DMD.cpp to
SmokeDMD.cpp. Only very minor changes were made to that code as part of the
move.
Attachment #8506008 - Flags: review?(mh+mozilla)
Attachment #8503913 - Attachment is obsolete: true
Comment on attachment 8506008 [details] [diff] [review]
Make the DMD test work on Windows

Review of attachment 8506008 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/replace/dmd/DMD.cpp
@@ -2058,5 @@
> -  for (int i = 0; i < seven + 8; i++) {
> -    s = (char*) replace_malloc(8);
> -    UseItOrLoseIt(s, seven);
> -  }
> -  MOZ_ASSERT(gSmallBlockActualSizeCounter == 120);

Do you lose something by not having those asserts in some form in the moved code?

::: memory/replace/dmd/dmd.py
@@ +502,5 @@
>  
>  
>  if __name__ == '__main__':
>      main()
> +

No need to add an empty line :)

::: memory/replace/dmd/test/SmokeDMD.cpp
@@ +96,5 @@
> +  // certainly doesn't know this.
> +  //
> +  int n = (rand() % 10 + 1) * 9;
> +  int nine = (n / 10) + (n % 10);
> +  int seven = nine - 2;

You don't need this if you build this with -O0.

@@ +248,5 @@
> +  free(c);
> +  free(e);
> +  Report(e2);
> +  free(e3);
> +

Maybe copy the commented free() calls corresponding to the commented cases above?

::: memory/replace/dmd/test/moz.build
@@ +17,5 @@
> +DEFINES['MOZ_NO_MOZALLOC'] = True
> +
> +DISABLE_STL_WRAPPING = True
> +
> +USE_LIBS += ['dmd'] 

trailing whitespace.

::: memory/replace/dmd/test/test_dmd.js
@@ +53,5 @@
> +      data += str.value;
> +    } while (read != 0);
> +  }
> +  cstream.close();                // this closes fstream
> +  return data.replace(/\r/g, ""); // normalize line endings

It probably would be better to use OS.File, although I don't know how you can do pseudo-synchronous things with it. https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread#OS.File.read

::: testing/mochitest/Makefile.in
@@ +64,5 @@
>  
>  ifdef MOZ_DMD
> +TEST_HARNESS_BINS += \
> +  dmd.py \
> +  SmokeDMD$(BIN_SUFFIX)

SmokeDMD$(BIN_SUFFIX) \
$(NULL)

::: toolkit/mozapps/installer/packager.mk
@@ +601,5 @@
>  endif
>  
> +ifdef MOZ_DMD
> +NO_PKG_FILES += \
> +        SmokeDMD

Put it all on one line, or do
        SmokeDMD \
        $(NULL)
Attachment #8506008 - Flags: review?(mh+mozilla) → feedback+
(In reply to Mike Hommey [:glandium] from comment #12)
> ::: memory/replace/dmd/test/test_dmd.js
> @@ +53,5 @@
> > +      data += str.value;
> > +    } while (read != 0);
> > +  }
> > +  cstream.close();                // this closes fstream
> > +  return data.replace(/\r/g, ""); // normalize line endings
> 
> It probably would be better to use OS.File, although I don't know how you
> can do pseudo-synchronous things with it.
> https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.
> File_for_the_main_thread#OS.File.read

Yoric, would you have some advice here?
Flags: needinfo?(dteller)
> Do you lose something by not having those asserts in some form in the moved
> code?

Not really. If anything goes wrong the output will be different to what is expected.
> It probably would be better to use OS.File, although I don't know how you can do pseudo-synchronous things with it.

I deliberately avoided it precisely because synchronous file-handling is appropriate here, and OS.File is entirely designed around asynchronous handling. No point trying to fit a square peg in a round hole when there is a square hole available that works just as well.
I agree with Nicholas. If what you need is synchronous, don't use OS.File.
Flags: needinfo?(dteller)
This updated patch addresses all comments, except I haven't used OS.File, and I
didn't do anything about the following suggestion because I didn't understand
it:

> Maybe copy the commented free() calls corresponding to the commented cases above?
Attachment #8509739 - Flags: review?(mh+mozilla)
Attachment #8506008 - Attachment is obsolete: true
Updated version that adds the commented-out free() calls back in, and tweaks
the commented-out aligned_alloc() call to be more like the others.
Attachment #8509926 - Flags: review?(mh+mozilla)
Attachment #8509739 - Attachment is obsolete: true
Attachment #8509739 - Flags: review?(mh+mozilla)
Comment on attachment 8509926 [details] [diff] [review]
Make the DMD test work on Windows

Review of attachment 8509926 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/replace/dmd/test/SmokeDMD.cpp
@@ +88,5 @@
> +  // eliding unused malloc() calls or unrolling loops with fixed iteration
> +  // counts. So we compile it with -O0 (or equivalent), which probably prevents
> +  // that. We also use the following variable for various loop iteration
> +  // counts, just in case compilers might unroll very small loops even with
> +  // -O0.

Compilers don't unroll anything at -O0.

I'm still not convinced by the "seven" thing, but meh.
Attachment #8509926 - Flags: review?(mh+mozilla) → review+
And we recently change the Windows 8 builds to be 64-bit, and that breaks the test yet again:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=68627a877b6e

There are lots of "ERROR: WalkStack64: The handle is invalid." messages.

Sigh.
I ended up disabling the test on Windows 8 just so I could land this and the four other patches I had in my queue following it.

https://hg.mozilla.org/integration/mozilla-inbound/rev/835fbe63da4a

I filed bug 1088343 as a follow-up.
(In reply to Carsten Book [:Tomcat] from comment #22)
> sorry had to back this out in
> https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-
> inbound&revision=de805196bbc4 since one of this changes caused perma failure
> on 10.8 Debug Tests like
> https://treeherder.mozilla.org/ui/logviewer.
> html#?job_id=3275566&repo=mozilla-inbound

Hey glandium. Looks like some compilers do unroll some loops with -O0. Argh.

Even more annoying: I would have caught this before landing if my Mac xpcshell try run hadn't taken over 26 hours to run, which was long enough that I gave up on it.
(In reply to Nicholas Nethercote [:njn] from comment #23)
> (In reply to Carsten Book [:Tomcat] from comment #22)
> > sorry had to back this out in
> > https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-
> > inbound&revision=de805196bbc4 since one of this changes caused perma failure
> > on 10.8 Debug Tests like
> > https://treeherder.mozilla.org/ui/logviewer.
> > html#?job_id=3275566&repo=mozilla-inbound
> 
> Hey glandium. Looks like some compilers do unroll some loops with -O0. Argh.

That's hard to believe. I'd like to see the disassembly.
My mistake; it appears that when I have two calls to Report() on consecutive lines, clang is causing them to have identical stack traces. If I artfully insert another function call between the Report() calls, then they end up with different stack traces as expected.
https://hg.mozilla.org/mozilla-central/rev/b339acb1f7fe
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.