Closed Bug 751842 Opened 12 years ago Closed 12 years ago

Import editing spec tests

Categories

(Core :: DOM: Editor, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: ayg, Assigned: ayg)

References

Details

Attachments

(6 files)

I have lots and lots and lots of editing spec tests.  The per-command split-up version can be found here:

http://dvcs.w3.org/hg/editing/raw-file/tip/conformancetest/splitruntest.html

The full version would be very helpful to have checked in, like the browserscope tests.  The problem is, it takes approximately forever to run in a debug build -- it's a few minutes to run the whole thing in an opt build.  Also, the test data in current form is about 5M (browserscope is 2M).  Still, I'll experiment and see if I can get it working.  It will mean we don't have to write separate tests for lots of bugs like bug 748303.
Flags: in-testsuite-
The tests are not so fast to run on my non-optimized debug build:

real    17m31.234s
user    12m7.921s
sys     1m13.909s

The tests themselves are about 5M, almost all data.js (the list of tests and expected results).  But the list of expected failures is 22M.  I don't think that's very practical.  :(  I could try cutting down on the number of tests, and/or just fixing the failures!  But for now I don't think this is so workable.
Assignee: ayg → nobody
Status: ASSIGNED → NEW
Firstly note that we build our debug builds with --enable-optimize.  Can you please test with such a build locally?  It should be much faster.

But that being said, even if the tests would get 17 minutes to run, I'd definitely want them in the tree.  We're already spending a ton of time running tests, and we can surely afford this extra time (just think of the extra coverage it gives us!).  :-)

Also, don't worry at all about the on-disk size of the tests.  We've never been optimizing for a small code repository.  :-)
hg gives a warning when I try to hg add the 22M file, saying it might use up a lot of RAM.  I think I want to break it up into multiple smaller files, at least.
That sounds good to me.
Okay, patch is almost ready.  I just have to debug a bit.
Assignee: nobody → ayg
Status: NEW → ASSIGNED
Depends on: 647323
Flags: in-testsuite- → in-testsuite+
This is a silly little fix -- it isn't really significant, but the lack of it wasted like two minutes of my time, so why not.
Attachment #621514 - Flags: review?(jhammel)
Again, this is just something that I wasted a bit of time on while doing this, so I thought I'd fix it.  The harness was saying "TEST-UNEXPECTED-FAIL | Test finished" with no info as to what went wrong.  This makes it slightly more informative by outputting "status 1" or "status 2" or whatever.
Attachment #621516 - Flags: review?(jhammel)
Some of my tests are very long-running, so testharness.js reported a timeout.  It actually seemed to work fine, but I got a fail on "Test finished" with code 2 (TIMEOUT).  I think this is a testharness.js bug of some kind -- but in any event, our test runner has its own timeouts built in, and we don't need testharness.js setting additional timeouts.
Attachment #621517 - Flags: review?(jhammel)
The editing tests have files in the repo root.  This causes importTestsuite.py to get confused and output "//" in various places.  The extra slash should be dropped if the path is empty.

This code is perhaps not the most elegant, but it seems to work . . .
Attachment #621518 - Flags: review?(jhammel)
Attachment #621518 - Flags: feedback?(Ms2ger)
Attachment #621514 - Flags: feedback?(Ms2ger)
Attachment #621516 - Flags: feedback?(Ms2ger)
Attachment #621517 - Flags: feedback?(Ms2ger)
This patch has lots of data stuff removed -- both data.js (the stuff that generates the tests) and the expected fails.  This is so it's under Bugzilla's 4M patch limit.  If you want the full patch, you can get it from try: https://tbpl.mozilla.org/?tree=Try&rev=00e9cc20b8db

Asking for feedback from Ms2ger on the imptests/ integration.  Although if you have comments on the tests, that's fine too.

Note: my tests currently refer to http://w3c-test.org/resources/testharness.js instead of using a relative URL.  find editing -type f -exec sed 's!http://w3c-test.org!!' {} + fixes it.  Currently this is an extra manual step after running importTestsuite.py.  If someone fixes dvcs.w3.org so that links to /resources work there, I'll update my test suite and this will no longer be necessary.
Attachment #621520 - Flags: review?
Attachment #621520 - Flags: feedback?
Hmm, I'm getting some mochitest failures on debug builds, manifesting as timeouts.  Like this on one box:

JavaScript error: http://mochi.test:8888/tests/dom/imptests/editing/selecttest/test_collapse.html, line 57: testRanges is undefined

Others have test_event.html failing.  Not sure why, from the description.  The longest-running file (test_runtest.html) seems to be only 160 seconds or so on debug tinderbox builds.  I'll investigate if no one else can spot the problem first.
Comment on attachment 621518 [details] [diff] [review]
Patch part 4, v1 -- Update importTestsuite.py to support tests in repo root

Review of attachment 621518 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good enough; you probably noticed this isn't the most elegant code :)

::: dom/imptests/importTestsuite.py
@@ +72,3 @@
>        importDirs(thissrcdir, dest, ["%s/%s" % (d, subdir) for subdir in subdirs])
> +    elif len(subdirs):
> +      importDirs(thissrcdir, dest, subdirs)

I think that

if len(subdirs):
  if d:
  else:

would be clearer, or maybe

if len(subdirs):
  importDirs(thissrcdir, dest, ["%s/%s" % (d, subdir) for subdir in subdirs] if d else subdirs)

I think I prefer the former.

(Unrelatedly, since [] is falsy in python, it could be 'if subdirs:'; not sure if that's better or worse.)

@@ +124,5 @@
>    for d in directories:
> +    if d:
> +      path = "%s/%s" % (dest, d)
> +    else:
> +      # Empty directory, i.e., the repository root

Probably worth adding this comment in the other function too.
Attachment #621518 - Flags: feedback?(Ms2ger) → feedback+
Comment on attachment 621514 [details] [diff] [review]
Patch part 1, v1 -- Make testharnessreport.js depend on failures.txt

Review of attachment 621514 [details] [diff] [review]:
-----------------------------------------------------------------

Yes. Thanks.
Attachment #621514 - Flags: feedback?(Ms2ger) → feedback+
Comment on attachment 621516 [details] [diff] [review]
Patch part 2, v1 -- Report status code on unsuccessful test finish

Review of attachment 621516 [details] [diff] [review]:
-----------------------------------------------------------------

+1
Attachment #621516 - Flags: feedback?(Ms2ger) → feedback+
Comment on attachment 621517 [details] [diff] [review]
Patch part 3, v1 -- Ignore timeouts in testharness.js tests

Review of attachment 621517 [details] [diff] [review]:
-----------------------------------------------------------------

Works for now, but can you poke jgraham to add a parameter to disable timeouts entirely?
Attachment #621517 - Flags: feedback?(Ms2ger) → feedback+
Comment on attachment 621520 [details] [diff] [review]
Patch part 6, v1 -- Import editing spec tests

Review of attachment 621520 [details] [diff] [review]:
-----------------------------------------------------------------

I really don't want to review the tests themselves; the glue seems to have worked.

::: dom/imptests/editing/css/reset.css
@@ +23,5 @@
> + * http://www.w3.org/Bugs/Public/show_bug.cgi?id=13330 */
> +:link, :visited { color: blue }
> +/* http://www.w3.org/Bugs/Public/show_bug.cgi?id=14066
> + * https://bugs.webkit.org/show_bug.cgi?id=68392 */
> +quasit { text-align: inherit }

(How silly.)

::: dom/imptests/editing/selecttest/test_getSelection.html
@@ +6,5 @@
> +<script>
> +"use strict";
> +
> +// TODO: Figure out more places where defaultView is or is not guaranteed to be
> +// null, and test whether getSelection() is null.

(DOMParser, maybe)

::: dom/imptests/editing/tests.js
@@ +17,5 @@
> +			document.body.insertBefore(warningDiv, document.body.firstChild);
> +			warningDiv.style.fontWeight = "bold";
> +			warningDiv.style.fontSize = "2em";
> +			warningDiv.style.color = "red";
> +			warningDiv.innerHTML = 'Your browser suffers from an <a href="http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1028">egregious bug</a> in range mutation that will give incorrect results for the spec columns in many cases.  To ensure that the spec column contains the output actually required by the spec, use a different browser.';

Heh.
Attachment #621520 - Flags: feedback? → feedback+
(In reply to Aryeh Gregor from comment #11)
> Hmm, I'm getting some mochitest failures on debug builds, manifesting as
> timeouts.  Like this on one box:
> 
> JavaScript error:
> http://mochi.test:8888/tests/dom/imptests/editing/selecttest/test_collapse.
> html, line 57: testRanges is undefined
> 
> Others have test_event.html failing.  Not sure why, from the description. 
> The longest-running file (test_runtest.html) seems to be only 160 seconds or
> so on debug tinderbox builds.  I'll investigate if no one else can spot the
> problem first.

I seem to remember hitting something similar on your range tests; I think it was because you sometimes call setup(), but that did nothing because you called it too late.
Attachment #621514 - Flags: review?(jhammel) → review+
Attachment #621516 - Flags: review?(jhammel) → review+
Attachment #621517 - Flags: review?(jhammel) → review+
Comment on attachment 621518 [details] [diff] [review]
Patch part 4, v1 -- Update importTestsuite.py to support tests in repo root



+      sourcedir = "hg-%s" % (dest, )

Should just be "hg-%s" % dest
Attachment #621518 - Flags: review?(jhammel) → review+
Comment on attachment 621520 [details] [diff] [review]
Patch part 6, v1 -- Import editing spec tests

<jhammel> Ms2ger: if you want to steal my remaining review on bug 751842 that is a-okay by me
Attachment #621520 - Flags: review? → review+
(In reply to Ms2ger from comment #12)
> I think that
> 
> if len(subdirs):
>   if d:
>   else:
> 
> would be clearer . . .
> . . .
> Probably worth adding this comment in the other function too.

Okay, fixed.

(In reply to Ms2ger from comment #15)
> Works for now, but can you poke jgraham to add a parameter to disable
> timeouts entirely?

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16992

(In reply to Ms2ger from comment #17)
> I seem to remember hitting something similar on your range tests; I think it
> was because you sometimes call setup(), but that did nothing because you
> called it too late.

Hmm, thanks for the tip.  I'll look into it.

(In reply to Jeff Hammel [:jhammel] from comment #18)
> +      sourcedir = "hg-%s" % (dest, )
> 
> Should just be "hg-%s" % dest

Okay, fixed.


I'll push as soon as I can figure out what's causing the tinderbox failures.
I'm not sure what the story is with the timeouts.  It looks like Ms2ger's original patch didn't correctly handle the timeouts, calling a nonexistent function, so once there's one timeout, it's followed by three more in the same test and the test run ends.  But I don't get what's causing the original timeout in test_event.html.  Anyone have any suggestions for what to look at?
Okay, so these patches drastically slow down all imported tests, because currently the generated testharnessreport.js contains all known fails for all tests, and it's reloaded for every test.  This causes everything to time out because of the giant known-fail list for test_runtest.html.  Ms2ger and I discussed it on IRC, and I think as a temporary workaround, we should ignore test_runtest.html for debug tinderbox builds -- both in building testharnessreport.js and actually running it.  That should solve the immediate problem.
I'm not familiar with this test framework, but I'm missing why we can't avoid loading expected failures which are not related to the editing tests elsewhere...
We can -- it might even be less work.  I'll see what I can do.
I don't like this very much, but it works.  I adjusted the final patch to fit on top of this, but it didn't see much in the way of substantive changes, so I'll carry forward the r+.  A try run suggests that this series now more or less works (orange builds are infra issues): https://tbpl.mozilla.org/?tree=Try&rev=6a617ff04355

I want to do at least one more try push before actually trying to land this, because all the infra orange on my last push makes me irrationally nervous.  :)  So here: https://tbpl.mozilla.org/?tree=Try&rev=6c571455eef4
Attachment #623966 - Flags: review?(jhammel)
Attachment #623966 - Flags: feedback?(Ms2ger)
Attachment #621520 - Attachment description: Patch part 5, v1 -- Import editing spec tests → Patch part 6, v1 -- Import editing spec tests
Comment on attachment 623966 [details] [diff] [review]
Patch part 5, v1 -- Put testharness.js expected fails in one file per test

Review of attachment 623966 [details] [diff] [review]:
-----------------------------------------------------------------

Please update dom/imptests/README and test this patch on the comm-central tryserver (<https://wiki.mozilla.org/Thunderbird/Infrastructure/TryServer>) as well.
I'm getting a consistent failure on Win debug, but the log is too large and it gets truncated, hiding the error.  It exits with code 0, and everything up through most of test_collapse.html is fine.  How should I debug that?  E.g.: <https://tbpl.mozilla.org/php/getParsedLog.php?id=11759840&tree=Try>.  Philor noted that if bug 439258 were fixed, it would probably be short enough, but that's a lot easier said than done.
So the test failure is just due to length, and Philor was rather emphatic about not increasing the 50 MB log limit.  Ehsan, Ms2ger, Jeff, any suggestions?  roc suggested maybe I should just change the assertion for bug 439258 so that it turns into an NS_WARNING after firing 100 times or something.  Of course, this will mean it will crop up again pretty soon when we import more tests -- I think increasing the tbpl log limit to 100 MB would make more sense.  (But I'm biased, since it means less work for me!)
Given that nothing is happening in bug 439258, I think we should just turn it in a warning until what it asserts is actually true.

I'm not convinced that increasing the maximal log size is a good idea; maybe it would help if we split mochitests in 6 instead of 5 chunks... jhammel, would that be hard?
Comment on attachment 623966 [details] [diff] [review]
Patch part 5, v1 -- Put testharness.js expected fails in one file per test

I haven't ran this but it looks good to me
Attachment #623966 - Flags: review?(jhammel) → review+
More chunks is simple, it's a matter of changing two numbers. But it won't help you. An existing M2 Win debug log is 4.4MB, with these tests added it's over 50MB. Maybe your output is 46.7MB, maybe it's 96.5MB.

Best case, it's 46.7MB. The flag to run chunked mochitests is --chunk-by-dir, it just counts up the number of directories in _tests/testing/mochitest/ and divides by the number you pass. So whether it's 5 or 6, you have to add enough fake test directories to have a chunk pretty much to yourself. Doing so would be a scumbag move, because then somewhere down the line someone will add a directory far away from dom, one which runs in M5, that will tip the total number of directories over to where something else runs in your chunk, and someone adding tests to M5 will cause M2 to fail.

I don't understand why the problem would crop up again after changing the primary assertion to a warning after 100 times: do you expect adding more tests to hit other assertions hundreds of thousands of times, or to hit the assertion-turned-warning so many times that just the single line of warning will overflow? We're not currently right up against the limit - in general, mochitest logs are well under 50MB, except for M4 which has several CSS tests with logorrhea, and editor/, which you would also be helping by turning 1742 "anonymous nodes should not be in child lists" assertions with deep and useless stacks into 100 useless assertions and 1642 useless warnings without the deep and useless stacks.

Increasing the maximum log size is also simple, one number in http://mxr.mozilla.org/build/source/buildbot-configs/mozilla/master_common.py#7. It's also a terrible thing to do: the debug mochitest-4 logs are a handy proxy for what your logs would be like, and they are miserable - slow and prone to hanging while loading, horrible to scroll, impossible to find anything in, the first thing to break tbpl when ftp.m.o is slow and under load.
Aryeh, I've meant to take out that assertion for a while now, but never got around to do it.  We have a pretty good idea about that broken-ness, and that assertion doesn't tell us anything new.  It makes the test runs very slow as well, as we walk the stack a ton of times as we're hitting that assertion over and over again.

Can you please file a new bug to convert that into a warning?
Depends on: 756045
(In reply to Ms2ger from comment #29)
> Given that nothing is happening in bug 439258, I think we should just turn
> it in a warning until what it asserts is actually true.

(In reply to Ehsan Akhgari [:ehsan] from comment #32)
> Aryeh, I've meant to take out that assertion for a while now, but never got
> around to do it.  We have a pretty good idea about that broken-ness, and
> that assertion doesn't tell us anything new.  It makes the test runs very
> slow as well, as we walk the stack a ton of times as we're hitting that
> assertion over and over again.
> 
> Can you please file a new bug to convert that into a warning?

Bug 756045.  New try push: <https://tbpl.mozilla.org/?tree=Try&rev=e532313de03b>.  comm-central try push: <https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=a82731c0df33> (I might have done this wrong and wouldn't be surprised if it totally fails for some reason).
Attachment #623966 - Flags: feedback?(Ms2ger)
Depends on: 886301
Depends on: 908879
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: