818576 - Tests for "Android 2.2 debug" (on cedar)

Reporter

Description

•

13 years ago

We have "Android 2.2 debug" builds on tbpl, but no tests are run against them. Tests run against debug builds might provide better diagnostics for some of our on-going intermittent failures. It looks like there is some history to this issue; see bug 693015.

Armen [:armenzg]

Updated

•

13 years ago

Component: Release Engineering → Release Engineering: Automation (General)

QA Contact: catlee

Geoff Brown [:gbrown]

Reporter

Comment 1

•

13 years ago

Someone smart recently suggested that we might move this forward by allowing debug tests to be requested on try, warts and all, leaving the bigger task of getting all-green debug test runs on m-i, etc. for another day.

Kelsey Gilbert [:jgilbert]

Comment 2

•

13 years ago

(In reply to Geoff Brown [:gbrown] from comment #1) > Someone smart recently suggested that we might move this forward by allowing > debug tests to be requested on try, warts and all, leaving the bigger task > of getting all-green debug test runs on m-i, etc. for another day. This would be fantastic. A 'matching' Try run (from trychooser) could skip these, but it'd be great to be able to run them if needed.

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

13 years ago

that is fine, but 100% of the tests will timeout which means we will not get full coverage. How will getting debug tests help us? The mochitest.log file looks the same, and logcat only gets a small portion of the data since it is a lifo buffer.

Kelsey Gilbert [:jgilbert]

Comment 4

•

13 years ago

Debug tests are really important because throughout the code, there are assertions and #ifdef DEBUG snippets which work to guarantee that the code is functioning and being used properly. Especially since fennec and b2g sometimes use unique codepaths (which aren't used by any desktop platform), this mobile-only code is getting no DEBUG testing, and therefore no assertions and no extra debug-only checks. This means that when things go wrong, they go off the rails instead of hitting an assertion at a useful stage. Tests coming back green means things appear to be working, but we could actually be skipping important assertions. Tests can also come back green intermittently, where they might have gone orange from hitting asserts. Running a DEBUG build locally, I've had this happen a couple of times. We need this coverage.

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

13 years ago

that is good to know. To make tests report green, we will have to modify the harnesses and the buildbot runners. They run way too long and we have timers that kill a test after 1 hour. In fact as we run things the tests will run for 2+ hours. I believe on android the buildbot stuff kills it at 2 hours, for mochitest the harness kills it after an hour. changing these variables just because could really eat up a lot of machine time if somebody checked in a bad change (imagine 400 jobs sitting around for an extra hour because we made the timeout longer). There are a couple solutions I can think of for this: 1) adjust the timeouts based on a --debugBuild flag to the harness 2) adjust buildbot to run smaller chunks (15 chunks for mochitest instead of 8, etc...) 3) selectively run tests we really really care about in debug mode and ignore the rest. 4) all of the above we will need releng to work on this as I don't have access or the ability to turn on debug tests and make necessary buildbot adjustments to tweak how we run the tests.

Phil Ringnalda (:philor)

Comment 6

•

13 years ago

Do we actually do something vaguely approximating useful with assertion now, something that we did not do the last time we ran debug tests on Android? You know, like, log them within a thousand lines of where they actually happened?

Kelsey Gilbert [:jgilbert]

Comment 7

•

13 years ago

Fatal assertions are logged fine: https://tbpl.mozilla.org/php/getParsedLog.php?id=18580536&tree=Try I don't know about for non-fatal assertions.

Phil Ringnalda (:philor)

Comment 8

•

13 years ago

Yeah, that's a nice looking MOZ_ASSERT in a desktop build. How would you feel about that only having the crash and the crash stack, and having the "Assertion failure: dest->GetSize() == src->Size(), at ../../../gfx/gl/GLScreenBuffer.cpp:422" line only appear about 400 lines below the bottom of the stack, so rather than an assertion failure it looks like an actual crash, and only two or three people involved in the project will know where to look to see that it was really an assertion failure and what assertion it was? And you can only have even that unhappy state for mochitests, not for crashtests or reftests, since they count assertions per-test, and the only reason your MOZ_ASSERT would be sure to appear 400 lines below is because you're crashing and thus cutting off the partial logcat and ensuring that you'll be there in the relatively few lines that get printed. For crashtests and reftests, you'd have a test failure, "assertion count 3 is more than expected 1 to 2" with absolutely no way of knowing what assertions failed unless you happened to be looking at one of the last tests in the run.

Depends on: 682129

OS: Mac OS X → Android

Hardware: x86 → ARM

Dan Mosedale (:dmosedale, :dmose)

Comment 9

•

13 years ago

While what philor proposes isn't ideal, Jeff's right that this stuff is important. It's high-leverage for automatically finding bugs closer to the time they're landed, when they're much cheaper to fix, because the introducer won't have swapped out the relevant mental context. So any step that moves us in the right direction seems worth doing to me.

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

13 years ago

releng folks (armenzg, callek, kmoir): I have outlined some issues with running debug tests in automation (in comment #5 https://bugzilla.mozilla.org/show_bug.cgi?id=818576#c5), we need to have resolutions to these before moving forward. I would personally prefer the greater chunking method as compared to longer timeouts. The longer a test runs, the greater the chance that we have a hardware or connectivity issue. Can the developers here give us a top priority list for which tests (not test suites) to run? Do we care about dom-level* (which never fail)? Do we care about crashtests or jsreftests? We are doing OK with our tegra pool now, turning on debug tests will only increase wait times and cause more delay. Should these only run on inbound and try? Can we limit the number of jobs per push to 10? With the volume on inbound that seems like enough to give us results vs take out the entire pool.

Justin Wood (:Callek)

Assignee

Comment 11

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #5) > There are a couple solutions I can think of for this: > 1) adjust the timeouts based on a --debugBuild flag to the harness I can't think of a good way to do this [and propogate to buildbot] top of my head > 2) adjust buildbot to run smaller chunks (15 chunks for mochitest instead of > 8, etc...) Fwiw :philor already noticed yesterday that mochi-8 for devices are running end to end in < 1 minute, and thought at first that the device was rebooting in the middle given how short the "Application ran for" was, so increasing this further, which highly increases our overall pool load (setup/teardown time) is a no-go for me. > 3) selectively run tests we really really care about in debug mode and > ignore the rest. This sounds like the best choice to me. > we will need releng to work on this as I don't have access or the ability to > turn on debug tests and make necessary buildbot adjustments to tweak how we > run the tests. Once we know exactly what suites, where, it can then be actionable by releng.

Joel Maher ( :jmaher ) (UTC -8)

Comment 12

•

13 years ago

mochitest-8 runs for 12 minutes on inbound for android 2.2 opt mochitest-7 runs for 9 minutes on inbound for android 2.2 opt mochitest-8 runs for 8 minutes on inbound for android 4.0 mochitest-7 runs for 5 minutes on inbound for android 4.0 whilst we have a need to do less chunks on our tests right now, I cannot find a 1 minute runtime on android.

Justin Wood (:Callek)

Assignee

Comment 13

•

13 years ago

My example was: https://tbpl.mozilla.org/php/getParsedLog.php?id=19027505&tree=Mozilla-Beta Application ran for: 0:00:53.128938 with whole step elapsed: 1 mins, 16 secs with whole job: Started at 2013-01-22 14:17:17.721565 Finished at 2013-01-22 14:35:08.004424 Elapsed: 17:50.282859 % of time spent doing testing vs time spent in rest of job: 4.96%

Dan Mosedale (:dmosedale, :dmose)

Comment 14

•

13 years ago

One nice property of prioritizing mochitests is that since they tend to be system/functional tests, they seem more likely to exercise more code throughout the stack.

Kelsey Gilbert [:jgilbert]

Comment 15

•

13 years ago

WebGL appears to touch M1, R, Ru, android R2, and b2g R6. As such, those are my recommendations for DEBUG test coverage. JS tests should also very strongly be considered for DEBUG coverage.

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

13 years ago

we don't run Ru on android, but we do R1-3. So Mochitests, R2 and Jsreftests. This bug is just for android, not b2g or desktop. Callek, it looks like we are getting a reasonable list of tests built up.

Justin Wood (:Callek)

Assignee

Updated

•

13 years ago

Assignee: nobody → bugspam.Callek

Justin Wood (:Callek)

Assignee

Comment 17

•

13 years ago

Attached patch [configs] v1 — Details — Splinter Review

So this ended up being simpler on the code-side than I first imagined (and first-started) This looked well enough for cedar per staging. Will need dev eyes to green them up properly before moving on to m-c.

Attachment #728101 - Flags: review?(kmoir)

Kim Moir [:kmoir] ET

Comment 18

•

13 years ago

Comment on attachment 728101 [details] [diff] [review] [configs] v1 lgtm

Attachment #728101 - Flags: review?(kmoir) → review+

Phil Ringnalda (:philor)

Comment 19

•

13 years ago

(In reply to Dan Mosedale (:dmose) from comment #9) > While what philor proposes isn't ideal Sorry that I wasn't sufficiently clear. That wasn't what I was proposing, it was what I was telling you that you would get, and the problem is not that it isn't ideal, it's that it isn't acceptable, and will not run visible on any tree other than Cedar. Take a look at https://tbpl.mozilla.org/php/getParsedLog.php?id=20995435&tree=Cedar&full=1 and picture yourself filing the bugs about the unexpected assertion test failures, particularly the ones in the first five tests, where you do not even know *what* assertions were hit. Now picture yourself having to fix the assertions in the last two tests, where you know what assertion was hit, but because you don't have a stack, you have no idea what code path was followed to get to that assertion. Now picture yourself trying to persuade bz to help you fix them. Now picture yourself telling someone that you backed out their patch because it caused Android tests to fail by hitting unexpected assertions. How? Sorry, they cannot know that. Which? Sorry, they cannot know that.

Kelsey Gilbert [:jgilbert]

Comment 20

•

13 years ago

That does sound like a nightmare, but it really sounds like the design of the tests, where they're counting assertions instead of running cleanly. Not being familiar with the reasons (historical or otherwise) why it works this way, it sounds really broken. Other tests we have don't have this problem, but do have a ton of asserts and #ifdef DEBUG-only code we need coverage on. Skipping DEBUG-build tests on android shouldn't have ever been an option, but here we are. We need as many of these back online as we can get. If some of them are currently inviable, and need to stay disabled until fixed, so be it. (Also someone will need to fix them) We *already* deal with mystery crashes in android opt-build tests during development, and part of what makes this painful is not having our DEBUG tests running to keep us honest.

Dan Mosedale (:dmosedale, :dmose)

Comment 21

•

13 years ago

philor, yes that does sound, uh, exciting. :-) That said, +1 to jgilbert's suggestion to blacklist busted tests in these runs until they are fixed and turning on the rest.

Ed Morley [:emorley]

Comment 22

•

13 years ago

(In reply to Jeff Gilbert [:jgilbert] from comment #20) > That does sound like a nightmare, but it really sounds like the design of > the tests, where they're counting assertions instead of running cleanly. Not > being familiar with the reasons (historical or otherwise) why it works this > way, it sounds really broken. > > Other tests we have don't have this problem, but do have a ton of asserts > and #ifdef DEBUG-only code we need coverage on. Skipping DEBUG-build tests > on android shouldn't have ever been an option, but here we are. We need as > many of these back online as we can get. If some of them are currently > inviable, and need to stay disabled until fixed, so be it. (Also someone > will need to fix them) > > We *already* deal with mystery crashes in android opt-build tests during > development, and part of what makes this painful is not having our DEBUG > tests running to keep us honest. That still doesn't solve the "logs only show part of the logcat, so future landings will be backed out for causing assertions, but people will have no idea what they broke and how to fix it" problem that philor was talking about.

Ed Morley [:emorley]

Comment 23

•

13 years ago

To be more explicit: Debug tests on android will need to remain hidden until assertions + stacks show up in the log like every other platform.

Dan Mosedale (:dmosedale, :dmose)

Comment 24

•

13 years ago

I'm assuming that's could be fixed by logcat scraping, a la bug 682129, upon which this bug depends. (Correct?) Is it possible that the shortest path to success here is to tweak the Android assertion code to complain to logcat in addition to stdout/stderr?

Phil Ringnalda (:philor)

Comment 25

•

13 years ago

As you saw when you looked at the log in comment 19, they are already there and visible to logcat, but as you also saw, logcat is just displaying a circular buffer which doesn't have the capacity to hold all of our output for an entire test run, and so bug 682129 is exactly about "capture all of our output from logcat."

Ed Morley [:emorley]

Comment 26

•

13 years ago

Or we could just ensure we print assertions to stdout, and not have to scrape the logcat at all...

Joel Maher ( :jmaher ) (UTC -8)

Comment 27

•

13 years ago

for the record, all mochitest data is captured in a special logfile on the device (typically /mnt/sdcard/tests/mochitest.log). The harness itself writes there instead of to stdout. Capturing stdout or even logcat is a major undertaking on android and mobile in general.

Kelsey Gilbert [:jgilbert]

Comment 28

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #27) > for the record, all mochitest data is captured in a special logfile on the > device (typically /mnt/sdcard/tests/mochitest.log). The harness itself > writes there instead of to stdout. Capturing stdout or even logcat is a > major undertaking on android and mobile in general. Couldn't we just output printf_stderr (and such) to a file, instead of actually stderr?

Joel Maher ( :jmaher ) (UTC -8)

Comment 29

•

13 years ago

Great idea! In the harness we could pull that stderr file to the host and append it to the logfile. If it had timestamps, we could sort it all properly with the test log.

Kelsey Gilbert [:jgilbert]

Comment 30

•

13 years ago

So, I can do a mini-patch to dump output to a file, but I can't do the rest of the build/harness stuff. Is there anyone else who could help out with that part?

Joel Maher ( :jmaher ) (UTC -8)

Comment 31

•

13 years ago

:jgilbert, I would be happy to help. This just requires editing testing/mochitest/runtestsremote.py. That would only be for the mochitest harness, but it would be a place to start. Basically we would do something like this: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtestsremote.py#217 self._dm.getFile('/mnt/sdcard/tests/<jgilbert.log>', "%s.jgilbert" % self.localLog) # append .jgilbert file to localLog ll = open(self.localLog, 'a') jg = open("%s.jgilbert" % self.localLog, 'r') data = jg.read() ll.write(jg) jg.close() ll.close() os.remove("%s.jgilbert" % self.localLog) It is a hack, if this seems to solve our problems, I would like to figure out a better way to support >1 type of file coming back. Likewise interleaving data to make it useful.

Justin Wood (:Callek)

Assignee

Comment 32

•

12 years ago

The major work [in buildbot] has been completed, I'm going to spin off "turn on for other branches" to a new bug to get this off my radar until we have actionable steps for said other branches. This is already live on cedar.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Justin Wood (:Callek)

Assignee

Updated

•

12 years ago

Summary: Tests for "Android 2.2 debug" → Tests for "Android 2.2 debug" (on cedar)

Justin Wood (:Callek)

Assignee

Updated

•

12 years ago

Blocks: 866795

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General