Last Comment Bug 773482 - (b2g-reftest) Tracking Bug to enable reftests on B2G
(b2g-reftest)
: Tracking Bug to enable reftests on B2G
Status: NEW
:
Product: Testing
Classification: Components
Component: Reftest (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 784810 785074 807970 b2g-crashtest 843634 862787 869011 B2GRT 1084564 737961 774396 774405 774682 778072 778725 780920 782655 783621 783632 783658 808771 810401 811779 818103 829626 839735 853024 861186 861928 870757 876801 922680 958518 986409 1091229
Blocks: mobile-automation b2g-testing 770490 981110
  Show dependency treegraph
 
Reported: 2012-07-12 15:43 PDT by Andrew Halberstadt [:ahal]
Modified: 2016-05-11 14:06 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
list of reftests that fail on B2g (68.53 KB, text/plain)
2012-07-17 14:19 PDT, Andrew Halberstadt [:ahal]
no flags Details
reftest_sanity log (77.36 KB, text/plain)
2012-07-26 12:05 PDT, Andrew Halberstadt [:ahal]
no flags Details

Description Andrew Halberstadt [:ahal] 2012-07-12 15:43:56 PDT
Many reftests fail on B2G. This is a tracking bug to triage, fix and enable/skip them.
Comment 1 Andrew Halberstadt [:ahal] 2012-07-17 11:22:56 PDT
Creating a bug per manifest doesn't make a whole lot of sense. Instead I'm uploading the raw logs, and will skip-if(B2G) failing tests, referencing this bug. Tests can be enabled/modified to random/etc as I do more testing on the machines that will run these in C-I.

Raw logs:
http://people.mozilla.com/~ahalberstadt/reftest/170712_reftest_logs.zip

HTML formatted output:
http://people.mozilla.com/~ahalberstadt/reftest/170712_reftest_html.zip
Comment 2 Andrew Halberstadt [:ahal] 2012-07-17 14:19:33 PDT
Created attachment 643144 [details]
list of reftests that fail on B2g

Note, these are not including the subset of reftests that are disabled for fennec.
Comment 3 Jonathan Griffin (:jgriffin) 2012-07-17 14:34:19 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #2)
> Created attachment 643144 [details]
> list of reftests that fail on B2g
> 
> Note, these are not including the subset of reftests that are disabled for
> fennec.

For the record, the number of failing tests is currently 1313.
Comment 4 Joel Maher (:jmaher) 2012-07-17 15:19:16 PDT
is that consistent if you run it over and over again?
Comment 5 Andrew Halberstadt [:ahal] 2012-07-18 08:52:12 PDT
I would say almost certainly no. For now I'm going to disable these and try and get a green run. Then I'll set up the reftest desktop that arrived in toronto and run everything a bunch of times to try and nail down random vs failures.

It takes about 4 hours to run on my local machine (due to having to save all the images to data urls on failure)
Comment 6 Andrew Halberstadt [:ahal] 2012-07-20 08:35:57 PDT
I'm seeing the same behaviour that Joel did for native android. If I skip failing tests a new set of failing tests just crops up. The scary thing is that I'm restarting the emulator between each subdirectory so they shouldn't be affecting each other.

I'll try getting them running on a panda board next week to see if that is any different.
Comment 7 Andrew Halberstadt [:ahal] 2012-07-25 06:44:36 PDT
I would say almost certainly no. For now I'm going to disable these and try and get a green run. Then I'll set up the reftest desktop and run everything a bunch of times to try and nail down random vs failures.

It takes about 4 hours to run on my local machine (due to having to save all the images to data urls on failure)
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-25 10:15:32 PDT
What desktop are you running the emulator on?  If it's desktop-linux-NVIDIA, I would expect many failures because we don't test rendering at all on that gpu, and additional errors can creep in from the emulator translation layer.

(In reply to Andrew Halberstadt [:ahal] from comment #6)
> that I'm restarting the emulator between each subdirectory so they shouldn't
> be affecting each other.

That should not be necessary at all, and is probably why the tests take 4 hours to run ;).

(In reply to Andrew Halberstadt [:ahal] from comment #7)
> I would say almost certainly no. For now I'm going to disable these and try
> and get a green run.

Let's not start disabling until we triage the failures and decide how much we're going to support the desktop-linux-emulator setup.  If the pandaboard is a long way off, then we'll probably need to support these to some degree.

I'll help with this.
Comment 9 Andrew Halberstadt [:ahal] 2012-07-25 12:18:57 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> What desktop are you running the emulator on?  If it's desktop-linux-NVIDIA,
> I would expect many failures because we don't test rendering at all on that
> gpu, and additional errors can creep in from the emulator translation layer.

Yes, my laptop and the desktop are both Linux with Nvidia GT218 and Nvidia GT216 cards respectively. I can order a new card if it becomes a big problem.

> That should not be necessary at all, and is probably why the tests take 4
> hours to run ;).

It shouldn't be necessary, but on my laptop at least the order of tests that got run had a dramatic affect on which ones passed and failed. The failures had nothing to do with the tests either, if they got disabled, new ones would fail in their place. It was impossible to triage so I thought separating the tests from their respective manifests would make it easier to find common sources of trouble. I think the main reason it took so long on my laptop was the large number of failures causing tons of data urls to be saved in the logs.

This seems to be better on the desktop (or maybe some fixes landed in b2g), I'll know more by tomorrow.

> Let's not start disabling until we triage the failures and decide how much
> we're going to support the desktop-linux-emulator setup.  If the pandaboard
> is a long way off, then we'll probably need to support these to some degree.
> 
> I'll help with this.

Sounds good! Thanks :)
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-25 13:04:02 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #9)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> > What desktop are you running the emulator on?  If it's desktop-linux-NVIDIA,
> > I would expect many failures because we don't test rendering at all on that
> > gpu, and additional errors can creep in from the emulator translation layer.
> 
> Yes, my laptop and the desktop are both Linux with Nvidia GT218 and Nvidia
> GT216 cards respectively. I can order a new card if it becomes a big problem.
> 

Yeah, we don't really support GLX well.  It's a big barrel of monkeys.

> > That should not be necessary at all, and is probably why the tests take 4
> > hours to run ;).
> 
> It shouldn't be necessary, but on my laptop at least the order of tests that
> got run had a dramatic affect on which ones passed and failed.

That's an unnecessary workaround.  How about we start with one directory of tests, layout/reftest/reftest-sanity, see if we get reliable results, triage the failures / intermittent-ness, get them reliably passing, and then move outwards? :)
Comment 11 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-26 10:47:43 PDT
Hi Andrew, do you have any initial results for reftest-sanity?  Are the results consistent or inconsistent?

/me waiting on pins and needles ... ;)
Comment 12 Andrew Halberstadt [:ahal] 2012-07-26 12:05:45 PDT
Created attachment 646251 [details]
reftest_sanity log

Hey Chris, here are the results for reftest-sanity, and they are indeed consistent :)

I actually have the results for all the other manifests as well (see them here: http://people.mozilla.org/~ahalberstadt/reftest/260712_reftest_logs.zip)

There are around 400 failing tests, over 200 of which are in ../../image/test/reftest.list. These 200 seem to fail because the image is being rendered in the centre of the screen instead of the top-left corner.

Of the remaining 200 failures, a fair amount are exceptions (presumably API's that B2G doesn't have are being called.. though I haven't looked into it).
Comment 13 Andrew Halberstadt [:ahal] 2012-07-26 12:07:47 PDT
I disabled the 400 failing tests and am currently running all tests at once from the root manifest (in 3 chunks). I'll let you know how that goes tomorrow morning (I think reftests are just crazy slow on the emulator in general).
Comment 14 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-26 13:50:16 PDT
Andrew, you can add the instructions for running these reftests to https://wiki.mozilla.org/B2G/Hacking#Reftests ?  I'd like to give them a spin locally and review that list.
Comment 15 Andrew Halberstadt [:ahal] 2012-07-26 14:34:16 PDT
Updated the docs: https://wiki.mozilla.org/B2G/Hacking#Reftests
Let me know if you run into trouble.
Comment 16 Andrew Halberstadt [:ahal] 2012-08-14 07:17:00 PDT
Reftests are being run against b2g nightlies and are reported here: http://brasstacks.mozilla.com/autolog/?tree=b2g&source=autolog

Let me know if you want me to enable/disable any of the tests. For now, it is the same subset that fennec runs.
Comment 17 Andrew Halberstadt [:ahal] 2012-11-02 06:50:44 PDT
Reftests are turned on in Cedar: https://tbpl.mozilla.org/?tree=Cedar

I'll have a patch to get the full set of passing ones enabled soon.

Note You need to log in before you can comment on or make changes to this bug.