The default bug view has changed. See this FAQ.
Bug 773482 (b2g-reftest)

Tracking Bug to enable reftests on B2G

RESOLVED WONTFIX

Status

Testing
Reftest
RESOLVED WONTFIX
5 years ago
6 months ago

People

(Reporter: ahal, Unassigned)

Tracking

(Depends on: 6 bugs, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Many reftests fail on B2G. This is a tracking bug to triage, fix and enable/skip them.
Depends on: 774396
Blocks: 770490
Depends on: 774405
Depends on: 774682
Creating a bug per manifest doesn't make a whole lot of sense. Instead I'm uploading the raw logs, and will skip-if(B2G) failing tests, referencing this bug. Tests can be enabled/modified to random/etc as I do more testing on the machines that will run these in C-I.

Raw logs:
http://people.mozilla.com/~ahalberstadt/reftest/170712_reftest_logs.zip

HTML formatted output:
http://people.mozilla.com/~ahalberstadt/reftest/170712_reftest_html.zip
Created attachment 643144 [details]
list of reftests that fail on B2g

Note, these are not including the subset of reftests that are disabled for fennec.
(In reply to Andrew Halberstadt [:ahal] from comment #2)
> Created attachment 643144 [details]
> list of reftests that fail on B2g
> 
> Note, these are not including the subset of reftests that are disabled for
> fennec.

For the record, the number of failing tests is currently 1313.
is that consistent if you run it over and over again?
I would say almost certainly no. For now I'm going to disable these and try and get a green run. Then I'll set up the reftest desktop that arrived in toronto and run everything a bunch of times to try and nail down random vs failures.

It takes about 4 hours to run on my local machine (due to having to save all the images to data urls on failure)
I'm seeing the same behaviour that Joel did for native android. If I skip failing tests a new set of failing tests just crops up. The scary thing is that I'm restarting the emulator between each subdirectory so they shouldn't be affecting each other.

I'll try getting them running on a panda board next week to see if that is any different.
I would say almost certainly no. For now I'm going to disable these and try and get a green run. Then I'll set up the reftest desktop and run everything a bunch of times to try and nail down random vs failures.

It takes about 4 hours to run on my local machine (due to having to save all the images to data urls on failure)
What desktop are you running the emulator on?  If it's desktop-linux-NVIDIA, I would expect many failures because we don't test rendering at all on that gpu, and additional errors can creep in from the emulator translation layer.

(In reply to Andrew Halberstadt [:ahal] from comment #6)
> that I'm restarting the emulator between each subdirectory so they shouldn't
> be affecting each other.

That should not be necessary at all, and is probably why the tests take 4 hours to run ;).

(In reply to Andrew Halberstadt [:ahal] from comment #7)
> I would say almost certainly no. For now I'm going to disable these and try
> and get a green run.

Let's not start disabling until we triage the failures and decide how much we're going to support the desktop-linux-emulator setup.  If the pandaboard is a long way off, then we'll probably need to support these to some degree.

I'll help with this.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> What desktop are you running the emulator on?  If it's desktop-linux-NVIDIA,
> I would expect many failures because we don't test rendering at all on that
> gpu, and additional errors can creep in from the emulator translation layer.

Yes, my laptop and the desktop are both Linux with Nvidia GT218 and Nvidia GT216 cards respectively. I can order a new card if it becomes a big problem.

> That should not be necessary at all, and is probably why the tests take 4
> hours to run ;).

It shouldn't be necessary, but on my laptop at least the order of tests that got run had a dramatic affect on which ones passed and failed. The failures had nothing to do with the tests either, if they got disabled, new ones would fail in their place. It was impossible to triage so I thought separating the tests from their respective manifests would make it easier to find common sources of trouble. I think the main reason it took so long on my laptop was the large number of failures causing tons of data urls to be saved in the logs.

This seems to be better on the desktop (or maybe some fixes landed in b2g), I'll know more by tomorrow.

> Let's not start disabling until we triage the failures and decide how much
> we're going to support the desktop-linux-emulator setup.  If the pandaboard
> is a long way off, then we'll probably need to support these to some degree.
> 
> I'll help with this.

Sounds good! Thanks :)
(In reply to Andrew Halberstadt [:ahal] from comment #9)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> > What desktop are you running the emulator on?  If it's desktop-linux-NVIDIA,
> > I would expect many failures because we don't test rendering at all on that
> > gpu, and additional errors can creep in from the emulator translation layer.
> 
> Yes, my laptop and the desktop are both Linux with Nvidia GT218 and Nvidia
> GT216 cards respectively. I can order a new card if it becomes a big problem.
> 

Yeah, we don't really support GLX well.  It's a big barrel of monkeys.

> > That should not be necessary at all, and is probably why the tests take 4
> > hours to run ;).
> 
> It shouldn't be necessary, but on my laptop at least the order of tests that
> got run had a dramatic affect on which ones passed and failed.

That's an unnecessary workaround.  How about we start with one directory of tests, layout/reftest/reftest-sanity, see if we get reliable results, triage the failures / intermittent-ness, get them reliably passing, and then move outwards? :)
Hi Andrew, do you have any initial results for reftest-sanity?  Are the results consistent or inconsistent?

/me waiting on pins and needles ... ;)
Created attachment 646251 [details]
reftest_sanity log

Hey Chris, here are the results for reftest-sanity, and they are indeed consistent :)

I actually have the results for all the other manifests as well (see them here: http://people.mozilla.org/~ahalberstadt/reftest/260712_reftest_logs.zip)

There are around 400 failing tests, over 200 of which are in ../../image/test/reftest.list. These 200 seem to fail because the image is being rendered in the centre of the screen instead of the top-left corner.

Of the remaining 200 failures, a fair amount are exceptions (presumably API's that B2G doesn't have are being called.. though I haven't looked into it).
I disabled the 400 failing tests and am currently running all tests at once from the root manifest (in 3 chunks). I'll let you know how that goes tomorrow morning (I think reftests are just crazy slow on the emulator in general).
Andrew, you can add the instructions for running these reftests to https://wiki.mozilla.org/B2G/Hacking#Reftests ?  I'd like to give them a spin locally and review that list.
Updated the docs: https://wiki.mozilla.org/B2G/Hacking#Reftests
Let me know if you run into trouble.
Depends on: 778072
Depends on: 778725
Blocks: 781041
Reftests are being run against b2g nightlies and are reported here: http://brasstacks.mozilla.com/autolog/?tree=b2g&source=autolog

Let me know if you want me to enable/disable any of the tests. For now, it is the same subset that fennec runs.
Depends on: 782655
Depends on: 783621
Depends on: 783632
Depends on: 783658
Depends on: 737961
Depends on: 784810
Depends on: 785074
Depends on: 780902
Depends on: 780920
No longer depends on: 780902
Depends on: 807970
Reftests are turned on in Cedar: https://tbpl.mozilla.org/?tree=Cedar

I'll have a patch to get the full set of passing ones enabled soon.
Depends on: 808771
Depends on: 810401
Depends on: 811779
Depends on: 818103
Depends on: 829626
Alias: b2g-reftest
Depends on: 833371
Depends on: 839735
Depends on: 843634
Blocks: 846091
Depends on: 853024
Depends on: 861186
Depends on: 861928
Depends on: 862787
Depends on: 869011
Depends on: 870757
Depends on: 876801
Depends on: 922680
Depends on: 958518
Blocks: 981110
Depends on: 986409
Depends on: 973835
Depends on: 1084564
Depends on: 1091229
This is WONTFIX (i.e. we're not going to invest in enabling/fixing reftests on B2G), now that we're removing B2G code from the tree via bug 1306391.
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → WONTFIX
(side note: we have lots of references to this tracking bug in reftest.list; many [maybe all?] of those are being removed in bug 1307332.)
You need to log in before you can comment on or make changes to this bug.