Closed Bug 1058724 Opened 10 years ago Closed 8 years ago

[Meta] Explore monkey testing framework

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gwagner, Unassigned)

References

Details

If I run the ./run-monkey.sh script with a debug gecko build I usually hit assertions within an hour. We should automate this and make sure we are not crashing with opt builds and we don't hit assertions with debug gecko builds.
The priority should be: 1) crashes, 2) situations that require reboot.

This includes:
1) Creating a testing profile that disables emergency calls and maybe other terrible things like bluetooth.
2) be able to debug crashes in gdb. either run the main process in b2g or make it stop and wait for the debugger to attach
3) in a future version we should also detect when our phone is in an unusable state. Like when pressing the home-button doesn't bring us back to the homescreen or if the homescreen doesn't load at all.
Depends on: 1056958
Emergency call can be disabled by setting adb shell setprop "ril.ecclist" "dummy"   (already asked the RIL team about this once in the past).

If you wrap this around existing Python runner stack like mozdevice then you would get 2 and 3 with much less effort as it's done or partially done there already.
James has already asked me to work on a way to run these scripts for every b2g-inbound build so that they report to Treeherder. I have something almost complete, which will generate the Orangutan script (with a few tweaks), and then run it. I'm planning on having crash detection and reporting to Treeherder (with logcat, script, and minidumps as artifacts) in the initial version.

Things like disabling the emergency call feature can be added to the tool, which already leverages other tools such as mozdevice, mozrunner, gaiatest, etc. What else should be disabled? If we allow calls to be made then there's a chance we'll still dial inappropriate numbers from these devices.
Here's my initial working version: https://github.com/davehunt/b2gmonkey

You'll see there are a number of TODOs in the code, but this should at least allow us to get monkey scripts running for each b2g-inbound build.
Note that on Flame I rarely see tap events happening. I suspect this is related to bug 1026527.
Gregor: Could you try b2gmonkey and let me know if you see tap events happening? I've even tried flashing a KK base build and still I only see swiping.
Flags: needinfo?(anygregor)
(In reply to Dave Hunt (:davehunt) from comment #5)
> Gregor: Could you try b2gmonkey and let me know if you see tap events
> happening? I've even tried flashing a KK base build and still I only see
> swiping.

I see it launching apps on the homescreen so I guess the events are happening!
Flags: needinfo?(anygregor)
It works great :)
The good thing about my ./run-monkey.sh was that I could run b2g in gdb on the phone. Is this also possible with b2gmonkey?
(In reply to Gregor Wagner [:gwagner] from comment #7)
> It works great :)

Great! I managed to get it working on my device too - looks like my Orangutan binary was out of date.

> The good thing about my ./run-monkey.sh was that I could run b2g in gdb on
> the phone. Is this also possible with b2gmonkey?

I can't imagine why not, how do you do this currently?
Flags: needinfo?(anygregor)
(In reply to Dave Hunt (:davehunt) from comment #8)
> (In reply to Gregor Wagner [:gwagner] from comment #7)
> > It works great :)
> 
> Great! I managed to get it working on my device too - looks like my
> Orangutan binary was out of date.
> 
> > The good thing about my ./run-monkey.sh was that I could run b2g in gdb on
> > the phone. Is this also possible with b2gmonkey?
> 
> I can't imagine why not, how do you do this currently?

I noticed that you restart the b2g process when the monkey starts. ./run-gdb.sh does the same. So you can either run the monkey or run in gdb. It probably works when you attach gdb after you start the monkey but thats not ideal.

It works with my script since it doesn't restart the b2g process and just starts simulating the touch events.
Flags: needinfo?(anygregor)
(In reply to Gregor Wagner [:gwagner] from comment #9)
> (In reply to Dave Hunt (:davehunt) from comment #8)
> > (In reply to Gregor Wagner [:gwagner] from comment #7)
> > > The good thing about my ./run-monkey.sh was that I could run b2g in gdb on
> > > the phone. Is this also possible with b2gmonkey?
> > 
> > I can't imagine why not, how do you do this currently?
> 
> I noticed that you restart the b2g process when the monkey starts.
> ./run-gdb.sh does the same. So you can either run the monkey or run in gdb.
> It probably works when you attach gdb after you start the monkey but thats
> not ideal.
> 
> It works with my script since it doesn't restart the b2g process and just
> starts simulating the touch events.

I see. We could change the remote binary in mozrunner [1], possibly based on an optional argument. Where can I find run-gdb.sh? Another option would be to avoid the restart, however this is currently present so we can do things like set the crash reporter up [2] and create a clean profile each time.

[1] http://hg.mozilla.org/mozilla-central/file/9ee9e193fc48/testing/mozbase/mozrunner/mozrunner/application.py#l52
[2] http://hg.mozilla.org/mozilla-central/file/9ee9e193fc48/testing/mozbase/mozrunner/mozrunner/base/device.py#l24
[3] http://hg.mozilla.org/mozilla-central/file/9ee9e193fc48/testing/mozbase/mozrunner/mozrunner/base/device.py#l68
Flags: needinfo?(anygregor)
(In reply to Dave Hunt (:davehunt) from comment #10)
> (In reply to Gregor Wagner [:gwagner] from comment #9)
> > (In reply to Dave Hunt (:davehunt) from comment #8)
> > > (In reply to Gregor Wagner [:gwagner] from comment #7)
> > > > The good thing about my ./run-monkey.sh was that I could run b2g in gdb on
> > > > the phone. Is this also possible with b2gmonkey?
> > > 
> > > I can't imagine why not, how do you do this currently?
> > 
> > I noticed that you restart the b2g process when the monkey starts.
> > ./run-gdb.sh does the same. So you can either run the monkey or run in gdb.
> > It probably works when you attach gdb after you start the monkey but thats
> > not ideal.
> > 
> > It works with my script since it doesn't restart the b2g process and just
> > starts simulating the touch events.
> 
> I see. We could change the remote binary in mozrunner [1], possibly based on
> an optional argument. Where can I find run-gdb.sh? Another option would be
> to avoid the restart, however this is currently present so we can do things
> like set the crash reporter up [2] and create a clean profile each time.
>

run-gdb.sh can be found here:https://github.com/mozilla-b2g/B2G/blob/master/run-gdb.sh
I think we can live without a fresh profile but crash-reporting is a nice thing to have.
Flags: needinfo?(anygregor)
I've made restarting the device an optional argument, which we will always set when running in the CI. Can you try updating and running this again to see if it works while running b2g in gdb?
Flags: needinfo?(anygregor)
In CI we're currently running 100000 steps, which is taking 10 minutes to run. Do we have an idea of how many steps or how long we'd like this to run for?
Flags: needinfo?(jlal)
Hrm IMO my non-scientific response would be to run one 10 min check per commit and a longer N (we can start at 2 hours?) run once we have capacity. For this an other fuzzing tests I think our best bet is probably shorter 10-30 min runs (but run 1-10 of those) + one or two longer running tests (maybe the longer ones we run on nightly then bisect).
Flags: needinfo?(jlal)
(In reply to James Lal [:lightsofapollo] from comment #14)
> Hrm IMO my non-scientific response would be to run one 10 min check per
> commit and a longer N (we can start at 2 hours?) run once we have capacity.
> For this an other fuzzing tests I think our best bet is probably shorter
> 10-30 min runs (but run 1-10 of those) + one or two longer running tests
> (maybe the longer ones we run on nightly then bisect).

If I've understood correctly then we should just leave this as a 10 minute run? When you say more capacity are you referring to adding more devices to the current pool (of two) or something else?
QA Whiteboard: [fxosqa-auto-backlog-]
The monkey tests for B2G are no longer maintained or running.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(anygregor)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.