Closed
Bug 1010173
Opened 11 years ago
Closed 9 years ago
test root internal variable on devices (SUTAgentAndroid.sTestRoot) should not be set as an error message
Categories
(Testing Graveyard :: SUTAgent, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
mozilla32
People
(Reporter: pmoore, Assigned: gbrown)
References
Details
Attachments
(1 file, 1 obsolete file)
In the SUT agent code, when we retrieve the test root, if it is not set, it stores the testroot variable as the error message we get (such as an error if the SD card cannot *currently* be mounted). Therefore, the first time we are able to query the test root, if it is not available, the test root variable internally will be set to the error message, and any subsequent attempts to retrieve the test root will use this "cached" value of the error message (in other words, it will never try to retrieve it again).
This could be a problem if we query the test root before the SD card has been mounted.
Here, we see there is a lazy initialiser for SUTAgentAndroid.sTestRoot: if SUTAgentAndroid.sTestRoot is not set, we get it the first time we are asked for it:
http://hg.mozilla.org/mozilla-central/file/87730d939be7/build/mobile/sutagent/android/DoCommand.java#l1474
->
http://hg.mozilla.org/mozilla-central/file/87730d939be7/build/mobile/sutagent/android/DoCommand.java#l1462
Then here, we see if it is not available, we *set it* to the error message, meaning that if we later call the getter again (above), the cached value of the error message will be retrieved, rather than attempting again to get it (despite the fact that the external storage might now be available):
http://hg.mozilla.org/mozilla-central/file/87730d939be7/build/mobile/sutagent/android/DoCommand.java#l1467
There is a possible race condition in the release engineering automation, that when we reboot devices, we keep querying the device to ask it for its test root *while it is rebooting*, to see if it is up and ready again. If we were able to query the test root before the SD card is mounted, this would in fact set SUTAgentAndroid.sTestRoot to the error message, and that device would never get its test root set, even if it shortly afterwards had finished mounting its external storage, and would always fail our verification tests, until the device is rebooted.
This could be a reason for intermittent problems in the release engineering infrastructure.
Instead it would be better to report the error message, and not store it as the internal variable SUTAgentAndroid.sTestRoot - instead leave it as an empty string, so that the next time it is called, it will retry to get the test root.
Comment 1•11 years ago
|
||
Good spot, thank you :-)
| Reporter | ||
Comment 2•11 years ago
|
||
The only places where SUTAgentAndroid.sTestRoot is referenced are the getters and setters of the test root [GetTestRoot() and SetTestRoot(String testroot) of DoCommand.java], and in SUTAgentAndroid.java, where SUTAgentAndroid.sTestRoot is set explicitly to the value of the "TestRoot" parameter as defined in the "Device" section of the "SUTAgent.ini" file, if it exists.
Therefore the fix to stop "caching" this error message is localised to just removing these 3 lines from the SetTestRoot(String testroot) method of DoCommand.java - so that SUTAgentAndroid.sTestRoot is not changed from its default value ("") if the SUT Agent is not able to access the test root directory (i.e. because SD card has not been yet mounted) at that moment in time. Future calls to attempt to get the test root may then result in success.
An example where we keep retrying to get the testroot (where this fix should now help) is:
https://hg.mozilla.org/build/tools/file/65122ae9187c/lib/python/sut_lib/__init__.py#l501
Attachment #8423139 -
Flags: review?(jmaher)
Attachment #8423139 -
Flags: feedback?(bugspam.Callek)
Comment 3•11 years ago
|
||
Comment on attachment 8423139 [details] [diff] [review]
Mozilla Central patch for SUT Agent
Review of attachment 8423139 [details] [diff] [review]:
-----------------------------------------------------------------
good find. I am asking bc for additional review since he has done a lot of sutagent work lately- he might have a dependency on this.
Attachment #8423139 -
Flags: review?(jmaher)
Attachment #8423139 -
Flags: review?(bclary)
Attachment #8423139 -
Flags: review+
Comment 4•11 years ago
|
||
Comment on attachment 8423139 [details] [diff] [review]
Mozilla Central patch for SUT Agent
Review of attachment 8423139 [details] [diff] [review]:
-----------------------------------------------------------------
I actually added this in bug 933842. Setting sTestRoot to the error message was a shorthand way of displaying the error on the SUTAgent's display so that test root errors would be visible by inspecting the device. I'm ok with not setting the test root to the error message. It might be nice but not a requirement to display the test root error condition.
Attachment #8423139 -
Flags: review?(bclary) → review+
| Reporter | ||
Comment 5•11 years ago
|
||
Thanks Bob and Joel!
In that case, an alternative implementation would be we continue to allow it to be set to the error message, but if GetTestRoot() is called, and the current value is the error string, we would still try to retrieve the test root again. At the moment, we only attempt to retrieve the test root if the current value is an empty string; in the new implementation we would attempt to retrieve the test root if it was an empty string, or an error message.
Since overloading the variable to cater for two types of string is maybe not ideal (both the value of the test root, and a possible error message) - it might be cleaner to have a separate class variable for the error message, which can be set independently of the value of the test root. However, I'm not sure what would be required to get that to show in the SUT Agent display - what are your thoughts on that Bob?
I'm happy to go with any of the three solutions:
1) As per my existing patch (not storing error message in SUTAgentAndroid.sTestRoot)
2) To continue to allow SUTAgentAndroid.sTestRoot to be an error message, but if GetTestRoot() is called and SUTAgentAndroid.sTestRoot is currently an error message, to retry to get it
3) To create a new member variable, such as SUTAgentAndroid.sTestRootError, to store the error message, and get it to display this in SUT Agent
My preference would be number 3 since it is a clean solution, and doesn't remove existing functionality. 1 is the simplest, since I've already done it, and 2 is do-able by me without knowing how to affect what is displayed in the SUT Agent, but maybe not beautiful. :)
Flags: needinfo?(bclary)
Comment 6•11 years ago
|
||
If I understand correctly, the SUTAgent display is only updated onCreate so once an error is displayed it would continue to be displayed even if the test root was successfully set later.
I vote for #1.
Flags: needinfo?(bclary)
| Reporter | ||
Comment 7•11 years ago
|
||
Hi Bob,
It looks like I don't have permission to land the patch myself, so here is the same patch, but now including the commit metadata - would you be able to land this for me?
Many thanks!
Pete
Assignee: nobody → pmoore
Attachment #8423139 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8423139 -
Flags: feedback?(bugspam.Callek)
Flags: needinfo?(bclary)
Comment 8•11 years ago
|
||
Also just to call out our SUT code about testroot also retries for sake of socket errors in contacting/communicating with the SUTAgent.
Keywords: checkin-needed
Comment 9•11 years ago
|
||
Flags: needinfo?(bclary)
Keywords: checkin-needed
| Reporter | ||
Comment 10•11 years ago
|
||
Joel, Callek, Kim,
Now that this is checked in, what are the steps for getting a new SUT Agent rolled out? Are there any rollouts currently planned? Is there value in rolling out a new SUT Agent just for this change? Maybe we could initially roll out just for pandas (since this is just a new mozpool image and one-liner in mozharness to use the new image).
What do you think? I'm sure you'll all have opinions. :)
Pete
Flags: needinfo?(kmoir)
Flags: needinfo?(jmaher)
Flags: needinfo?(bugspam.Callek)
Comment 11•11 years ago
|
||
to get a new sutagent:
* build it locally (we don't really upload it to ftp from our android builds)
* deploy it on staging and ensure all tests work
* coordinate with ateam/mobile/sheriffs when you are deploying it
* deploy it to a subset of machines (you mentioned pandas)
* wait 1 week, ensure all is well
* roll out to the rest of the machines
Is it worth it? Probably- we should verify with bc if we have the latest version in production. If we just have 1 small change made, the risk is low and we might be able to prevent a few failures/week.
Flags: needinfo?(jmaher)
Comment 12•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
| Reporter | ||
Comment 13•11 years ago
|
||
Reopening until we have new SUT Agent deployed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 14•11 years ago
|
||
bug 1009113 would probably be a good candidate to ride along. wlach?
Flags: needinfo?(wlachance)
Comment 15•11 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #14)
> bug 1009113 would probably be a good candidate to ride along. wlach?
I don't think bug 1009113 will affect anything we have in production at all, so I wouldn't worry about it one way or the other.
Flags: needinfo?(wlachance)
Updated•11 years ago
|
Flags: needinfo?(bugspam.Callek)
Comment 16•11 years ago
|
||
I think jmaher covered the needinfo in comment #11. (I was on PTO last week). In any case, it's easy to run some tests in staging.
Flags: needinfo?(kmoir)
| Reporter | ||
Comment 17•11 years ago
|
||
I'll have a look at this in July after I've rolled out my vcs sync work.
| Reporter | ||
Comment 18•10 years ago
|
||
Joel,
Do you know if a new SUT Agent was ever rolled out?
Pete
Assignee: pmoore → nobody
Comment 19•10 years ago
|
||
I have no idea.
| Assignee | ||
Comment 20•9 years ago
|
||
I still have no idea, but...
The Android/Java version of sutagent is being removed in bug 1255527. (Negatus remains in use at this time.)
Assignee: nobody → gbrown
Status: REOPENED → RESOLVED
Closed: 11 years ago → 9 years ago
Resolution: --- → FIXED
| Reporter | ||
Updated•9 years ago
|
Resolution: FIXED → WONTFIX
Updated•7 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•