Closed Bug 785032 Opened 12 years ago Closed 12 years ago

SD card issues with new Tegras ("INFO: attempting to create file /mnt/sdcard/writetest ... Push File Failed to Validate!")

Categories

(Release Engineering :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: cmtalbert)

References

Details

(Keywords: intermittent-failure, Whiteboard: [red])

Attachments

(2 files, 1 obsolete file)

{
========= Started Running verify.py failed (results: 2, elapsed: 34 secs) (at 2012-08-23 03:14:28.642449) =========
python /builds/sut_tools/verify.py
 in dir /builds/tegra-362/test/build (timeout 1200 secs)
 watching logfiles {}
 argv: ['python', '/builds/sut_tools/verify.py']
 environment:
  HOME=/home/cltbld
  PATH=/tools/buildbot-0.8.4-pre-moz2/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin
  PWD=/builds/tegra-362/test/build
  SUT_IP=10.250.51.202
  SUT_NAME=tegra-362
 using PTY: False
DEBUG: updateSUT: Using tegra 'tegra-362' found in env variable
INFO: Using tegra 'tegra-362' found in env variable
INFO: attempting to ping tegra
reconnecting socket
INFO: updateSUT.py: We're running SUTAgentAndroid Version 1.11
INFO: Got expected SUTAgent version '1.11'
INFO: attempting to create file /mnt/sdcard/writetest
Push File Failed to Validate!
program finished with exit code 1
}

s: tegra-362
https://tbpl.mozilla.org/php/getParsedLog.php?id=14627585&tree=Firefox

s: tegra-343
https://tbpl.mozilla.org/php/getParsedLog.php?id=14627866&tree=Firefox
Whiteboard: [orange]
Blocks: 438871
Whiteboard: [orange] → [orange][red]
We've seen this problem on non-new tegras, so don't believe it's related to the way the new tegras are installed. Removing block on new tegra install.
No longer blocks: 767456
Don't you mean "on one and only one non-new tegra"? And "a problem other than this one"?

This bug is 

INFO: attempting to create file /mnt/sdcard/writetest
Push File Failed to Validate!

and every one of the ones on 119, the only non-new tegra mentioned here, has instead been

INFO: attempting to create file /mnt/sdcard/writetest
removing file: /mnt/sdcard/writetest
True
reconnecting socket

a thing which we've known about for some time, but have been calling bug 681861 rather than bothering to file the bug I really wanted to file, "Intermittent Android 'True'".
Blocks: 767456
Apparently I'm wrong, and old tegras do something else which isn't this, but is this, but they don't do it visibly, they do it invisibly, but their invisible not-it is this.
No longer blocks: 767456
Passing it this to ctalbert since he volunteered to look into this.
Assignee: nobody → ctalbert
Depends on: 793678
Depends on: 794017
So, given we don't know exactly what is happening here, the current hypothesis is that the agent is starting before the sdcard is mounted.  That shouldn't happen. The agent waits on the android.intent.action.MEDIA_MOUNTED action to fire its startup receiver and start the agent.[1] However, I found that the agent would startup just fine with ONLY this code active when no sdcard was in the system (I disabled the code in the watcher).

So, this patch adds a second check of the external storage mount state in all the places we start the agent -- both in the watcher and in the agent itself. It also outputs debugging information so we can see what is happening when we test this, and I ask very much that we test this before deploying it.

If we get an r+ here I'd like to stage this an then we can do a final patch for deployment with less/better debug output.
Attachment #667280 - Flags: review?(jmaher)
Comment on attachment 667280 [details] [diff] [review]
Patch to help debug and possibly fix this

Review of attachment 667280 [details] [diff] [review]:
-----------------------------------------------------------------

just some nits

::: build/mobile/sutagent/android/watcher/WatcherService.java
@@ +937,5 @@
> +            if (Environment.MEDIA_MOUNTED.compareTo(state) != 0) {
> +                Log.i("SUTAgentWatcher", "SDcard not mounted, waiting another turn");
> +                return;
> +            } else if (Environment.MEDIA_MOUNTED_READ_ONLY.equals(state)) {
> +                Log.e("SUTAgentWatcher", "SDCard mounted read only not starting agent now, try again in 60s");

nit: caps here should be SDcard

@@ +945,3 @@
>              if (bStartSUTAgent && !GetProcessInfo(sProgramName))
>                  {
> +                Log.i("SUTAgentWatcher", "Starting SUTAGent from watcher code");

NIT: caps here should be SUTAgent
Attachment #667280 - Flags: review?(jmaher) → review+
Depends on: 797652
Attached patch Addresses nitsSplinter Review
Ok, this patch and the dependent bug patch in 797652 will allow you to test this. Callek can you stage this on some tegras and we can see if this works for us?
Attachment #667280 - Attachment is obsolete: true
Attachment #667759 - Flags: review+
(In reply to Clint Talbert ( :ctalbert ) from comment #51)
> Created attachment 667759 [details] [diff] [review]
> Addresses nits
> 
> Ok, this patch and the dependent bug patch in 797652 will allow you to test
> this. Callek can you stage this on some tegras and we can see if this works
> for us?

Sure, will just have you/joel give me the apk's and I'll get it up.
Note that this isn't the real fix we want. We have to deal with the fact that the agent needs to run on devices without an external sdcard and so we can't have it never come up if there is no sdcard. In that instance, we simply have to change the fallback behavior so that the agent never uses a fallback that is not world readable/writable.  But this code in comment 53 should enable us to understand if this is the problem we're seeing on this bug and then we can fix it properly.
Attached patch Final patchSplinter Review
This is the final patch we should go with. I tried to fix the fallback method; however, I can't for the life of me get the agent to write anything into /data/local/tmp although the location appears to be (and is supposed to be) world writeable/readable. The fallback does work, but given that the getTestRoot code has never actually returned null (although that is what it was coded to do) it can cause the agent to crash, so I had to null guard pruneCommandLog against that. I think this should get us where we want to go, especially for the watcher, which is the more difficult one of the pair to update. We can continue working on the agent's fallback code in getTestRoot and see if we can actually get that to work properly.
Attachment #668302 - Flags: review?(jmaher)
Comment on attachment 668302 [details] [diff] [review]
Final patch

Review of attachment 668302 [details] [diff] [review]:
-----------------------------------------------------------------

this looks pretty good in general.  I think we will be good in doing this, but I could see us changing the behaviour in the future as we hit other scenarios or find out more data.

::: build/mobile/sutagent/android/DoCommand.java
@@ +1324,5 @@
> +            tmpFile.createNewFile();
> +        } catch (IOException e){
> +            Log.i("SUTAgentAndroid", "Caught exception creating file in /data/local/tmp: " + e.getMessage());
> +        }
> +   

nit: extra whitespace here.

@@ +1334,5 @@
> +            return(Environment.getExternalStorageDirectory().getAbsolutePath());
> +            }
> +        if (tmpFile.exists()) 
> +            {
> +            Log.i("CLINT", "tmpfile exists");

extra debugging in here?
Attachment #668302 - Flags: review?(jmaher) → review+
https://hg.mozilla.org/mozilla-central/rev/99889cc4affd
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [orange][red] → [red]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: