Closed Bug 968200 Opened 6 years ago Closed 6 years ago

Intermittent uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml | application timed out after 330 seconds with no output because we run out of disk space

Categories

(Core :: Networking, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox28 --- fixed
firefox29 --- fixed
firefox30 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: cbook, Assigned: ehsan)

References

(Depends on 2 open bugs, )

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

Android 2.2 Tegra mozilla-inbound opt test mochitest-8 on 2014-02-05 04:32:21 PST for push 537c83f1c323

slave: tegra-249

https://tbpl.mozilla.org/php/getParsedLog.php?id=34130392&tree=Mozilla-Inbound



TEST-UNEXPECTED-FAIL | /tests/uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml | application timed out after 330 seconds with no output
Ehsan, can you please take a look at this? I think this has been an ongoing failure that was just obscured by the 2400s timeouts in bug 663657 until the recent harness change that better exposed the underlying test issues.
Component: General → File Handling
Flags: needinfo?(ehsan)
Product: Firefox for Android → Core
Summary: Intermittent TEST-UNEXPECTED-FAIL | /tests/uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml | application timed out after 330 seconds with no output → Intermittent uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml | application timed out after 330 seconds with no output
Do you know if the output of window.dump() goes anywhere in these tests?
Flags: needinfo?(ehsan) → needinfo?(ryanvm)
I somehow doubt it. ISTR that Android doesn't get a lot of useful stuff in the logs due to how the harness works. I would ask in #ateam.
Flags: needinfo?(ryanvm)
OK, pushed a debugging patch to use info() instead for a whole bunch of things.  This should give us an idea on where the failure is roughly the next time this test fails:

https://hg.mozilla.org/integration/mozilla-inbound/rev/3a10c4605795
https://hg.mozilla.org/mozilla-central/rev/3a10c4605795
Assignee: nobody → ehsan
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Brad, is there a good way of getting NSPR log output on Android on the try server?  I suspect that there may be an actual bug here.  We just don't seem to call into the external app handler here at all, so I no longer think this is a test bug.
Flags: needinfo?(blassey.bugs)
Ehsan, the best answer I have is to patch GeckoLoader.setupGeckoEnvironment() to add an env var to turn on NSPR logging. 

NI'ing gbrown to see if he has a better idea.


https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/mozglue/GeckoLoader.java.in#122
Flags: needinfo?(blassey.bugs) → needinfo?(gbrown)
Thanks!  Will that cause the log to go to the test output?  If yes, that should work well for me.
There is a fairly new option here, for enabling NSPR logging in mochitests: http://hg.mozilla.org/mozilla-central/annotate/d8d8fa98ee7d/testing/mochitest/runtests.py#l56. I have not tried using that on Android, but it looks promising.
Flags: needinfo?(gbrown)
Thanks a lot!  Here's a try push which should hopefully give us the NSPR logs, and help us understand what's going on.
Hmm, I don't see any links to download the NSPR logs from <https://tbpl.mozilla.org/php/getParsedLog.php?id=34478849&full=1&branch=try>.  What am I doing wrong?
Flags: needinfo?(gbrown)
Darn. I suspect you didn't do anything wrong, but this is not working the way it should on Android.

I think there are 2 problems:

1. On Android 4.0 Opt, there is a failure:
19:00:25     INFO -  Mochi-Remote ERROR | Automation Error: Exception caught while running tests
19:00:25     INFO -  Traceback (most recent call last):
19:00:25     INFO -    File "/builds/panda-0097/test/build/tests/mochitest/runtestsremote.py", line 731, in main
19:00:25     INFO -      retVal = mochitest.runTests(options)
19:00:25     INFO -    File "/builds/panda-0097/test/build/tests/mochitest/runtests.py", line 1069, in runTests
19:00:25     INFO -      with zipfile.ZipFile("%s/nsprlog.zip" % browserEnv["MOZ_UPLOAD_DIR"], "w", zipfile.ZIP_DEFLATED) as logzip:
19:00:25     INFO -  KeyError: 'MOZ_UPLOAD_DIR'

2. On Android 2.2 Opt, blobber uploads are not working -- probably because those tests do not use mozharness.

I'll open a bug for (1) and ask around about (2) -- will try to find a solution for you today.
Flags: needinfo?(gbrown)
Depends on: 970994
(In reply to comment #41)
> Darn. I suspect you didn't do anything wrong, but this is not working the way
> it should on Android.
> 
> I think there are 2 problems:
> 
> 1. On Android 4.0 Opt, there is a failure:
> 19:00:25     INFO -  Mochi-Remote ERROR | Automation Error: Exception caught
> while running tests
> 19:00:25     INFO -  Traceback (most recent call last):
> 19:00:25     INFO -    File
> "/builds/panda-0097/test/build/tests/mochitest/runtestsremote.py", line 731, in
> main
> 19:00:25     INFO -      retVal = mochitest.runTests(options)
> 19:00:25     INFO -    File
> "/builds/panda-0097/test/build/tests/mochitest/runtests.py", line 1069, in
> runTests
> 19:00:25     INFO -      with zipfile.ZipFile("%s/nsprlog.zip" %
> browserEnv["MOZ_UPLOAD_DIR"], "w", zipfile.ZIP_DEFLATED) as logzip:
> 19:00:25     INFO -  KeyError: 'MOZ_UPLOAD_DIR'

Hmm, I'm not sure what this means...

> 2. On Android 2.2 Opt, blobber uploads are not working -- probably because
> those tests do not use mozharness.
> 
> I'll open a bug for (1) and ask around about (2) -- will try to find a solution
> for you today.

Thanks!  FWIW if there's any other way to get the NSPR logs, it will be fine with me!
The "normal" way to obtain NSPR logs in tbpl is to copy them to the "blobber" aws server. On Android 4.0, blobber is supported, but the NSPR -> blobber code is broken -- I'll fix that.

But you probably really want an Android 2.2 log, and blobber is not supported there. I asked and there are conflicting opinions about how difficult it would be to support blobber on Android 2.2. Rather than investigate that, let's just cat the nspr logs and see if that works:

https://tbpl.mozilla.org/?tree=Try&rev=5cf4e3f38015
(In reply to comment #43)
> The "normal" way to obtain NSPR logs in tbpl is to copy them to the "blobber"
> aws server. On Android 4.0, blobber is supported, but the NSPR -> blobber code
> is broken -- I'll fix that.

Thanks!

> But you probably really want an Android 2.2 log,

Correct.

> and blobber is not supported
> there. I asked and there are conflicting opinions about how difficult it would
> be to support blobber on Android 2.2. Rather than investigate that, let's just
> cat the nspr logs and see if that works:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=5cf4e3f38015

Thanks for your help!  :-)
(In reply to Geoff Brown [:gbrown] from comment #49)
> Another try: https://tbpl.mozilla.org/?tree=Try&rev=5078af5501df

It looks like nothing went wrong in that run, but I don't see any NSPR output. I will debug locally...
This one gives NSPR output: https://tbpl.mozilla.org/?tree=Try&rev=22a22e7f14b1

Ehsan -- hope this gives you what you need.
Flags: needinfo?(ehsan)
Yes it does, thanks a lot!  Now, all we need is to get this test to fail.  :-)
Flags: needinfo?(ehsan)
I'm trying! :P
(In reply to comment #57)
> I'm trying! :P

It worked.  I'll try to look at the failure log as soon as I get some free time...
So there is this line in the log:

OnStopRequest[60976830](http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/unsafeBidiFileName.sjs?name=%C3%A2%C2%80%C2%AA.test) status=804b0004 mIsLoadingDocument=true, 0 active URLs

which is basically where the load stops.  0x804b0004 is NS_NET_STATUS_CONNECTED_TO, which is only set from here: http://dxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsSocketTransport2.cpp#1572

Time to turn on the socket transport logging: https://tbpl.mozilla.org/?tree=Try&rev=b8f2f9cc414b

And moving the bug to Core::Networking because it seems like we're hitting a bug there.
Component: File Handling → Networking
https://tbpl.mozilla.org/?tree=Try&rev=f357ba820f5f

Hmm, this log <https://tbpl.mozilla.org/php/getParsedLog.php?id=34746774&tree=Try&full=1#error0> doesn't include any socket transport logs.  Not sure what I'm doing wrong... :/
ehsan, re: comment   did you include "nsSocketTransport:5" in the list of NSPR modules to log?:  

   https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(ehsan)
(In reply to Jason Duell (:jduell) from comment #115)
> ehsan, re: comment   did you include "nsSocketTransport:5" in the list of
> NSPR modules to log?:  
> 
>    https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Yes, any idea why that wouldn't work on Android?
Flags: needinfo?(ehsan) → needinfo?(jduell.mcbugs)
ehsan: are you getting any NSPR log at all, or just missing the nsSocketTransport msgs?
Flags: needinfo?(jduell.mcbugs) → needinfo?(ehsan)
In the logs of Comment 78, I see DocLoader NSPR messages, but nothing else. I wonder if we are losing part of the environment variable somewhere. It might be worth trying with just nsSocketTransport:5, or changing the order of modules so that nsSocketTransport is first: "nsSocketTransport:5,DocLoader:5,..."
(In reply to Jason Duell (:jduell) from comment #118)
> ehsan: are you getting any NSPR log at all, or just missing the
> nsSocketTransport msgs?

As discussed on irc, I was getting NSPR logs, just not any nsSocketTransport messages.  I'll try Geoff's suggestion in comment 119 now.
Flags: needinfo?(ehsan)
OK, we have a log now finally! <https://tbpl.mozilla.org/php/getParsedLog.php?id=35298635&tree=Try&full=1#error0>

But this only gave us socket transport logs, not the rest.  Seems like nsSocketTransport logs are mutually exclusive with anything else?!

I can't make much sense out of this log.  Jason, would you mind lending a hand please?  Please see comment 74 for my previous attempt to explain what's happening until we got to the networking code.

Thanks!
Flags: needinfo?(jduell.mcbugs)
> Seems like nsSocketTransport logs are mutually exclusive with anything else?

No, I think we probably have some sort of Android bug with environment variables getting truncated.  

I'll look at this soon--too swamped with other things tonight.
Depends on: 977589
Pushed to try with: NSPR_LOG_MODULES = "all:5"

https://tbpl.mozilla.org/?tree=Try&rev=f0f27a53af2c
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #144)
> Pushed to try with: NSPR_LOG_MODULES = "all:5"
> 
> https://tbpl.mozilla.org/?tree=Try&rev=f0f27a53af2c

"Output exceeded 52428800 bytes, remaining output has been truncated (output was 52471609 bytes)"

sigh... :(
So I guess we're kind of stuck here until the logging issues are sorted out? This is still a top-10 orange :(
(In reply to comment #180)
> So I guess we're kind of stuck here until the logging issues are sorted out?

Yes, unfortunately.  I tried anything I could think of to work around that... Suggestion welcome.

> This is still a top-10 orange :(

I really doubt that this is a test problem, and would therefore really hate for this test to be disabled and then people forgetting about this bug, if that makes sense.
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #181)
> (In reply to comment #180)
> > So I guess we're kind of stuck here until the logging issues are sorted out?
> 
> Yes, unfortunately.  I tried anything I could think of to work around
> that... Suggestion welcome.
Flags: needinfo?(gbrown)
There's still a chance that :jduell will be able to interpret the nsSocketTransport logs, isn't there?

I have had trouble finding time for bug 977589, but will try to push that forward soon.
Flags: needinfo?(gbrown)
As I mentioned in email, I think the next thing to try here is :mayhemer's recipe for NSPR logging with mochitest:

  http://www.janbambas.cz/get-nspr-log-from-tryserver-run/
Flags: needinfo?(jduell.mcbugs)
(In reply to comment #187)
> As I mentioned in email, I think the next thing to try here is :mayhemer's
> recipe for NSPR logging with mochitest:
> 
>   http://www.janbambas.cz/get-nspr-log-from-tryserver-run/

Isn't that the thing that doesn't yet work on Android?

(Also, which email? ;-)
(In reply to Jason Duell (:jduell) from comment #187)
> As I mentioned in email, I think the next thing to try here is :mayhemer's
> recipe for NSPR logging with mochitest:
> 
>   http://www.janbambas.cz/get-nspr-log-from-tryserver-run/

That relies on blobber (aws) support, which is not available on Android 2.2 -- comment 44.
Patrick,

Can you look at this? It's a top 10 orange.  Per comment 74 it looks like we're somehow getting OnStopRequest called with status==NS_NET_STATUS_CONNECTED_TO, which seems like a weird channel result status.  It's getting propagated up somehow via OnTransportStatus.

You can find the channel in question by searching for 5d2d2830 in the log.
Flags: needinfo?(mcmanus)
I thought of a hack to work around bug 977589:

https://tbpl.mozilla.org/?tree=Try&rev=56545bdc82ed

In this try push, I replace the NSPR_LOG_MODULES delimiter with a slash, and use that in the test runner script.  Let's see if this is going to work!
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #198)
> I thought of a hack to work around bug 977589:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=56545bdc82ed
> 
> In this try push, I replace the NSPR_LOG_MODULES delimiter with a slash, and
> use that in the test runner script.  Let's see if this is going to work!

It didn't work...
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #199)
> (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> emailapocalypse) from comment #198)
> > I thought of a hack to work around bug 977589:
> > 
> > https://tbpl.mozilla.org/?tree=Try&rev=56545bdc82ed
> > 
> > In this try push, I replace the NSPR_LOG_MODULES delimiter with a slash, and
> > use that in the test runner script.  Let's see if this is going to work!
> 
> It didn't work...

Geoff, did I do anything obviously wrong there?
Flags: needinfo?(gbrown)
When I looked at your push yesterday I was sure it would work!

But now I think you forgot to include my changes from https://hg.mozilla.org/try/rev/0f9f983844d0.
Flags: needinfo?(gbrown)
(In reply to comment #201)
> When I looked at your push yesterday I was sure it would work!
> 
> But now I think you forgot to include my changes from
> https://hg.mozilla.org/try/rev/0f9f983844d0.

Gah, you're right!  Here's another attempt: https://tbpl.mozilla.org/?tree=Try&rev=f5f3fe614624
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #204)
> (In reply to comment #201)
> > When I looked at your push yesterday I was sure it would work!
> > 
> > But now I think you forgot to include my changes from
> > https://hg.mozilla.org/try/rev/0f9f983844d0.
> 
> Gah, you're right!  Here's another attempt:
> https://tbpl.mozilla.org/?tree=Try&rev=f5f3fe614624

So here is a test failure log:

https://tbpl.mozilla.org/php/getParsedLog.php?id=35674796&tree=Try&full=1

But I only see the socket transport log there and *nothing* else. :(  What am I doing wrong?
Flags: needinfo?(gbrown)
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/847c492b44d08aacadb66bf471d329cea8d0cf09f0b743f5daf0beb7bf50a4c470139549fcec38c354f5a6ea2cf5a4c378423b1f636499d388ffd3e6129f58f7 is the logcat for the Android 4.0 run and it shows:

03-05 10:52:12.234 D/GeckoLoader( 2229): env20: NSPR_LOG_MODULES=nsSocketTransport:5/DocLoader:5/URILoader:5/HelperAppService:5

I cannot verify that for Android 2.2, but I'm 99% sure it would be the same.

I don't know prlog.c well, but notice there are 2 functions that access NSPR_LOG_MODULES and both sscanf for "," -- don't you need to change both _PR_InitLog and _PR_SetLogModuleLevel?
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #208)
> http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/
> 847c492b44d08aacadb66bf471d329cea8d0cf09f0b743f5daf0beb7bf50a4c470139549fcec3
> 8c354f5a6ea2cf5a4c378423b1f636499d388ffd3e6129f58f7 is the logcat for the
> Android 4.0 run and it shows:
> 
> 03-05 10:52:12.234 D/GeckoLoader( 2229): env20:
> NSPR_LOG_MODULES=nsSocketTransport:5/DocLoader:5/URILoader:5/
> HelperAppService:5
> 
> I cannot verify that for Android 2.2, but I'm 99% sure it would be the same.
> 
> I don't know prlog.c well, but notice there are 2 functions that access
> NSPR_LOG_MODULES and both sscanf for "," -- don't you need to change both
> _PR_InitLog and _PR_SetLogModuleLevel?

Yes indeed.  I used to be able to write code...  ;-)

https://tbpl.mozilla.org/?tree=Try&rev=54177c1d3d22
So here’s a useful log containing all of the NSPR log modules I selected:

https://tbpl.mozilla.org/php/getParsedLog.php?id=35773619&tree=Try&full=1#error0

Based on this log, we get to this point: <http://mxr.mozilla.org/mozilla-central/source/uriloader/base/nsURILoader.cpp#260> where we have the following line in the log:

2469704[4f82c080]:   After dispatch, m_targetStreamListener: 0x5d2f4500, rv: 0x00000000

After which there is:

2469704[4f82c080]:   After dispatch, m_targetStreamListener: 0x5d2f4500, rv: 0x00000000
2469704[4f82c080]: STS dispatch [5fe6d800]
3008312[4f82c880]:     ...returned after 1 milliseconds
2469704[4f82c080]: DocLoader:5f8f4400: OnStopRequest[60c06c30](http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/unsafeBidiFileName.sjs?name=%C3%A2%C2%80%C2%AA.test) status=804b0004 mIsLoadingDocument=true, 0 active URLs
3008312[4f82c880]: STS poll iter [1]
3008312[4f82c880]:   active [0] { handler=5d9ca9c0 condition=0 pollflags=5 }
3008312[4f82c880]:   calling PR_Poll [active=1 idle=0]
3008312[4f82c880]: poll timeout: none
3008312[4f82c880]:     timeout = -1 milliseconds
2469704[4f82c080]: DocLoader:5f8f4400: Is now idle...
2469704[4f82c080]: STS dispatch [60d29a80]
3008312[4f82c880]:     ...returned after 0 milliseconds
2469704[4f82c080]: STS dispatch [60d29a90]
3008312[4f82c880]: STS poll iter [0]
3008312[4f82c880]:   active [0] { handler=5d9ca9c0 condition=0 pollflags=5 }
3008312[4f82c880]:   calling PR_Poll [active=1 idle=0]
3008312[4f82c880]: poll timeout: none
3008312[4f82c880]:     timeout = 0 milliseconds
3008312[4f82c880]:     ...returned after 0 milliseconds
3008312[4f82c880]: STS poll iter [1]
3008312[4f82c880]:   active [0] { handler=5d9ca9c0 condition=0 pollflags=5 }
3008312[4f82c880]:   calling PR_Poll [active=1 idle=0]
3008312[4f82c880]: poll timeout: none
3008312[4f82c880]:     timeout = -1 milliseconds
2469704[4f82c080]: STS dispatch [5f8b3080]
3008312[4f82c880]:     ...returned after 361 milliseconds
2469704[4f82c080]: Error: diskFull, type=1, listener=0x0, rv=0x8052000A
3008312[4f82c880]: STS poll iter [1]
2469704[4f82c080]:        path='/mnt/sdcard/Download/PXY+h9oc.exe.part'

Note the strange thing here that “OnStartRequest returning” on <http://mxr.mozilla.org/mozilla-central/source/uriloader/base/nsURILoader.cpp#270> never appears in the log, but note the diskFull error above!  That comes from here: <http://mxr.mozilla.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#1818> and it seems like we just call prompter->Alert, show an alert to the user who is not watching the device, and never return back to the event loop until we’re killed, which explains why the “OnStartRequest returning” message is never shown.

(When reading the log note that bug 949991 recently landed which touches this code and changes the log messages, but that patch was not in the tree from which this log was generated.)

We can of course mock the prompt service to cause it to never show an alert (and perhaps we should do it in the test harness to get it print something useful to the log), but what’s our story about protecting against the disk getting full on Android?
Flags: needinfo?(mcmanus)
Flags: needinfo?(gbrown)
Flags: needinfo?(blassey.bugs)
Summary: Intermittent uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml | application timed out after 330 seconds with no output → Intermittent uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml | application timed out after 330 seconds with no output because we run out of disk space
For the test devices, we try to have lots of free space, and clean up known tmp files between test runs.

The tests show the available space before and after each run. For your latest log:

before -- 'disk': ['/data: 221642752 total, 156811264 available', '/system: 250740736 total, 150310912 available', '/mnt/sdcard: 15911780352 total, 15841263616 available']

 after -- 'disk': ['/data: 221642752 total, 156770304 available', '/system: 250740736 total, 150310912 available', '/mnt/sdcard: 15911780352 total, 15841214464 available']

So 15+ GB available on /mnt/sdcard. It seems unlikely that we are filling that up during the test -- maybe the diskFull error is wrong? Related to the file names in this test? Maybe the format of the sdcard on certain devices doesn't allow the characters used in the file names??
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #234)
> For the test devices, we try to have lots of free space, and clean up known
> tmp files between test runs.
> 
> The tests show the available space before and after each run. For your
> latest log:
> 
> before -- 'disk': ['/data: 221642752 total, 156811264 available', '/system:
> 250740736 total, 150310912 available', '/mnt/sdcard: 15911780352 total,
> 15841263616 available']
> 
>  after -- 'disk': ['/data: 221642752 total, 156770304 available', '/system:
> 250740736 total, 150310912 available', '/mnt/sdcard: 15911780352 total,
> 15841214464 available']

What are the mount points on this device?  Is it possible that /mnt/sdcard/Download is mounted on a different volume?

> So 15+ GB available on /mnt/sdcard. It seems unlikely that we are filling
> that up during the test -- maybe the diskFull error is wrong? Related to the
> file names in this test? Maybe the format of the sdcard on certain devices
> doesn't allow the characters used in the file names??

This is the temp file path: /mnt/sdcard/Download/PXY+h9oc.exe.part

The only character out of the ordinary there is +, but I don't know any filesystem which blocks that.  What filesystem do we use on these machines?

I'm preparing another try push with more logging to see where exactly this error is coming from.
Flags: needinfo?(gbrown)
Depends on: 980967
Depends on: 980970
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #233)
> We can of course mock the prompt service to cause it to never show an alert
> (and perhaps we should do it in the test harness to get it print something
> useful to the log), but what’s our story about protecting against the disk
> getting full on Android?

Are you talking in general or on our test machines? In general, we don't really have a story. We'd like to move our caches to the designated cache directory such that the OS can clear them away on disk pressure. Necko can't handle that yet though.

Are there any disk full scenarios you're particularly worried about?

Geoff would know better about the test machines.
Flags: needinfo?(blassey.bugs)
(In reply to comment #242)
> (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> emailapocalypse) from comment #233)
> > We can of course mock the prompt service to cause it to never show an alert
> > (and perhaps we should do it in the test harness to get it print something
> > useful to the log), but what’s our story about protecting against the disk
> > getting full on Android?
> 
> Are you talking in general or on our test machines? In general, we don't really
> have a story. We'd like to move our caches to the designated cache directory
> such that the OS can clear them away on disk pressure. Necko can't handle that
> yet though.
> 
> Are there any disk full scenarios you're particularly worried about?

No, just the test machines.  We show a prompt when this happens, so there is no problem with the user facing behavior, it's just that there is nobody to dismiss that prompt when we're running tests!
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #235)
> What are the mount points on this device?  Is it possible that
> /mnt/sdcard/Download is mounted on a different volume?
> What filesystem do we use on these machines?

Here is the output from "mount" on a tegra, at the start of mochitest-8:

rootfs / rootfs ro,relatime 0 0
tmpfs /dev tmpfs rw,relatime,mode=755 0 0
devpts /dev/pts devpts rw,relatime,mode=600 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
tmpfs /mnt/asec tmpfs rw,relatime,mode=755,gid=1000 0 0
/dev/block/mtdblock3 /system yaffs2 ro,relatime 0 0
/dev/block/mtdblock6 /data yaffs2 rw,nosuid,nodev,relatime 0 0
/dev/block/mtdblock4 /cache yaffs2 rw,nosuid,nodev,relatime 0 0
/dev/block/mtdblock0 /misc yaffs2 rw,nosuid,nodev,relatime 0 0
/dev/block/vold/179:9 /mnt/sdcard vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0702,dmask=0702,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
/dev/block/vold/179:9 /mnt/secure/asec vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0702,dmask=0702,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
tmpfs /mnt/sdcard/.android_secure tmpfs ro,relatime,size=0k,mode=000 0 0

And here is the output from "df":
/dev: 451240K total, 12K used, 451228K available (block size 4096)
/mnt/asec: 451240K total, 0K used, 451240K available (block size 4096)
/system: 244864K total, 97948K used, 146916K available (block size 4096)
/data: 216832K total, 60040K used, 156792K available (block size 4096)
/cache: 32768K total, 1156K used, 31612K available (block size 4096)
/misc: 2048K total, 1156K used, 892K available (block size 4096)
/mnt/sdcard: 15550208K total, 98848K used, 15451360K available (block size 32768)
/mnt/secure/asec: 15550208K total, 98848K used, 15451360K available (block size 32768)

So it looks like /mnt/sdcard/Download/... is on /mnt/sdcard, which is vfat, and has 15+GB available.

I am trying to get the mount info for a tegra that fails the test, just in case that is any different.
Thanks, Geoff.  Much appreciated.

So, on try, so far I've confirmed that it's the nsILocalFile::CreateUnique that is failing, like I suspected.  I'm running another patch with some NSPR logging on try <https://tbpl.mozilla.org/?tree=Try&rev=98571633d685> to see what open() returns here, but the chances of something non-filesystem happening here are quickly diminishing...

Can you also do an ls -la /mnt/sdcard/Download on one of these tegras, preferably one that fails this test?  And can you please see if you can create a file in that directory through adb shell?
Here is mount/df output from a tegra that failed this test. I do not see any significant difference (this device has an 8 GB sdcard, but I think that's just a coincidence).

rootfs / rootfs ro,relatime 0 0
tmpfs /dev tmpfs rw,relatime,mode=755 0 0
devpts /dev/pts devpts rw,relatime,mode=600 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
tmpfs /mnt/asec tmpfs rw,relatime,mode=755,gid=1000 0 0
/dev/block/mtdblock3 /system yaffs2 ro,relatime 0 0
/dev/block/mtdblock6 /data yaffs2 rw,nosuid,nodev,relatime 0 0
/dev/block/mtdblock4 /cache yaffs2 rw,nosuid,nodev,relatime 0 0
/dev/block/mtdblock0 /misc yaffs2 rw,nosuid,nodev,relatime 0 0
/dev/block/vold/179:9 /mnt/sdcard vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0702,dmask=0702,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
/dev/block/vold/179:9 /mnt/secure/asec vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0702,dmask=0702,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
tmpfs /mnt/sdcard/.android_secure tmpfs ro,relatime,size=0k,mode=000 0 0

/dev: 451240K total, 12K used, 451228K available (block size 4096)
/mnt/asec: 451240K total, 0K used, 451240K available (block size 4096)
/system: 244864K total, 97948K used, 146916K available (block size 4096)
/data: 216832K total, 63208K used, 153624K available (block size 4096)
/cache: 32768K total, 1156K used, 31612K available (block size 4096)
/misc: 2048K total, 1156K used, 892K available (block size 4096)
/mnt/sdcard: 7763120K total, 38052K used, 7725068K available (block size 4096)
/mnt/secure/asec: 7763120K total, 38052K used, 7725068K available (block size 4096)

I will try ls -la and creating a file (adb is probably not an option, but I should be able to exec something like date > /mnt/sdcard/Download/testfile).
CreateUnique is the thing that works 9999 times (per slave, if you fail to actually delete the file), and then fails obscurely, isn't it?
https://secure.pub.build.mozilla.org/buildapi/recent/tegra-199?numbuilds=1000 and put mochitest-8 in the search box, that tegra was fine until the 3rd, remains fine for other tests, but has failed every M8 since the first failure. https://secure.pub.build.mozilla.org/buildapi/recent/tegra-249?numbuilds=5000 is more of a go-getter, so 1000 jobs doesn't even get you back to where (assuming I'm right) it got its filename9999.whatever and became unable to CreateUnique and unable to run this test, back in January.
(In reply to Phil Ringnalda (:philor) from comment #254)
> CreateUnique is the thing that works 9999 times (per slave, if you fail to
> actually delete the file), and then fails obscurely, isn't it?

Oh, you're talking about this limitation right? <http://dxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileCommon.cpp#49>

So I looked at my try pushes so far and as far as I can see, not two of them have run on the same slave yet :(  But I don't think this is what's happening here since we start off with a randomly generated file name <http://mxr.mozilla.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#1399>, and the only way for us to fall into the 9999 maximum file limitation is if we start with the same base name...
We are definitely accumulating files in /mnt/sdcard/Download. This is from a successful test:

----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 5Yd4knB1.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 uaUftZeJ.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 GDZl3TIY.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 xeMgoFj4.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 HqvUl4WS.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 QWZDAnX5.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 bqC+BfNB.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 wexVUpAw.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 wXpGAieP.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 46VhywsV.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 PVKgP9Jy.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 oOsGKvBI.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 ADLDnVLR.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 5Tad9nbR.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 cA2Rm20a.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 jK+30ngu.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 66VHXDQs.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 aZHcWUjt.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 VEi6Hr_X.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 SRVKVy6e.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 S+QKLF2V.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 2_ahq+Yv.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 qbtJ93SF.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 YjuBOzox.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 4IJLvj9C.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 mrvNLdfa.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 xZ9v6ukn.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 CdxS+Vcy.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 iK56utt8.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 lar8+_cs.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 Nwqe6Ria.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 4ZNBM8JU.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 MDwQL5Lw.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 WPmShr4G.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 bToMOLpA.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 C+P4QANR.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 FWxTNRSo.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 BHA4HLiH.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 dkFO_2KS.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-11 19:40 0vFpcceK.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-14 14:16 Ue3LOeEL.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-14 14:16 tDhsWBMo.exe.part
----rwxr-x system   sdcard_rw                0 2013-02-14 14:16 iVKe3U6e.exe.part
...
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 h4uXU8Em.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 TY29rXn5.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 HyPWiseU.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 QdoQu2CX.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 OThUc2Ds.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 NEwdcJvN.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-07 22:42 wGpF++FZ.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 74jFyXPw.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 vwT5_40h.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 iITZdTbs.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 Rj+VdvnD.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 xGAdDZ6S.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 RyPtA2sg.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 J3uxGTdJ.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 05zyaJwi.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 3WrZgm6a.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 zPeg4+dX.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 EVLP5mVw.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 PaalZzdj.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 qi_zwRlR.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 AsWlXuFi.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 LJYj7Auw.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 kmI8Ij7o.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 yls8_z2Z.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 0orZMO5y.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 E72n_cUD.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 KmY3Vzk_.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 rg+S_zJB.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 Wi9rXls0.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 dXd5tqmB.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 lFY3K3LK.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 pJUiO75W.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 NY1sq2Hw.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 xsacOR3h.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 3rlhGmDs.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 t2oEUZMr.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 mbu1hfZK.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 AYDweYRA.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 WFKQnrld.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 UczXxlgS.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 eLIuIgJR.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 6odYEH56.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 aguE+C39.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 _Vs9PYfI.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 mHstyaK7.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 SqtMFsKy.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 00:29 MhgsAmEX.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 LJWvruQy.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 ippnJzbm.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 YYdNmGGD.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 Sv8GmsMQ.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 YJhJelD4.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 aux_9Q4R.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 _9E2hHqh.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 l2aSGxF0.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 WsSSlToF.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 JNTa1LW1.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 MlO7tljw.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 zK8yzu5+.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 12uHgD4X.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 csTJ084P.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 2lRDARDk.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 2eqbfM6j.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 OXmLzTvc.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 +UzCqCng.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 pogbKJju.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 JSDboIJt.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 XEBbCZdN.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 gaRwbCNS.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 +g6wrkqj.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 h1prNs_o.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 2XA0HglN.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 a1Zaq1k0.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 +9kW5GMn.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 2vZUwfVq.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 EbCy7qSm.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 3LWUTgw4.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 9b9z+V6A.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 hlmizQwi.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 DWr9ttHo.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 zq3Tw9G4.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 s+kZeRRy.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 yLCJYex6.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 FPSwMMiH.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 OwudR2ec.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 tM34uCCA.exe.part
----rwxr-x system   sdcard_rw                0 2014-03-08 05:52 mRc_rmzI.exe.part

https://tbpl.mozilla.org/php/getParsedLog.php?id=35842180&tree=Try&full=1

We should probably cleanup /mnt/sdcard/Download in http://mxr.mozilla.org/build/source/tools/sut_tools/cleanup.py.
Flags: needinfo?(gbrown)
Here's a failed test run: https://tbpl.mozilla.org/php/getParsedLog.php?id=35843259&tree=Try&full=1. On this tegra, "ls -l /mnt/sdcard/Download" failed.
(In reply to Geoff Brown [:gbrown] from comment #258)
> We are definitely accumulating files in /mnt/sdcard/Download. This is from a
> successful test:

Aha!  That's 16,000 files right there.  IIRC FAT32 only allows for 64k files per directory, so it's not surprising if at some point a slave will end up being unable to create files in the download folder.

> We should probably cleanup /mnt/sdcard/Download in
> http://mxr.mozilla.org/build/source/tools/sut_tools/cleanup.py.

Yes, agreed, except...

(In reply to Geoff Brown [:gbrown] from comment #259)
> Here's a failed test run:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=35843259&tree=Try&full=1.
> On this tegra, "ls -l /mnt/sdcard/Download" failed.

hmm, not sure what this means.  If we can't ls the folder, how can we clean it up?  I don't think we can safely delete and recreate it with the correct owner/permission unless if we have root access on these devices.
Also, FWIW, I verified that attempting to create the file results in ENOSPC.  This is the code in the vfat driver which raises this error: <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/fat/dir.c#n1292> (err is set to -ENOSPC on line 1288 before the loop.)

This confirms the hypothesis that we're running out of maximum entries per directory on the FAT32 filesystem.
Attached patch CleanupSplinter Review
(Not sure how to test this patch -- also, not sure if this is going to work for all slaves if we can't list the directory contents.)
Attachment #8388101 - Flags: review?(gbrown)
Comment on attachment 8388101 [details] [diff] [review]
Cleanup

Review of attachment 8388101 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine to me but let's run it by release engineering.
Attachment #8388101 - Flags: review?(gbrown)
Attachment #8388101 - Flags: review?(bugspam.Callek)
Attachment #8388101 - Flags: feedback+
(In reply to Geoff Brown [:gbrown] from comment #259)
> Here's a failed test run:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=35843259&tree=Try&full=1.
> On this tegra, "ls -l /mnt/sdcard/Download" failed.

tegra-250 -- fwiw, this does fail for me too (as does ls `/mnt/sdcard/Download`) however `exec ls /mnt/sdcard/Download` works just fine!

(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #260) 
> (In reply to Geoff Brown [:gbrown] from comment #259)
> > Here's a failed test run:
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=35843259&tree=Try&full=1.
> > On this tegra, "ls -l /mnt/sdcard/Download" failed.
> 
> hmm, not sure what this means.  If we can't ls the folder, how can we clean
> it up?  I don't think we can safely delete and recreate it with the correct
> owner/permission unless if we have root access on these devices.

we do have root access, `execsu` with SUTAgent (you can also call many "shell" commands with root=True)  e.g. http://mxr.mozilla.org/build/source/tools/sut_tools/mozdevice/devicemanagerSUT.py#304
Comment on attachment 8388101 [details] [diff] [review]
Cleanup

Review of attachment 8388101 [details] [diff] [review]:
-----------------------------------------------------------------

I'm a little nervous about some of our tegras dropping out of the pool due to this ls bug with the download folder per my previous comment, that said this is indeed a GOOD cleanup to have, and an improvement over today, so I'm happy to have this landed, and I'll deploy to foopies after it does land.

:ehsan feel free to land to hg.m.o/build/tools whenever you're ready, I'll take care of deploying this week (and can even do so sooner if pinged in #releng)
Attachment #8388101 - Flags: review?(bugspam.Callek) → review+
Landed: https://hg.mozilla.org/build/tools/rev/59fa26f23794

Let's see if this patch manages to clean up some of these tegras.  If not, I can find a RelEng bug to pull them off all of the affected ones and clean them up manually.
Comment on attachment 8388101 [details] [diff] [review]
Cleanup

I'm hesitant to call this bug fixed until we see the benefits of it, however, it has been deployed
Attachment #8388101 - Flags: checkin+
Duplicate of this bug: 960101
I landed a followup for this, since I never tested it as advised and it broke the whole pool.

http://hg.mozilla.org/build/tools/rev/938c20dcd0dc

Also since many tegras wouldn't work with the ls issue mentioned above:

Fixing those ls issues was me doing a rmdr on /mnt/sdcard/Download manually. We might want to change the patch to do that in the future, but for this I expect to be good.
According to orangefacter this looks fixed, :gbrown concur?
Flags: needinfo?(gbrown)
There is absolutely no way that orangefactor could tell you that it's fixed: the bug was that a set of tegras with 16K files in their Download directory could not successfully run mochitest-8, and the initial patch took every single one of those affected slaves out of action all day. Orangefactor telling you that it's fixed would consist of using OF to get the list of affected slaves, using https://secure.pub.build.mozilla.org/buildapi/recent/tegra-237 to see what they've done recently, and filtering on mochitest-8 to see what they've done in the way of this suite. None of them have yet completed a job since this morning, so none of them have had a chance to say whether they are fixed.
(In reply to TBPL Robot from comment #277)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=35919213&tree=Mozilla-
> Inbound
> Android 2.2 Armv6 Tegra mozilla-inbound opt test mochitest-8 on 2014-03-10
> 19:30:12
> revision: 38ecb36a4f52
> slave: tegra-136
> 
> TEST-UNEXPECTED-FAIL |
> /tests/uriloader/exthandler/tests/mochitest/test_unsafeBidiChars.xhtml |
> application timed out after 330 seconds with no output

Uh-oh, is this after the fix?
Hmm, yeah.  Can someone please see why tegra-136 is still affected?
this one had no /mnt/sdcard/Download -- Might have been leftover from me attempting a fix elsewhere :/

I just created it with exec mkdir /mnt/sdcard/Download

I'll look into ways to enforce/ensure that
No failures yet today is a bit more encouraging, though :)
Reverted the debugging stuff: https://hg.mozilla.org/integration/mozilla-inbound/rev/5ef71c50623d

I think we're done here.  Thanks everyone!
Whiteboard: [leave open]
Whew!

There may still be a problem with some of the affected tegras not taking jobs / not mounting their sdcards...I understand :Callek is working on that.
Flags: needinfo?(gbrown)
This was merged.
https://hg.mozilla.org/mozilla-central/rev/5ef71c50623d
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.