987152 - Remove %APPDATA% and %LOCALAPPDATA% from Windows testers

Reporter

Description

•

11 years ago

In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #501) > Q, wfm from what you said about the cltbld context. > > FTR: %APPDATA% resolved to C:\Users\cltbld\AppData\Roaming > > (In reply to Q from comment #482) > > I have a scheduled task ready to test that nukes the profile folders in > > %APPDATA%\Mozilla\Firefox\Profiles\. > > > > The bat is simple and looks thus: > > > > for /F "delims=\" %%I in ('dir /ad /b %APPDATA%\Mozilla\Firefox\Profiles') > > DO ( > > rd /S /Q "%APPDATA%\Mozilla\Firefox\Profiles\%%I" > > > > > > ) (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #502) > As mentioned on IRC, we need to clean %LOCALAPPDATA% as well so we don't > perpetually accumulate stale cache directories.

Ryan VanderMeulen [:RyanVM]

Comment 1

•

11 years ago

Can we get an ETA on this? Bug 918507 remains and extremely high frequency test failure on branches where it wasn't disabled, it and is doubly bad because it aborts the test run when it hits.

Severity: normal → critical

Flags: needinfo?(q)

Q

Assignee

Comment 2

•

11 years ago

I am still testing but it should be done this week.

Flags: needinfo?(q)

Q

Assignee

Comment 3

•

11 years ago

I can implement this today with a full cleanup to be clear the previous cleanup in starttalos plus this change will clean the following: %APPDATA%\Mozilla\Firefox\console.log %LOCALAPPDATA%\Temp %APPDATA%\Mozilla\Firefox\Profiles %LOCALAPPDATA%\Mozilla\Firefox\Profiles

Q

Assignee

Comment 4

•

11 years ago

I also think we should clean %userprofile%\downloads Gentlemen what do you think?

Flags: needinfo?(armenzg)

Ryan VanderMeulen [:RyanVM]

Comment 5

•

11 years ago

SGTM!

Q

Assignee

Comment 6

•

11 years ago

Also be aware that this may cause a delay in testing times for a period as some of these directories are packed and will take time to clear.

Armen [:armenzg]

Reporter

Comment 7

•

11 years ago

That is fine. Thanks for letting us know about the delays. Understandable. Thanks Q!

Flags: needinfo?(armenzg)

Q

Assignee

Comment 8

•

11 years ago

Great the new clearing will happen next reboot before runslave is launched.

Nick Thomas [:nthomas] (UTC+12)

Comment 9

•

11 years ago

We're 4 hours since this deployed (taking comment #8 as the timestamp), and down all but 7 of our t-w864-ix slaves. The main trees are closed. We have to backout, reboot some boxes to get them back online, and figure out a gradual deployment strategy.

Nick Thomas [:nthomas] (UTC+12)

Comment 10

•

11 years ago

nthomas closed two hours now ? if we have a way to remediate this and get things open, why wouldn't we use it ? I asked t-w864-ix-029 to delete 10.5k tmp<blah> dirs, 10 mins later it has found 625k files using 26G to delete, but is still counting this is going to take ages to remove at 250 files/s RyanVM|afk, nthomas: lets back out the change and keep at least 50% of the slaves running while it gets deployed to the other portion? nthomas Q: ^^ please please pretty please I can handle reboots if that helps Q On it

Nick Thomas [:nthomas] (UTC+12)

Comment 11

•

11 years ago

Q commented out the cleanup commands in the startup script. I've walked up https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w864-ix sorted by slave name ascending as far as t-w864-ix-090, rebooting if the slave was still cleaning up temp files. About 57 hosts in all, the rest had already finished themselves. There are about 20 slaves with higher NNN which are still cleaning up, which I'm leaving to finish.

Ryan VanderMeulen [:RyanVM]

Comment 12

•

11 years ago

Nick may be right that the issue with being orange on the first run was due to deleting the profile directory but not %APPDATA%\Mozilla\Firefox\profiles.ini.

Armen [:armenzg]

Reporter

Comment 13

•

11 years ago

I'm very sorry about this. I could have not thought it could have taken this long :( I think we should look into do clean up through mozharness scripts with/without the GPO cleanup.

Ryan VanderMeulen [:RyanVM]

Comment 14

•

11 years ago

Having just starred bug 918507 on Windows XP, I'm also reminded this cleanup would be nice to run on all Windows test slaves eventually once the bugs are worked out. Bonus points for OSX too (where I know I've seen screenshots of the desktop filled with garbage files generated during test runs).

Q

Assignee

Comment 15

•

11 years ago

If we do this clean up every time I don't think it will be an issue after we get over this first hump.

Q

Assignee

Comment 16

•

11 years ago

I am going to do a background clean up for %LOCALAPPDATA\Temp ( which should be done at the beginning of every job) for any file older than 2 days then put just that part of the cleaner script back. Where are we on the profiles bug? Q

Flags: needinfo?(armenzg)

Armen [:armenzg]

Reporter

Comment 17

•

11 years ago

If it is a background removal we could have talos jobs being affected. I don't know though. We should really fix this on the mozharness side. <armenzg> Q, what does a "background" clean up mean? is it different than a normal cleanup? <armenzg> Q, so what you're saying is that you're picking only 1 directory instead of the 4 ones mentioned on comment 3? <armenzg> Q, is there a way to deploy a change to 20 machines at a time? <armenzg> Q, that way we can see if the machines fall behind <Q> Aremn: sure <Q> Batchs are easy <jhopkins|buildduty> Q: armen mentioned a "background cleanup" - if it is what it sounds like, could that background process overlap with a build? <jhopkins|buildduty> just trying to make sure there's no race condition (eg. large number of temp files causes strange failures) <Q> jhopkins|mtg: It would be a background script that would only kill files older than 2 days. The load is niced down so it "should" be invisible <jhopkins|mtg> ok.. if we keep seeing failures around temp dirs we should confirm that more closely <Q> RyanVM|sheriffduty: jhopkins|mtg: Should I kick off the background clean up ? <RyanVM|sheriffduty> Q: by kick off you mean turn off? <RyanVM|sheriffduty> or start? <Q> to be clear this will only be for %LOCALAPPDATA\Temp <Q> Start <RyanVM|sheriffduty> what's happening now that we think might be causing problems? Callek> Q: as a data point, when did we do the cp python.exe python2.7.exe as well here? <Callek> could we possibly be missing some process elevation whitelist entry with regard to that? <Q> Callek: good question let me check <Callek> (potentially as it relates to easy_install* since anything with install in name can trigger UAC on windows) <RyanVM|sheriffduty> oh dammit, the failures on aurora are real bustage <Q> Callek: I hate that "feature" <Callek> Q: agreed :/ <Q> Can someone aim me at a machine currently having issues ? <RyanVM|sheriffduty> "currently"? <RyanVM|sheriffduty> didn't we go over this last week? By the time we see a problem, it's already on to another job? <RyanVM|sheriffduty> so https://tbpl.mozilla.org/php/getParsedLog.php?id=37018979&tree=Mozilla-Aurora this is a failure <RyanVM|sheriffduty> no clue what state that slave is in now <RyanVM|sheriffduty> oh, it already ran green on another job <Q> Wasn't that for the Profiles directory ? <Q> Last week that is <Q> RyanVM|sheriffduty ^ <RyanVM|sheriffduty> My point is that the inherent lag of this failures being reported makes it basically impossible for me to point you at a slave that's having problems "right now" <RyanVM|sheriffduty> just one that was failing at one point in the recent past <Q> Right Sorry I was getting hopeful and forgetful

Flags: needinfo?(armenzg)

Q

Assignee

Comment 18

•

11 years ago

I never got a chance to kick this off. You are right that we should be doing something on the mozharness side to fix this with an OS level catchall just in case. So there are two issues here: 1) cleaning out the %TEMP% folder 2) Cleaning the Profiles folders ( which causes a test failure) * This is a serious issue as we may have been using polluted profile folders for tests.

Q

Assignee

Comment 19

•

11 years ago

Okay after looking at this for a while. for issue 2 I think we should delete: %APPDATA%\Mozilla %LOCALAPPDATA%\Mozilla entirely before each run and mozharness should check for a clean environment as well. Those directories are created at browser first run ( unless of course that import prompt will break anything but I think our test account for a run on a blank system) thoughts ? Q

Q

Assignee

Updated

•

11 years ago

Flags: needinfo?

Q

Assignee

Updated

•

11 years ago

Blocks: 991236

Armen [:armenzg]

Reporter

Comment 20

•

11 years ago

For me the plan makes sense. However, I currently cannot help with the mozharness component. Perhaps in a week or two.

Flags: needinfo?

Armen [:armenzg]

Reporter

Comment 21

•

11 years ago

As per IRC discussion, the cleanup that was backed out cleaned enough that we're not as much under pressure, however, we have some machines that could be time-bombs if they did not go through the cleanup process. I'm asking around if anyone can pick it up this week.

Severity: critical → major

Q

Assignee

Comment 22

•

11 years ago

As a status updates. We are back to cleaning %TEMP% so we are no longer in danger of drives filling up from that. What we need now is someone to verify that killing %APPDATA%\Mozilla %LOCALAPPDATA%\Mozilla won't kill the next test that runs. The last conjecture here was that killing the profiles but not the profiles.ini is what was causing a failure condition on hte next test run. If that is true than killing the parent directories should work.

Flags: needinfo?(armenzg)

Armen [:armenzg]

Reporter

Comment 23

•

11 years ago

jmaher: do you know if deleting those 2 dirs will cause any issues? I don't think so but wanted to double check. Do we have to regenerate them before starting tests? Q: I have a new idea, let me know if this is possible without trouble. Can GPO remove files for few minutes and then stop? This way it would delete it a bit every day until we have cleaned up enough. If it doesn't make sense, don't worry about it.

Flags: needinfo?(armenzg) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

11 years ago

I am not sure if we will have problems, I would bet 3/1 that we would be fine with those directories removed. The idea of removing a few files is a good one. Can mozharness do some of this as well? I know that isn't the right place to do it, but it would ensure success!

Flags: needinfo?(jmaher)

Aki Sasaki (not active)

Comment 25

•

11 years ago

This could live in a preflight_clobber() (or postflight_clobber() ) in mozharness.mozilla.testing.testbase.TestingMixin . Or we could add an optional self.config['additional_clobber_files'] that clobber() looks for, and set that for all appropriate tests.

Q

Assignee

Comment 26

•

11 years ago

This is done in starttalos as of yesterday

Amy Rich [:arr] [:arich]

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED