Closed Bug 1010321 Opened 10 years ago Closed 10 years ago

12% Non-Main Startup File IO Bytes regression on win7 on May 9th on beta (Fx30) from push: bbac2a994298

Categories

(Core Graveyard :: DOM: Apps, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

holy batman, we have a new regression on beta, lets tackle this before we ship with it!

Here is a graph:
http://graphs.mozilla.org/graph.html#tests=[[248,53,25]]&sel=none&displayrange=30&datatype=running

I did some retriggers on beta, it looks to be this push:
https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=bbac2a994298

The problem is there is a bunch of dontbuild stuff that landed as well- if we allow that on mozilla-beta, then we have enough confidence that it is not going to affect performance- so right now the 4 changesets that landed for this given push are in the hot seat!

Here is some background information on xperf (the job that produces non-main startup file io bytes):
https://wiki.mozilla.org/Buildbot/Talos/Tests#xperf

Thanks for getting fixes on beta and addressing regressions before we ship them!
It's probably bug 981085, because it is using nsIFile instead of OS.File. There were issues with using OS.File, so I think we need to live with the regression.
:marco- can you explain the issues with os.file?  In bug 981085 I don't see a problem description, so this seems like a nice to have feature that is affecting performance instead of a feature that we added or a bug fixed that is more important than the performance benchmark.

:aklotz, Do you have any thoughts?
Flags: needinfo?(mar.castelluccio)
Flags: needinfo?(aklotz)
What is the plan moving forward on this? Is OS.File being improved to accommodate this? Is there a plan for reverting to OS.File once any improvements have been made?
Flags: needinfo?(aklotz) → needinfo?(dteller)
I didn't investigate specific issues that prompted to bug 981085, but it is my understanding that the code was pushed to avoid the ~1-2Mb cost of the OS.File thread (well, 2.5Mb on my Nightly), for the sake of Tarako. Marco, Ehsan, can you confirm that this was the issue?

There are plans to completely get rid of the OS.File thread and move to a mostly native implementation, but that's not for 2014. There are parallel plans to improve memory usage of JS workers, but I'm not aware of recent improvements.

I suppose that we could keep the OS.File version for Desktop and the nsIFile version for B2G, but this will make things harder to test – and that's not really something we want to change during Beta, I guess.
Flags: needinfo?(dteller) → needinfo?(ehsan)
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 16th-23th) from comment #4)
> I didn't investigate specific issues that prompted to bug 981085, but it is
> my understanding that the code was pushed to avoid the ~1-2Mb cost of the
> OS.File thread (well, 2.5Mb on my Nightly), for the sake of Tarako. Marco,
> Ehsan, can you confirm that this was the issue?

That was part of it, yes.  Fabrice and others also pointed out that there were some perf issues with OS.File on b2g as well but I don't remember the details.

> There are plans to completely get rid of the OS.File thread and move to a
> mostly native implementation, but that's not for 2014. There are parallel
> plans to improve memory usage of JS workers, but I'm not aware of recent
> improvements.

Last I heard from khuey on the worker memory usage issue this week, we're not there yet. :(

> I suppose that we could keep the OS.File version for Desktop and the nsIFile
> version for B2G, but this will make things harder to test – and that's not
> really something we want to change during Beta, I guess.

Yeah, we don't really want to maintain two code paths here, I agree.
Flags: needinfo?(ehsan)
2 code paths is a bad thing, I will vote for a single code path whenever possible.  While this is a significant percentage hit, I think this is all that is hit by changing this.  No other benchmarks/tests have regressed.

Shall we let this go and focus on the OS.File fix in the future?
(In reply to Joel Maher (:jmaher) from comment #6)
> 2 code paths is a bad thing, I will vote for a single code path whenever
> possible.  While this is a significant percentage hit, I think this is all
> that is hit by changing this.  No other benchmarks/tests have regressed.
> 
> Shall we let this go and focus on the OS.File fix in the future?

It seems like the best course of action (where "best" is merely a relative qualification).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
We experienced some bugs, probably the most important was fixed by Yoric (I don't remember the bug #).
We also experienced slow downs because the OS.File worker took too long to startup (lately the read function was rewritten in C++, but we need to use some other functions at startup).

The problems were mostly seen on b2g emulators, but I do agree we can't keep two separate code paths. It's better to wait a bit before trying again the transition to OS.File.

This and other patches that we landed recently almost fixed all the frequent intermittent failures we were seeing, so I think it's worth it.
Flags: needinfo?(mar.castelluccio)
By the way, is there a way to get xperf to tell us which files have seen an increase in bytes transfer? Just to be on the safe side.
Flags: needinfo?(aklotz)
When xperf turns orange you see the problem immediately. The xperf test does log its data, but with respect to analysis of that data, I'm going to have to defer to Joel on that question.
Flags: needinfo?(aklotz) → needinfo?(jmaher)
Right now all the raw data it stored in datazilla, we just don't have a good view on it.  One tool I have for this in inside the talos repository: compare.py:
python compare.py --xperf --revision 1a00dde01343 --branch Inbound

This is pointing to a change I landed on inbound today and it pulls the results.  If you run that locally, you can see the raw data and bytes for each file.  Of course you would need to run this twice and then summarize the data to calculate non-main startup bytes.
Flags: needinfo?(jmaher)
Joel, did you find out which files were affected, as we discussed over IRC?
Flags: needinfo?(jmaher)
I looked over the data we upload and there is no difference in the raw file bytes|counts.  I see the difference in the summary.  It looks like we need to either adjust what we upload or ignore this value
Flags: needinfo?(jmaher)
What does this mean? How is the total computed?
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.