Closed
Bug 75083
Opened 24 years ago
Closed 23 years ago
Tokenizer (and more) slowed down by non-inlined code
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: bratell, Assigned: bratell)
References
()
Details
(Keywords: perf)
Attachments
(3 files)
616 bytes,
patch
|
Details | Diff | Splinter Review | |
53.30 KB,
image/jpeg
|
Details | |
1.04 KB,
text/plain
|
Details |
nsString::AppendWithConversion, nsStr::Trim and nsLocalCString(nsLocalCString)
all do nsCRT::strlen which don't get inlined by the MSVC compiler and they show
up high on the Tokenizer cost on the jdk index page. By inlining them manually I
got a 15% speedup in the tokenizer. I will try calling strlen direct and see if
the compiler can do something even better with that.
Assignee | ||
Updated•24 years ago
|
Assignee | ||
Comment 1•24 years ago
|
||
I have seen this in profile builds but if we use the compiler flags /Oi or /O2
we should get inlined strlen in windows. What compiler flags are really used
when building on Windows?
The only relevant /O2 I found is:
/config/makefile.win, line 87 -- CFLAGS = /O2 /GB /MD
but from the comment it looks like that is only for the tools. So aren't we
inlining? Are we using the wrong build flag? Can we make a _big_ win here or am
I missing something?
CC:ing some people who might know how our build system works on Windows.
Comment 2•24 years ago
|
||
I'm not sure that we've really done a good job exploring this space. We use -O1
(see mozilla/config/WIN32). Adding mjudge & rpotts, who know a lot about Win32.
Assignee | ||
Comment 3•24 years ago
|
||
I did:
set MOZ_FLAGS=/Oi (inline strlen and other similar functions)
and rebuilt strings and xpcom (nothing else). The quick and dirty result in
quantify was that the total time to load the page in the url above went from
12.3 seconds to 10.3 on my computer. A big part of the difference can be
explained by fewer reflows for some reason. Probably because they are timer
started and it the layout is quicker, fewer are needed.
So even if the "real" speedup is unknown, there seems to be some. I will try to
do a full rebuild with /Oi to see what that does, but someone with more
resources than I should really try some of the optimizer flags and do wall
clock timings. I can't access John Morrison's tests I guess?
And by the way, the object files I looked at was smaller than without /Oi so
inlining some functions seems to generate smaller code than doing the function
call.
Keywords: topperf
Assignee | ||
Comment 4•24 years ago
|
||
Another flags to test would be /Ob2 (let the compiler inline anything it think
is suitable to inline).
/O1 and /O2 only implies /Ob1 which is to inline functions with the |inline|
keyword but nothing else.
Assignee | ||
Comment 5•24 years ago
|
||
A full rebuild with /0i and I'm down to 10.1 seconds in my very unscientific
measurements. That's just 2% better than with just xpcom built with /Oi, but it
is still a positive trend.
Comment 6•24 years ago
|
||
Showing chofmann what Daniel found.
Assignee | ||
Comment 7•24 years ago
|
||
So I changed WIN32 to build everything with /O2 /Ob2 and did a new run. It's
much noise but what I see in parts of the program is in the table below. The
times are F+D times in seconds as reported by Quantifty. The number in
parentheses after the time is number of calls.
Before: After:
PresShell::ProcessReflowCommands 6.482s (191) 4.998s (635)
nsHTMLTokenizer::ConsumeToken 0.623s (91k) 0.409s (91k)
nsPresContext::ResolveStyleContextFor 1.200s (32k) 1.039s (32k)
nsCSSFr.Const...::ConstructFrameByDisplayType 1.334s (31k) 1.183s (31k)
SinkContext::FlushText 0.630s (62k) 0.534s (62k)
and so on...
I want this change in the real tree as soon as possible, to at least get some
real world measurements! We also need to see how much this affects build size.
(This put a new light on all those questions about when we are going to release
an optimized build. :-) )
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: mozilla0.9
Summary: Tokenizer slowed down by non-inlined strlens in string code → Tokenizer (and more) slowed down by non-inlined code
Target Milestone: --- → mozilla0.9
Assignee | ||
Comment 8•24 years ago
|
||
Comment 9•24 years ago
|
||
This caught my eye, so I ran jrgm's performance test once with /Oi and then
again with /O2 /Ob2, and both runs produced the same timings as the current
(/O1) flags.
Assignee | ||
Comment 10•24 years ago
|
||
That's depressing. I wonder why I got so nice figures during the weekend. Didn't
you see any difference at all?
Maybe it will only show an effect on big computational intensive pages? Anyway,
I'm rebuilding right now to see if my /O1 figures were wrong in any way.
Comment 11•24 years ago
|
||
Perhaps Quantify's instrumentation makes non-inline functions seem slower when
in fact there's little difference.
One thing that has I've always wondered is if we'd save anything by
making all the methods in nsCOMPtr & nsString statically linked or inline rather
than the cross-DLL calls we make now.
Comment 12•24 years ago
|
||
Comment 13•24 years ago
|
||
FWIW, I tested release-mode builds on my PII/400mhz/256mb laptop, over a DSL
line at home. I did clobber builds and cleared the disk cache before each test
run. The graph represents the median page load time for each site (and for all
sites - first element in the series), across the five iterations that jrgm's
test does. I can post the raw data (or XLS spreadsheet) if anyone wants.
Assignee | ||
Comment 14•24 years ago
|
||
I believe you. The graph is very clear. I think Adam may have a point. The
non-inlined functions are "timed", not instrumented as our code, and maybe that
timing is wrong in some way. The only problem with that theory is that even if I
remove all strlen and memcpy completely from my "Before" build, it will still
not be as fast in Quantify.
I timed seven loads of the JDK index page with the time Mozilla displays by
loading it into an empty mozilla window and then pressing shift+reload six
times. These figures are interesting in their own but they shows no
improvements. :-(
/O1 /O2 /Ob2
1. 8.858s 9.153s 3.3% slower
2. 12.628s 12.047s 4.6% faster
3. 19.608s 18.897s 3.6% faster
4. 19.058s 20.900s 9.7% slower
5. 15.463s 14.912s 3.6% faster
6. 14.551s 14.881s 2.3% slower
7. 13.710s 13.390s 2.3% faster
This is so depressing. I thought I really had found something. Well, well. Back
to Quantify. :-/
Maybe someone could explain why the load times differs so much in the same
pattern:
Fast - slower - really slow - really slow- slightly slow - slightly slow -
slightly slow....
I can understand the difference between number 1 and 2, because the previuos
page has to be torn down, something we are really slow at, but the rest?
At least that could explain why different people get different numbers on the
same page with the same computer.
Severity: major → minor
Target Milestone: mozilla0.9 → ---
Comment 15•24 years ago
|
||
Daniel, the rumor in today's performance meeting is that you may have a patch to
manually modify those functions into in-line code?? (besides turning the
compilier options) If that's true, can you post your patch to this bug, so we
can try it out too?
Thanks!
Cathleen
Comment 16•23 years ago
|
||
What is the status of this bug?
Is there similar phenomenon in other OS,such as:solaris?
Assignee | ||
Comment 17•23 years ago
|
||
Phenomena, as in "code that should be inlined is not", or phenomena as in
"quantify is reporting inaccurate numbers for timed functions"?
I believe heavier inlining can do good things at some places but apparently it
didn't affect page load in a way that was measurable.
Comment 18•23 years ago
|
||
hi,daniel
Because the bug haven't been discussed for at least a year,I want to know
whether your changing compling options is effective to enhance the performance
problem now. Is there a conclusion of previous discussion?
In theory,although inline function can lead to the inflation of binary code, it
is still helpful to improve performance of mozilla if changed the small size
functions to inline functions. In pratice, we should blance the two sides of
inline functions.
BTW, have someone tried similar compiling options in other OS,such as,
linux,solaris?
thanks
leon
Comment 19•23 years ago
|
||
I modified the functions in nsString and nsCRT,and changed some short
functions,whose line-counting number is smaller than 10, into "inline" function.
The groups of testing data below is the time consumed(milliseconds) when running
local testcase "index-7.html" using build from modified code and original code.
index
Org1
org2
Org3
Org4
0rg5
Mod1
Mod2
Mod3
Mod4
1
10547
10453
10843
11828
10484
11344
10641
10453
10563
2
10218
10844
10703
12266
10797
11360
10719
11125
11234
3
11454
11703
11437
10547
10672
11750
11078
11031
11187
4
11015
11125
11172
10734
11016
11735
11172
11156
11531
5
10657
11407
10922
12015
11141
11265
11313
11204
11063
6
10969
11453
11297
11110
11187
11109
11250
10813
11094
7
10562
10579
10469
10938
10531
10563
10547
10437
10484
8
10250
10422
10625
10985
10390
10578
10438
10422
10250
9
10781
10468
10204
10453
10266
10141
10047
10156
10172
10
10422
10344
11391
10422
10109
10203
10047
10485
10000
11
10125
10000
10203
10171
11094
10219
10063
10563
10031
12
10297
10062
10953
10203
9985
10547
12266
10438
10688
13
10937
10063
10562
10125
9953
10171
10078
10500
10078
14
10032
10032
10469
10062
10062
10188
10453
10516
10141
15
10047
10110
10641
10110
10391
10172
10062
10531
10688
-------------------------------------------------------------------------------------
Ava
10554
10604
10792
10797
10538
10756
10678
10655
10613
According to the data above,it is difficult to say that these modification is
helpful to enhance the performance of mozilla.
Although I strongly suggest that we should create inline functions if they are
short,I don't think the bug is very helpful and I suggest it be closed if no
oppostions.
At the same time, we can also get similar conclusion from previous comments.
Comment 20•23 years ago
|
||
I modified the functions in nsString and nsCRT,and changed some short
functions,whose line-counting number is smaller than 10, into "inline"
function.
The groups of testing data in attachment is the time consumed(milliseconds)
when running local testcase "index-7.html" using build from modified code and
original code.
According to the testing data ,it is difficult to say that these modification
is
helpful to enhance the performance of mozilla.
Although I strongly suggest that we should create inline functions if they are
short,I don't think the bug is very helpful and I suggest it be closed if no
oppostions.
At the same time, we can also get similar conclusion from previous comments.
Updated•23 years ago
|
Attachment #90200 -
Attachment description: testing data from modificated code and original code → testing data from modified code and original code
Comment 21•23 years ago
|
||
see previous comments
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Updated•5 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•