Open
Bug 596229
Opened 14 years ago
Updated 1 year ago
static global atoms
Categories
(Core :: JavaScript Engine, enhancement, P3)
Core
JavaScript Engine
Tracking
()
NEW
People
(Reporter: igor, Unassigned)
References
(Depends on 3 open bugs, Blocks 1 open bug)
Details
(Whiteboard: [sp3])
Attachments
(1 file, 1 obsolete file)
834 bytes,
patch
|
Details | Diff | Splinter Review |
Currently a few global atoms are created on startup using js_Atomize calls. These calls create JSString from the corresponding C string chars and hash them before adding into the global atom table. This startup cost can be avoided if the strings will be constructed statically (similar to unit and other static strings) and their hash will be precalculate during compilation.
Reporter | ||
Comment 1•14 years ago
|
||
callgrind on my iCore7 laptop shows that js_InitCommonAtoms is responsible for 2.5% of execution time when running js -e 'quit()'.
Assignee: general → igor
Reporter | ||
Comment 2•14 years ago
|
||
The patch is a work in progress. It turns all common atoms into static strings.
To make initialization of the static atoms as fast as possible the patch
precompute the string hash and adds to the hashtable addUniqueNoRemovals
function that assumes that the keys are unique, no entries is removed from
the table and it has enough capacity to accommodate the new entry.
With this changes the callgrind shows that the execution time of running
js -e 'quit()' drops by 2.4% while contribution of atom initialization
decreases from 2.5% to 0.56%. In principle that can be decreased father via
pre-recording the hashtable itself so the hashtable can be constructed using
single malloc and memcpy from the static template, but the gains would
relatively small compared with efforts.
Another note is that the patch in jsstaticatimgen.cpp duplicates the code for
hash calculations from jsstr.cpp and jshashtable.h. Unfortunately it is not
possible to include the corresponding headers into the generator as it runs on
the host while the headers depends on the configuration for the target. Still
perhaps via some shared header the duplication can be avoided.
An alternative to running jsstaticatimgen.cpp atis to use in jsstaticatom.tbl
instead of ATOMDEF(Math) a set of macros like ATOMDEF<N> with a usage like
ATOMDEF4(Math, 'M','a','t','h')
Such macros would allow in principle to calculate the hash, but that may push
the compielrs to the limit. For example, ATOMDEF4 during hash calculations can be defined as
#define ATOMDEF4(name, char1, char2, char3, char4) \
R(R(R(R(uint32(0), char1), char2), char3), char4)
where R implements the body of the loop from js_HashString:
#define R(hash, new_char) ((((hash) << 4) | ((hash) >> 28)) ^ uint32(new_char))
But since the body of R include hash twice to get ATOMDEF(propertyIsEnumerable) the expansion would be on the order of 2**length(propertyIsEnumerable) or 2**20. Clearly smarter solutions is necessary.
So for now I have opted for source code generation.
Asking Luke for a feedback here.
Attachment #475876 -
Flags: feedback?(lw)
Comment 3•14 years ago
|
||
Before I try to understand the patch, I had a few high-level questions:
In the browser, is global atom initialization done per global object or per runtime? I would guess it is done per runtime. In that case, would this be a speedup for browser startup, and about how much time do you think it would save?
Reporter | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> In the browser, is global atom initialization done per global object or per
> runtime? I would guess it is done per runtime.
That is right, once per runtime.
> In that case, would this be a
> speedup for browser startup, and about how much time do you think it would
> save?
There are 3 small wins. First the runtime initialization on my laptom would be 0.2 ms faster. Second few cycles cycles would be spared per each GC as it would not need to mark the common atoms. Third win is that common atoms would be loaded using a constant pointer and not using two or more indirect loads as in cx->runtime->atomState.nameAtom.
Comment 5•14 years ago
|
||
The reason I ask is because the patch seems to make things more complicated, so it seems like it should be justified by some measurable improvement somewhere, as was the case with unit, int, and length-2 strings.
Win 1, measured in comment 2, seems very small since browser startup is on the order of 1000s of ms. Win 2 seems like it could show up when marking an empty or close-to-empty heap... perhaps you can demonstrate its best case speedup with a micro-benchmark? As for Win 3, with the tracejit, method-jit and PICs, I don't see us pulling common atoms out of the runtime on any hot paths... but perhaps you can find some with a measurable improvement?
Reporter | ||
Comment 6•14 years ago
|
||
Updated patch with fixes and more optimizations.
Attachment #475876 -
Attachment is obsolete: true
Attachment #476528 -
Flags: feedback?(lw)
Attachment #475876 -
Flags: feedback?(lw)
Reporter | ||
Comment 7•14 years ago
|
||
(In reply to comment #5)
> Win 1, measured in comment 2, seems very small since browser startup is on the
> order of 1000s of ms.
Callgrind shows that js_AtomizeString takes at least 3% of the browser startup time with 40% of that time spent on atomization of new atoms. I would like to decrease that as much as possible. This patch is a preliminary work that shows that pre-computing just 143 common atoms used by JS engine can have measurable benefits on the startup time.
> Win 2 seems like it could show up when marking an empty
> or close-to-empty heap... perhaps you can demonstrate its best case speedup
> with a micro-benchmark?
A js benchmark that shows clear benefits of the patch is the following:
var list = [];
var obj = {};
var x = 0;
for (var i = 0; i != 1e6; ++i) {
var c1 = 10 + Math.floor(i / 1e4);
var c2 = 10 + Math.floor(i / 100) % 100;
var c3 = 10 + i % 100;
var str = String.fromCharCode(c1, c2, c3);
list.push(str);
if (obj[str])
++x;
}
list = null;
var t = Date.now();
gc();
t = Date.now() - t;
print("gc time:" + t);
It creates 1e6 atoms that turns into the garbage during the timed GC. With the patch time drops from 67 to 51 ms. The win comes from the marking phase which no longer needs to scan garbage-full atom hash just to mark pinned atoms.
> As for Win 3, with the tracejit, method-jit and PICs,
> I don't see us pulling common atoms out of the runtime on any hot paths... but
> perhaps you can find some with a measurable improvement?
The parser often refers to the predefined atoms. The benefits of elimination of cx->runtime->atomState loads is visible in the following test (it compiles a function containing return arguments[0]+...+arguments[0])
var s = "return "+Array(1 << 17).join("arguments[0]+")+"0;";
Function(s);
callgrind shows a drop of the instruction count from 627 to 625 in millions with halve of the win coming from the parser and another halve from more efficient implementation of js_AtomizeString.
Comment 8•14 years ago
|
||
(In reply to comment #7)
> (In reply to comment #5)
> > Win 1, measured in comment 2, seems very small since browser startup is on the
> > order of 1000s of ms.
>
> Callgrind shows that js_AtomizeString takes at least 3% of the browser startup
> time with 40% of that time spent on atomization of new atoms. I would like to
> decrease that as much as possible. This patch is a preliminary work that shows
> that pre-computing just 143 common atoms used by JS engine can have measurable
> benefits on the startup time.
Atomization may be a significant part of browser startup, but is the atomization of global atoms a significant part of atomization? I don't see any indication of that from the measurements you've quoted.
> > Win 2 seems like it could show up when marking an empty
> > or close-to-empty heap... perhaps you can demonstrate its best case speedup
> > with a micro-benchmark?
[snip]
> It creates 1e6 atoms that turns into the garbage during the timed GC. With the
> patch time drops from 67 to 51 ms. The win comes from the marking phase which
> no longer needs to scan garbage-full atom hash just to mark pinned atoms.
Now this looks enticing, but on first glance it seems independent of whether we do any static initialization of global atoms. Is there any simple alternative way of achieving the same benefits?
> > As for Win 3, with the tracejit, method-jit and PICs,
> > I don't see us pulling common atoms out of the runtime on any hot paths... but
> > perhaps you can find some with a measurable improvement?
[snip]
> callgrind shows a drop of the instruction count from 627 to 625 in millions
> with halve of the win coming from the parser and another halve from more
> efficient implementation of js_AtomizeString.
That is a .3% improvement on parsing; still not the killer argument for all the hassle.
Reporter | ||
Comment 9•14 years ago
|
||
(In reply to comment #8)
> Atomization may be a significant part of browser startup, but is the
> atomization of global atoms a significant part of atomization? I don't see any
> indication of that from the measurements you've quoted.
A possible way to speedup the browser startup is to add API that would allow to quickly setup the atom table with about 3000 long-term atoms. For that one can add to the hashtable a function that allows to add unique elements with a pre-calculated hash. This way a browser can provide a memory-mapped file with char data and hashes to the atoms and the implementation would just populate the hashtable with new entries avoiding touching the char data itself until the corresponding atom is referenced.
The patch implements that adding of unique elements and uses that for global atoms from the engine itself.
> Now this looks enticing, but on first glance it seems independent of whether we
> do any static initialization of global atoms. Is there any simple alternative
> way of achieving the same benefits?
An alternative exists but with heap-allocated global atoms it would not allow to simplify the code removing that hack of using a hashset to store both strings and interned/pinned flags.
> That is a .3% improvement on parsing; still not the killer argument for all the
> hassle.
I am not sure what do you mean by the hassle. Do you refer to the extra source generator? If so, then this is a price to simplify the current code to use just STATIC_ATOM(name) instead of referring atomState in the runtime. The latter is especially ugly in the parser with code like:
tc->parser->context->runtime->atomState.argumentsAtom
Also for the final patch I plan to merge the atom generator together with the keyword generator so the hassle will be less intrusive.
Comment 10•14 years ago
|
||
(In reply to comment #9)
> (In reply to comment #8)
> > Atomization may be a significant part of browser startup, but is the
> > atomization of global atoms a significant part of atomization? I don't see any
> > indication of that from the measurements you've quoted.
>
> A possible way to speedup the browser startup is to add API that would allow to
> quickly setup the atom table with about 3000 long-term atoms. For that one can
> add to the hashtable a function that allows to add unique elements with a
> pre-calculated hash.
Cool, so would that be most/all of that startup atomization overhead you mentioned earlier?
Btw, I think there was a mix-up with the v2 patch posted; its very short ;-)
Updated•14 years ago
|
Attachment #476528 -
Flags: feedback?(lw)
Reporter | ||
Updated•13 years ago
|
Assignee: igor → general
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Updated•2 years ago
|
Severity: normal → S3
Updated•1 year ago
|
Updated•1 year ago
|
Whiteboard: [sp3]
Updated•1 year ago
|
See Also: → https://mozilla-hub.atlassian.net/browse/SP3-480
Comment 11•1 year ago
|
||
Some update on the atomization during startup.
- well-known atoms' hash is pre-calculated during compilation [1]
- well-known atom uses dedicate path skips some check [2]
- JSRuntime::initializeAtoms doesn't appear in startup profile. JS::InitSelfHostedCode takes 6ms, but mostly in JSRuntime::initSelfHostingFromStencil
- other atomization take ~20ms during startup, example in the following. some atoms are statically known from webidl, and that part could be optimized as well:
js::WellKnownAtomInfo js::wellKnownAtomInfos[] = {
...
mozilla::HashStringKnownLength(js_##IDPART##_str, \
sizeof(js_##IDPART##_str) - 1), \
bool JSRuntime::initializeAtoms(JSContext* cx) {
...
for (size_t i = 0; i < uint32_t(WellKnownAtomId::Limit); i++) {
const auto& info = wellKnownAtomInfos[i];
JSAtom* atom = PermanentlyAtomizeCharsValidLength(
You need to log in
before you can comment on or make changes to this bug.
Description
•