Compiled dlmalloc benchmark 22X slower in IonMonkey

NEW
Unassigned

Status

()

Core
JavaScript Engine
6 years ago
3 years ago

People

(Reporter: azakai, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

89.39 KB, application/javascript
Details
(Reporter)

Description

6 years ago
Created attachment 617665 [details]
dlmalloc

js -m -n src.js 20 20

vs.

  ionjs src.js 20 20

, IonMonkey is 22X slower. Since almost all emscripten-compiled projects use malloc and free, this affects a lot of them.
Well, this test is impressive!
we manage to spend almost no time running code.  Looking at where perf says we spend time up to the first kernel function listed gives:
    13.08%  js.ion.opt  js                  [.] js::ion::MNode::replaceOperand(unsigned long, js::ion::MDefinition*)
     6.48%  js.ion.opt  js                  [.] js::ion::EliminatePhis(js::ion::MIRGraph&)
     5.53%  js.ion.opt  js                  [.] js::ion::MDefinition::replaceAllUsesWith(js::ion::MDefinition*)
     4.98%  js.ion.opt  js                  [.] js::ion::LinearScanAllocator::allocateRegisters()
     4.97%  js.ion.opt  js                  [.] js::ion::LinearScanAllocator::buildLivenessInfo()
     3.96%  js.ion.opt  js                  [.] js::ion::CodeGeneratorShared::encodeSlots(js::ion::LSnapshot*, js::ion::MResumePoint*, unsigned int*)
     3.67%  js.ion.opt  js                  [.] js::ion::MResumePoint::inherit(js::ion::MBasicBlock*)
     3.37%  js.ion.opt  js                  [.] js::ion::LIRGeneratorShared::buildSnapshot(js::ion::LInstruction*, js::ion::MResumePoint*, js::ion::BailoutKind)
     3.34%  js.ion.opt  js                  [.] js::ion::MPhi::op() const
     2.94%  js.ion.opt  js                  [.] js::ion::MBasicBlock::inherit(js::ion::MBasicBlock*)
     2.38%  js.ion.opt  js                  [.] js::ion::LinearScanAllocator::setIntervalRequirement(js::ion::LiveInterval*)
     1.91%  js.ion.opt  js                  [.] js::ion::ValueNumberer::lookupValue(js::ion::MDefinition*)
     1.85%  js.ion.opt  js                  [.] js::ion::MResumePoint::getOperand(unsigned long) const
     1.79%  js.ion.opt  js                  [.] js::ion::LiveInterval::covers(js::ion::CodePosition)
     1.59%  js.ion.opt  js                  [.] TypeAnalyzer::propagateSpecialization(js::ion::MPhi*)
     1.52%  js.ion.opt  js                  [.] js::ion::LinearScanAllocator::reifyAllocations()
     1.45%  js.ion.opt  js                  [.] js::ion::MResumePoint::setOperand(unsigned long, js::ion::MDefinition*)
     1.35%  js.ion.opt  js                  [.] js::ion::SnapshotWriter::addUndefinedSlot()
     1.12%  js.ion.opt  js                  [.] js::ion::Loop::insertInWorklist(js::ion::MInstruction*)
     1.11%  js.ion.opt  js                  [.] js::ion::LiveInterval::firstIncompatibleUse(js::ion::LAllocation)
     1.08%  js.ion.opt  js                  [.] js::ion::LinearScanAllocator::resolveControlFlow()
     1.06%  js.ion.opt  js                  [.] js::LifoAlloc::getOrCreateChunk(unsigned long)
     0.93%  js.ion.opt  js                  [.] js::ion::LinearScanAllocator::populateSafepoints()
     0.79%  js.ion.opt  js                  [.] js::ion::ValueNumberer::computeValueNumbers()
     0.77%  js.ion.opt  js                  [.] js::ion::MPhi::getOperand(unsigned long) const
     0.72%  js.ion.opt  [kernel.kallsyms]   [k] __percpu_counter_add

which sums to 73.74%!
and none of these routines have anything to do with the interpreter, only compiling in IM (and type analysis).
I suspect that chunked compilation will help with this.  Nevertheless, I'll continue looking into this to see if there is anything horribly silly that we do that would cause compilation to go so painfully slowly.
(In reply to Marty Rosenberg [:mjrosenb] from comment #1)
> I suspect that chunked compilation will help with this.

With my chunked compilation (WIP) patch, Ion is about 2x slower than JM+TI (60 ms vs 30 ms). About 30 ms is compilation time (5 ms with JM+TI). I hope we can bring this down to about 15-20 ms by optimizing snapshots a bit. There's a large number of locals and with chunked compilation these have a fixed location so we don't have to encode them.

Note that the interpreter is still faster than both Ion and JM+TI (20 ms), so I guess there's not much to optimize/win here for the JITs. Alon, is it okay to use "200 200" instead of "20 20"? It won't read out-of-bound array values or something?
Depends on: 746225
(Reporter)

Comment 3

6 years ago
Yes, any value >0 of those two arguments is fine. First parameter is how many malloc() /free() calls to do each repetition, the second is how many repetitions. Here is the original source,

https://github.com/kripken/emscripten/blob/c7bed7ab29a5e351166bf570825edc2a94c43aef/tests/dlmalloc_test.c

With high enough parameters, I would hope that JITs would help here...
Blocks: 705294

Comment 4

5 years ago
On AWFY, misc-dlmalloc have regressed a lot since last September.
(Assignee)

Updated

3 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.