Closed
Bug 965130
Opened 11 years ago
Closed 11 years ago
pdf.js: create less garbage while parsing
Categories
(Firefox :: PDF Viewer, defect, P4)
Tracking
()
RESOLVED
FIXED
Firefox 29
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Whiteboard: [pdfjs-c-performance][MemShrink])
If I open http://stacks.math.columbia.edu/download/book.pdf, which is a 4000+ PDF, and wait for the file to finish being read, about:memory tells me this: > │ ├──820.95 MB (72.97%) -- > workers(localhost)/worker(../src/worker_loader.js, 0x7f232bd41000) > │ │ ├──799.56 MB (71.07%) -- zone(0x7f232414a800) > │ │ │ ├──410.71 MB (36.51%) -- strings > │ │ │ │ ├──285.73 MB (25.40%) -- notable > │ │ │ │ │ ├──245.05 MB (21.78%) ++ (2593 tiny) > │ │ │ │ │ └───40.68 MB (03.62%) ── string(length=15, copies=666537, > "[object Object]")/gc-heap > │ │ │ │ ├──118.15 MB (10.50%) ── short-gc-heap > │ │ │ │ └────6.83 MB (00.61%) ++ normal What a lot of strings. And looking in detail at them, there are *lots* of entries like this: > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=10, copies=6174, "http://sta")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=11, copies=6174, "http://stac")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=12, copies=6174, "http://stack")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=13, copies=6174, "http://stacks")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=14, copies=6174, "http://stacks.")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=15, copies=6174, "http://stacks.m")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=16, copies=6174, "http://stacks.ma")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=17, copies=6174, "http://stacks.mat")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=18, copies=6174, "http://stacks.math")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=19, copies=6174, "http://stacks.math.")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=20, copies=6174, "http://stacks.math.c")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=21, copies=6174, "http://stacks.math.co")/gc-heap > │ │ │ │ │ │ ├────0.38 MB (00.03%) ── string(length=22, copies=6174, "http://stacks.math.col")/gc-heap Goodness! It looks like pdf.js is building up strings one char at a time, and indeed, src/core/parser.js does exactly that in the Lexer class. But isn't the JS engine supposed to do well in such cases, thanks to the use of ropes? Well, it turns out that thanks to http://dxr.mozilla.org/mozilla-central/source/js/src/vm/String.cpp#398, ropes aren't used if the result of a concatenation can fit into a short string. And on 64-bit platforms, a short string can have up to 23 chars. And this test case causes *many* strings shorter than 23-chars to be built up. (And that's why all the entries above have "gc-heap" values, but don't have "malloc-heap" values, because the chars are inline in the GC thing.) I have a patch that converts pdf.js to use Array.join instead of += to build up these strings. It reduces the size of the "strings/notable" sub-tree to just over 10 MiB, and reduces the peak RSS encountered while reading the file from ~1130 MiB to ~800. I will submit a GitHub pull request to get that landed in the pdf.js GitHub repo. But I also I wonder if we should reconsider the JS engine's behaviour here. I tried commenting out the short-string code in js::ConcatStrings(), and sure enough the peak RSS dropped to ~800 MiB, very similar to what I got when using Array.join.
Assignee | ||
Comment 1•11 years ago
|
||
The patches are at https://github.com/mozilla/pdf.js/pull/4213
Comment 2•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #0) > But I also I wonder if we should reconsider the JS engine's behaviour here. > I tried commenting out the short-string code in js::ConcatStrings(), and > sure enough the peak RSS dropped to ~800 MiB, very similar to what I got > when using Array.join. I tried this in bug 861251. It improved sunspider a bit but regressed Dromaeo and that's why I reverted that change, see bug 863018. Also note that Ion inlines the short-string path, so you should probably comment out the short-string code there as well, see JitCompartment::generateStringConcatStub.
Assignee | ||
Comment 3•11 years ago
|
||
> I tried this in bug 861251. It improved sunspider a bit but regressed
> Dromaeo and that's why I reverted that change
Fair enough; I can see that the best choice would depend greatly on workload.
One noteworthy thing is that our behaviour is quite different on 32-bit vs 64-bit here. On 32-bit, a JSInlineString can hold 3 jschars (excluding null-termination) and a JSShortString can hold 11 jschars. On 64-bit, the corresponding numbers are 7 and 23. So this particular problem would be much smaller on 32-bit.
Assignee | ||
Comment 4•11 years ago
|
||
The patches landed in pdf.js: https://github.com/mozilla/pdf.js/commit/acb33b3e7d2a69bd9743a537421bca16317e4de8
Comment 5•11 years ago
|
||
Fixed on trunk by bug 965861.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
You need to log in
before you can comment on or make changes to this bug.
Description
•