Closed
Bug 627220
Opened 15 years ago
Closed 8 years ago
Finalize external strings in the background
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox56 | --- | fixed |
People
(Reporter: gal, Assigned: jandem)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
|
6.37 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
|
3.04 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
External strings are currently finalized via an API callback. This is lame, at least for our current embedding, because all we end up doing in the callback is nsMemory::Free which does NS_Free which does NS_Free_P which does free. Our default engine-internal treatment is to call free as well, but we do so from a background thread, which is quite a bit faster.
| Reporter | ||
Comment 1•15 years ago
|
||
External string allocation profile for a mix of cnn.com, facebook.com and slashdorg.org:
647 0
1091 1
467 2
1034 3
2093 4
660 5
576 6
1379 7
853 8
785 9
676 10
1676 11
818 12
592 13
453 14
852 15
858 16
327 17
186 18
399 19
174 20
168 21
157 22
179 23
125 24
109 25
42 26
28 27
193 28
96 29
74 30
98 31
114 32
62 33
243 34
42 35
53 36
221 37
54 38
52 39
59 40
51 41
45 42
120 43
147 44
51 45
56 46
293 47
34 48
35 49
55 50
161 51
103 52
82 53
62 54
44 55
215 56
82 57
56 58
83 59
39 60
57 61
55 62
50 63
30 64
35 65
106 66
33 67
104 68
54 69
96 70
68 71
95 72
183 73
56 74
105 75
37 76
75 77
136 78
52 79
136 80
63 81
100 82
49 83
82 84
63 85
71 86
69 87
67 88
85 89
31 90
97 91
34 92
66 93
30 94
38 95
20 96
22 97
35 98
16 99
10 100
35 101
39 102
54 103
7 104
16 105
20 106
6 107
16 108
3 109
10 110
6 111
11 112
14 113
21 114
3 115
6 116
14 117
6 118
5 119
13 120
4 121
3 122
4 123
4 124
7 125
13 126
7 127
3 128
6 129
1 130
2 131
4 132
6 133
3 134
1 135
11 136
1 140
3 141
6 142
12 144
5 145
1 147
2 148
3 152
1 153
3 154
1 155
3 156
3 157
2 158
1 159
2 160
2 161
1 162
1 163
6 166
19 168
1 169
4 170
14 171
2 172
1 174
18 175
8 182
1 184
6 185
9 187
1 188
2 191
6 192
5 193
1 194
1 196
5 197
3 198
2 199
1 200
2 201
22 202
15 204
2 208
1 210
10 214
2 218
2 221
1 222
1 224
6 225
22 226
1 230
1 232
27 234
1 236
1 238
10 240
24 242
1 245
1 246
1 247
1 248
12 249
1 250
1 251
10 252
2 254
1 259
1 264
1 266
4 274
10 276
7 282
8 285
1 298
1 307
6 332
6 360
28 427
10 439
1 445
1 452
2 457
11 465
17 491
35 504
24 573
2 581
1 663
4 666
2 699
1 707
2 777
12 827
1 832
1 840
2 843
2 857
2 872
2 876
2 985
2 1138
2 1140
1 1462
2 1519
1 1523
2 3068
4 3413
2 3732
2 4595
1 6690
1 7238
2 16384
1 138318
6 851954
2 990121
There is a few really huge strings here, and quite a few really short ones that we could use short strings or internal strings for, but no clear simple wins.
| Reporter | ||
Comment 2•15 years ago
|
||
(note that this isn't as simple for DOM strings, those to nsStringBuffer::FromData(chars)->Release())
Comment 3•15 years ago
|
||
That Release call should, in fact, be the common case of external strings, no? Unless the callee had an autostring....
| Reporter | ||
Comment 4•15 years ago
|
||
Yeah (the data above is for both cases, it was taken in JS_NewExternalString). Both finalizers can run from a background thread though, the ref counting of the string is atomic. anygregor is working on a patch for that.
Comment 5•15 years ago
|
||
> the ref counting of the string is atomic
Yes, though note that we'd like it to stop being so...
| Reporter | ||
Comment 6•15 years ago
|
||
First of all: yeah, do that. The atomic op is crazy expensive. I don't think it would hurt us here. These strings are relatively infrequent. I can ship them back in a runnable to the main thread and ref-count them down there.
Comment 7•15 years ago
|
||
Doing that is bug 585661, fwiw.
Updated•11 years ago
|
Assignee: general → nobody
Updated•8 years ago
|
Component: JavaScript Engine → JavaScript: GC
Summary: improve external string finalization → Finalize external strings in the background
Updated•8 years ago
|
Whiteboard: [qf]
| Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(jdemooij)
| Assignee | ||
Comment 8•8 years ago
|
||
Let's try Linux64 first:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=000eb24bbf30b460a8c361696adb4b92404c5199
| Assignee | ||
Comment 9•8 years ago
|
||
Note that this would have been harder before bug 1353758. We used to clear Gecko's string cache in the finalizer but we no longer do that. This also means we can remove the Zone* argument from the finalizer as nobody uses it.
| Assignee | ||
Comment 10•8 years ago
|
||
Try looks green. I'll be traveling tomorrow, will post patches for review soon.
Blocks: 1351769
Updated•8 years ago
|
Whiteboard: [qf] → [qf:p1]
| Assignee | ||
Comment 11•8 years ago
|
||
None of the finalizers use the Zone* argument and it's best not to expose it once we may be called off thread.
| Assignee | ||
Comment 12•8 years ago
|
||
Flags: needinfo?(jdemooij)
Attachment #8881092 -
Flags: review?(sphink)
Updated•8 years ago
|
Attachment #8881091 -
Flags: review?(sphink) → review+
Comment 13•8 years ago
|
||
Comment on attachment 8881092 [details] [diff] [review]
Part 2 - Finalize external strings in the background
Review of attachment 8881092 [details] [diff] [review]:
-----------------------------------------------------------------
If that passes tests, this is certainly about as straightforward as you can get.
Attachment #8881092 -
Flags: review?(sphink) → review+
Comment 14•8 years ago
|
||
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/efd7a75d3b34
part 1 - Remove unused Zone* argument from external string finalizers. r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4bc665ad39b
part 2 - Finalize external strings in the background. r=sfink
Comment 15•8 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/efd7a75d3b34
https://hg.mozilla.org/mozilla-central/rev/d4bc665ad39b
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 16•8 years ago
|
||
Henri, did this actually help your encoding/decoding benchmark? I was seeing some weirdness where profiler-reported mainthread time dropped, but wall-clock measurements did not..
Flags: needinfo?(hsivonen)
(In reply to Boris Zbarsky [:bz] from comment #16)
> Henri, did this actually help your encoding/decoding benchmark? I was
> seeing some weirdness where profiler-reported mainthread time dropped, but
> wall-clock measurements did not..
Some wall clock times become better (Decode js) but others became worse by quite a bit (Decode en, Decode de, Decode fr).
## Before
What Language Encoding MiB/s Mi u16s / s ns / iteration
Decode ar utf-8 256 217 1173902
Decode ar windows-1256 258 258 1008361
Decode cs utf-8 323 315 984831
Decode cs windows-1250 339 339 934001
Decode de utf-8 400 395 759988
Decode de windows-1252 419 419 732955
Decode el utf-8 167 146 1388130
Decode el windows-1253 165 165 1267433
Decode en utf-8 409 408 1646406
Decode en windows-1252 427 427 1587302
Decode fr utf-8 469 463 1911695
Decode fr windows-1252 468 468 1913543
Decode he utf-8 242 207 1036547
Decode he windows-1255 159 159 1392141
Decode ja utf-8 177 147 1335255
Decode ja shift_jis 142 130 1538963
Decode ko utf-8 138 119 1138206
Decode ko euc-kr 129 120 1167464
Decode pt utf-8 371 367 1524078
Decode pt windows-1252 370 370 1529838
Decode ru utf-8 265 227 2257596
Decode ru windows-1251 275 275 1882883
Decode th utf-8 377 308 2467706
Decode th windows-874 391 391 1961111
Decode tr utf-8 295 286 947936
Decode tr windows-1254 317 317 872234
Decode vi utf-8 272 257 2040392
Decode vi windows-1258 337 337 1755556
Decode zh_cn utf-8 312 274 953178
Decode zh_cn gb18030 242 225 1158397
Decode zh_tw utf-8 310 271 963019
Decode zh_tw big5 271 253 1045893
Decode ja euc-jp 150 137 1456071
Decode ja iso-2022-jp 79.43 67.29 2963345
Decode bin x-user-defined 174 174 73810
Decode js utf-8 188 188 440061
Decode js big5 190 190 434660
Decode js euc-kr 182 182 455065
Decode js gb18030 191 191 432953
Decode js gbk 188 188 439677
Decode js shift_jis 192 192 431096
Decode js euc-jp 188 188 440894
Decode js iso-2022-jp 77.66 77.66 1064802
Decode js windows-1250 192 192 430904
Decode js windows-1251 178 178 465639
Decode js windows-1252 189 189 436603
Decode js windows-1253 188 188 439229
Decode js windows-1254 185 185 446721
Decode js windows-1255 185 185 446721
Decode js windows-1256 186 186 445423
Decode js windows-1257 185 185 445833
Decode js windows-1258 181 181 456131
Decode js windows-874 192 192 430392
Decode js☃ utf-8 187 187 442646
Encode ar utf-8 134 74.82 164431
Encode cs utf-8 133 121 347444
Encode de utf-8 246 242 298258
Encode el utf-8 133 73.96 232621
Encode en utf-8 274 274 224858
Encode fr utf-8 169 163 805228
Encode he utf-8 129 74.16 365266
Encode ja utf-8 190 68.02 163995
Encode ko utf-8 138 59.75 138347
Encode pt utf-8 205 199 316752
Encode ru utf-8 141 78.94 789866
Encode th utf-8 191 68.41 925751
Encode tr utf-8 135 122 395010
Encode vi utf-8 114 87.60 725922
Encode zh_cn utf-8 188 69.64 156411
Encode zh_tw utf-8 175 64.83 168321
Encode js utf-8 314 314 263765
## After
What Language Encoding MiB/s Mi u16s / s ns / iteration
Decode ar utf-8 171 145 1750520
Decode ar windows-1256 204 204 1277857
Decode cs utf-8 237 231 1340804
Decode cs windows-1250 245 245 1293655
Decode de utf-8 260 257 1171449
Decode de windows-1252 275 275 1116803
Decode el utf-8 203 178 1138713
Decode el windows-1253 205 205 1020184
Decode en utf-8 302 301 2233333
Decode en windows-1252 299 299 2266686
Decode fr utf-8 303 299 2960871
Decode fr windows-1252 305 305 2931200
Decode he utf-8 189 162 1329025
Decode he windows-1255 198 198 1115323
Decode ja utf-8 214 177 1103170
Decode ja shift_jis 172 157 1272135
Decode ko utf-8 154 133 1019358
Decode ko euc-kr 155 144 973668
Decode pt utf-8 270 267 2093879
Decode pt windows-1252 278 278 2040381
Decode ru utf-8 213 182 2818242
Decode ru windows-1251 217 217 2384516
Decode th utf-8 275 224 3387314
Decode th windows-874 267 267 2877903
Decode tr utf-8 214 207 1307787
Decode tr windows-1254 220 220 1255750
Decode vi utf-8 214 202 2593434
Decode vi windows-1258 251 251 2353548
Decode zh_cn utf-8 235 206 1265369
Decode zh_cn gb18030 193 179 1456071
Decode zh_tw utf-8 236 207 1263775
Decode zh_tw big5 203 190 1393010
Decode ja euc-jp 181 165 1209320
Decode ja iso-2022-jp 92.76 78.58 2537500
Decode bin x-user-defined 220 220 58543
Decode js utf-8 229 229 360997
Decode js big5 222 222 372837
Decode js euc-kr 223 223 370419
Decode js gb18030 223 223 370731
Decode js gbk 218 218 378857
Decode js shift_jis 216 216 383438
Decode js euc-jp 222 222 371745
Decode js iso-2022-jp 91.11 91.11 907597
Decode js windows-1250 231 231 357297
Decode js windows-1251 218 218 378663
Decode js windows-1252 230 230 360143
Decode js windows-1253 222 222 372268
Decode js windows-1254 225 225 368226
Decode js windows-1255 221 221 374879
Decode js windows-1256 219 219 378375
Decode js windows-1257 222 222 371812
Decode js windows-1258 219 219 377049
Decode js windows-874 219 219 378315
Decode js☃ utf-8 238 238 347533
Encode ar utf-8 133 74.49 165170
Encode cs utf-8 135 122 342971
Encode de utf-8 238 234 308094
Encode el utf-8 139 77.25 222716
Encode en utf-8 282 281 219035
Encode fr utf-8 165 159 825270
Encode he utf-8 134 76.96 351947
Encode ja utf-8 194 69.46 160589
Encode ko utf-8 138 60.03 137715
Encode pt utf-8 207 201 313627
Encode ru utf-8 142 79.36 785702
Encode th utf-8 199 71.29 888390
Encode tr utf-8 136 123 392599
Encode vi utf-8 116 89.27 712368
Encode zh_cn utf-8 185 68.48 159075
Encode zh_tw utf-8 185 68.39 159573
Encode js utf-8 319 319 259043
Flags: needinfo?(hsivonen)
Comment 18•8 years ago
|
||
Hrm. So I'm left wondering whether this fix is a win in general at all, given the behavior described in bug 1375566. :(
Flags: needinfo?(sphink)
Flags: needinfo?(jdemooij)
| Assignee | ||
Comment 19•8 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #18)
> Hrm. So I'm left wondering whether this fix is a win in general at all,
> given the behavior described in bug 1375566. :(
So the problem is that instead of sweeping, we now spend a lot of time waiting for the background thread, right?
We use background sweeping for a lot of other GC things (like non-external strings) and if there's an issue with that, it likely affects those other things as well.
sfink or jonco can probably say more about this..
Flags: needinfo?(jdemooij)
| Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(jcoppeard)
Comment 20•8 years ago
|
||
> we now spend a lot of time waiting for the background thread, right?
Well, it's _a_ problem. I don't know whether it's the only problem in this case.
In particular, I can see how that problem would make this change not improve performance. It's not clear to me why performance would actually become "much worse" as comment 17 describes. That seems pretty fishy to me. Maybe we're ending up with significant contention on the malloc locks or something?
Comment 21•8 years ago
|
||
Could it be because of reasons along the lines of Julian's findings in bug 1291355 comment 26? jemalloc's multi-core performance is terrible: having a background thread call into jemalloc more often can certainly make malloc a lot slower, not due to lock contention, but due to cache line movement since all data is shared across all cores basically.
Perhaps Julian can help?
Comment 22•8 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #21)
> Could it be because of reasons along the lines of Julian's findings in bug
> 1291355 comment 26? [..] Perhaps Julian can help?
I can run some low level memory-system (MESI, etc) profiles to see what's
going on, if someone can give me STR for the alleged slowdown. In other
words, what test case + configuration should I measure?
Flags: needinfo?(ehsan)
Comment 23•8 years ago
|
||
The testcase is https://hsivonen.com/test/moz/encoding_bench_web/english-only.html and then clicking the "Run" button.
Comment 24•8 years ago
|
||
I hope comment 23 answers your question Julian. :-) Thanks a lot for your help!
Flags: needinfo?(ehsan)
Comment 25•8 years ago
|
||
I can't replicate the slowdown, at least using a local copy of the test,
which has been set up to run 4 times back-to-back, per hsivonen's guidance.
I am comparing the following versions:
without (patch) = 366117:638029b68a85
with = 366119:d4bc665ad39b
Measured rates (Encoding MiB/sec)
117: 699 700 696
119: 771 772 767
This is on x86_64 Linux, Fedora 25, non-debug build, -g -O2.
Updated•4 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1]
Comment 26•1 year ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #19)
Background sweeping waiting on the jemalloc lock is a known issue.
Flags: needinfo?(jcoppeard)
Updated•1 year ago
|
Flags: needinfo?(sphink)
You need to log in
before you can comment on or make changes to this bug.
Description
•