Closed Bug 1157624 Opened 10 years ago Closed 10 years ago

Odin: Split parsing/validation from codegen

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla42

Tracking Flags:

Tracking

Status

firefox40

---

affected

firefox42

---

fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(11 files, 26 obsolete files)

1 - Random cleanups.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 4.13 KB, patch	luke : review+	Details \| Diff \| Splinter Review
wip.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 35.19 KB, patch		Details \| Diff \| Splinter Review
wip.part2.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 81.81 KB, patch		Details \| Diff \| Splinter Review
folded.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 98.85 KB, patch		Details \| Diff \| Splinter Review
wip.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 212.32 KB, patch		Details \| Diff \| Splinter Review
wip.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 217.09 KB, patch		Details \| Diff \| Splinter Review
wip.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 259.88 KB, patch		Details \| Diff \| Splinter Review
2. Remove ternary optimizations and use FoldTest all the time 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 8.41 KB, patch	luke : review+	Details \| Diff \| Splinter Review
all.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 273.02 KB, patch		Details \| Diff \| Splinter Review
MozReview Request: A few free tests. 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 39 bytes, text/x-review-board-request		Details
all.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 274.38 KB, patch		Details \| Diff \| Splinter Review
3. Folded patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 272.14 KB, patch		Details \| Diff \| Splinter Review
3.1.tests.patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 6.09 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3.2. Changes to ModuleCompiler 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 13.73 KB, patch		Details \| Diff \| Splinter Review
3.3. Opcodes 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.29 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3.4. Function object 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 3.92 KB, patch		Details \| Diff \| Splinter Review
3.5. Function Builder 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.77 KB, patch		Details \| Diff \| Splinter Review
3.6. Function Compiler 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 39.74 KB, patch		Details \| Diff \| Splinter Review
3.7. Encoding and Decoding 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 199.23 KB, patch		Details \| Diff \| Splinter Review
MozReview Request: Changes to ModuleCompiler 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 40 bytes, text/x-review-board-request		Details
MozReview Request: Bytecode. 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 40 bytes, text/x-review-board-request		Details
MozReview Request: Function interface. 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 40 bytes, text/x-review-board-request		Details
MozReview Request: FunctionBuilder. 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 40 bytes, text/x-review-board-request		Details
MozReview Request: FunctionCompiler. 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 40 bytes, text/x-review-board-request		Details
MozReview Request: Bug 1157624: Remaining changes. 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 40 bytes, text/x-review-board-request		Details
3.2. Changes to ModuleCompiler 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 13.76 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3.4. Function object 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 3.97 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3.5. Function Builder 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.74 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3.6. Function Compiler 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 39.70 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3.7. Encoding and Decoding 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 199.59 KB, patch		Details \| Diff \| Splinter Review
3. Folded patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 266.61 KB, patch		Details \| Diff \| Splinter Review
3.7. Encoding and Decoding 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 198.97 KB, patch	luke : review+	Details \| Diff \| Splinter Review
3. Folded patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 271.83 KB, patch		Details \| Diff \| Splinter Review
3. Folded patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 277.80 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
unpack-shell.js 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 356 bytes, application/javascript		Details
3. Folded patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 279.18 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
3. Folded patch 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 281.52 KB, patch	bbouvier : review+ gkw : feedback+	Details \| Diff \| Splinter Review

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Description

•

10 years ago

Odin currently parses, validates, emits MIR and performs the final codegen step linearly. We could have parsing/validation made in a thread, another thread converting the AST into MIR and performing codegen (and use the remainder of the helper threads, when the number of cores >2, to execute Ion background compilation jobs). The net effect would be to pipeline parsing+validation with MIR and code generation, which are all currently performed on the same thread. This is a first step to an asm.js baseline compiler, and it would reduce asm.js compile times on systems with 4 or more cores.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 2

•

10 years ago

Attached patch 1 - Random cleanups.patch — Details — Splinter Review

Random small cleanups.

Attachment #8598106 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 3

•

10 years ago

Attached patch wip.patch (obsolete) — Details — Splinter Review

This is *very* WIP-y, but this should help validating the architecture. In the current state, it can compile functions of this form: function f() { return 2 * 4 | 0; } (that is, using int32 literals, int32 mul, and ret) Here are the newly introduced data structures: - wasm::FunctionBuilder, which takes an asm.js module as an in-param and outputs a wasm::Function, described below. - wasm::Function, which should contain the binary form of a Function, from the number of locals (NYI) to the compact form itself. - FunctionCompiler can now read the binary form and then emit the associated MIR graph. For now, the FunctionCompiler contains the FunctionBuilder (for quick implementation purposes), but of course they'll be independent in the future, and ultimately the FunctionCompiler should just receive the wasm::Function from the FunctionBuilder. Please don't mind the XXX / TODO items in the file, which are mostly reminders for myself. The plan is the following: - continue converting statements into the binary form. - when that's done, replace the FunctionCompiler argument of all Check* functions by a FunctionBuilder. - remove the MDefinition argument from all Check* functions. - move the FunctionCompiler in the wasm:: namespace. How does that look?

Attachment #8598113 - Flags: feedback?(luke)

Luke Wagner [:luke]

Updated

•

10 years ago

Attachment #8598106 - Flags: review?(luke) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Keywords: leave-open

Pulsebot

Comment 4

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8fb7914493b7

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 5

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/8fb7914493b7

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 6

•

10 years ago

Attached patch wip.part2.patch (obsolete) — Details — Splinter Review

That's the diff between the wip.patch and the big folded one that I am going to post in a sec.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 7

•

10 years ago

Attached patch folded.patch (obsolete) — Details — Splinter Review

That's wip + wip part 2, but it's really big, so not asking for feedback on this one.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 8

•

10 years ago

Comment on attachment 8598113 [details] [diff] [review] wip.patch The patch has changed a lot since this one, so no need to give feedback on it right now.

Attachment #8598113 - Attachment is obsolete: true

Attachment #8598113 - Flags: feedback?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 9

•

10 years ago

Attached patch wip.patch (obsolete) — Details — Splinter Review

Big WIP patch that I'll try to split in several patches later. All tests excluding SIMD / atomics / conditionals in if conditions seem to pass / testBullet pass. Need to investigate the testBullet issue before measuring overhead.

Attachment #8599970 - Attachment is obsolete: true

Attachment #8599972 - Attachment is obsolete: true

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 10

•

10 years ago

Attached patch wip.patch (obsolete) — Details — Splinter Review

An updated patch that passes testBullet, which had two issues (one at compile time, easy to debug; another one being a difference of behavior, harder to debug). I've added tests corresponding to these two issues, as they would have helped a lot debugging. Now, keeping on with ternary conditionals, as they're used in our benchmarks.

Attachment #8605530 - Attachment is obsolete: true

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 11

•

10 years ago

I've solved the latest issue (which was an issue with ternary conditionals, totally untested in our code base, so I've added a test), so I could benchmark a bit. I've run each benchmark once so there could be some noise. This isn't an apple to apple comparison, as the patched version doesn't have the ternary optimization yet. Results seem to indicate that 1) either we stay in the noise neighborhood of the baseline patch (with some slow downs, which I assume being caused by the lack of ternary optimizations -- although this hypothesis needs to be confirmed) 2) or there is some very *light* speedup (perhaps due to code locality? or noise). | --------------------------- | ---------- | --------- | | Benchmark | Baseline | Patched | | --------------------------- | ---------- | --------- | | Massive (higher is better) | | --------------------------- | ---------- | --------- | | Massive overall score | 7,285 | 7,357 | | --------------------------- | ---------- | --------- | | main-thread-poppler-cold | 6,129 | 6,322 | | --------------------------- | ---------- | --------- | | main-thread-poppler-warm | 6,048 | 6,257 | | --------------------------- | ---------- | --------- | | box2d-throughput | 30,448 | 30,821 | | --------------------------- | ---------- | --------- | | box2d-throughput-f32 | 34,106 | 33,322 | | --------------------------- | ---------- | --------- | | lua-binarytrees | 17,102 | 16,434 | | --------------------------- | ---------- | --------- | | lua-scimark | 18,945 | 19,155 | | --------------------------- | ---------- | --------- | | poppler-throughput | 22,626 | 22,835 | | --------------------------- | ---------- | --------- | | sqlite-throughput | 12,767 | 13,063 | | --------------------------- | ---------- | --------- | | poppler-cold-preparation | 704 | 725 | | --------------------------- | ---------- | --------- | | poppler-warm-preparation | 3,350 | 3,436 | | --------------------------- | ---------- | --------- | | sqlite-cold-preparation | 185 | 188 | | --------------------------- | ---------- | --------- | | sqlite-warm-preparation | 4,008 | 4,132 | | --------------------------- | ---------- | --------- | | asmjs-apps (lower is better) | | --------------------------- | ---------- | --------- | | box2d-loadtime | 202 | 213 | | --------------------------- | ---------- | --------- | | box2d-throughput | 1309 | 1318 | | --------------------------- | ---------- | --------- | | bullet-loadtime | 840 | 906 | | --------------------------- | ---------- | --------- | | bullet-throughput | 1567 | 1611 | | --------------------------- | ---------- | --------- | | lua_binarytrees-loadtime | 173 | 160 | | --------------------------- | ---------- | --------- | | lua_binarytrees-throughput | 1047 | 1066 | | --------------------------- | ---------- | --------- | | zlib-loadtime | 200 | 204 | | --------------------------- | ---------- | --------- | | zlib-throughput | 1272 | 1334 | | --------------------------- | ---------- | --------- | | asmjs-ubench (lower is better) | | --------------------------- | ---------- | --------- | | copy | 3322 | 3323 | | --------------------------- | ---------- | --------- | | corrections | 5122 | 5118 | | --------------------------- | ---------- | --------- | | fannkuch | 2430 | 2519 | | --------------------------- | ---------- | --------- | | fasta | 8279 | 8323 | | --------------------------- | ---------- | --------- | | life | 3145 | 2996 | | --------------------------- | ---------- | --------- | | memops | 4432 | 4476 | | --------------------------- | ---------- | --------- | | primes | 4981 | 5127 | | --------------------------- | ---------- | --------- | | skinning | 8740 | 8933 | | --------------------------- | ---------- | --------- |

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 12

•

10 years ago

Attached patch wip.patch (obsolete) — Details — Splinter Review

Attachment #8607491 - Attachment is obsolete: true

Luke Wagner [:luke]

Comment 13

•

10 years ago

Just so I understand what's being measured: parsing/validation is generating an IR that is consumed on the same thread to do MIR-gen? If so, that's fantastic since it means we can land w/o waiting for pipelining speedups. I'm not super concerned with single-core devices, but do you get the same results for 2 and 4 hardware threads (fixing 'numCpus' and using 'taskset')?

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 14

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #13) > Just so I understand what's being measured: parsing/validation is generating > an IR that is consumed on the same thread to do MIR-gen? Yes, exactly. > If so, that's > fantastic since it means we can land w/o waiting for pipelining speedups. > I'm not super concerned with single-core devices, but do you get the same > results for 2 and 4 hardware threads (fixing 'numCpus' and using 'taskset')? I'll do these measures tomorrow.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Blocks: 1169167

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Attachment #8598106 - Attachment description: cleanups.patch → 1 - Random cleanups.patch

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 15

•

10 years ago

Attached patch 2. Remove ternary optimizations and use FoldTest all the time — Details — Splinter Review

When building the MIR graph for asm.js, we did some short-circuiting optimizations for ternaries that include constant values (e.g. x ? 1 : y is semantically equivalent to x || y, which can be short-circuited if x is true). This was before bhackett's work in bug 1028580, which optimized these short-circuiting ternaries as an optimization pass in Ion. So far, this optimization pass was disabled for asm.js, as Odin already had done the work when building the MIR graph. However, with the work being done in this bug, doing the short-circuiting optimizations when decoding the function compact encoding is tedious. In the meanwhile, benchmarking (asm.js-apps / asm.js-ubench) shows that disabling the short circuiting optimizations in Odin while enabling the FoldTest pass all the time yields almost the same results (in the noise levels, ~2%). So let's just do this here. Benchmarks results (20-100 runs). Legend: "benchmark_name: baseline / with_patch (relative speedup)" asmjs-apps zlib-loadtime: 200.32 / 200.4 (-0.04%) box2d-throughput: 1308.51 / 1293.45 (1.15%) bullet-loadtime: 846.15 / 840.6 (0.66%) zlib-throughput: 1283.5 / 1276.2 (0.57%) lua_binarytrees-loadtime: 151.75 / 152.6 (-0.56%) lua_binarytrees-throughput: 1039.45 / 1018.25 (2.04%) bullet-throughput: 1534.77 / 1520.5 (0.93%) box2d-loadtime: 196.12 / 197.55 (-0.73%) asmjs-ubench life: 3166.66 / 3071.0 (3.02%) memops: 4457.49 / 4408.0 (1.11%) fannkuch: 2434.38 / 2395.5 (1.6%) mandelbrot-polyfill: 413.42 / 421.0 (-1.83%) skinning: 8806.53 / 9020.0 (-2.42%) fbirds-native: 98.01 / 102.5 (-4.58%) fbirds-polyfill: 288.19 / 287.5 (0.24%) mandelbrot-native: 310.43 / 311.0 (-0.18%) corrections: 5115.75 / 5065.0 (0.99%) fasta: 8233.16 / 8209.0 (0.29%) copy: 3343.36 / 3285.0 (1.75%) primes: 5024.45 / 4956.0 (1.36%)

Attachment #8612908 - Flags: review?(luke)

Luke Wagner [:luke]

Updated

•

10 years ago

Attachment #8612908 - Flags: review?(luke) → review+

Pulsebot

Comment 16

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/eff7275c4950

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 17

•

10 years ago

Attached patch all.patch (obsolete) — Details — Splinter Review

So that's the big patch, not split yet (I plan to do so tomorrow), if you want to have a first look. Here are a few hints for understanding the changes: - FunctionCompiler has been split into FunctionBuilder (which is used during encoding) and FunctionCompiler (which is used during decoding). - Each CheckX function does the type checking and adds the compact encoded form to the code stream. For most functions, next to each CheckX function there's the equivalent EmitX function, which consumes the compact form from the code stream. That helps reasoning about the layout and operands order, in general. There's a nice symmetry between Check/Emit even with respect to their arguments: first arg of CheckX is the FunctionBuilder, first arg of EmitX is the FunctionCompiler. - FunctionCompiler used ParseNode* at several places as map keys or set values. At all these places, we needed to ensure that the key was unique and monotonically increasing with the position in the code stream, so I've simply used the position itself in all these places. The patch isn't final yet (there are a few TODOs and FIXMEs still in the code, but most relate to the metadata debug information, mainly used for retrieving the source coordinates from the position in the code stream).

Attachment #8611249 - Attachment is obsolete: true

Attachment #8613688 - Flags: feedback?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 18

•

10 years ago

Attached file MozReview Request: A few free tests. (obsolete) — Details

Bug 1157624: WIP

Carsten Book [:Tomcat]

Comment 19

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/eff7275c4950

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 20

•

10 years ago

Attached patch all.patch (obsolete) — Details — Splinter Review

There was a slight issue with the patch above: a For loop was basically a condition and a maybe<increment> (so 2 opcodes to distinguish between them), as the statement in the init part could stand as a statement of its own. However, this doesn't mix well with Blocks, which expect a fixed amount of statements (and this init statement wouldn't be in the statements count, hence the issue). Fortunately, asm.js/simd-fbirds had such a control flow example (a block containing a for loop). I've isolated and reduced this test case in testControlFlow as well. We now have 4 opcodes for For loops: (w or w/o init statement) cross-product (w or w/o increment statement), but this is temporary, as I plan to rewrite encoding of all loops with a "forever" opcode. Also: I've added a debug-only opcode, DebugCheckPoint, that's inserted after a for-loop and at the end of a block, so that we can ensure basic consistency of the function encoding. If this debugging approach shows useful, it can be extended to other control flow statements.

Attachment #8613688 - Attachment is obsolete: true

Attachment #8613688 - Flags: feedback?(luke)

Attachment #8614250 - Flags: feedback?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 21

•

10 years ago

Comment on attachment 8614250 [details] [diff] [review] all.patch I am currently splitting the patch.

Attachment #8614250 - Flags: feedback?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 22

•

10 years ago

Attached patch 3. Folded patch (obsolete) — Details — Splinter Review

Folded patch, for getting a global view of how components interact.

Attachment #8614250 - Attachment is obsolete: true

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 23

•

10 years ago

Attached patch 3.1.tests.patch — Details — Splinter Review

Let's take it easy.

Attachment #8615632 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 24

•

10 years ago

Attached patch 3.2. Changes to ModuleCompiler (obsolete) — Details — Splinter Review

When building the compact format from functions, we now identify functions / globals / functions tables by an vector index in the ModuleCompiler. So all of these 3 structures need to be stored into a vector inside the MC. This patch contains all changes to the MC.

Attachment #8615637 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 25

•

10 years ago

Attached patch 3.3. Opcodes — Details — Splinter Review

Opcodes. Note that I will implement (later) the For / While loops as Forever loops (I wanted to start with something closest to the current implementation, to make sure that there won't be any behavior differences). So among the weirdos: - Id: the encoding often applies the following scheme: - put a temporary opcode that can be patched - encode the sub-expression - deduct from the sub-expression the type of the operation and then patch the temporary opcode. However, for coercions e.g., we sometimes put a temporary opcode although we didn't need one (typical example: +(+f()); the second + is useless), so we just pass the current definition, hence the need for Id. Note that Stmt::Id is needed when we call CoerceResult, expecting void and the actual retType is void (this only happens for Atomics Fence). - As loops aren't implemented via a Forever opcode (yet), we need to make a distinction between the For loops: whether they do have an initializer or not; whether they do have an increment or not. This leads to 2 x 2 = 4 opcodes. - This version has expression statements, for simplicity of implementation (I'll remove them in a follow-up, if that's fine). - Stmt::Noop happens when we have an expression statement with nothing inside (only example: ;;;;;;;;;;;;). The caller pre-allocates the temporary opcode, but the expression statement is empty, so we need a no-op for that case. - Stmt::DebugCheckPoint is an idea I've got for helping debugging (put checkpoints at some defined places in the code stream during encoding, and then check that they are at these positions when decoding). This helped me find one edge case in the implementation.

Attachment #8615649 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 26

•

10 years ago

Attached patch 3.4. Function object (obsolete) — Details — Splinter Review

The Function object is the data passed from the FunctionBuilder (during encoding) to the FunctionCompiler (which will use it for decoding). It has a very simple interface for reading and writing.

Attachment #8615650 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 27

•

10 years ago

Attached patch 3.5. Function Builder (obsolete) — Details — Splinter Review

The FunctionBuilder is the new first argument to all Check* functions, and it used for encoding the asm.js code. As a result, it inherited a few functions from the FunctionCompiler, which were used during parsing / validation.

Attachment #8615652 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 28

•

10 years ago

Attached patch 3.6. Function Compiler (obsolete) — Details — Splinter Review

Notable changes to the FunctionCompiler: - we don't use ParseNode* anymore when building new blocks. ParseNodes were used for retrieving the source coordinates and annotate the block; I do think that using the position in the bytecode stream, we could retrieve the source coordinates, using debug metadata. However, as this is a debug-only thing, just used for the (linux) JIT perf support, maybe this can wait a bit. - The LabeledBlockMap can't use a PropertyName* anymore as its key. Instead, a unique u32 identifier of the label is used as the key. - Ditto for the LabeledBlockMap / NodeStack which were using ParseNode*, which instead use the position in the bytecode (which has the property to uniquely identify statements in a monotonically increasing fashion, which seems to be sufficient for the task these ADT need to accomplish). The FunctionCompiler is still used as an advanced data structure by the Emit* functions, which mirror the Check* function by consuming the bytecode and emit the MIR.

Attachment #8615662 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 29

•

10 years ago

Attached patch 3.7. Encoding and Decoding (obsolete) — Details — Splinter Review

Unfortunately, this one is quite large and can't really be split. This is pretty mechanical: it uses the FunctionBuilder (Check* functions) and the FunctionCompiler (Emit* functions) to do the same task that was done before this patch series.

Attachment #8615663 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 30

•

10 years ago

Comment on attachment 8613700 [details] MozReview Request: A few free tests. A few free tests.

Attachment #8613700 - Attachment description: MozReview Request: Bug 1157624: WIP → MozReview Request: A few free tests.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 31

•

10 years ago

Attached file MozReview Request: Changes to ModuleCompiler (obsolete) — Details

Changes to ModuleCompiler

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 32

•

10 years ago

Attached file MozReview Request: Bytecode. (obsolete) — Details

Bytecode.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 33

•

10 years ago

Attached file MozReview Request: Function interface. (obsolete) — Details

Function interface.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 34

•

10 years ago

Attached file MozReview Request: FunctionBuilder. (obsolete) — Details

FunctionBuilder.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 35

•

10 years ago

Attached file MozReview Request: FunctionCompiler. (obsolete) — Details

FunctionCompiler.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 36

•

10 years ago

Attached file MozReview Request: Bug 1157624: Remaining changes. (obsolete) — Details

Bug 1157624: Remaining changes.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 37

•

10 years ago

I've pushed to reviewboard to make the folded patch easier to understand, as mozreview does a good job of showing code that has moved and changes within a single line of code: https://reviewboard.mozilla.org/r/9771/diff/2/#index_header

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 38

•

10 years ago

Attached patch 3.2. Changes to ModuleCompiler — Details — Splinter Review

So lua_binarytrees doesn't like that exitsSignatures_ stores the address of the Signature that's been added in the ExitMap. Terrence told me on IRC that keys could move, so we actually need to make a copy of the signature before storing it in the exitsSignatures_ array.

Attachment #8615637 - Attachment is obsolete: true

Attachment #8615667 - Attachment is obsolete: true

Attachment #8615668 - Attachment is obsolete: true

Attachment #8615669 - Attachment is obsolete: true

Attachment #8615670 - Attachment is obsolete: true

Attachment #8615671 - Attachment is obsolete: true

Attachment #8615672 - Attachment is obsolete: true

Attachment #8615637 - Flags: review?(luke)

Attachment #8615699 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 39

•

10 years ago

I've compiled the whole browser with debug and optimized symbols, to see whether I would hit any assertion added in the decoding code. No assertions were hit when running Massive, Tappy Chicken and BananaBread. Tappy Chicken and BananaBread seem to have the same behavior with or without the patch. Moreover, all tests pass in our test suite. As a consequence, I consider that most behavior differences have been ruled out. Benchmarking, now. I've tried the simplest setup (not modifying the cores configurations in SM yet - though I've set the CPU scaling governors to "performance" at the OS level). One issue I had with benchmarking is that the OS detects that the CPUs get really hot, and then it throttles down the CPU frequencies, leading to weird results. Hopefully, this happens with and without applying the patch series, so we can assume both configurations are affected. In a nutshell, for the asmjs-apps and asmjs-ubench benchmarks, there seems to be no difference above the level of noise. Massive shows weird differences for sqlite-warm-preparation (and others which get faster with the patch!), but my machine has so much variance that I'm not even sure it's due to my patch. I'll retest a few times. For each line, we have: BENCHMARK_NAME - avg(baseline scores) / avg(scores with patch applied) (relative speedup or slowdown) AsmJS-Apps (10 runs each) zlib-loadtime: 209.4 / 211.6 (-1.05%) box2d-throughput: 1366.5 / 1355.4 (0.81%) bullet-loadtime: 904.2 / 873.3 (3.42%) zlib-throughput: 1370.2 / 1332.5 (2.75%) lua_binarytrees-loadtime: 154.5 / 154.3 (0.13%) lua_binarytrees-throughput: 1067.8 / 1129.9 (-5.82%) bullet-throughput: 1644.9 / 1587.2 (3.51%) box2d-loadtime: 208.7 / 197.1 (5.56%) AsmJS-Apps (20 runs each) zlib-loadtime: 211.15 / 208.6 (1.21%) box2d-throughput: 1361.0 / 1312.7 (3.55%) bullet-loadtime: 891.95 / 857.75 (3.83%) zlib-throughput: 1338.15 / 1288.5 (3.71%) lua_binarytrees-loadtime: 160.9 / 158.7 (1.37%) lua_binarytrees-throughput: 1074.95 / 1043.15 (2.96%) bullet-throughput: 1617.75 / 1550.05 (4.18%) box2d-loadtime: 207.45 / 203.6 (1.86%) AsmJS-ubench (10 runs each) life: 3150.0 / 3185.5 (-1.13%) memops: 4526.41666667 / 4621.58333333 (-2.1%) fannkuch: 2478.91666667 / 2508.25 (-1.18%) mandelbrot-polyfill: 425.5 / 423.916666667 (0.37%) skinning: 8805.75 / 8945.33333333 (-1.59%) fbirds-native: 98.8333333333 / 103.833333333 (-5.06%) fbirds-polyfill: 290.333333333 / 308.25 (-6.17%) mandelbrot-native: 310.083333333 / 312.5 (-0.78%) corrections: 5134.5 / 5220.16666667 (-1.67%) fasta: 8316.91666667 / 8293.41666667 (0.28%) copy: 3353.41666667 / 3522.58333333 (-5.04%) primes: 5119.25 / 5150.5 (-0.61%) AsmJS-Ubench (20 runs) life: 3163.1 / 3191.5 (-0.9%) memops: 4521.95 / 4500.6 (0.47%) fannkuch: 2491.95 / 2506.2 (-0.57%) mandelbrot-polyfill: 425.8 / 419.7 (1.43%) skinning: 8875.2 / 8885.7 (-0.12%) fbirds-native: 103.5 / 103.7 (-0.19%) fbirds-polyfill: 303.9 / 295.3 (2.83%) mandelbrot-native: 314.8 / 311.05 (1.19%) corrections: 5148.1 / 5160.7 (-0.24%) fasta: 8270.05 / 8250.85 (0.23%) copy: 3351.25 / 3411.55 (-1.8%) primes: 5085.7 / 5056.1 (0.58%) Massive (2 runs each) box2d-throughput: 30290.5 / 28625.5 (5.5%) box2d-throughput-f32: 33498.5 / 32212.0 (3.84%) box2d-variance: 10000.0 / 10000.0 (0.0%) lua-binarytrees: 17199.5 / 15729.5 (8.55%) lua-scimark: 18435.0 / 19997.5 (-8.48%) main-thread-poppler-cold: 5933.0 / 6051.5 (-2.0%) main-thread-poppler-warm: 5942.0 / 6086.0 (-2.42%) main-thread-sqlite-cold: 10000.0 / 10000.0 (0.0%) main-thread-sqlite-warm: 10000.0 / 10000.0 (0.0%) massive-total: 7163.5 / 7199.0 (-0.5%) poppler-cold-preparation: 690.5 / 692.0 (-0.22%) poppler-throughput: 22511.5 / 22011.5 (2.22%) poppler-variance: 10000.0 / 10000.0 (0.0%) poppler-warm-preparation: 3171.0 / 3367.5 (-6.2%) sqlite-cold-preparation: 181.5 / 181.5 (0.0%) sqlite-throughput: 12785.5 / 12827.5 (-0.33%) sqlite-warm-preparation: 3708.5 / 4095.5 (-10.44%)

Luke Wagner [:luke]

Comment 40

•

10 years ago

Comment on attachment 8615632 [details] [diff] [review] 3.1.tests.patch Always a good start.

Attachment #8615632 - Flags: review?(luke) → review+

Luke Wagner [:luke]

Comment 41

•

10 years ago

Comment on attachment 8615649 [details] [diff] [review] 3.3. Opcodes Review of attachment 8615649 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +2418,5 @@ > /*****************************************************************************/ > > namespace { > > +enum class WasmType : uint8_t { How about AsmType? Later we could rename 'Type' to 'AsmJSType'. @@ +2421,5 @@ > > +enum class WasmType : uint8_t { > + Int, > + Float, > + Double, How about Int32/Float32/Float64? @@ +2451,5 @@ > + BreakLabel, > + > + CallInt, > + CallInd, > + CallImp, (I know this is my fault, but) how about CallInternal/CallIndirect/CallImport/GetLocal/GetGlobal/Literal?

Attachment #8615649 - Flags: review?(luke) → review+

Luke Wagner [:luke]

Comment 42

•

10 years ago

Comment on attachment 8615650 [details] [diff] [review] 3.4. Function object Review of attachment 8615650 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +2782,5 @@ > + > +class Function > +{ > + typedef Vector<uint8_t, 4096> Bytecode; > + Bytecode bytecode_; What is the lifetime of a Function? Is it short-lived (as in, there is only 1 of them alive at a time)? When this is done in parallel, is the expectation for there to be N alive (where N is roughly # threads)? Another question: Function is stored in a Maybe<Function> in FunctionComiler. Since the Vector is stored by value, this means a lot of copying when the Vector length is < 4096. Is there any way to create the Function as a plain object on the stack and pass it in by & first to the FunctionBuilder and then by const& to the FunctionCompiler?

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 43

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #42) > Comment on attachment 8615650 [details] [diff] [review] > 3.4. Function object > > Review of attachment 8615650 [details] [diff] [review]: > ----------------------------------------------------------------- > > What is the lifetime of a Function? Is it short-lived (as in, there is only > 1 of them alive at a time)? When this is done in parallel, is the > expectation for there to be N alive (where N is roughly # threads)? The Function is the piece of data given by the FunctionBuilder (parsing) to the FunctionCompiler (creating the MIR graph). While we don't do this in parallel, it's short-lived. However, when it's done in parallel, each FunctionBuilder will create one Function, and each FunctionCompiler will consume one. If building is faster than compiling, Functions may even be added to a queue until one FunctionCompiler thread gets available. > Another question: Function is stored in a Maybe<Function> in > FunctionComiler. Since the Vector is stored by value, this means a lot of > copying when the Vector length is < 4096. Is there any way to create the > Function as a plain object on the stack and pass it in by & first to the > FunctionBuilder and then by const& to the FunctionCompiler? Copying will only happen when the Vector length is < 1024, which is the current limit for inline storage in a Vector. No problem with using a plain object on the stack and pass it by reference to the other structures, though using the Maybe pattern seemed to pave the way for parallelization.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 44

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #41) > Comment on attachment 8615649 [details] [diff] [review] > 3.3. Opcodes > @@ +2451,5 @@ > > + BreakLabel, > > + > > + CallInt, > > + CallInd, > > + CallImp, > > (I know this is my fault, but) how about > CallInternal/CallIndirect/CallImport/GetLocal/GetGlobal/Literal? Done! I've also renamed Cond -> Conditional, for consistency.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 45

•

10 years ago

Attached patch 3.4. Function object — Details — Splinter Review

As Function was ambiguous in the global scope (and thus couldn't be used as an argument type), I've chosen to rename it as AsmFunction (in parallel to AsmType). This also const-ifies all the read-only functions.

Attachment #8615650 - Attachment is obsolete: true

Attachment #8615650 - Flags: review?(luke)

Attachment #8616696 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 46

•

10 years ago

Attached patch 3.5. Function Builder — Details — Splinter Review

FB is now receiving its AsmFunction by reference.

Attachment #8615652 - Attachment is obsolete: true

Attachment #8615652 - Flags: review?(luke)

Attachment #8616697 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 47

•

10 years ago

Attached patch 3.6. Function Compiler — Details — Splinter Review

... and FC by const reference

Attachment #8615662 - Attachment is obsolete: true

Attachment #8615662 - Flags: review?(luke)

Attachment #8616698 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 48

•

10 years ago

Attached patch 3.7. Encoding and Decoding (obsolete) — Details — Splinter Review

Attachment #8615663 - Attachment is obsolete: true

Attachment #8615663 - Flags: review?(luke)

Attachment #8616699 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 49

•

10 years ago

Attached patch 3. Folded patch (obsolete) — Details — Splinter Review

Attachment #8613700 - Attachment is obsolete: true

Attachment #8615631 - Attachment is obsolete: true

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 50

•

10 years ago

New results for Massive (with all threads, again), adding to the ones in comment #39. There might be something wrong with the two lua benchmarks, I'll make sure to check that the results are the same tomorrow. However, from one benchmarking serie to the other, I don't get similar results at all. In this series, lua_binary is slower once the patch applied, lua_scimark is faster with the patch; while in comment #39, it was the exact same opposite! Alon, can you weigh in: are the Massive lua benchmarks expected to have that much variance? poppler-variance: Average: 10000.0 / 10000.0 (0.0%) Median: 10000.0 / 10000.0 (0.0%) box2d-throughput-f32: Average: 33301.83 / 33883.33 (-1.75%) Median: 33179.0 / 34341.0 (-3.5%) box2d-throughput: Average: 30217.5 / 30165.5 (0.17%) Median: 30555.0 / 30294.0 (0.85%) poppler-throughput: Average: 22089.33 / 22110.17 (-0.09%) Median: 22354.0 / 22098.0 (1.15%) poppler-warm-preparation: Average: 3239.83 / 3284.0 (-1.36%) Median: 3260.0 / 3317.0 (-1.75%) sqlite-warm-preparation: Average: 4025.0 / 4000.0 (0.62%) Median: 4057.0 / 3984.0 (1.8%) lua-scimark: Average: 19025.83 / 18843.33 (0.96%) Median: 19615.0 / 18685.0 (4.74%) main-thread-sqlite-cold: Average: 10000.0 / 10000.0 (0.0%) Median: 10000.0 / 10000.0 (0.0%) lua-binarytrees: Average: 15550.17 / 16576.17 (-6.6%) Median: 15973.0 / 16894.0 (-5.77%) box2d-variance: Average: 10000.0 / 10000.0 (0.0%) Median: 10000.0 / 10000.0 (0.0%) massive-total: Average: 7180.83 / 7227.83 (-0.65%) Median: 7184.0 / 7260.0 (-1.06%) sqlite-cold-preparation: Average: 184.0 / 185.0 (-0.54%) Median: 184.0 / 185.0 (-0.54%) main-thread-poppler-cold: Average: 6093.0 / 6073.83 (0.31%) Median: 6073.0 / 6148.0 (-1.23%) main-thread-poppler-warm: Average: 5870.83 / 5946.17 (-1.28%) Median: 5898.0 / 6028.0 (-2.2%) poppler-cold-preparation: Average: 694.67 / 698.83 (-0.6%) Median: 698.0 / 703.0 (-0.72%) main-thread-sqlite-warm: Average: 10000.0 / 10000.0 (0.0%) Median: 10000.0 / 10000.0 (0.0%) sqlite-throughput: Average: 12811.83 / 12844.33 (-0.25%) Median: 12982.0 / 12970.0 (0.09%)

Flags: needinfo?(azakai)

Alon Zakai (:azakai)

Comment 51

•

10 years ago

They are deterministic and run in a worker, so they should have fairly low variance. Still, I see around 1-2% of noise on them on my machine, so definitely they do have some variance. But the noise you report sounds significantly higher than I would expect. How many runs did you do to compute the mean and median in those? How much variance do you see with and without the patches?

Flags: needinfo?(azakai)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 52

•

10 years ago

(In reply to Alon Zakai (:azakai) from comment #51) > They are deterministic and run in a worker, so they should have fairly low > variance. Still, I see around 1-2% of noise on them on my machine, so > definitely they do have some variance. But the noise you report sounds > significantly higher than I would expect. How many runs did you do to > compute the mean and median in those? How much variance do you see with and > without the patches? This was run 6 times. For what it's worth, I've run both lua benchmarks 50 times in shell mode, to see whether this reproduced, and the results are the following: (legend is still before/after(relative difference)) binarytrees-runtime: Average: 4368.9 / 4305.33 (1.46%) Median: 4187.0 / 4164.0 (0.55%) Variance: 106144.46 / 88831.22 (16.31%) Standard deviation: 325.8 / 298.05 (8.52%) scimark-runtime: Average: 10346.72 / 10242.7 (1.01%) Median: 10072.0 / 10140.0 (-0.68%) Variance: 393417.47 / 378799.08 (3.72%) Standard deviation: 627.23 / 615.47 (1.87%) binarytrees-startup: Average: 119.3 / 118.45 (0.71%) Median: 119.0 / 118.0 (0.84%) Variance: 10.28 / 8.48 (17.51%) Standard deviation: 3.21 / 2.91 (9.35%) scimark-startup: Average: 119.3 / 118.45 (0.71%) Median: 119.0 / 118.0 (0.84%) Variance: 10.28 / 8.48 (17.51%) Standard deviation: 3.21 / 2.91 (9.35%) This tells that my machine really has too much variance, but median and average seem to confirm there is no big difference. Moreover, binarytrees results are the same with/without the patch, which I assume confirm that the taken code paths are the same. I'll check the compilation time numbers to be sure, but I think we're good to go, otherwise.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 53

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #13) > I'm not super concerned with single-core devices, but do you get the same > results for 2 and 4 hardware threads (fixing 'numCpus' and using 'taskset')? Methodology: 5 runs, take the median value, run with taskset + add the --threads-count argument to the js shell, measure compilation times on Poppler/Sqlite (longest benchmarks to compile). 8 cores: - Poppler: - before: 370ms - after: 360ms (!) - Sqlite: - before: 1732ms - after: 1753ms 4 cores: - Poppler: - before: 362ms - after: 368ms - Sqlite: - before: 1806ms - after: 1825ms 2 cores: - Poppler: - before: 658ms - after: 657ms (!) - Sqlite: - before: 2523ms - after: 2570ms 1 core: - Poppler: - before: 959ms - after: 970ms - Sqlite: - before: 3818ms - after: 3752ms (!) So there seems to be a slowdown in most cases (if we ignore the cases marked with (!)), but it seems fairly low overall, in the range of noise, so things look good so far.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 54

•

10 years ago

Attached patch 3.7. Encoding and Decoding — Details — Splinter Review

Diff: the Emit*Expr functions now contain plain switches that just redirects from the opcode to a specific Emit function, removing the need for braces for the switch cases (overall, 100 deletions).

Attachment #8616699 - Attachment is obsolete: true

Attachment #8616699 - Flags: review?(luke)

Attachment #8622869 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 55

•

10 years ago

Attached patch 3. Folded patch (obsolete) — Details — Splinter Review

Attachment #8616700 - Attachment is obsolete: true

Guilherme Lima

Comment 56

•

10 years ago

Benjamin, just one thing to remind: Massive results are score and not time, right? So higher is better, and on all your posts you see to think the opposite.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 57

•

10 years ago

(In reply to Guilherme Lima from comment #56) > Benjamin, just one thing to remind: Massive results are score and not time, > right? > So higher is better, and on all your posts you see to think the opposite. That's right, I was confused because in asmjs-apps and asmjs-ubench, that's the opposite way. Also, it doesn't alter much the results: instead of low speedups, we have low slowdowns (and conversely), but nothing really significant/big.

Luke Wagner [:luke]

Comment 58

•

10 years ago

Comment on attachment 8615699 [details] [diff] [review] 3.2. Changes to ModuleCompiler Review of attachment 8615699 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +1166,5 @@ > Type::Which type_; > + // The AsmJSModule splits variables indexes into two distinct > + // spaces (SIMD and non-SIMD) while the ModuleCompiler doesn't > + // distinguish them. > + uint32_t moduleIndex_; I forgot about this comment and was confused later as to why modules and compilers had separate index spaces. How about naming 'scalarOrSimdIndex_' here and in associated names. With that, probably don't need a comment. @@ +1661,5 @@ > if (GlobalMap::Ptr p = globals_.lookup(name)) > return p->value(); > return nullptr; > } > + const Global* lookupGlobal(uint32_t index) const { For symmetry with ModuleCompiler::function(), can you rename this to 'global()'? @@ +1747,5 @@ > } > bool addGlobalVarImport(PropertyName* varName, PropertyName* fieldName, AsmJSCoercion coercion, > bool isConst) > { > uint32_t index; Similarly, could you rename scalarOrSimdIndex? @@ +1932,5 @@ > } > if (!module_->addExit(ffiIndex, exitIndex)) > return false; > + Signature sigCopy(moduleLifo_); > + if (!sigCopy.copy(exitDescriptor.sig())) Instead of making a copy, how about: allocating the Signature in the moduleLifo (like we do with Global) and having both exits_ and exitSignatures_ hold a Signature*?

Attachment #8615699 - Flags: review?(luke) → review+

Luke Wagner [:luke]

Comment 59

•

10 years ago

Comment on attachment 8616696 [details] [diff] [review] 3.4. Function object Review of attachment 8616696 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +2792,5 @@ > + explicit AsmFunction(ExclusiveContext* cx) : bytecode_(cx) {} > + > + private: > + AsmFunction(const AsmFunction&) = delete; > + AsmFunction(AsmFunction&& other) = delete; I didn't think move ctors were generated automatically and thus didn't need to be deleted. @@ +2800,5 @@ > + template<class T> size_t writePrimitive(T v) { > + size_t writeAt = bytecode_.length(); > + if (!bytecode_.growBy(sizeof(T))) > + return -1; > + memcpy(&bytecode_[writeAt], &v, sizeof(T)); How about: bytecode_.append(reinterpret_cast<uint8_t*>(&v), sizeof(v)) Also, for future reference, there is a growByUninitialized(). @@ +2870,5 @@ > + > + template<class T> > + void patch32(size_t pc, T i) { > + static_assert(sizeof(T) == sizeof(uint32_t), > + "patch32 must be used with 32-bits wide types"); Why not just take uint32_t then?

Attachment #8616696 - Flags: review?(luke) → review+

Luke Wagner [:luke]

Comment 60

•

10 years ago

Comment on attachment 8616697 [details] [diff] [review] 3.5. Function Builder Review of attachment 8616697 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +2880,5 @@ > }; > + > +// Encapsulates the building of a wasm function from an asm.js function > +// source code, packing the asm.js code into the wasm binary form that can > +// be decoded and compiled with a FunctionCompiler. For now (here and other places), let's not say it's the wasm binary format (it's not yet) or explicitly mention 'wasm' and just say 'asm bytecode' to refer to what is being generated and say 'asm' for the shared wasm/asm.js path. In theory, we could interpret the bytecode (particularly if we switch to a postorder serialization).

Attachment #8616697 - Flags: review?(luke) → review+

Luke Wagner [:luke]

Updated

•

10 years ago

Attachment #8616698 - Flags: review?(luke) → review+

Luke Wagner [:luke]

Comment 61

•

10 years ago

Comment on attachment 8622869 [details] [diff] [review] 3.7. Encoding and Decoding Great work on this! I found it really easy to follow the code (even though I can't say I audited every single line). r+ with the below list of comments addressed (written on a plane, hence no splinter). I'll be on PTO for 3 weeks so if one of my comments is bogus, just skip it. Given the extensive changes, it might be good to put up a final combined patch and needinfo? gkw for a round of fuzzing before landing. I'd also run (in a debug build to catch asserts) all the major public Unity/Epic/AutoDesk/etc demos. rs=me for simple fixes. - Implement TODOs or file bugs mentioned in TODO comment. - Now or later, it seems like FunctionBuilder should be scoped such that it is destroyed before FunctionCompiler is created (and this bits handed off moved to AsmFunction). - Could you line up the columns in the switch in EmitStatement? - No curlies around some single-line then branches (unless that style changed recently? I hear Gecko style is coming...) - pack*/read* seem asymmetric; how about write*/read* or pack*/unpack*? Btw, I think AsmJSValidate.cpp is getting big enough that, in a future patch, we should split it up into Check and Emit files. This would require a new header to share things between these two files. Trampolines would also be nice to move into a new file.

Attachment #8622869 - Flags: review?(luke) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 62

•

10 years ago

Thanks for the reviews! (In reply to Luke Wagner [:luke] from comment #59) > Comment on attachment 8616696 [details] [diff] [review] > I didn't think move ctors were generated automatically and thus didn't need > to be deleted. C++ reference says that a move ctor is generated automatically if you haven't defined the dtor, copy ctor/assignment operator, nor the move assignment operator, which is the case here. Would it be not generated anyways, explicit is better than implicit IMO. I don't know if there's a policy enforcing move ctors to be deleted in our codebase if they're not user defined, but in any case, we can get back to this later (if it matters). > > How about: > bytecode_.append(reinterpret_cast<uint8_t*>(&v), sizeof(v)) > Also, for future reference, there is a growByUninitialized(). Cleaner, thanks! > > @@ +2870,5 @@ > > + > > + template<class T> > > + void patch32(size_t pc, T i) { > > + static_assert(sizeof(T) == sizeof(uint32_t), > > + "patch32 must be used with 32-bits wide types"); > > Why not just take uint32_t then? It is actually used with int32 and uint32 in the gigantic patch.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Blocks: 1178840

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 63

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #61) > Comment on attachment 8622869 [details] [diff] [review] > 3.7. Encoding and Decoding > > Great work on this! I found it really easy to follow the code (even though > I can't say I audited every single line). Thank you for this review! > > r+ with the below list of comments addressed (written on a plane, hence no > splinter). I'll be on PTO for 3 weeks so if one of my comments is bogus, > just skip it. Given the extensive changes, it might be good to put up a > final combined patch and needinfo? gkw for a round of fuzzing before > landing. I'd also run (in a debug build to catch asserts) all the major > public Unity/Epic/AutoDesk/etc demos. rs=me for simple fixes. Good idea, I'll check all of this and request fuzzing. Already found a "simple fix" on try: the GC hazard analysis wasn't pattern-matching the CheckCallArgs function anymore as its signature changed, so I had to change the devtools/rootAnalysis/annotations.js file to update to the new signature. > - Implement TODOs or file bugs mentioned in TODO comment. Opened a dependent bug (these TODOs all relate to debug-only, perf (as in, the linux CLI tool) annotations that I doubt are actually being used). > - Now or later, it seems like FunctionBuilder should be scoped such that it > is destroyed before FunctionCompiler is created (and this bits handed off > moved to AsmFunction). Right, that will pave the way for the parallelization work. Done. > - Could you line up the columns in the switch in EmitStatement? Done. > - No curlies around some single-line then branches (unless that style > changed recently? I hear Gecko style is coming...) Haha, quite a challenge without knowing which lines are to be changed! Found 3 occurrences in the patch, but still can't wait for clang-format to be used by default... > - pack*/read* seem asymmetric; how about write*/read* or pack*/unpack*? I've chosen write/read ("pack" was used to help distinguishing between the AsmFunction and the FunctionBuilder APIs, but it was actually more confusing than helping). > > Btw, I think AsmJSValidate.cpp is getting big enough that, in a future > patch, we should split it up into Check and Emit files. This would require > a new header to share things between these two files. Trampolines would > also be nice to move into a new file. I'll do this in future patches (when you'll be back from PTO, I guess).

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 64

•

10 years ago

Attached patch 3. Folded patch (obsolete) — Details — Splinter Review

gkw and decoder, can you please fuzz this patch? It changes asm.js features only, so the specific asm.js fuzzer and the differential fuzzing could show useful. This patch applies cleanly on mozilla-inbound f30e1413ebd1.

Attachment #8622871 - Attachment is obsolete: true

Flags: needinfo?(gary)

Flags: needinfo?(choller)

Attachment #8628182 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 65

•

10 years ago

And for what it's worth, the folded patch on try is green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ff2f1c630ab

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 66

•

10 years ago

Attached file unpack-shell.js — Details

So all demos seem to work, except the packed ones on luke's repo (AngryBotsPacked and the PlatformerGamePacked), and for these two, there's an assertion triggered in ~FunctionCompiler. So the fun thing is that neither the original unpacked asm.js file, nor the asmjsunpacker-worker.js file trigger the assert (might this mean there is a different behavior here?). I had to make a script that can be executed in the JS shell and which can unpack the packed version; the resulting unpacked script indeed triggers the assertion. For the record, here's the shell script. I'll investigate tomorrow.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 67

•

10 years ago

Attached patch 3. Folded patch (obsolete) — Details — Splinter Review

Found the fix to the assertion triggering in the packed demos: the same label could be used for different independent blocks in a function, but labels were not deleted from the FunctionBuilder's map (label -> unique id), so a labelled break or continue inside the second block labelled with the same name would point to the first block: function f() { a: do { break a; /* points to the expected block */ } while (0); a: do { break a; /* pointed to the wrong block */ } while (0); } I've added the fix (after checking a label statement, remove its entry in the label map), and folded it in the patch as part of the rs=luke trivial fixes :) Still requesting fuzzing against this patch, if that's possible :) This one cleanly applies on m-i 250855:c0f955560a47.

Attachment #8628182 - Attachment is obsolete: true

Attachment #8628813 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 68

•

10 years ago

Attached patch 3. Folded patch — Details — Splinter Review

Annnd another small issue found thanks to the packed games demos (thanks luke!). I am pondering a solution to make sure that these small differences don't show up anymore... gkw, decoder: this patch applies cleanly on mozilla-inbound 250855:c0f955560a47, any chance you can fuzz against it, please?

Attachment #8628813 - Attachment is obsolete: true

Attachment #8628955 - Flags: review+

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 69

•

10 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #68) > any chance you can fuzz against it, please? Yes, I've started fuzzing on one of my Mac machines.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Depends on: 1180059

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 70

•

10 years ago

I haven't found anything interesting yet after one weekend's worth of fuzzing.

Flags: needinfo?(gary)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Attachment #8628955 - Flags: feedback+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 71

•

10 years ago

Thank you gary! Last try run before landing (unless there are other issues, of course)... remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c295b86bc3a2

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 72

•

10 years ago

Another try run (JS and jit-tests only), as I've hit many oranges, and the GC jit-tests don't want to work even after a few retriggers... remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3b69a2a0df2

Keywords: leave-open

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 73

•

10 years ago

The Jit2 crash in comment 71 was unrelated, as shows comment 72. Landing.

Flags: needinfo?(choller)

Pulsebot

Comment 74

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/dbab48c2e593 https://hg.mozilla.org/integration/mozilla-inbound/rev/cd949a8d064c

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Blocks: 1181612

Ryan VanderMeulen [:RyanVM]

Comment 75

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/dbab48c2e593 https://hg.mozilla.org/mozilla-central/rev/cd949a8d064c

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox42: --- → fixed

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Depends on: 1186226

You need to log in before you can comment on or make changes to this bug.