Closed Bug 1381972 Opened 7 years ago Closed 7 years ago

Speed up lcov rewriter

Tracking

(firefox57 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox57

---

fixed

People

(Reporter: marco, Assigned: marco)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Part 1: Only fully parse records belonging to preprocessed files, as we only need to rewrite those 7 years ago Marco Castelluccio [:marco] 14.68 KB, patch		Details \| Diff \| Splinter Review
Part 1: Only fully parse records belonging to preprocessed files, as we only need to rewrite those 7 years ago Marco Castelluccio [:marco] 15.79 KB, patch	chmanchester : review+	Details \| Diff \| Splinter Review
Part 2: Make lcov_rewriter.py use LCOV files from a ZIP file directly 7 years ago Marco Castelluccio [:marco] 4.03 KB, patch		Details \| Diff \| Splinter Review

Marco Castelluccio [:marco]

Assignee

Description

•

7 years ago

It is taking really long to run the rewriter (~2 hours and 15 minutes on a c4.large Amazon instance).

Could we make it faster?

- For the paths, we can generate a JSON file and let grcov use it to rewrite them;
- For the merging of different entries with the same SF value, we can rely on grcov (it is already doing it);
- The preprocessor rewriting is the only thing left.

Kyle Lahnakoski [:ekyle]

Comment 1

•

7 years ago

Spot pricing for c4.large is about $0.025/hour, we run about 100 jobs for coverage, about 10x per day, for a cost of $25/day.  The main slowdown is probably the EBS drives. Can we speed it up by using a different instance with a local ephemeral drive?

Marco Castelluccio [:marco]

Assignee

Comment 2

•

7 years ago

The rewriting is only performed once in the uploader task, not for every test task (as it requires the source code to be present), so it should be around $0.5 per day.

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384038

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384040

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384044

Marco Castelluccio [:marco]

Assignee

Comment 3

•

7 years ago

I think we could:
1) Only parse and rewrite records associated with preprocessed files, for other records we will only parse the source names;
2) Reduce the number of preprocessed files, so we reduce the number of records we need to fully parse (bug 1384044 is one of the main offenders, but it's a small file, so cheaper to rewrite);
3) Accept zip files as input, so users of lcov_rewriter.py don't need to unzip files first;

Regarding 1, for records we don't fully parse, we could directly rewrite the SF entry only and output the rest of the record as-is (so without parsing the full record and calling LcovFile.format_record).

Marco Castelluccio [:marco]

Assignee

Comment 4

•

7 years ago

What do you think of comment 3? Other ideas?

Flags: needinfo?(cmanchester)

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384057

Chris Manchester (limited bugmail, email directly)

Comment 5

•

7 years ago

Comment 3 sounds reasonable and will probably speed things up a lot. I know that reducing the number of preprocessed js files has been a goal in general.

Other than that, I suppose should be able to do these in parallel, but I expect the other optimizations you mention would be enough.

Flags: needinfo?(cmanchester)

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1356594

Marco Castelluccio [:marco]

Assignee

Comment 6

•

7 years ago

Attached patch Part 1: Only fully parse records belonging to preprocessed files, as we only need to rewrite those (obsolete) — Details — Splinter Review

This is a WIP for part 1.

There's still a test failure that I need to figure out in test_remap_lcov:
> self.assertEqual(original_line_count,sum(r.line_count for r in records))
fails because original_line_count is 1083, the sum of the records' line counts is 2133.

Assignee: nobody → mcastelluccio

Status: NEW → ASSIGNED

Attachment #8890440 - Flags: feedback?(cmanchester)

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384509

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384508

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1384936

Marco Castelluccio [:marco]

Assignee

Comment 7

•

7 years ago

Attached patch Part 1: Only fully parse records belonging to preprocessed files, as we only need to rewrite those — Details — Splinter Review

Test fixed.

Attachment #8890440 - Attachment is obsolete: true

Attachment #8890440 - Flags: feedback?(cmanchester)

Attachment #8891269 - Flags: review?(cmanchester)

Marco Castelluccio [:marco]

Assignee

Comment 8

•

7 years ago

Attached patch Part 2: Make lcov_rewriter.py use LCOV files from a ZIP file directly — Details — Splinter Review

Interestingly, using lcov_rewriter.py with a ZIP file is 2.5x slower than unzipping first (still in Python using the same ZipFile library) and giving lcov_rewriter.py a list of files.

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1385233

Chris Manchester (limited bugmail, email directly)

Comment 9

•

7 years ago

Comment on attachment 8891269 [details] [diff] [review]
Part 1: Only fully parse records belonging to preprocessed files, as we only need to rewrite those

Review of attachment 8891269 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. A question and a nit, please address those before landing.

::: python/mozbuild/mozbuild/codecoverage/lcov_rewriter.py
@@ +260,2 @@
>              if line == 'end_of_record':
> +                if current_source_file != '':

Are we ever going to see 'end_of_record' without a 'current_source_file'?

@@ +694,3 @@
>  
> +        with open(in_path) as fh:
> +            with open(out_path, 'w+') as out_fh:

These can be collapsed to one line: "with open(in_path) as in_fh, open(out_path) as out_fh:"...

Attachment #8891269 - Flags: review?(cmanchester) → review+

Marco Castelluccio [:marco]

Assignee

Comment 10

•

7 years ago

(In reply to Chris Manchester (:chmanchester) from comment #9)
> Comment on attachment 8891269 [details] [diff] [review]
> Part 1: Only fully parse records belonging to preprocessed files, as we only
> need to rewrite those
> 
> Review of attachment 8891269 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good. A question and a nit, please address those before landing.
> 
> ::: python/mozbuild/mozbuild/codecoverage/lcov_rewriter.py
> @@ +260,2 @@
> >              if line == 'end_of_record':
> > +                if current_source_file != '':
> 
> Are we ever going to see 'end_of_record' without a 'current_source_file'?

If we are unable to rewrite the URL (that is if `rewrite_url` returns None), then current_source_file is going to be ''.
It's the same behavior as before the patch, as we were adding the records whose URL we couldn't rewrite to the `removals` set and then we were ignoring those records.

Pulsebot

Comment 11

•

7 years ago

Pushed by mcastelluccio@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b9d3b4adf01
Only fully parse records belonging to preprocessed files, as we only need to rewrite those. r=chmanchester

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2b9d3b4adf01

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Marco Castelluccio [:marco]

Assignee

Updated

•

7 years ago

Depends on: 1424227

You need to log in before you can comment on or make changes to this bug.