Closed Bug 1381972 Opened 7 years ago Closed 7 years ago

Speed up lcov rewriter

Categories

(Testing :: Code Coverage, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(firefox57 fixed)

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: marco, Assigned: marco)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

It is taking really long to run the rewriter (~2 hours and 15 minutes on a c4.large Amazon instance).

Could we make it faster?

- For the paths, we can generate a JSON file and let grcov use it to rewrite them;
- For the merging of different entries with the same SF value, we can rely on grcov (it is already doing it);
- The preprocessor rewriting is the only thing left.
Spot pricing for c4.large is about $0.025/hour, we run about 100 jobs for coverage, about 10x per day, for a cost of $25/day.  The main slowdown is probably the EBS drives. Can we speed it up by using a different instance with a local ephemeral drive?
The rewriting is only performed once in the uploader task, not for every test task (as it requires the source code to be present), so it should be around $0.5 per day.
Depends on: 1384038
Depends on: 1384040
Depends on: 1384044
I think we could:
1) Only parse and rewrite records associated with preprocessed files, for other records we will only parse the source names;
2) Reduce the number of preprocessed files, so we reduce the number of records we need to fully parse (bug 1384044 is one of the main offenders, but it's a small file, so cheaper to rewrite);
3) Accept zip files as input, so users of lcov_rewriter.py don't need to unzip files first;

Regarding 1, for records we don't fully parse, we could directly rewrite the SF entry only and output the rest of the record as-is (so without parsing the full record and calling LcovFile.format_record).
What do you think of comment 3? Other ideas?
Flags: needinfo?(cmanchester)
Depends on: 1384057
Comment 3 sounds reasonable and will probably speed things up a lot. I know that reducing the number of preprocessed js files has been a goal in general.

Other than that, I suppose should be able to do these in parallel, but I expect the other optimizations you mention would be enough.
Flags: needinfo?(cmanchester)
Depends on: 1356594
This is a WIP for part 1.

There's still a test failure that I need to figure out in test_remap_lcov:
> self.assertEqual(original_line_count,sum(r.line_count for r in records))
fails because original_line_count is 1083, the sum of the records' line counts is 2133.
Assignee: nobody → mcastelluccio
Status: NEW → ASSIGNED
Attachment #8890440 - Flags: feedback?(cmanchester)
Depends on: 1384509
Depends on: 1384508
Depends on: 1384936
Test fixed.
Attachment #8890440 - Attachment is obsolete: true
Attachment #8890440 - Flags: feedback?(cmanchester)
Attachment #8891269 - Flags: review?(cmanchester)
Interestingly, using lcov_rewriter.py with a ZIP file is 2.5x slower than unzipping first (still in Python using the same ZipFile library) and giving lcov_rewriter.py a list of files.
Depends on: 1385233
Comment on attachment 8891269 [details] [diff] [review]
Part 1: Only fully parse records belonging to preprocessed files, as we only need to rewrite those

Review of attachment 8891269 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. A question and a nit, please address those before landing.

::: python/mozbuild/mozbuild/codecoverage/lcov_rewriter.py
@@ +260,2 @@
>              if line == 'end_of_record':
> +                if current_source_file != '':

Are we ever going to see 'end_of_record' without a 'current_source_file'?

@@ +694,3 @@
>  
> +        with open(in_path) as fh:
> +            with open(out_path, 'w+') as out_fh:

These can be collapsed to one line: "with open(in_path) as in_fh, open(out_path) as out_fh:"...
Attachment #8891269 - Flags: review?(cmanchester) → review+
(In reply to Chris Manchester (:chmanchester) from comment #9)
> Comment on attachment 8891269 [details] [diff] [review]
> Part 1: Only fully parse records belonging to preprocessed files, as we only
> need to rewrite those
> 
> Review of attachment 8891269 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good. A question and a nit, please address those before landing.
> 
> ::: python/mozbuild/mozbuild/codecoverage/lcov_rewriter.py
> @@ +260,2 @@
> >              if line == 'end_of_record':
> > +                if current_source_file != '':
> 
> Are we ever going to see 'end_of_record' without a 'current_source_file'?

If we are unable to rewrite the URL (that is if `rewrite_url` returns None), then current_source_file is going to be ''.
It's the same behavior as before the patch, as we were adding the records whose URL we couldn't rewrite to the `removals` set and then we were ignoring those records.
Pushed by mcastelluccio@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b9d3b4adf01
Only fully parse records belonging to preprocessed files, as we only need to rewrite those. r=chmanchester
https://hg.mozilla.org/mozilla-central/rev/2b9d3b4adf01
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Depends on: 1424227
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: