Move release1 indexer to use t3.2xlarge and only 1 backup server
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Assigned: asuth)
References
Details
Attachments
(4 files)
As proposed in https://bugzilla.mozilla.org/show_bug.cgi?id=1779672#c10 I'm going to try moving release1 (config1.json) which contains mozilla-central to an 8-core t3.2xlarge instance type to see how that moves our p90 and higher searches. codesearch/livegrep will also be updated to be able to use all 8-cores. We'll drop back down to t3.xlarge if p90 and p95 don't decrease meaningfully.
Assignee | ||
Comment 1•3 years ago
|
||
This is landed and should take effect for the utc22 run.
This will also give the VM 32 GiB of memory. This should avoid cache competition with the non m-c repositories for their crossref and codesearch databases and allow for additional caching of the m-c git repo which currently clocks in at 7.0 GiB. That's not a huge win for our current feature-set which doesn't really do any git scans, but could be nice in the future as the "query" endpoint may potentially gain some git history-related features. (Although anything really useful should of course end up potentially pre-computed and then indexed by livegrep or something.)
Assignee | ||
Comment 2•3 years ago
|
||
After mitigating bug 1779939 and re-triggering, things seem to have worked. That said, it's not clear codesearch is actually leveraging the extra threads meaningfully; it seems like codesearch only ends up using 2 threads:
- htop for a longer query like a 4-digit hex-string shows only 2 cores hitting full utilization (or equivalent distribution)
- the "git_time" stat frequently ends up being almost exactly 2x the "total_time".
I presume the issue is that the chunks are the atomic unit of labor division. I see there is a config value chunk_power with a default of 27 which is 128 MiB which does seem like a pretty big work unit. I'm going to drop the chunk power by a factor of 8 (= 2^3) to 24. My rationale here is that for work load balancing it's potentially better for each thread to potentially see more than 1 work unit.
Assignee | ||
Comment 3•3 years ago
|
||
This has landed and after re-triggering the indexer I am indeed am seeing the expected parallelism and we're seeing the git_time and total_time numbers line up with the parallelism.
The "nsmappedattribute" test case can get down to ~1.8 secs from ~2.5 secs after bug 1779672.
The hex code test case sees massive improvements but we can also see how brutal the initial page-in is:
2022-07-18 01:04:32.611831/pid=20891 - request(handled by 20892) /mozilla-central/search?q=2ac3&path=
2022-07-18 01:04:32.613941/pid=20892 - QUERY line: "2ac3", file: ".*", fold_case: true,
2022-07-18 01:04:41.740173/pid=20892 - codesearch result with 998 line matches across 181 paths - 9.126436 : re2_time: 8, git_time: 47852, index_time: 15584, exit_reason: MATCH_LIMIT, total_time: 9071,
2022-07-18 01:04:41.758766/pid=20892 - identifier_search "2ac3" - 0.010385
2022-07-18 01:04:41.759931/pid=20892 - search.get() - 0.001111
2022-07-18 01:04:41.770014/pid=20891 - finish pid 20892 - 9.157628
2022-07-18 01:04:46.672930/pid=20898 - request(handled by 20899) /mozilla-central/search?q=2AC3&path=
2022-07-18 01:04:46.675116/pid=20899 - QUERY line: "2AC3", file: ".*", fold_case: true,
2022-07-18 01:04:50.078560/pid=20899 - codesearch result with 996 line matches across 199 paths - 3.403652 : re2_time: 11, git_time: 19280, index_time: 4, exit_reason: MATCH_LIMIT, total_time: 3387,
2022-07-18 01:04:50.088464/pid=20899 - identifier_search "2AC3" - 0.001677
2022-07-18 01:04:50.089593/pid=20899 - search.get() - 0.001091
2022-07-18 01:04:50.098166/pid=20898 - finish pid 20899 - 3.424760
2022-07-18 01:05:04.906701/pid=20908 - request(handled by 20909) /mozilla-central/search?q=2aC3&path=
2022-07-18 01:05:04.909433/pid=20909 - QUERY line: "2aC3", file: ".*", fold_case: true,
2022-07-18 01:05:08.823518/pid=20909 - codesearch result with 998 line matches across 179 paths - 3.914413 : re2_time: 8, git_time: 20776, index_time: 5, exit_reason: MATCH_LIMIT, total_time: 3898,
2022-07-18 01:05:08.833412/pid=20909 - identifier_search "2aC3" - 0.001693
2022-07-18 01:05:08.834525/pid=20909 - search.get() - 0.001067
2022-07-18 01:05:08.845302/pid=20908 - finish pid 20909 - 3.938121
Specifically, the initial page-in search is ~9.2 sec but effectively equivalent queries afterwards with the hot cache clock in at ~3.4s and ~3.9s.
My next quick steps are:
- The provisioner will start installing vmtouch (https://hoytech.com/vmtouch/)
- We will touch mozilla-central's
crossref-extra
,crossref
, andlivegrep.idx
into cache in that order synchronously as part of the web-server spin-up. I choose that order because the later things are more important than the earlier-things.- I am going to do this hackily. All that matters is that it happens for mozilla-central on release1. We can generalize/elegantize later.
Assignee | ||
Comment 4•3 years ago
|
||
Assignee | ||
Comment 5•3 years ago
|
||
The vmtouch stuff worked well; a fresh "2ac3" search and "nsmappedattribute" search:
2022-07-18 05:57:08.544971/pid=19410 - request(handled by 19411) /mozilla-central/search?q=2ac3&path=
2022-07-18 05:57:08.546795/pid=19411 - QUERY line: "2ac3", file: ".*", fold_case: true,
2022-07-18 05:57:11.980368/pid=19411 - codesearch result with 998 line matches across 196 paths - 3.433835 : re2_time: 8, git_time: 18829, index_time: 16, exit_reason: MATCH_LIMIT, total_time: 3378,
2022-07-18 05:57:11.998036/pid=19411 - identifier_search "2ac3" - 0.009444
2022-07-18 05:57:11.999316/pid=19411 - search.get() - 0.001220
2022-07-18 05:57:12.008485/pid=19410 - finish pid 19411 - 3.463101
2022-07-18 05:57:42.685431/pid=19566 - request(handled by 19567) /mozilla-central/search?q=nsmappedattribute&path=&case=false®exp=false
2022-07-18 05:57:42.687238/pid=19567 - QUERY line: "nsmappedattribute", file: ".*", fold_case: true,
2022-07-18 05:57:44.440411/pid=19567 - codesearch result with 458 line matches across 82 paths - 1.753351 : re2_time: 25, git_time: 4486, index_time: 27, total_time: 1740,
2022-07-18 05:57:44.457333/pid=19567 - search_files "nsmappedattribute" - 0.016603
2022-07-18 05:57:44.543932/pid=19567 - identifier_search "nsmappedattribute" - 0.086530
2022-07-18 05:57:44.545053/pid=19567 - search.get() - 0.001071
2022-07-18 05:57:44.548650/pid=19566 - finish pid 19567 - 1.862811
I'm going to leave this open until we get monday's utc10 p90+ numbers, but this seems like a win on at least the RAM pre-caching front even if the CPU win wasn't also nice. After pre-caching, we're at 9.5 GiB free (with 19 GiB cached and 1.6GiB used) per free -h
.
Assignee | ||
Comment 6•3 years ago
|
||
I took a peek at the utc22 web-server after it rotated it out. We're using the utc10 server for our sampling, so I'll save the victory announcement until we have that data, but for now it looks great.
One thing I did notice was that nearly all of our slowest searches were clearly coming from comm-central, not mozilla-central, so I've augmented the script to be able to optionally filter by repo/tree so it doesn't distort our statistics and since I still have the original slow web logs we can compare m-c to m-c. Interestingly, c-c's stats actually made m-c's performance look slightly better in the slow logs! Note that because we use different codesearch/livegrep instances for each tree, comm-central being blocked on I/O shouldn't impact m-c search performance as long as the nginx server has not reached a proxying connection limit.
When utc22 rotated out, we retained 100% caching on our 3 pre-cached files, and free -h
indicates we're at 4.4 GiB free (down from 9.5 GiB just after activation), with 24 GiB cached (up from 19 GiB), and 1.8 GiB used (up from 1.6 GiB).
Assignee | ||
Comment 7•3 years ago
|
||
Well, shoot. The "only keep 1 backup server" logic where I explicitly called out that I wasn't dealing with ordering the candidates for shutdown decided to get rid of today's utc10 server and keep yesterday's utc22 server. This means the unscientific comparison I was going to run is now unscientific in different ways than I was planning because I can't compare utc10 server runs to utc10 server runs. (The web logs go away with the server, like searches in the rain.)
Assignee | ||
Comment 8•3 years ago
|
||
I sent an email about the improved performance to dev-platform and it's here on google groups with not great formatting because it ended up as text/plain and I didn't spend a lot of time considering the implications of that.
For posterity, I re-provide the differences here:
OLD Dynamic Search Request Latencies for mozilla-central
cache_status _count p50 p66 p75 p90 p95 p99
----------------------------------------------------------------------------------------------------------
MISS 2025 0.18 0.38 0.73 2.76 4.23 9.12
HIT 114 0 0 0 0 0 0
NEW Dynamic Search Request Latencies for mozilla-central
cache_status _count p50 p66 p75 p90 p95 p99
----------------------------------------------------------------------------------------------------------
MISS 3396 0.06 0.09 0.17 0.79 1.92 2.74
HIT 219 0 0 0 0 0 0
Description
•