Closed Bug 1970807 Opened 1 year ago Closed 11 months ago

Add support for mozilla-esr140 on searchfox

Categories

(Webtools :: Searchfox, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: eijebong, Assigned: eijebong)

References

Details

Attachments

(3 files)

No description provided.
Summary: Add support for mozilla-esr140 / comm-esr140 on searchfox → Add support for mozilla-esr140 on searchfox
See Also: → 1970809
Assignee: nobody → borivel
Status: NEW → ASSIGNED

Unfortunately https://github.com/mozsearch/mozsearch-mozilla/pull/279 seems to have put us at the current disk size limit based on the most recent email:

 mv: cannot create directory '/index/mozilla-esr128/blame/.git/objects/f3': No space left on device

This is the EBS filling up as we move the esr128 index from the scratch dir after finishing it. I'm going to quickly evaluate where we are in terms of EBS disk sizes and also our goals relative to web-server RAM size and the size of the livegrep and crossref databases[1]. We ideally want config2 to have tier1 performance, so the right short-term answer might even be to move to indexer-becomes-web-server using the SSDs and lose the EBS storage entirely, but the immediate term answer is probably just to bump all the EBS sizes. One thing I will do in terms of logistics, however, is drop cypress; :mjf indicated that his project no longer needs it and the next project on the loaner branch list doesn't need it either.

So unfortunately the indexer having a 300G SSD also ended up being a limit too because of how we like to run all the setup passes first before the indexing processes proper. The failure mode is interesting; we end up experiencing the SSD being full for mozilla-beta, the first tree, but because we have configured its on_error to be continue we just sorta stop super early and then move the tree from the SSD to the EBS store. Then when we get to mozilla-release (the 2nd tree) we've freed up enough space that mozilla-release is able to index successfully. But when we bring up the web-server, we fail to find a mozilla-beta crossref file and that dooms the web-server bring-up and the indexer never successfully pivots to the new web-server. The first indexer run I attempted didn't generate a timeout email because the next indexer terminated it as it was starting; that run then did time out as expected.

For config2 we've still been using m5d.2xlarge which only has a single 300G NVME SSD. For config4 and config5 we've adopted m5d.4xlarge so they go faster, but that's 2 300G SSDs and we made it so that we try and use the second SSD for swap purposes if we find it. The swap decision was part of a mitigation I attempted in bug 1912078 when wubkat/config5 was falling over because of some pathological macro expansion things that were happening and it was quite difficult to figure out what was going on without swap since the machine basically just would straight up and die.

It seems like there are now more recent generations of comparable machines and those step the SSDs to 474G from 330G. Specifically, m6id.2xlarge works and the marginal prices is minimal plus it seems like we should potentially see a meaningful speed improvement (which could reduce costs), noting that the 4xlarge instances we use for config4 and config5 are just exactly twice the price:

  • m5d.2xlarge: $0.452
  • m6id.2xlarge: $0.4746/hr on-demand

A change that we will experience is m6id.4xlarge exposes 1 x 950G SSD whereas the m5d.5xlarge was 2 x 300G SSD, so we will lose the dedicated swap SSD. In that case we fall back to using a hardcoded 8G swap file, which should probably still be sufficient for the use-case of having the machine slow down and be interruptible rather than hard lock in a worst-case situation. We can of course bump that number if we want, though.

So I'm going to update the trigger_common instance type choice and see how it goes.

log excerpt

+ /home/ubuntu/mozsearch/scripts/scip-analyze.sh /mnt/index-scratch/config.json mozilla-beta               
+ set -eu                                                                                                                                                                                                             
+ set -o pipefail                                                                                                                                                                                                     
+ '[' 2 -lt 2 ']'                                                                                                                                                                                                     
++ realpath /mnt/index-scratch/config.json                                                                                                                                                                            
+ CONFIG_FILE=/mnt/index-scratch/config.json                                                                                                                                                                          
+ TREE_NAME=mozilla-beta                                                                                                                                                                                              
++ jq -Mc '.trees["mozilla-beta"].scip_subtrees | to_entries? | .[]?' /mnt/index-scratch/config.json                                                                                                                  
+ SCIP_SUBTREE_INFOS=                                                                                                                                                                                                 
+ [[ -n '' ]]                                                                                                                                                                                                         
+ date                                                                                                                                                                                                                
Sun Jun 15 13:57:08 UTC 2025                                                                                                                                                                                          
+ /home/ubuntu/mozsearch/scripts/find-objdir-files.sh                                                                                                                                                                 
+ set -eu                                                                                                                                                                                                             
+ set -o pipefail                                                                                                                                                                                                     
+ mkdir -p /mnt/index-scratch/mozilla-beta/objdir                                                                                                                                                                     
+ pushd /mnt/index-scratch/mozilla-beta/objdir                                                                                                                                                                        
/mnt/index-scratch/mozilla-beta/objdir ~                                                                                                                                                                              
+ set +o pipefail                                                                                                                                                                                                     
+ find . -type f -not -regex '\.(o|out|so|a|so\..*|scip)$' -exec file --mime '{}' +                                                                                                                                   
+ grep -v charset=binary                                                                                                                                                                                              
+ cut -d : -f 1                                                                                                                                                                                                       
+ sed -e 's#^./#__GENERATED__/#'                                                                                                                                                                                      
sed: couldn't write 81 items to stdout: No space left on device                                                                                                                                                       
find: ‘file’ terminated by signal 13                                                                                                                                                                                  
find: ‘file’ terminated by signal 13                                                                                                                                                                                  
find: ‘file’ terminated by signal 13                                                                       
+ handle_tree_error mkindex.sh                                                                                                                                                                                        
+ local msg=mkindex.sh                                                                                     
+ echo 'warning: Tree '\''mozilla-beta'\'' error: mkindex.sh'                                              
warning: Tree 'mozilla-beta' error: mkindex.sh                                                                                                                                                                        
+ [[ continue == \c\o\n\t\i\n\u\e ]]                                                                                                                                                                                  
+ return 0                                                                                                 
+ '[' -n /index ']'                                                                                                                                                                                                   
+ mv /mnt/index-scratch/mozilla-beta /index/mozilla-beta                                                   
+ ln -s /index/mozilla-beta /mnt/index-scratch/mozilla-beta                                                                                                                                                           
+ for TREE_NAME in $(jq -r ".trees|keys_unsorted|.[]" ${CONFIG_FILE})                                                                                                                                                 
+ . /home/ubuntu/mozsearch/scripts/load-vars.sh /mnt/index-scratch/config.json mozilla-release             
++ '[' -z /home/ubuntu/mozsearch ']'                                                                                                                                                                                  
++ export CONFIG_FILE=/mnt/index-scratch/config.json                                                                                                                                                                  
++ CONFIG_FILE=/mnt/index-scratch/config.json                                                                                                                                                                         
++ export TREE_NAME=mozilla-release                       

config2 now got past mozilla-beta, so that's good. The 474G SSD ends up as 434GiB usable and presumably the 300G SSD was something more like 274GiB usable (plus we reserved 8G for swap on a single SSD system). My lazily running df -h manually a few times showed peak usage had a lower bound of 265 GiB, so we were probably really close to the line. We should have some headroom now, although esr115 should probably get made non-semantic and get kicked out to config3 soon, so then we'll have a lot more head room.

Okay, we've had 2 clean runs of config2 now (last night's one-off and today's normal one).

Disk usage introspection looks like, and I did this as the fresh web-server was starting:

ubuntu@ip-172-31-11-83:~/index$ du -csh *
1.7G    comm-esr115
3.4G    comm-esr128
8.0K    config.json
305M    index-log
du: cannot read directory 'lost+found': Permission denied
16K     lost+found
67G     mozilla-beta
51G     mozilla-esr115
57G     mozilla-esr128
67G     mozilla-esr140
67G     mozilla-release
56M     nginx-cache
313G    total
ubuntu@ip-172-31-11-83:~/index$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1    393G  313G   61G  84% /home/ubuntu/index
ubuntu@ip-172-31-11-83:~/index$ 

So we're at 313GiB used but where our 20G nginx limit should find us maxing at ~333GiB and leaving us with 60G spare, but which is then not enough for another full index. (Which is fine, this is our max number of ESRs at once because of the special-case that is ESR115.)

Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED

For closure on the performance of the new instance types (and so I can close some tabs), the new instances seem to improve indexing job time across the board, so there's a net cost savings with the change. m-c went from 1:35:42 to 1:21:09, c-c went from 0:28:14 to 0:21:55.

new m6id instances

├── mozilla-central                                                                                        
│   └──                                                                   
│         script                   time since start   apparent duration                             
│        ───────────────────────────────────────────────────────────────                                   
│         find-repo-files.py       0:00:00            0:02:06            
│         build.sh                 0:02:06            0:21:59            
│         scip-analyze.sh          0:24:05            0:00:44            
│         js-analyze.sh            0:24:49            0:00:37            
│         html-analyze.sh          0:25:26            0:00:27            
│         css-analyze.sh           0:25:53            0:00:02            
│         idl-analyze.sh           0:25:55            0:00:51            
│         staticprefs-analyze.sh   0:26:46            0:00:08            
│         ipdl-analyze.sh          0:26:54            0:00:04            
│         replace-aliases.sh       0:26:58            0:00:35            
│         crossref.sh              0:27:33            0:34:28            
│         output.sh                1:02:01            0:09:25            
│         build-codesearch.py      1:11:26            0:03:44            
│         compress-outputs.sh      1:15:10            0:05:59            
│         check-index.sh           1:21:09          

old m5d instances

├── mozilla-central                                                                                        
│   └──                                                                  
│         script                   time since start   apparent duration  
│        ─────────────────────────────────────────────────────────────── 
│         find-repo-files.py       0:00:00            0:02:52            
│         build.sh                 0:02:52            0:26:16            
│         scip-analyze.sh          0:29:08            0:01:24             
│         js-analyze.sh            0:30:32            0:00:52             
│         html-analyze.sh          0:31:24            0:00:47            
│         css-analyze.sh           0:32:11            0:00:03             
│         idl-analyze.sh           0:32:14            0:01:02            
│         staticprefs-analyze.sh   0:33:16            0:00:09            
│         ipdl-analyze.sh          0:33:25            0:00:05            
│         replace-aliases.sh       0:33:30            0:00:31            
│         crossref.sh              0:34:01            0:36:41                                              
│         output.sh                1:10:42            0:11:51            
│         build-codesearch.py      1:22:33            0:04:35            
│         compress-outputs.sh      1:27:08            0:08:34            
│         check-index.sh           1:35:42       
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: