Closed Bug 1944790 Opened 1 year ago Closed 1 year ago

run a CI memory test

Tracking

()

Status:

RESOLVED FIXED

Milestone:

137 Branch

Tracking Flags:

Tracking

Status

firefox137

---

fixed

People

(Reporter: tarek, Assigned: tarek)

References

Details

(Whiteboard: [genai])

Attachments

(1 file)

Bug 1944790 - run a CI memory test r?nordzilla,atossou 1 year ago Tarek Ziadé (:tarek) 48 bytes, text/x-phabricator-request		Details \| Review

Tarek Ziadé (:tarek)

Assignee

Description

•

1 year ago

The memory snapshot we're currently collecting will not provide the peak RSS for the inference process.

In order to do this, we could run the performance test with --gecko-profile and collect the generated profile to extract the value.

in practice we need to find a way to hook something that will get the JSON, extract the data and publish it as a performance metrics

Tarek Ziadé (:tarek)

Assignee

Comment 1

•

1 year ago

•

Edited

This is the script to extract peak usage

    import sys
    import json
    
    data = json.loads(open(sys.argv[-1]).read())
    
    
    def getPeakMem(process):
        max = 0
        current = 0
        pid = 0
        processName = ""
    
        for thread in process["threads"]:
            if thread["name"] == "GeckoMain":
                pid = thread["pid"]
                processName = thread.get("processName", "unknown")
                break
    
        for counter in process["counters"]:
            if counter["name"] == "malloc":
                for sample in counter["samples"]["data"]:
                    current += sample[1]
                    if current > max:
                        max = current
    
                print(
                    f"[{pid}][{processName}] Peak memory allocation {max / (1024 * 1024)}"
                )
    
    
    getPeakMem(data)
    
    for process in data["processes"]:
        getPeakMem(process)

example on a perf test :

➜  python3 peak.py profile.json
[51064][Parent Process] Peak memory allocation 298.32491302490234
[51065][unknown] Peak memory allocation 35.444618225097656
[51070][Utility Process] Peak memory allocation 35.41194152832031
[51067][unknown] Peak memory allocation 35.460235595703125
[51084][Web Content] Peak memory allocation 37.041725158691406
[51073][Inference] Peak memory allocation 479.8895950317383
[51083][Web Content] Peak memory allocation 37.04314422607422
[51074][Web Content] Peak memory allocation 37.16900634765625
[51072][Web Content] Peak memory allocation 39.958473205566406
[51066][WebExtensions] Peak memory allocation 52.40960693359375
[51068][Web Content] Peak memory allocation 40.31774139404297
[51071][Privileged Content] Peak memory allocation 56.09490203857422

Greg Mierzwinski [:sparky]

Updated

•

1 year ago

Depends on: 1944913

Greg Tatum [:gregtatum]

Comment 2

•

1 year ago

So this is not a great source of truth for anything running in Wasm. I ran into this issue earlier on doing some memory analysis on the translations engine. This source of data is in the hooked malloc from the profiler. Unfortunately most of the Wasm memory is not going through this malloc site, and is using another syscall that is not instrumented. Ontop of that, Wasm will reserve the memory, but it doesn't become actualized into real memory until it is committed (at least this is my understanding)

Because of this, it's important to query the actual memory usage using some other OS-level system utility. In my manual tests on macOS I use the activity monitor and record the memory usage there for my manual testing. Erik implemented this check using: https://searchfox.org/mozilla-central/rev/c5432a86ece2ce8671e7aefbe43fed9a10151227/browser/components/translations/tests/browser/head.js#704-737

I'm wrapping up my day, but we should probably audit that it's reporting the correct thing, but the numbers in our perf alerts seem similar to what I've been seeing.

Also see Bug 1811927 for better hooks into Wasm memory.

Tarek Ziadé (:tarek)

Assignee

Comment 3

•

1 year ago

I've implemented something very similar here: https://searchfox.org/mozilla-central/source/toolkit/components/ml/tests/browser/head.js#410-445

Another option that would work on all platform would be to use psutil in the background inside a perftest hook to watch the inference process and grab values every second or so

Tarek Ziadé (:tarek)

Assignee

Comment 4

•

1 year ago

I will try a different approach based on hooks, by running a pstutil loop in https://searchfox.org/mozilla-central/source/toolkit/components/ml/tests/tools/hooks_local_hub.py

Erik Nordin [:nordzilla]

Comment 5

•

1 year ago

(In reply to Tarek Ziadé (:tarek) from comment #3)

I've implemented something very similar here: https://searchfox.org/mozilla-central/source/toolkit/components/ml/tests/browser/head.js#410-445

I based mine off of yours, so that makes sense.

Please keep me up to date on your findings here and if we should measure memory in more than one way in Translations as well.

I'll follow along on this bug.

Tarek Ziadé (:tarek)

Assignee

Updated

•

1 year ago

Whiteboard: [genai]

Jira Integration Bot

Updated

•

1 year ago

See Also: → https://mozilla-hub.atlassian.net/browse/GENAI-672

Tarek Ziadé (:tarek)

Assignee

Updated

•

1 year ago

Assignee: nobody → tziade

Tarek Ziadé (:tarek)

Assignee

Comment 6

•

1 year ago

Attached file Bug 1944790 - run a CI memory test r?nordzilla,atossou — Details

Phabricator Automation

Updated

•

1 year ago

Attachment #9464648 - Attachment description: WIP: Bug 1944790 - run a CI memory test → Bug 1944790 - run a CI memory test

Phabricator Automation

Updated

•

1 year ago

Attachment #9464648 - Attachment description: Bug 1944790 - run a CI memory test → Bug 1944790 - run a CI memory test r?nordzilla,atossou

Pulsebot

Comment 7

•

1 year ago

Pushed by tziade@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e9c59acf56eb run a CI memory test r=afinder,atossou,nordzilla,perftest-reviewers

Cristian Tuns

Comment 8

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e9c59acf56eb

Status: NEW → RESOLVED

Closed: 1 year ago

status-firefox137: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 137 Branch

Erik Nordin [:nordzilla]

Updated

•

1 year ago

Blocks: 1947840

Treeherder Bug Filer

Updated

•

1 year ago

Regressions: 1949171

Jens Stutte [:jstutte]

Updated

•

1 year ago

Bugzilla

run a CI memory test

Categories

(Core :: Machine Learning: General, enhancement)

Tracking

()

People

(Reporter: tarek, Assigned: tarek)

References

Details

(Whiteboard: [genai])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated

Comment 6

Updated

Updated

Comment 7

Comment 8

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type