Closed Bug 1353829 Opened 8 years ago Closed 7 years ago

8% Are we slim yet regression found on autoland march 28th from revision e2a697abd5d3

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: jmaher, Assigned: schien)

References

Details

(Whiteboard: [necko-active][PBg-HTTP-M4])

== Change summary for alert #5706 (as of March 28 2017 00:10 UTC) == Regressions: 8% Images summary linux32 opt 5715568.43 -> 6173288.11 6% Images summary linux64 opt 6609478.93 -> 7030321.47 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=5706 we are not familiar with triaging AWSY alerts. This seems like a large enough regression to file a bug for. These are recently added to our CI and seem to be finding value already. Currently these only run on linux, so not sure if this is more widespread to other configurations. If there are questions about the tool, I would start with :erahm; Possibly there are simple docs explaining what the different values mean and we can just create a nice bug template to save troubles in the future.
:schien, can you look into why your change would cause a memory regression like this?
Flags: needinfo?(schien)
The regression is limited to the "tabs open" snapshots, we see an 8% regression after forcing a GC but ~24% regression prior to that. Details from the memory report diff after forcing a GC: > ├──1.15 MB (21.87%) -- images > │ ├──1.15 MB (21.83%) -- content/raster/used > │ │ ├──0.76 MB (14.53%) -- (28 tiny) > │ │ │ ├──0.05 MB (00.97%) ── image(104x129, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110328/feng4_%E5%89%AF%E6%9C%AC.jpg)/locked/surface(100x125)/decoded-heap > │ │ │ ├──0.04 MB (00.82%) ++ image(120x80, http://localhost:8086/page_load_test/tp5n/xunlei.com/T1claaXXXgXXXXXXXX-120-80.gif) > │ │ │ ├──0.04 MB (00.67%) ++ image(90x90, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110418/%7B5C13500A-0CD5-43AD-AB26-F1C6E28802FB%7D.jpg) > │ │ │ ├──0.03 MB (00.60%) ++ image(186x32, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/svc_video.jpg) > │ │ │ ├──0.03 MB (00.60%) ++ image(96x70, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110329/windows_banner.jpg) > │ │ │ ├──0.03 MB (00.60%) ++ image(96x70, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110330/1.gif) > │ │ │ ├──0.03 MB (00.60%) ++ image(86x86, http://localhost:8093/page_load_test/tp5n/seesaa.net/recommend.up.seesaa.net/image/bigmama_01.jpg) > │ │ │ ├──0.03 MB (00.52%) ++ image(100x50, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281542354475261711.jpg) > │ │ │ ├──0.03 MB (00.52%) ++ image(100x50, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281544122316328144.jpg) > │ │ │ ├──0.03 MB (00.52%) ++ image(99x48, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281540444332171532.jpg) > │ │ │ ├──0.03 MB (00.52%) ++ image(86x86, http://localhost:8093/page_load_test/tp5n/seesaa.net/recommend.up.seesaa.net/image/baseballbear_2010_86.png) > │ │ │ ├──0.03 MB (00.52%) ++ image(86x86, http://localhost:8093/page_load_test/tp5n/seesaa.net/recommend.up.seesaa.net/image/bigmama_02.jpg) > │ │ │ ├──0.03 MB (00.52%) ++ image(86x86, http://localhost:8093/page_load_test/tp5n/seesaa.net/recommend.up.seesaa.net/image/kazuma.jpg) > │ │ │ ├──0.03 MB (00.52%) ── image(100x74, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110324/IBM%E4%B8%93%E9%A2%98%E5%9B%BE%E7%89%87.jpg)/locked/surface(96x70)/decoded-heap > │ │ │ ├──0.03 MB (00.52%) ── image(102x76, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110408/ITeye%E5%B0%8F.jpg)/locked/surface(96x70)/decoded-heap > │ │ │ ├──0.03 MB (00.52%) ── image(96x70, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110324/%E4%BE%AF%E6%8D%B7%E7%9A%84%E4%B8%93%E9%A2%98-96X70.jpg)/locked/surface(96x70)/decoded-heap > │ │ │ ├──0.03 MB (00.52%) ── image(96x70, http://localhost:8085/page_load_test/tp5n/csdn.net/images.csdn.net/20110401/MongoDB%E5%89%AF%E6%9C%AC02.jpg)/locked/surface(96x70)/decoded-heap > │ │ │ ├──0.02 MB (00.45%) ++ image(100x50, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281539387049051699.jpg) > │ │ │ ├──0.02 MB (00.45%) ++ image(100x50, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281545161316290532.jpg) > │ │ │ ├──0.02 MB (00.45%) ++ image(100x50, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281546127390174172.jpg) > │ │ │ ├──0.02 MB (00.45%) ++ image(109x32, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/svc_download.jpg) > │ │ │ ├──0.02 MB (00.45%) ++ image(100x50, http://localhost:8100/page_load_test/tp5n/people.com.cn/people.com.cn/mediafile/201102/28/F201102281531183112148831.gif) > │ │ │ ├──0.02 MB (00.41%) ++ image(48x48, http://localhost:8091/page_load_test/tp5n/slideshare.net/public.slidesharecdn.com/images/user-48x48.png) > │ │ │ ├──0.02 MB (00.38%) ++ image(69x32, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/svc_ds.jpg) > │ │ │ ├──0.02 MB (00.38%) ++ image(92x32, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/svc_mall.jpg) > │ │ │ ├──0.02 MB (00.37%) ── image(85x85, http://localhost:8093/page_load_test/tp5n/seesaa.net/recommend.up.seesaa.net/recommend/tabito_85.png)/locked/surface(70x70)/decoded-heap > │ │ │ ├──0.02 MB (00.34%) ++ image(56x72, http://localhost:8085/page_load_test/tp5n/csdn.net/zi.csdn.net/56_72.jpg) > │ │ │ └──0.02 MB (00.30%) ++ image(95x32, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/svc_mondju.jpg) > │ │ ├──-0.10 MB (-1.83%) ++ <non-notable images> > │ │ ├──0.11 MB (02.16%) ++ image(83x325, http://localhost:8074/page_load_test/tp5n/yelp.com/media1.px.yelpcdn.com/static/201012162843250757/i/ico/stars/stars_map.png) > │ │ ├──0.10 MB (01.98%) ++ image(950x27, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/ft_help_u.jpg) > │ │ ├──0.08 MB (01.49%) ++ image(599x28, http://localhost:8087/page_load_test/tp5n/hatena.ne.jp/www.hatena.ne.jp/images/smaliforjapan-portal-title.gif) > │ │ ├──0.06 MB (01.19%) ++ image(300x42, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/bnr_move_blog.jpg) > │ │ ├──0.06 MB (01.19%) ── image(300x52, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/btn_regist_new.jpg)/locked/surface(300x52)/decoded-heap > │ │ └──0.06 MB (01.12%) ── image(286x53, http://localhost:8093/page_load_test/tp5n/seesaa.net/blog.seesaa.jp/img/portal/bnr_present.jpg)/locked/surface(286x53)/decoded-heap > │ └──0.00 MB (00.04%) ── uncached/raster/used/<non-notable images>/source
Locks and atomic variables are introduced in HttpChannelChild/ChannelEventQueue in bug 1320744, I would expect some extra memory usage. However 24% before GC and 8% after seems too much. My first guess would be that enabling nsIThreadRetargetableRequest changes the reference graph. @erahm, is there any way to run this test against a local version of gecko? either on perfherder or local machine.
Flags: needinfo?(schien) → needinfo?(erahm)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #3) > Locks and atomic variables are introduced in > HttpChannelChild/ChannelEventQueue in bug 1320744, I would expect some extra > memory usage. However 24% before GC and 8% after seems too much. My first > guess would be that enabling nsIThreadRetargetableRequest changes the > reference graph. I would guess that we're leaking something or not cleaning up as quickly as we used to. > @erahm, is there any way to run this test against a local version of gecko? > either on perfherder or local machine. You can run |./mach awsy-test| locally (it takes a while) or on try: > ./mach try -b o -p linux,linux64,win32,win64 -u awsy-e10s Try is probably the easiest because you'll be able to compare results in perfherder and you can do a few retriggers.
Flags: needinfo?(erahm)
Sorry also if it wasn't clear, this regression is for the 'images' category, so it's probably not overhead of atomics.
By disabling the thread retargeting in image decoding, the image memory usage is back to normal. https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=1986a6f181d7&newProject=try&newRevision=e44908b91057d67cb9b37737aade0b1969310fd4&framework=4&showOnlyImportant=0 Need deep dive to figure out the root cause.
Assignee: nobody → schien
@erahm, is there any way to trigger awsy-test on try with e10s disabled? In addition, what's the test step for "Images After tabs open opt"? Does it open a bunch of tabs or consecutively load urls in single table? I would like to figure out a minimal STR in order to run refcount analysis.
Flags: needinfo?(erahm)
the work for AWSY was to get it running in e10s mode, I am not sure what would show up if you changed |e10s: false| here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#1303 if you did that it probably would work, but I am not sure of the results, accuracy, etc.
Whiteboard: [necko-active]
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #7) > @erahm, is there any way to trigger awsy-test on try with e10s disabled? > In addition, what's the test step for "Images After tabs open opt"? Does it > open a bunch of tabs or consecutively load urls in single table? I would > like to figure out a minimal STR in order to run refcount analysis. Locally you can run: > |./mach awsy-test --disable-e10s|
Flags: needinfo?(erahm)
@pyang told me that we can reduce the test overhead by reducing the number of entities in testing/awsy/conf/testvars.json. I plan to use it and capture corresponding MOZ_LOG and refcount log. Hope I can a reasonable size of log for analysis.
Not sure this would be tightly related, but good to be aware of at least: https://bugzilla.mozilla.org/show_bug.cgi?id=1341673#c4
From the non-e10s awsy result, it looks like the 8% regression after GC is also existed in non-e10s environment. There is a high possibility that this issue is in our code base for a while. https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=fd17c11f9fa44e224d3a922012e595907b6da0cb&newProject=try&newRevision=390b01e66d297c333e78b402dd9871492b93ab16&originalSignature=a9184d267771f5ff6a30d4cdab867b0b33e6eba3&newSignature=a9184d267771f5ff6a30d4cdab867b0b33e6eba3&framework=4 However the 24% regression before GC only shows in e10s environment so it could be a separate issue.
(In reply to Honza Bambas (:mayhemer) from comment #11) > Not sure this would be tightly related, but good to be aware of at least: > https://bugzilla.mozilla.org/show_bug.cgi?id=1341673#c4 Sadly, the awsy result shows no difference after the corresponding patch is landed.
Whiteboard: [necko-active] → [necko-active][PBg-HTTP-M3]
Whiteboard: [necko-active][PBg-HTTP-M3] → [necko-active][PBg-HTTP-M4]
Priority: -- → P1
Per discussion with SC, this might not be a valid problem to solve, changing to P3 for now.
Priority: P1 → P3
This bug is highly likely a false alarm. The image memory difference is because different set of decoded image buffer is kept after tabs opened. After we enable OMT image loading, larger image files have higher chance to finish decoding later. Therefore we'll have higher chance to sample larger image buffer usage at tab opened stage, since we are still decoding large image file. Based on my analysis I would say there is no memory leakage to fix and I'll mark this as invalid.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.