Closed Bug 803616 Opened 8 years ago Closed 7 years ago

Eideticker profile symbolication is incorrect

Categories

(Testing Graveyard :: Eideticker, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: BenWa, Assigned: jchen)

References

Details

Attachments

(2 files, 1 obsolete file)

I suspect that somewhere we're picking up old .so files instead of the correct one for the builds.

Example:
http://people.mozilla.com/~bgirard/cleopatra/?zippedProfile=profiles/sps-profile-1350642357.61.zip&videoCapture=videos/video-1350642591.62.webm

The symbols are wrong for gradient painting.
(In reply to Benoit Girard (:BenWa) from comment #0)
> I suspect that somewhere we're picking up old .so files instead of the
> correct one for the builds.
> 
> Example:
> http://people.mozilla.com/~bgirard/cleopatra/?zippedProfile=profiles/sps-
> profile-1350642357.61.zip&videoCapture=videos/video-1350642591.62.webm
> 
> The symbols are wrong for gradient painting.

I think I see what's going on here, there was a change in the mozdevice API which we didn't adapt to. Fortunately easy to fix.
Assignee: nobody → wlachance
Nice. Thanks William, supporting this feature makes eideticker very useful for understanding regression :)!
Ok, think I've fixed it. 

https://github.com/mozilla/eideticker/commit/3fd09f3f8118d0d7006a18381c3e397e3d5e2705

Here's a profile with the new symbols: http://people.mozilla.org/~wlachance/profile-org.mozilla.fennec-1350670283.zip

Please verify and reopen if you're still seeing problems.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
It looks fine. It's a bit hard to tell if the symbols are right or just every so slightly out of date. I didn't see any symbol that was obviously wrong.
The latest symbols are still wrong. The files are likely not extracting and/or getting mixed up somehow.

I wanted to compare why there's such a large different between armv6/v7.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Benoit Girard (:BenWa) from comment #5)
> The latest symbols are still wrong. The files are likely not extracting
> and/or getting mixed up somehow.
> 
> I wanted to compare why there's such a large different between armv6/v7.

I just noticed an issue where the apk would never be properly downloaded from the device into the profile package. If that's the problem, I have a tentative patch applied on the eideticker machine which should be used in tonight's run.

If that's not the issue, we'll probably have to do some deeper debugging.
We still have problems with armv6 difference. Some libraries are under the directory lib/armeabi instead of lib/armeabi-v7a. I don't remember if the script has some path hardcoded or not. If so the following libraries will be wrong in armv6 profiles:
lib/armeabi/libmozglue.so  
lib/armeabi/libplugin-container.so  
lib/armeabi/libomxplugin.so
(In reply to Benoit Girard (:BenWa) from comment #7)
> We still have problems with armv6 difference. Some libraries are under the
> directory lib/armeabi instead of lib/armeabi-v7a. I don't remember if the
> script has some path hardcoded or not. If so the following libraries will be
> wrong in armv6 profiles:
> lib/armeabi/libmozglue.so  
> lib/armeabi/libplugin-container.so  
> lib/armeabi/libomxplugin.so

Hmm, I don't see any reference to those files in eideticker. I guess the fix would probably have to occur in the GeckoProfilerAddon?

e.g. somewhere around:

 https://github.com/bgirard/Gecko-Profiler-Addon/blob/master/lib/main.js#L619
Fix for the earlier symbolication error fixed: https://github.com/mozilla/eideticker/commit/0d6add8bc971ab61f830357a91a409972d452e71

I'll leave this open until we fix the armv6 issues as well.
So do I just need to copy in the files into the profile .zip to fix your issue?

lib/armeabi/libmozglue.so  
lib/armeabi/libplugin-container.so  
lib/armeabi/libomxplugin.so
Flags: needinfo?(bgirard)
That's correct.
Flags: needinfo?(bgirard)
Sorry for the delay in fixing this. I believe the problem is in the GeckoProfilerAddon. Attached is a patch.

I've set it up on the main eideticker runs for the GN, so we'll see if they kick in properly. I did one run of taskjs, and I found it at least didn't break anything. :) I'm not quite sure what the profiler should look like though, so it's difficult for me to verify.

So I need two things:

(1) A review of the patch
(2) Verification that eideticker is now producing the expected output for armv6
Attachment #680290 - Flags: review?(bgirard)
If we accumulate a /tmp/lib/armeabi-v7a and /tmp/lib/armeabi folder then we will always overwrite the armv6 libraries over the armv7 building when profiling an armv7 build. We may also want to rm -rf /tmp/lib before unzip symbol.apk.
Attachment #680290 - Flags: review?(bgirard) → review-
(In reply to Benoit Girard (:BenWa) from comment #13)
> If we accumulate a /tmp/lib/armeabi-v7a and /tmp/lib/armeabi folder then we
> will always overwrite the armv6 libraries over the armv7 building when
> profiling an armv7 build. We may also want to rm -rf /tmp/lib before unzip
> symbol.apk.

Hmm, true. I think an even better strategy might be to blow away the temporary directory where we're copying the symbols each time (put it in a subdirectory called /tmp/gecko_profiler/ or something). That way we're guaranteed not to be accumulating stale data. What do you think?
(In reply to William Lachance (:wlach) from comment #14)
> (In reply to Benoit Girard (:BenWa) from comment #13)
> > If we accumulate a /tmp/lib/armeabi-v7a and /tmp/lib/armeabi folder then we
> > will always overwrite the armv6 libraries over the armv7 building when
> > profiling an armv7 build. We may also want to rm -rf /tmp/lib before unzip
> > symbol.apk.
> 
> Hmm, true. I think an even better strategy might be to blow away the
> temporary directory where we're copying the symbols each time (put it in a
> subdirectory called /tmp/gecko_profiler/ or something). That way we're
> guaranteed not to be accumulating stale data. What do you think?

(BTW, I've reverted my patch on the dashboard for now)
Yes that would be ideal. We really want to use something like python tempdir feature but it's likely a pain to do via js so that will do.
Apparently there have been yet more updates to the procedure. These days symbolication happens through a gecko addon, we should copy what it does:

https://github.com/bgirard/Gecko-Profiler-Addon/blob/master/lib/remoteHost.js
Assigning this to Jim for now with some pointers.

Getting eideticker running:

Installation, etc. is mostly documented here: https://github.com/mozilla/eideticker#project-eideticker

For just testing the profiler, you can probably do an eideticker run like this (after setting up a rooted device with the agent, as described in the docs):

./bin/runtest.py --app-name org.mozilla.fennec --profile-file myprofile.zip taskjs

Code pointers:

Here's where we send the signal to save the profile:

https://github.com/mozilla/eideticker/blob/master/src/eideticker/eideticker/test.py#L328

Here's where we process the profile:

https://github.com/mozilla/eideticker/blob/master/src/eideticker/eideticker/test.py#L328
https://github.com/mozilla/eideticker/blob/master/src/eideticker/eideticker/test.py#L378

And here are our implementations of both:

https://github.com/mozilla/eideticker/blob/master/src/eideticker/eideticker/runner.py#L182

Most of the actual logic is in the GeckoProfilerAddon, which we were discussing patching above:

https://github.com/bgirard/Gecko-Profiler-Addon
Assignee: wlachance → nchen
Requires changes in the Gecko profiler add-on as well (https://github.com/bgirard/Gecko-Profiler-Addon/pull/52)

This patch skips the symbolicate.sh shell script and moves that code to inside eideticker. Instead of letting the script manage xpcshell, the new code lets you specify an xpcshell location through an environment variable. The new code also manages all temporary files inside a temporary directory; instead of zipping everything and letting the symbolicating code unzip everything later, the new code simply passes the temporary directory to the symbolicating code.

External dependencies:
xpcshell - XRE env var specifies the location of XUL Runtime Environment containing xpcshell inside the bin/ directory (default to PATH)
szip - SZIP env var specifies the location of szip binary (http://people.mozilla.com/~bgirard/szip) (default to PATH)
addr2line - I built an improved addr2line tool that includes a patch (http://sourceware.org/ml/binutils/2013-03/msg00193.html) not inside the NDK version (https://people.mozilla.com/~nchen/arm-eabi-addr2line.tar.bz2)
Attachment #775734 - Flags: review?(wlachance)
Comment on attachment 775734 [details] [diff] [review]
Fix profiler symbolification (v1)

Review of attachment 775734 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good, would like to see a few minor things addressed before this lands.

::: src/eideticker/eideticker/runner.py
@@ +10,5 @@
>  import tempfile
>  import time
>  import zipfile
>  import log
> +from cStringIO import StringIO

It doesn't look like you're actually using stringio anywhere here? If I'm not mistaken, please remove this line.

@@ +48,5 @@
>          else:
>              self.activity = activity_mappings[self.appname]
>  
> +    # return a zip file if target_zip is set; otherwise return a temporary directory
> +    def get_profile_and_symbols(self, target_zip=None):

It doesn't look like the target_zip parameter is ever used anymore. Let's just remove that option to reduce confusion.

@@ +242,5 @@
> +            if symbolicated_profile:
> +                with zipfile.ZipFile(profile_file, 'w') as out:
> +                    out.writestr('symbolicated_profile.txt', symbolicated_profile)
> +            else:
> +                self.log('WARNING: Failed to get symbolication output')

I think we should throw an exception here, rather than just emitting a warning message and continuing on.

Just a "raise Exception("Failed to get symbolication output")" is probably sufficient.

@@ +245,5 @@
> +            else:
> +                self.log('WARNING: Failed to get symbolication output')
> +        else:
> +            self.log('WARNING: Failed to symbolicate (returned %d)' %
> +                     symbolicator.returncode)

Likewise, I think we should throw an exception here.
Attachment #775734 - Flags: review?(wlachance) → review-
(In reply to William Lachance (:wlach) from comment #20)
> Comment on attachment 775734 [details] [diff] [review]
> Fix profiler symbolification (v1)
> 
> It doesn't look like you're actually using stringio anywhere here? If I'm
> not mistaken, please remove this line.

Good catch!

> It doesn't look like the target_zip parameter is ever used anymore. Let's
> just remove that option to reduce confusion.

Done

> 
> Just a "raise Exception("Failed to get symbolication output")" is probably
> sufficient.
> 
> Likewise, I think we should throw an exception here.

Done
Attachment #775734 - Attachment is obsolete: true
Attachment #775868 - Flags: review?(wlachance)
Comment on attachment 775868 [details] [diff] [review]
Fix profiler symbolification (v2)

Review of attachment 775868 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, thanks!
Attachment #775868 - Flags: review?(wlachance) → review+
Just pushed Jim's patch: https://github.com/mozilla/eideticker/commit/d59a3e622e759eff315e0c8bc1436cdf545193d6

Is there anything else that needs to be done to enable this symbolication in production?
(In reply to William Lachance (:wlach) from comment #23)
> Just pushed Jim's patch:
> https://github.com/mozilla/eideticker/commit/
> d59a3e622e759eff315e0c8bc1436cdf545193d6
> 
> Is there anything else that needs to be done to enable this symbolication in
> production?

Yeah, there are several external dependencies.

1) XRE

You now need to specify the XRE directory in $XRE. XRE is what the shell script used to download, but if the script has run before, it should still be under eideticker/src/GeckoProfilerAddon/firefox. So just set $XRE to that path.

You can also build an XRE directory by extracting [1] and [2] into the same directory
 [1] ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/firefox-24.0a2.en-US.linux-x86_64.tar.bz2
 [2] ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/firefox-24.0a2.en-US.linux-x86_64.tests.zip

2) szip

You now need the szip binary under either $PATH, or be pointed to by $SZIP. You can download the binary at http://people.mozilla.com/~bgirard/szip/linux/szip

3) addr2line

I built an improved addr2line tool that includes the patch in [3]. You just need to replace the old copy with the new one at https://people.mozilla.com/~nchen/arm-eabi-addr2line.tar.bz2  It also needs to be under $PATH to be run.

 [3] http://sourceware.org/ml/binutils/2013-03/msg00193.html

Things should work with these in place. Thanks!
Sorry, the directions above for building the XRE directory was wrong:

You can build an XRE directory by extracting [2] first into a directory, and then extracting [1] into /bin/ under that directory.

 [1] ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/firefox-24.0a2.en-US.linux-x86_64.tar.bz2
 [2] ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/firefox-24.0a2.en-US.linux-x86_64.tests.zip
AFAIK this bug is now fixed and the profiles being generated are valid. Yes?
Flags: needinfo?(nchen)
Yes. Are you still running into Bug 903136 (ANR in broadcasting HealthReportUploadStartReceiver intent)?
Flags: needinfo?(nchen)
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.