Closed Bug 1470127 Opened 6 years ago Closed 6 years ago

Move binary checks to a standalone script

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox-esr60 fixed, firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr60 --- fixed
firefox63 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(3 files)

      No description provided.
Attachment #8986747 - Flags: review?(core-build-config-reviews)
Apparently, dumpbin likes to return non-zero exit status.
(In reply to Mike Hommey [:glandium] from comment #3)
> Apparently, dumpbin likes to return non-zero exit status.

on 32-bits builds only. Incidentally, that means the current test silently doesn't happen too, which justifies even more this change.
Comment on attachment 8986747 [details]
Bug 1470127 - Move binary checks to a standalone script.

https://reviewboard.mozilla.org/r/252048/#review258492

A couple comments before I noticed that you had canceled your own review.  This is so much nicer than the Makefile goo.

::: python/mozbuild/mozbuild/action/check_binary.py:82
(Diff revision 2)
> +                    name = name[:-1]
> +            else:
> +                ver = None
> +            yield {
> +                'addr': int(addr, 16),
> +                'size': int(size, 0),

I assume the `0` here means "autodetect"?  Might be worth a comment about how readelf sometimes prints out in decimal and sometimes in hex, so we'll let Python take care of the parsing.

::: python/mozbuild/mozbuild/action/check_binary.py:143
(Diff revision 2)
> +@memoize
> +def guess_nsmodule_size():
> +    if buildconfig.substs.get('MOZ_ASAN'):
> +        return 64
> +    if buildconfig.substs.get('HAVE_64BIT_BUILD'):
> +        return 8
> +    return 4

We could just make this a constant, rather than `@memoize`'ing the function?

```
if buildconfig.substs.get('MOZ_ASAN'):
    guessed_nsmodule_size = 64
elif ...
```
So the reason dumpbin fails on 32-bits windows is that the MSVC tooltool package doesn't contain a mspdbcore.dll in Hostx64/x86.
(In reply to Mike Hommey [:glandium] from comment #6)
> So the reason dumpbin fails on 32-bits windows is that the MSVC tooltool
> package doesn't contain a mspdbcore.dll in Hostx64/x86.

This is only a problem because the subprocess doesn't inherit any of the environment variables from the parent process, such as PATH, which on Windows, is used for DLL lookups.
Blocks: 1471132
Attachment #8986747 - Flags: review?(core-build-config-reviews)
Comment on attachment 8986747 [details]
Bug 1470127 - Move binary checks to a standalone script.

https://reviewboard.mozilla.org/r/252048/#review259630

All really minor things.

::: commit-message-f885a:4
(Diff revision 6)
> +Bug 1470127 - Move binary checks to a standalone script. r?build
> +
> +We perform, on the binaries we build, a series of check, that are
> +implemented as half-backed make commands, invoked after linking them.

Nit: did you mean "half-baked" here?

::: python/mozbuild/mozbuild/action/check_binary.py:46
(Diff revision 6)
> +}
> +
> +if buildconfig.substs.get('MOZ_ASAN'):
> +    GUESSED_NSMODULE_SIZE = 64
> +elif buildconfig.substs.get('HAVE_64BIT_BUILD'):
> +    GUESSED_NSMODULE_SIZE =  8

Nit: two spaces after the `=`.

::: python/mozbuild/mozbuild/action/check_binary.py:244
(Diff revision 6)
> +                if '[libmozglue.so]' in value:
> +                    mozglue = n
> +                elif '[libc.so]' in value:
> +                    libc = n

Is it better here to drop the `[]`, so we depend less on the formatting of `readelf -d`?

::: python/mozbuild/mozbuild/action/check_binary.py:301
(Diff revision 6)
> +    if options.host and options.target:
> +        print('Only one of --host or --target can be given',
> +              file=sys.stderr)
> +        return 1

Do we want to check for `not options.host and not options.target`?
Attachment #8986747 - Flags: review+
(In reply to Nathan Froyd [:froydnj] from comment #12)
> Comment on attachment 8986747 [details]
> Bug 1470127 - Move binary checks to a standalone script.
> 
> https://reviewboard.mozilla.org/r/252048/#review259630
> 
> All really minor things.
> 
> ::: commit-message-f885a:4
> (Diff revision 6)
> > +Bug 1470127 - Move binary checks to a standalone script. r?build
> > +
> > +We perform, on the binaries we build, a series of check, that are
> > +implemented as half-backed make commands, invoked after linking them.
> 
> Nit: did you mean "half-baked" here?

Indeed.

> ::: python/mozbuild/mozbuild/action/check_binary.py:244
> (Diff revision 6)
> > +                if '[libmozglue.so]' in value:
> > +                    mozglue = n
> > +                elif '[libc.so]' in value:
> > +                    libc = n
> 
> Is it better here to drop the `[]`, so we depend less on the formatting of
> `readelf -d`?

This is Android-only, and this is the same test as what we already had in the .mk. OTOH, the value of value is something like "Shared library: [library.so]", where "Shared library" depends on the locale (although, not with the ndk readelf, which doesn't have the locale files). I'm not too thrilled trying to check this more generically. I'll punt on this one.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/9f4d1a4296ba
Move binary checks to a standalone script. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/9f4d1a4296ba
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
I get
  TEST-UNEXPECTED-FAIL | check_nsmodules | xul.dll | NSModules are not adjacent
on a local build now, so what do I do?
Flags: needinfo?(nfroyd)
Flags: needinfo?(mh+mozilla)
(In reply to Jorg K (GMT+2) from comment #17)
> I get
>   TEST-UNEXPECTED-FAIL | check_nsmodules | xul.dll | NSModules are not
> adjacent
> on a local build now, so what do I do?

If this is a Firefox build, you should file a follow-up bug.

If this is a comm-central build, you get to investigate what comm-central is doing to trigger this; you may need to port changes from bug 1471132 (?).
Flags: needinfo?(nfroyd)
Flags: needinfo?(mh+mozilla)
Attached file build-error.txt
It's a TB build. How could be port any of bug 1471132 since this is all in xpcom, toolkit and python?

For now, I've commented out the line locally. Does the test matter much? I don't see the issue on server builds.
(In reply to Jorg K (GMT+2) from comment #20)
> For now, I've commented out the line locally. Does the test matter much? I
> don't see the issue on server builds.

If that test is failing, it probably means your binary is going to crash at startup, or other bad things are going to happen.
Yes, the binary didn't start.

I did a clobber build and got the same again. So I just can't work :-(

(In reply to Nathan Froyd [:froydnj] from comment #18)
> If this is a comm-central build, you get to investigate what comm-central is
> doing to trigger this; you may need to port changes from bug 1471132 (?).
I think you're over-estimating my skills or the skill set of the TB team in general. We use Mozilla's software as a toolbox and don't have any insight into some of its parts. You're telling the taxi driver to debug his board computer's firmware :-( - There was a similar comment in bug 1399746 comment #15 (and above) and it took the peers of that module five months to fix it, only until it affected Firefox.
Depends on: 1471665
(In reply to Jorg K (GMT+2) from comment #22)
> I did a clobber build and got the same again. So I just can't work :-(

That's unfortunate.  Maybe something in your mozconfig or the particular compiler you're using?  The output:

 2:51.21 80cc478 8 __start_kPStaticModules
 2:51.21 80cc588 8 nsLDAPProtocolModule_NSModule
 2:51.21 80cc598 8 msgMapiModule_NSModule
 2:51.21 80cc5a8 8 nsMailModule_NSModule

indicates that your *_NSModule symbols are 16 bytes apart, rather than the requisite 8 bytes (assuming you're compiling for 64-bit Windows), which is problematic.  That suggests some weirdness in your mozconfig or your compiler.

(In reply to Nathan Froyd [:froydnj] from comment #18)
> > If this is a comm-central build, you get to investigate what comm-central is
> > doing to trigger this; you may need to port changes from bug 1471132 (?).
> I think you're over-estimating my skills or the skill set of the TB team in
> general. We use Mozilla's software as a toolbox and don't have any insight
> into some of its parts. You're telling the taxi driver to debug his board
> computer's firmware :-( - There was a similar comment in bug 1399746 comment
> #15 (and above) and it took the peers of that module five months to fix it,
> only until it affected Firefox.

Many of us are willing to answer specific questions about things you don't understand, either about what might be happening, what you could do to investigate, or what might be good next steps to take.  Perhaps you could adopt those courses of action rather than the one that you have taken in this bug, the previously cited bug, and other places.
(In reply to Nathan Froyd [:froydnj] from comment #23)
> indicates that your *_NSModule symbols are 16 bytes apart, rather than the
> requisite 8 bytes (assuming you're compiling for 64-bit Windows), which is
> problematic.  That suggests some weirdness in your mozconfig or your
> compiler.

Thanks for the comment. Both are stock-standard:
ac_add_options --enable-application=comm/mail
ac_add_options --enable-calendar
# ac_add_options --enable-debug  <-- just commented that out so I can compile/link again.
ac_add_options --target=x86_64-pc-mingw32
ac_add_options --host=x86_64-pc-mingw32
ac_add_options --enable-warnings-as-errors
ac_add_options --enable-tests

Otherwise I'm using stock-standard VS2017 which last worked yesterday.

So what could I do to investigate? Given that I just rebuilt in non-debug mode since I need to conduct reviews and land patches, I'm not likely to scrap that and compile again in debug mode.

Maybe you can understand the the TB team including me need to interface with all modules of Mozilla: DOM, editor, XUL/XBL, graphics, add-ons, database, autocomplete, l10n, build system, and we don't usually have the time to dig into anything inside the toolbox too deeply. Personally, I've kept TB up and running for about two years now, so even if you don't like my style, it's been quite successful. The said bug 1399746 was a good example for the strategy *not* to debug deep into the bowels of Mozilla core software, as least not in areas where I have zero expertise. I have fixed some editor and spellcheck bugs back in 2015/2016.

Let's see whether we get people compiling FF debug with "me toos" in bug 1471665.
Depends on: 1471685
Blocks: 1054034
Depends on: 1472496

Comment on attachment 9035855 [details] [diff] [review]
Bug 1470127 - Move binary checks to a standalone script. r=froydnj (esr60)

This is one of series of patches I am requesting uplift to esr60. Please don't uplift any if the entire series won't go. The whole series will need to go in one push.

This try run (applied on tip-of-esr60 as of an hour ago; and beginning with 'Bug 1491901 - move MK*SHLIB to moz.configure') represents the patch series. It must be applied in that order: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0659d6e265f3b624ad6fbac0c9cd7ce246094596 (If this try run doesn't complete successfully, I will investigate and figure out why)

The uplift request form is the same for all of the patch series; see https://bugzilla.mozilla.org/show_bug.cgi?id=1491901#c10

Attachment #9035855 - Flags: approval-mozilla-esr60?

Comment on attachment 9035855 [details] [diff] [review]
Bug 1470127 - Move binary checks to a standalone script. r=froydnj (esr60)

approved for 60.5esr

Attachment #9035855 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
You need to log in before you can comment on or make changes to this bug.