Closed Bug 1334641 Opened 3 years ago Closed 3 years ago

Patch libxcb on Ubuntu 12.04 testers to include the fix for "xcb_conn.c:186: write_vec: Assertion `!c->out.queue_len' failed"

Categories

(Testing :: General, defect)

defect
Not set

Tracking

(firefox52 fixed, firefox-esr52 fixed, firefox53 fixed, firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 --- fixed
firefox-esr52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: botond, Assigned: botond)

References

Details

Attachments

(3 files)

libxcb <= 1.10 has a race condition that results in frequent intermittent failures in our mochitests (bug 1293474).

It would be good to upgrade the libxcb version used by our Linux (Ubuntu 12.04) builders to a newer version to avoid this.
See bug 975216 where this was done previously.
The Taskcluster Ubuntu 12.04 testers are run in a docker image built from this Dockerfile:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/docker/desktop-test/Dockerfile

You can patch that in-tree to fix them. It may not be worth the effort to fix the buildbot ones given that they're nearing EOL.
Summary: Upgrade libxcb on Ubuntu 12.04 builders to >= 1.11 → Upgrade libxcb on Ubuntu 12.04 testers to >= 1.11
We already have special handling for libxcb:

https://dxr.mozilla.org/mozilla-central/rev/07d7ecbf77e3be59797f16234d357a02bb38ed8b/taskcluster/docker/recipes/ubuntu1204-test-system-setup.sh#172

That refers back to bug 975216, but it looks like that is version 1.8.1-2.
So, as a proof-of-concept, I tried modifying the setup script [1] to install the patched version of libxcb 1.10 that I built back in bug 1293474 comment 3.

However, tests are failing [2] with the error:

[task 2017-01-28T01:22:58.581737Z] 01:22:58     INFO -  /home/worker/workspace/build/tests/bin/xpcshell: error while loading shared libraries: libxcb-shm.so.0: wrong ELF class: ELFCLASS64

I'm not sure why we're getting this error. The libxcb repo that I built does include the i386 version of the libxcb-shm0 package (and all other packages), and I even modified the setup script to install the package libxcb-shm0:i386 explicitly [3].

Any ideas as to why we're still getting this error?


[1] https://hg.mozilla.org/try/rev/b7f7cf6c52f11d583ca4b71f8ed941edf171d78c
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=178d0d230582e38f9ee0e70591d2c615d4f6db41
[3] https://hg.mozilla.org/try/rev/b7f7cf6c52f11d583ca4b71f8ed941edf171d78c#l1.41
this is odd to be seeing this- the linux test images support both 64 and 32 bit images.

:gbrown, anything pop out here at you?
Flags: needinfo?(gbrown)
Sorry, no, I am puzzled too.

I thought I might be able to see something in the image-building log (look for "I(dt)" in your try push): https://public-artifacts.taskcluster.net/P8dGo5_BSii0-mfDBmjQ6w/0/public/logs/live_backing.log

There is a lot of libxcb activity in there, but things seem to be installed as I would expect. I do see:

Setting up libgtk2.0-0:i386 (2.24.10-0ubuntu6.3) ...
[91m/usr/lib/i386-linux-gnu/libgtk2.0-0/gtk-query-immodules-2.0: error while loading shared libraries: libxcb.so.1: wrong ELF class: ELFCLASS64
...
Setting up libgtk-3-0:i386 (3.4.2-0ubuntu0.9) ...
[91m/usr/lib/i386-linux-gnu/libgtk-3-0/gtk-query-immodules-3.0: error while loading shared libraries: libxcb.so.1: wrong ELF class: ELFCLASS64

Maybe it would be better to install the new libxcb later in the setup script, after gtk??
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #6)
> Maybe it would be better to install the new libxcb later in the setup
> script, after gtk??

Unfortunately, that did not help [1].

Interestingly, those same errors are present in the image-building log, *before* the install the newer libxcb...

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=9851c2ec61e4bb77b064c06c65d3b15dbf570cea
(In reply to Botond Ballo [:botond] from comment #7)
> (In reply to Geoff Brown [:gbrown] from comment #6)
> > Maybe it would be better to install the new libxcb later in the setup
> > script, after gtk??
> 
> Unfortunately, that did not help [1].
> 
> Interestingly, those same errors are present in the image-building log,
> *before* the install the newer libxcb...

So I realized that's because we were still setting up the repository with the newer libxcb before installing GTK, and the GTK install brought that in implicitly as a dependency.

Tweaking the order of operations a bit more, I was able to get rid of the elfclass error in the image-building log, but it's still present in the test jobs and causing them to fail:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fead038b2200ce3e7c8da1c27c3a7468b7440f6c
I started a One-Click Loaner on the try revision from comment 8 and had a quick look:

root@taskcluster-worker:/# ls -l /usr/lib/x86_64-linux-gnu/libxcb-shm*
-rw-r--r-- 1 root root 10146 Aug 15 20:25 /usr/lib/x86_64-linux-gnu/libxcb-shm.a
lrwxrwxrwx 1 root root    19 Aug 15 20:25 /usr/lib/x86_64-linux-gnu/libxcb-shm.so -> libxcb-shm.so.0.0.0
lrwxrwxrwx 1 root root    19 Aug 15 20:25 /usr/lib/x86_64-linux-gnu/libxcb-shm.so.0 -> libxcb-shm.so.0.0.0
-rw-r--r-- 1 root root 14432 Aug 15 20:25 /usr/lib/x86_64-linux-gnu/libxcb-shm.so.0.0.0
root@taskcluster-worker:/# ls -l /usr/lib/i386-linux-gnu/libxcb-shm*
lrwxrwxrwx 1 root root    19 Aug 15 20:30 /usr/lib/i386-linux-gnu/libxcb-shm.so.0 -> libxcb-shm.so.0.0.0
-rw-r--r-- 1 root root 14432 Aug 15 20:30 /usr/lib/i386-linux-gnu/libxcb-shm.so.0.0.0
root@taskcluster-worker:/# cmp /usr/lib/x86_64-linux-gnu/libxcb-shm.so.0.0.0 /usr/lib/i386-linux-gnu/libxcb-shm.so.0.0.0
root@taskcluster-worker:/# 

It seems odd that the i386 and x86_64 versions are identical. Do the file sizes match what you built :botond?
(In reply to Geoff Brown [:gbrown] from comment #9)
> It seems odd that the i386 and x86_64 versions are identical. Do the file
> sizes match what you built :botond?

They do, and the "file" command confirms that the supposedly-i386 .so is in fact an "ELF 64-bit LSB shared object".

Moreover, this appears to be the case for all the shared objects in the packages that I built, not just libxcb-shm.so.

==> So, the packages I built are definitely wrong. Not sure why, but I'll look into it.

Anyways, thanks for catching this!
(In reply to Botond Ballo [:botond] from comment #10)
> They do, and the "file" command confirms that the supposedly-i386 .so is in
> fact an "ELF 64-bit LSB shared object".
> 
> Moreover, this appears to be the case for all the shared objects in the
> packages that I built, not just libxcb-shm.so.

Ok, I was able to fix this.

I was following these steps [1] for building the packages. Note in particular the command for building the 32-bit packages was:

  dpkg-buildpackage -uc -us -b -a i386 -d

It appeas that, even though "-a i386" was specified as an argument, this command would build 64-bit packages.

I Googled around a bit, found a forum post where someone else has run into this [2], and applied the workaround described in that post.

Now that we actually have 32-bit packages, though, we quickly run into the following dependency error:


The following packages have unmet dependencies:
 libxcb-glx0:i386 : Depends: libc6-i386:i386 (>= 2.4) but it is not installable
                    Depends: libxcb1:i386 but it is not going to be installed


It looks like this newer version of libxcb depends on a newer version of libc than the version present on our Ubuntu 12.04 testers.

I guess the options for solving this are:

  - Try to install a newer version of libc on the testers?
    This sounds difficult, since almost everything depends on libc.

  - See if the patch to fix the race condition in libxcb applies
    to an older version (preferably the version we were installing
    before)?

  - Something else?

I'm open to suggestions/ideas.

[1] https://bug1293474.bmoattachments.org/attachment.cgi?id=8781242
[2] https://ubuntuforums.org/showthread.php?t=1469121&p=12599759#post12599759
iirc libc cannot be updated on 12.04 :(  possibly we push faster/harder for 16.04
I'm trying this approach:

>   - See if the patch to fix the race condition in libxcb applies
>     to an older version (preferably the version we were installing
>     before)?

Thanks to help from several people (thanks rail, dustin, and jcristau!), I discovered that:

  - The libxcb 1.8 package that we're currently isntalling on the testers
    was built using the steps described in bug 975216 comment 8.

      - These steps also require calling "pbuilder --create" first [1].

  - Using pbuilder, as in those steps, is the proper way to build 32-bit
    packages, not the workaround referenced in the previous comment.

So, as a first step, I tried following these steps to rebuild the libxcb 1.8 package from source.

The source package referenced in bug 975215 comment 8 isn't there any more, but it can be found at [2], so I used that.

However, I ran into errors at the "pbuilder --create" step.

On my local Debian machine, I get the following error:

  E: Release signed by unknown key (key id 40976EAF437D05B5)
  E: debootstrap failed

I tried various things to get past this, such as installing the ubuntu-archive-keyring package, passing a --keyring argument to the pbuilder command, downloading and adding the key in question directly via "apt-key", and even passing --allow-untrusted, but I wasn't able to get past this error.

Thinking that it may be better to try building the package on a Ubuntu 12.04 installation, I got a one-click loaner on such a machine, and there the "pbuilder --create" step got further, but still failed with an error:

  W: Failure trying to run: chroot /var/cache/pbuilder/build/6707/. mount -t proc proc /proc
  E: debootstrap failed

Not sure what to make of that one. Any suggestions are appreciated.


[1] https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Build_DEBs#Create_pbuilder_images
[2] http://old-releases.ubuntu.com/ubuntu/pool/main/libx/libxcb/libxcb_1.8.1-2ubuntu2.1.dsc
I was finally able to rebuild the libxcb 1.8 package from source, by getting a physical loaner machine and installing Ubuntu 12.04 on it.

Using that package, our tester image builds successfully, and tests are passing:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f76237e74864edd81427f979e897d60b620b3011

The next step is to apply the patch to the source of this 1.8 package, and rebuild.
This is the patch rebased to apply to libxcb 1.8.
And here's a Try push with the patched build of the library:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab9893fb252a7cc141543b9713b8a76c1be8b2fc

It's looking good!

I guess the next step is to upload the patched build to tooltool?
yes, that would be a good thing- then we can land the official patch to update ubuntu 12
Rail, I would like to upload https://people-mozilla.org/~bballo/xcb-repo-1.8.1-2ubuntu2.1mozilla2.tgz to tooltool. How would I go about doing that?
Flags: needinfo?(rail)
Please file a bug similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1328846 and see https://wiki.mozilla.org/ReleaseEngineering/Tooltool for other details on how to upload.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #20)
> Please file a bug similar to
> https://bugzilla.mozilla.org/show_bug.cgi?id=1328846 and see
> https://wiki.mozilla.org/ReleaseEngineering/Tooltool for other details on
> how to upload.

Thanks - filed bug 1337137.
Depends on: 1337137
Updating bug title to reflect that we're not actually updating libxcb to version 1.11, we're just patching the current 1.8 version to include the relevant fix from 1.11.
Assignee: nobody → botond
Summary: Upgrade libxcb on Ubuntu 12.04 testers to >= 1.11 → Patch libxcb on Ubuntu 12.04 testers to include the fix for "xcb_conn.c:186: write_vec: Assertion `!c->out.queue_len' failed"
I've successfully uploaded the repository with the patched packages to tooltool, and updated the image-building script to consume them:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=81ca4fb865b0c718d40406bdd1aa4d0a25413c2f

The Try push is looking good! I think we can now get this change reviewed and landed.
Comment on attachment 8835187 [details]
Bug 1334641 - Patch the version of libxcb used by the Ubuntu 12.04 testers.

https://reviewboard.mozilla.org/r/110878/#review112322

Thank you for making this happen, this patch looks great.
Attachment #8835187 - Flags: review?(jmaher) → review+
Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e9abd869ce60
Patch the version of libxcb used by the Ubuntu 12.04 testers. r=jmaher
https://hg.mozilla.org/mozilla-central/rev/e9abd869ce60
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Whiteboard: [checkin-needed-aurora][checkin-needed-beta]
https://hg.mozilla.org/releases/mozilla-aurora/rev/787f2ba75729
Whiteboard: [checkin-needed-aurora][checkin-needed-beta] → [checkin-needed-beta]
mozilla-beta (and soon to be esr52) still runs a number of buildbot-based Linux64 tests, and we do in fact see these failures occasionally there as well. How easily would we be able to get the updated libxcb library included in those AMIs as well?
Flags: needinfo?(rail)
I think it shouldn't be too hard, just need to publish the fixed binaries to the internal apt repos, and update the puppet manifests.
Flags: needinfo?(rail)
You need to log in before you can comment on or make changes to this bug.