Closed Bug 1308957 Opened 8 years ago Closed 8 years ago

Commit messages containing : can cause too-long property names, and xpcshell blowups can cause very long lists for blobber_files property

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aselagea, Assigned: nthomas)

References

Details

Attachments

(2 files, 2 obsolete files)

From #buildduty channel: 

Mon 08:15:27 PDT [4037] buildbot-master85.bb.releng.scl3.mozilla.com:Command Queue is CRITICAL: 2 dead items (http://m.mozilla.org/Command+Queue)

Taking a look at the command queue log for such a job, it can be noticed the following error:
"  File "/builds/buildbot/build1/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
sqlalchemy.exc.DataError: (DataError) (1406, "Data too long for column 'name' at row 1") 'INSERT INTO properties (name, source, value) VALUES (%s, %s, %s)' ('unnecessary. The functions involved can instead return mozilla', 'SetProperty Step', '":pkix::Result,"')
"

The issue seemed to be caused by setting up a multi-line buildbot property, as shown below:
https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64-st-an-debug/1476105077/autoland-macosx64-st-an-debug-bm85-build1-build682.txt.gz
I think we need two fixes:

- mozharness shouldn't output these multi-line properties. ScriptFactory doesn't handle them properly
- postrun.py (from bbcustom) needs to truncate these before adding into the db
I think the issue is the way we parse messages, rather than multiline comments. The commmit with the problem (https://hg.mozilla.org/integration/autoland/rev/12c51a960f265258615dcb04304509d8499bb975) has this message:

Bug 1296317 - Stop calling PR_SetError() in VerifyCert() and VerifySSLServerCert(). r=keeler

The PR_SetError() + PR_GetError() pattern currently used is error prone and
unnecessary. The functions involved can instead return mozilla::pkix::Result,
which is equally expressive and more robust.

MozReview-Commit-ID: Hkd39eqTvds
-----

mozharness successfully writes that out to a text file, then ScriptFactory use cat to echo them back again (https://hg.mozilla.org/build/buildbotcustom/file/e80e5cbe992f/process/factory.py#l4513). The issue is the way extractProperties() parses them again:
(https://hg.mozilla.org/build/buildbotcustom/file/e80e5cbe992f/process/factory.py#l4513)

def extractProperties(rv, stdout, stderr):
    props = {}
    stdout = stdout.strip()
    for l in filter(None, stdout.split('\n')):
        e = filter(None, l.split(':', 1))
        if len(e) == 2:
            props[e[0]] = e[1].strip()
    return props

So the split finds a ':' in 'mozilla::pkix::Result', and a line that would normally be dropped is included. A multiline comment usually picks up only the first line, and possibly a MozReview-Commit-ID (unknown if that is on purpose or accident).
Got the logs uploaded with a little hot-patch in /builds/buildbot/build1/buildbotcustom:

diff --git a/status/db/model.py b/status/db/model.py
--- a/status/db/model.py
+++ b/status/db/model.py
@@ -146,6 +146,9 @@ class Property(Base):

         new_props = set(names) - set([p.name for p in retval])
         for name in new_props:
+       	    if name.startswith('unnecessary. The functions'):
+                log.info('ignoring bogus property')
+       	       	continue
             p = cls(name=unicode(name), value=props[name],
                     source=unicode(props.getPropertySource(name)))
             retval.append(p)
Summary: Multi-line properties causing dead items in the command queues → Commit messages containing : can cause too-long property names
https://hg.mozilla.org/integration/autoland/rev/b7c0df58a2f4e68d719ba9d49fee3c7c807fc10a has the same problem:

Bug 1300895 - Unprefix CSS multi-column properties, but add back prefixed aliases via nsCSSPropAliasList.h r=dbaron

Also make necessary updates to tests like:
 1. unprefixing in property_database.js and added new aliases there
...
Noticed more related alerts in #buildduty today.

Thu 06:25:31 PDT [4053] buildbot-master138.bb.releng.use1.mozilla.com:Command Queue is CRITICAL: 3 dead items (http://m.mozilla.org/Command+Queue)
16:34:02 Thu 06:34:01 PDT [4055] buildbot-master136.bb.releng.scl3.mozilla.com:Command Queue is CRITICAL: 1 dead item (http://m.mozilla.org/Command+Queue)
16:39:02 Thu 06:39:01 PDT [4057] buildbot-master132.bb.releng.scl3.mozilla.com:Command Queue is CRITICAL: 2 dead items (http://m.mozilla.org/Command+Queue)
16:55:32 Thu 06:55:31 PDT [4059] buildbot-master138.bb.releng.use1.mozilla.com:Command Queue is CRITICAL: 3 dead items (http://m.mozilla.org/Command+Queue)

@nthomas: any chance you could take a look?

Thanks!
Flags: needinfo?(nthomas)
They're having a problem parsing:

blobber_files: '{"5A5FD1B6-C630-48A6-8F4F-FBBB1C4B9AD9.extra": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/2f5023cdb1c98de496d792a57b74785933d9ad78f1e8de4cd13071afcd3efdc3a4a21ee35744f7a9eac12a2530ce37f97ad63a26dd8ceb055b2f569a78ba6b7d", "011D3BA3-7DF5-49B8-80E8-758F78B458D7.extra": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/4523feaeff094b072d5dc04d578b39bd508a1e9e9a8c4e8992441eac3594b36599cf72b4df1d0eb05c31760a92eb36a151ce1f4b2ca9d3bfe7bc4537c1793452", "542078BC-DA67-4516-905A-ECF84C0B63B4.dmp": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/176c3834eabdebed01f92cf8fe4d78717654ab389d1027846037b8f40bc29468f9f21439320f10e04a7860b1dfe57fef864c5dae2323afc5fc237ee2ca434a72", "28805F96-6501-4D3F-A02B-9B40E6291581.extra": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/f52ae5e61b88d2ba68555384d517a382abbaa617276d18fdb49c1c345deea5e08339524f645d5de59b49e45fef314ef2b1aef8932a10c74968a4d3758b21e82a",...  

66158 characters in all. In the buildbot UI that turns into:

Name: blobber_files	
Value: {"5A5FD1B6-C630-48A6-8F4F-FBBB1C4B9AD9.extra": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/2f5023cdb1c98de496d792a57b74785933d9ad78f1e8de4cd13071afcd3efdc3a4a21ee35744f7a9eac12a2530ce37f97ad63a26dd8ceb055b2f569a78ba6b7d", "011D3BA3-7DF5-49B8-80E8-758F78B458D7.extra": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/4523feaeff094b072d5dc04d578b39bd508a1e9e9a8c4e8992441eac3594b36599cf72b4df1d0eb05c31760a92eb36a151ce1f4b2ca9d3bfe7bc4537c1793452", "542078BC-DA67-4516 .. [property value too long]

The 'update status db' code in postrun hits an error on the length of the property value

sqlalchemy.exc.DataError: (DataError) (1406, "Data too long for column 'value' at row 1") 'INSERT INTO properties (name, source, value) VALUES (%s, %s, %s)' ('blobber_files', 'SetProperty Step', '"{\\"5A5FD1B6-C630-48A6-8F4F-FBBB1C4B9AD9.extra\\": \\"http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/2f5023cdb1c98de496d792a57b74785933d9ad78f1e8de4cd13071afcd3efdc3a4a21ee35744f7a9eac12a2530ce37f97ad63a26dd8ceb055b2f569a78ba6b7d\\", \\"011D3BA3-7DF5-49B8-80E8-758F78B458D7.extra\\": \\"http://mozilla-releng-blobs.s3.amazonaws.com

We should probably just truncate the value, but I want to check with catlee on consequences.
Flags: needinfo?(nthomas)
Actually that's slightly different from the original point of this bug. There's no problem parsing, it's just the value is too long.
Since we've been silently truncating for a long time with MyISAM there are no downstream consumers to be concerned about, and we can explicitly truncate at submission time. I'll work on a patch for status db submission.
buildbot.properties.value has type TEXT, which http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html#idm140521168503584 says has a nominal limit of 65534. However some testing revealed it's actually somewhere between 60000 and 65000, and I didn't want to spend a lot of time figuring out exactly the right value. Have to cast into unicode or there are complaints about props[name] not being hashable.
Assignee: nobody → nthomas
Attachment #8803291 - Flags: review?(catlee)
Please consider the two [:40] truncations there too, they were originally landed as a quick fix without review.
I wonder if this was the cause of bug 1312002?
See Also: → 1312002
Comment on attachment 8803291 [details] [diff] [review]
[buidlbotcustom] 60000 chars should be enough for anyone

So this won't do the job. How did you go with looking at truncating the JSON ?
Attachment #8803291 - Attachment is obsolete: true
Attachment #8803291 - Flags: review?(catlee)
Something to try next time we hit a problem, in a quite specific way - eg using a separate buildbotcustom and calling it manually; or shutting down command_runner, patching, and making a single call manually - so that we don't impact a wide swath of jobs. Then check the db, and monitor for alerts for builds-4hr.js & similar.

The idea here is that it could return invalid json, but that's nothing different from what we were getting with MyISAM without known fallout. It just truncates in a way that doesn't modify the format/serialisation of what's in the DB, which is what caused bug 1312002.
Turns out we can slice a full 2^16 with this method. This is the comparison of a (randomly-selected) job that run with the current code, and one I submitted with using the fix. There is a duplicated 'branch', not sure what's up with that.
Attachment #8804932 - Attachment is obsolete: true
Haven't seen any fallout from this, and redid the checks from query from bug 1312002 comment #0.
Attachment #8805312 - Flags: review?(catlee)
Summary: Commit messages containing : can cause too-long property names → Commit messages containing : can cause too-long property names, and xpcshell blowups can cause very long lists for blobber_files property
Attachment #8805312 - Flags: review?(catlee) → review+
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: