Intermittent Strings.huge_capacity | Value of: X.SetCapacity(nsString::size_type(-1)/4 - Y, fallible)

RESOLVED FIXED in Firefox 65

Status

()

defect
P5
normal
RESOLVED FIXED
10 months ago
8 months ago

People

(Reporter: intermittent-bug-filer, Assigned: hsivonen)

Tracking

({intermittent-failure})

unspecified
mozilla65
Points:
---

Firefox Tracking Flags

(firefox65 fixed)

Details

(Whiteboard: [stockwell disable-recommended])

Attachments

(2 attachments)

Filed by: ccoroiu [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=200999814&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/OrQBpB__R9eDSegWKlmJvg/runs/0/artifacts/public/logs/live_backing.log

task 2018-09-22T17:27:09.427Z] 17:27:09     INFO -  TEST-START | Strings.huge_capacity
[task 2018-09-22T17:27:09.428Z] 17:27:09  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: i.SetCapacity(nsString::size_type(-1)/4 - 1000, fallible)
[task 2018-09-22T17:27:09.429Z] 17:27:09     INFO -    Actual: false
[task 2018-09-22T17:27:09.429Z] 17:27:09     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1430
[task 2018-09-22T17:27:09.430Z] 17:27:09  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: j.SetCapacity(nsString::size_type(-1)/4 - 1000, fallible)
[task 2018-09-22T17:27:09.430Z] 17:27:09     INFO -    Actual: false
[task 2018-09-22T17:27:09.431Z] 17:27:09     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1434
[task 2018-09-22T17:27:10.178Z] 17:27:10  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: k.SetCapacity(nsString::size_type(-1)/4 - 1001, fallible)
[task 2018-09-22T17:27:10.179Z] 17:27:10     INFO -    Actual: false
[task 2018-09-22T17:27:10.179Z] 17:27:10     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1439
[task 2018-09-22T17:27:10.180Z] 17:27:10  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: k.SetCapacity(nsString::size_type(-1)/4 - 998, fallible)
[task 2018-09-22T17:27:10.180Z] 17:27:10     INFO -    Actual: false
[task 2018-09-22T17:27:10.181Z] 17:27:10     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1440
[task 2018-09-22T17:27:12.582Z] 17:27:12  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: n.SetCapacity((nsString::size_type(-1)/2 - sizeof(nsStringBuffer)) / 2 - 2, fallible)
[task 2018-09-22T17:27:12.582Z] 17:27:12     INFO -    Actual: false
[task 2018-09-22T17:27:12.583Z] 17:27:12     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1458
[task 2018-09-22T17:27:12.583Z] 17:27:12  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: n1.SetCapacity((nsCString::size_type(-1)/2 - sizeof(nsStringBuffer)) / 1 - 2, fallible)
[task 2018-09-22T17:27:12.584Z] 17:27:12     INFO -    Actual: false
[task 2018-09-22T17:27:12.584Z] 17:27:12     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1463
[task 2018-09-22T17:27:12.584Z] 17:27:12  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | test completed (time: 3152ms)
[task 2018-09-22T17:27:12.585Z] 17:27:12     INFO -  TEST-START | Strings.tofloat
[task 2018-09-22T17:27:12.585Z] 17:27:12     INFO -  TEST-PASS | Strings.tofloat | test completed (time: 3ms)
(In reply to Andreea Pavel [:apavel] from comment #10)
> The spike here seems to be from Bug 1507267  
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&searchStr=linux%2Cx64%2Cquantumrender%2Cdebug%2Ctest-
> linux64-qr%2Fdebug-
> gtest%2C%28gtest%29&tochange=9a65ee9355e8b014a8ff5fdb28e0f500f0aa652f&fromcha
> nge=95393acee0ea77580738dbd8c7f3238798c23510&selectedJob=211913280 

What is gTest and how is it related to awsy tests? I ask, because the change from bug 1507267 only affects the latter, and to note only enables a pref which was accidentally disabled on Nov 12th via bug 1506643.

So I assume that a change which has been landed in-between those to landings has been caused this spike, but was not noticed because the `about:newtab` page was turned off. 

Here the list of changes:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=88fb9e85339c&tochange=f1455bc47683
Flags: needinfo?(igoldan)
Flags: needinfo?(hskupin)
Flags: needinfo?(erahm)
Summary: Intermittent Strings.huge_capacity | Value of: i.SetCapacity(nsString::size_type(-1)/4 - 1000, fallible) → Intermittent Strings.huge_capacity | Value of: X.SetCapacity(nsString::size_type(-1)/4 - Y, fallible)
(In reply to Henrik Skupin (:whimboo) from comment #12)
> (In reply to Andreea Pavel [:apavel] from comment #10)
> > The spike here seems to be from Bug 1507267  
> > 
> > https://treeherder.mozilla.org/#/
> > jobs?repo=autoland&searchStr=linux%2Cx64%2Cquantumrender%2Cdebug%2Ctest-
> > linux64-qr%2Fdebug-
> > gtest%2C%28gtest%29&tochange=9a65ee9355e8b014a8ff5fdb28e0f500f0aa652f&fromcha
> > nge=95393acee0ea77580738dbd8c7f3238798c23510&selectedJob=211913280 
> 
> What is gTest and how is it related to awsy tests?

I think GTest runs Firefox's unit tests. I don't know anything more about this matter and can only confirm :whimboo's comment about the relation with the AWSY tests.
Flags: needinfo?(igoldan)
(In reply to Andreea Pavel [:apavel] from comment #10)
> The spike here seems to be from Bug 1507267  
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&searchStr=linux%2Cx64%2Cquantumrender%2Cdebug%2Ctest-
> linux64-qr%2Fdebug-
> gtest%2C%28gtest%29&tochange=9a65ee9355e8b014a8ff5fdb28e0f500f0aa652f&fromcha
> nge=95393acee0ea77580738dbd8c7f3238798c23510&selectedJob=211913280 
> 
> Henrik can you please take a look see if that's possible?

This change does not affect gtests. The test is pretty weird, it seems to be trying to fragment the address space? I'm not sure why we ever thought this would be deterministic. It's possible an infrastructure change made this more flaky.
Flags: needinfo?(erahm)
Henri can you take a look at this? It appears the test [1] is using `SetCapacity` to clear the strings, you changed that behavior [2] in bug 1487341 so it's possible we're exhausting contiguous address space.

[1] https://searchfox.org/mozilla-central/rev/55895c49f55073d82d977cb74ec1d3a71ae4b25f/xpcom/tests/gtest/TestStrings.cpp#1395
[2] https://searchfox.org/mozilla-central/rev/55895c49f55073d82d977cb74ec1d3a71ae4b25f/xpcom/string/nsTSubstring.cpp#1000-1002
Flags: needinfo?(hsivonen)
(In reply to Eric Rahm [:erahm] from comment #15)
> Henri can you take a look at this? It appears the test [1] is using
> `SetCapacity` to clear the strings, you changed that behavior [2] in bug
> 1487341 so it's possible we're exhausting contiguous address space.

There's that. Another issue is that since bug 1486711 in debug builds we actually write to the buffer meaning it's not just a matter of address space.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Component: XPCOM → String
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #17)
> Another issue is that since bug 1486711 in debug builds we
> actually write to the buffer meaning it's not just a matter of address space.

I forgot that bug 1490972 made us write less, so, yeah, mostly a matter of address space.
Hi guys, thanks for clarifying and working on this.
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5b09820ac9bd
Free string buffer using Truncate() after testing SetCapacity(). r=erahm
https://hg.mozilla.org/mozilla-central/rev/5b09820ac9bd
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
This is still happening: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&selectedJob=213461473&revision=8b245cc1086f912f84b54a6af13f015404af8e14

https://treeherder.mozilla.org/logviewer.html#?job_id=213461473&repo=mozilla-central&lineNumber=1185

[task 2018-11-22T22:39:30.830Z] 22:39:30     INFO -  TEST-START | Strings.huge_capacity
[task 2018-11-22T22:39:35.806Z] 22:39:35  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: k.SetCapacity(nsString::size_type(-1)/4 - 1001, fallible)
[task 2018-11-22T22:39:35.807Z] 22:39:35     INFO -    Actual: false
[task 2018-11-22T22:39:35.807Z] 22:39:35     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1444
[task 2018-11-22T22:39:35.807Z] 22:39:35  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: k.SetCapacity(nsString::size_type(-1)/4 - 998, fallible)
[task 2018-11-22T22:39:35.807Z] 22:39:35     INFO -    Actual: false
[task 2018-11-22T22:39:35.807Z] 22:39:35     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1445
[task 2018-11-22T22:39:43.507Z] 22:39:43  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | test completed (time: 12675ms)
Status: RESOLVED → REOPENED
Flags: needinfo?(hsivonen)
Resolution: FIXED → ---
Target Milestone: mozilla65 → ---
(In reply to Intermittent Failures Robot from comment #26)
> 48 failures in 386 pushes (0.124 failures/push) were associated with this
> bug yesterday.
> 
> Repository breakdown:
> * mozilla-central: 11
> * try: 7
> * autoland: 12
> * mozilla-inbound: 18
> 
> Platform breakdown:
> * linux32: 1
> * android-hw-p2-8-0-arm7-api-16: 2
> * linux64-qr: 29
> * android-em-7-0-x86: 1
> * linux64: 14
> * android-hw-p2-8-0-android-aarch64: 1
> 
> For more details, see:
> https://treeherder.mozilla.org/intermittent-failures.html#/
> bugdetails?bug=1493458&startday=2018-11-22&endday=2018-11-22&tree=all

The Android failures seem misattributed.

(In reply to Noemi Erli[:noemi_erli] from comment #25)
> This is still happening:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassi
> fied&selectedJob=213461473&revision=8b245cc1086f912f84b54a6af13f015404af8e14
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=213461473&repo=mozilla-
> central&lineNumber=1185

This test is still performing very large allocations, so if the allocations that have happened previously haven't been totally deterministic, in the 32-bit case it's quite plausible that there might not be enough contiguous address space available for the very large allocation.

Curiously, though, the recent failures were on 64-bit Linux. I don't have an immediate explanation how this could occasionally fail in the 64-bit case.

erahm, do you have an idea?
Assignee: hsivonen → nobody
Flags: needinfo?(hsivonen) → needinfo?(erahm)
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=213712816&repo=mozilla-inbound&lineNumber=2977

[task 2018-11-25T02:12:34.177Z] 02:12:34     INFO -  TEST-START | Strings.huge_capacity
[task 2018-11-25T02:12:38.972Z] 02:12:38  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: k.SetCapacity(nsString::size_type(-1)/4 - 1001, fallible)
[task 2018-11-25T02:12:38.973Z] 02:12:38     INFO -    Actual: false
[task 2018-11-25T02:12:38.973Z] 02:12:38     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1444
[task 2018-11-25T02:12:38.974Z] 02:12:38  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | Value of: k.SetCapacity(nsString::size_type(-1)/4 - 998, fallible)
[task 2018-11-25T02:12:38.974Z] 02:12:38     INFO -    Actual: false
[task 2018-11-25T02:12:38.975Z] 02:12:38     INFO -  Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestStrings.cpp:1445
[task 2018-11-25T02:12:46.902Z] 02:12:46  WARNING -  TEST-UNEXPECTED-FAIL | Strings.huge_capacity | test completed (time: 12720ms)
[task 2018-11-25T02:12:46.903Z] 02:12:46     INFO -  TEST-START | Strings.tofloat
[task 2018-11-25T02:12:46.903Z] 02:12:46     INFO -  TEST-PASS | Strings.tofloat | test completed (time: 5ms)

[task 2018-11-25T02:23:45.358Z] 02:23:45     INFO -  TEST-START | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/3
[task 2018-11-25T02:23:45.358Z] 02:23:45     INFO -  Checking cipher: 2
[task 2018-11-25T02:23:45.359Z] 02:23:45     INFO -  TEST-PASS | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/3 | test completed (time: 224ms)
[task 2018-11-25T02:23:45.359Z] 02:23:45  WARNING -  TEST-UNEXPECTED-FAIL | GTest unit test: failed
[task 2018-11-25T02:23:45.359Z] 02:23:45     INFO -  Passed: 4286
[task 2018-11-25T02:23:45.359Z] 02:23:45  WARNING -  Failed: 1
[task 2018-11-25T02:23:45.360Z] 02:23:45  WARNING -  One or more unittests failed.
[task 2018-11-25T02:23:45.375Z] 02:23:45    ERROR -  gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1
[task 2018-11-25T02:23:45.381Z] 02:23:45    ERROR - Return code: 1
[task 2018-11-25T02:23:45.381Z] 02:23:45     INFO - TinderboxPrint: gtest-gtest<br/>4286/<em class="testfail">1</em>
[task 2018-11-25T02:23:45.381Z] 02:23:45    ERROR - # TBPL FAILURE #
[task 2018-11-25T02:23:45.382Z] 02:23:45  WARNING - setting return code to 2
[task 2018-11-25T02:23:45.382Z] 02:23:45    ERROR - The gtest suite: gtest ran with return status: FAILURE
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> This test is still performing very large allocations, so if the allocations
> that have happened previously haven't been totally deterministic, in the
> 32-bit case it's quite plausible that there might not be enough contiguous
> address space available for the very large allocation.

The test is gated at runtime on being 64-bit, ie sizeof(void*) == 8.

> Curiously, though, the recent failures were on 64-bit Linux. I don't have an
> immediate explanation how this could occasionally fail in the 64-bit case.
> 
> erahm, do you have an idea?

Not really but it seems to be limited to this bit of code [1]:

>    // this is okay
>    EXPECT_TRUE(k.SetCapacity(nsString::size_type(-1)/8 - 1000, fallible));
>    // these fail
>    EXPECT_TRUE(k.SetCapacity(nsString::size_type(-1)/4 - 1001, fallible));
>    EXPECT_TRUE(k.SetCapacity(nsString::size_type(-1)/4 - 998, fallible));

My guess is it has something to do with us not using `realloc` anymore [2] (which might have avoided an allocation). Unfortunately I'm really not sure what the original intention of this test is. We certainly don't want to disable the Strings test as a whole, but maybe removing the `k` portion of `huge_capacity` would be okay.

[1] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/xpcom/tests/gtest/TestStrings.cpp#1444-1445
[2] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/xpcom/string/nsTSubstring.cpp#212-216
Flags: needinfo?(erahm)
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4990602fac5a
Disable an intermittently failing part of SetCapacity tests. r=erahm
https://hg.mozilla.org/mozilla-central/rev/4990602fac5a
Status: REOPENED → RESOLVED
Closed: 8 months ago8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Assignee: nobody → hsivonen
You need to log in before you can comment on or make changes to this bug.