Closed Bug 1773604 Opened 2 years ago Closed 2 years ago

Update protobuf to version 21.2

Categories

(Toolkit :: General, task)

task

Tracking

()

RESOLVED FIXED
104 Branch
Tracking Status
firefox104 --- fixed

People

(Reporter: RyanVM, Assigned: RyanVM)

References

Details

Attachments

(4 files, 2 obsolete files)

Our current in-tree copy is version 3.11.4, which is over 2 years old at this point. There's been at least one CVE since then. Additionally, there's been a number of improvements shipped in the library for performance and memory usage. Also, we're evaluating another library right now which may require a newer release.

The update has gone mostly-smooth and passes tests, with one exception on Linux32 GTests:
https://treeherder.mozilla.org/logviewer?job_id=380721502&repo=try&lineNumber=38008

TEST-START | UrlClassifierFindFullHash.Request
[libprotobuf FATAL /builds/worker/workspace/obj-build/dist/include/google/protobuf/arenastring.h:193] CHECK failed: (reinterpret_cast<uintptr_t>(p) & kMask) == (0UL):
Redirecting call to abort() to mozalloc_abort
Hit MOZ_CRASH(Redirecting call to abort() to mozalloc_abort) at /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc_abort.cpp:35
#01: mozalloc_abort [/builds/worker/workspace/build/application/firefox/firefox + 0x633e0]
#02: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x63418]
#03: _GLOBAL__sub_I_common.cc [/builds/worker/workspace/build/application/firefox/gtest/libxul.so + 0x97731a8]
#04: google::protobuf::internal::LogFinisher::operator=(google::protobuf::internal::LogMessage&) [toolkit/components/protobuf/src/google/protobuf/stubs/common.cc:261]
#05: google::protobuf::internal::ArenaStringPtr::Set(std::string const&, google::protobuf::Arena*) [toolkit/components/protobuf/src/google/protobuf/arenastring.cc:124]
#06: mozilla::safebrowsing::CreateClientInfo() [toolkit/components/url-classifier/nsUrlClassifierUtils.cpp:166]
#07: nsUrlClassifierUtils::MakeFindFullHashRequestV4(nsTArray<nsTString<char> > const&, nsTArray<nsTString<char> > const&, nsTArray<nsTString<char> > const&, nsTSubstring<char>&) [toolkit/components/url-classifier/nsUrlClassifierUtils.cpp:467]
#08: UrlClassifierFindFullHash_Request_Test::TestBody() [toolkit/components/url-classifier/tests/gtest/TestFindFullHash.cpp:48]
#09: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [third_party/googletest/googletest/src/gtest.cc:2655]
#10: testing::Test::Run() [third_party/googletest/googletest/src/gtest.cc:2677]
#11: testing::TestInfo::Run() [third_party/googletest/googletest/src/gtest.cc:2852]
#12: testing::TestSuite::Run() [third_party/googletest/googletest/src/gtest.cc:3010]
#13: testing::internal::UnitTestImpl::RunAllTests() [third_party/googletest/googletest/src/gtest.cc:5867]
#14: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) [third_party/googletest/googletest/src/gtest.cc:2655]
#15: testing::UnitTest::Run() [third_party/googletest/googletest/src/gtest.cc:5440]
#16: mozilla::RunGTestFunc(int*, char**) [testing/gtest/mozilla/GTestRunner.cpp:156]
#17: XREMain::XRE_mainStartup(bool*) [toolkit/xre/nsAppRunner.cpp:4760]
#18: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5916]
#19: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5991]
#20: mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/Bootstrap.cpp:45]
#21: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x3f4b1]
ExceptionHandler::GenerateDump cloned child 19087
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[Socket 17086, IPC I/O Child] WARNING: [CC75F51BF8F4B549.BFABD053AAB3DE77]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:352
[Socket 17086, IPC I/O Child] WARNING: [CC75F51BF8F4B549.BFABD053AAB3DE77]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:352
[Socket 17086, IPC I/O Child] WARNING: [CC75F51BF8F4B549.BFABD053AAB3DE77]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:352
[Socket 17086, Main Thread] WARNING: Shutting down Socket process early due to a crash!: file /builds/worker/checkouts/gecko/netwerk/ipc/SocketProcessChild.cpp:177
gtest INFO | gtest | process wait complete, returncode=-11
mozcrash checking /builds/worker/workspace/build/tests/gtest for minidumps...
mozcrash INFO | Copy/paste: /builds/worker/fetches/minidump-stackwalk/minidump-stackwalk --symbols-url=https://symbols.mozilla.org/ --human /builds/worker/workspace/build/tests/gtest/3415df59-f234-c54a-6797-b10b52ff9c3c.dmp /builds/worker/workspace/build/symbols
mozcrash INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/3415df59-f234-c54a-6797-b10b52ff9c3c.dmp
mozcrash INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/3415df59-f234-c54a-6797-b10b52ff9c3c.extra
PROCESS-CRASH | gtest | application crashed [@ abort]

Through bisection, I was able to track this down to changes that landed during the protobuf 3.20 development cycle:
https://github.com/protocolbuffers/protobuf/compare/aa896f6...fc9fb72

From the 3.20 release notes:

Arenas

  • Change Repeated*Field to reuse memory when using arenas.
  • Implements pbarenaz for profiling proto arenas
  • Introduce CreateString() and CreateArenaString() for cleaner semantics
  • Fix unreferenced parameter for MSVC builds
  • Add UnsafeSetAllocated to be used for one-of string fields.
  • Make Arena::AllocateAligned() a public function.
  • Determine if ArenaDtor related code generation is necessary in one place.
  • Implement on demand register ArenaDtor for InlinedStringField

Dimi was able to confirm through some Try pushes of his own that the issue appears to come down to alignment differences between Linux32 and other platforms which are in conflict with protobuf's requirements. Protobuf appears to require 8-byte alignment:
https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/arenastring.cc#L53

Given our override of std::new in cxxalloc.h, is it possible that we can't guarantee that new returns 8-byte aligned memory when alignof(std::max_align_t) is 8? And if that is indeed the root cause, is there anything we can do about that on our end? NI glandium for those questions.

And Dimi, I did my best to summarize our various conversations on this, but please do add any thoughts you might have if anything was missed :-)

Flags: needinfo?(mh+mozilla)

Android builds use clang and gcc7 is the minimum supported version.

Depends on D148876

Depends on D148878

Attachment #9280603 - Attachment description: WIP: Bug 1773604 - Change kNotFound define to constexpr in nsAString.h. r=. → Bug 1773604 - Change kNotFound define to constexpr in nsAString.h.

The only kind of allocations that are not guaranteed to be 8 bytes aligned are 4-bytes allocations. Why would it care that those are 8-bytes aligned?

Flags: needinfo?(mh+mozilla)
Attachment #9280603 - Attachment is obsolete: true

From one of the Google engineers who works on protobuf:

It looks like we recently started relying on the lower 3 bits of the pointer (see the TaggedStringPtr class) as tags, so I believe the 8-byte alignment is required to ensure that those bits are available for our use. If the 32-bit builds are not achieving the 8-byte alignment for some reason then that would explain the problem.

I asked about possibly not using arenas on affected platforms (from the docs, they're not mandatory):

That's true that you could avoid using arenas on the affected platforms, but this might be tricky depending on how your code is written. A heap-allocated proto must be deleted, but an arena-allocated one must not be deleted (since the arena owns the memory). Internally we use an ArenaSafeUniquePtr class to wrap the pointer and destroy it in the right way based on whether it's arena-allocated or not, but I don't think we ever open sourced this.

Seems like a lot of possible complexity for a Tier 2 platform. Is there any way we can force 8-byte alignment for protobuf's arenas even if that isn't guaranteed to be the case the rest of the time?

Good news - upstream changes to the arena code have resolved our alignment issues and I've now got green Try runs across all platforms. Looks like we'll be able to eventually target the 22.0 release whenever that ships.

Version 21.2 is out with the alignment fixes we needed.

Summary: Update protobuf to version 21.1 → Update protobuf to version 21.2
Attachment #9280604 - Attachment description: WIP: Bug 1773604 - Drop unnecessary mozilla patch. r=. → Bug 1773604 - Drop unnecessary mozilla patch. r=markh
Attachment #9280605 - Attachment description: WIP: Bug 1773604 - Update bundled protobuf to version 21.1. r=. → Bug 1773604 - Update bundled protobuf to version 21.2. r=markh

Depends on D148878

Attachment #9280606 - Attachment is obsolete: true

This also changes the definition to be a static rather than a #define, in order
to reduce the potential impact.

Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3a0f23dcd98a
Move definition of `kNotFound` to `nsStringFwd.h`, r=xpcom-reviewers,barret
https://hg.mozilla.org/integration/autoland/rev/025eb4d2d274
Drop unnecessary mozilla patch. r=markh
https://hg.mozilla.org/integration/autoland/rev/71c27941e870
Update bundled protobuf to version 21.2. r=markh
https://hg.mozilla.org/integration/autoland/rev/03f913ac3bae
Regenerate classes. r=markh
Regressions: 1777275
Blocks: 1782859
Regressions: 1841201
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: