Open Bug 1277337 Opened 8 years ago Updated 4 years ago

Use hg.mozilla.org to map crashes to bug components by way of source files when possible

Categories

(Socorro :: Processor, task, P3)

Tracking

(Not tracked)

People

(Reporter: ted, Unassigned)

References

Details

In a dev.platform thread a few people expressed that they'd like to have a feed of crashes in the area of the codebase they work on. I think it's feasible to implement this nowadays, since we have bug components for many files in the source tree in moz.build files. For example:
https://dxr.mozilla.org/mozilla-central/rev/4d63dde701b47b8661ab7990f197b6b60e543839/dom/media/moz.build#7

We also have a service on hg.mo for reading this metadata for any file at any revision in the repository:
http://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/mozbuildinfo.html

Unfortunately this is currently broken (bug 1263973), so that will need to be fixed first.

Armed with that, I think what we should do is take the source file from the last stack frame that was used to build the signature, and if it starts with hg:hg.mozilla.org, use that to build a query against hg.mo for the bug component. If we get a result, we should store that in the processed crash. If we allow querying based on that data then people should be able to get lists of crashes by bug component easily. Since the data is maintained in the tree it can easily be updated by developers.

As a concrete example, this crash:
https://crash-stats.mozilla.com/report/index/a8ccd6bc-209c-4d60-a07d-7c76c2160526

has its signature generated from a single frame, frame 0, which has:
"file": "hg:hg.mozilla.org/mozilla-central:dom/media/MediaFormatReader.cpp:829d3be6ba64",

so we could build the query:
https://hg.mozilla.org/mozilla-central/json-mozbuildinfo/829d3be6ba64?p=dom/media/MediaFormatReader.cpp
(basically '{repo}/json-mozbuildinfo/{rev}?p={file}' like we do to build the source links in report/index).

When that web service is working again that ought to return something like:
```
{
  "files": {
    "dom/media/MediaFormatReader.cpp": {
      "bug_component": [
        "Core",
        "Video/Audio"
      ]
    }
  }
}
```
Last week uptime meeting I mentioned a per-component crash rates dashboard would be useful. I initially thought we need to query bugzilla for triaged component but then I found this bug. I might take a stab at this when bug 1263973 is fixed.
As a workaround for bug 1263973, you could use the data from https://wiki.mozilla.org/Modules/All.

There's a module in the libmozdata library that makes this data easy to use (https://github.com/mozilla/libmozdata/blob/master/libmozdata/modules.py).

Example usage:
from libmozdata import modules
modules.module_from_path('dom/indexedDB/IDBDatabase.cpp')

The output is an object:
{u'ownersEmeritus': [], u'name': u'IndexedDB', u'peers': [{u'name': u'Jonas Sicking', u'email': u'jonas@sicking.cc'}, {u'name': u'Kyle Huey', u'email': u'me@kylehuey.com'}, {u'name': u'Jan Varga', u'email': u'jvarga@mozilla.com'}], u'discussionGroup': u'http://www.mozilla.org/community/forums/#dev-platform', u'peersEmeritus': [], u'urls': [{u'directory': u'https://developer.mozilla.org/en/IndexedDB'}], u'owners': [{u'name': u'Ben Turner', u'email': u'bent@mozilla.com'}], u'bugzillaComponents': [u'Core::DOM: IndexedDB'], u'sourceDirs': [u'dom/indexedDB/'], u'description': u''}

Of course it isn't perfect, but better than nothing.
(In reply to Marco Castelluccio [:marco] from comment #2)
> As a workaround for bug 1263973, you could use the data from
> https://wiki.mozilla.org/Modules/All.
> 
> There's a module in the libmozdata library that makes this data easy to use
> (https://github.com/mozilla/libmozdata/blob/master/libmozdata/modules.py).
> 
> Example usage:
> from libmozdata import modules
> modules.module_from_path('dom/indexedDB/IDBDatabase.cpp')
> 
> The output is an object:
> {u'ownersEmeritus': [], u'name': u'IndexedDB', u'peers': [{u'name': u'Jonas
> Sicking', u'email': u'jonas@sicking.cc'}, {u'name': u'Kyle Huey', u'email':
> u'me@kylehuey.com'}, {u'name': u'Jan Varga', u'email':
> u'jvarga@mozilla.com'}], u'discussionGroup':
> u'http://www.mozilla.org/community/forums/#dev-platform', u'peersEmeritus':
> [], u'urls': [{u'directory':
> u'https://developer.mozilla.org/en/IndexedDB'}], u'owners': [{u'name': u'Ben
> Turner', u'email': u'bent@mozilla.com'}], u'bugzillaComponents':
> [u'Core::DOM: IndexedDB'], u'sourceDirs': [u'dom/indexedDB/'],
> u'description': u''}
> 
> Of course it isn't perfect, but better than nothing.

Looks like the modules module uses a static copy of the wiki in file modules.json. Maybe we want to do that anyway since querying hg.mozilla.org when processing each crash reports maybe too expensive?

Maybe we could generate a static copy of json-mozbuildinfo info for each official build and socorro only need to query that.
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #3)
> Looks like the modules module uses a static copy of the wiki in file
> modules.json. Maybe we want to do that anyway since querying hg.mozilla.org
> when processing each crash reports maybe too expensive?
> 
> Maybe we could generate a static copy of json-mozbuildinfo info for each
> official build and socorro only need to query that.

Yes, that's what I did at the time because most moz.build files did not
contain any info about the component. I don't know if the situation is
different now.
On the plus side, using the data from the moz.build files means that you can get developers to annotate things properly in order to help themselves get better crash reporting :)
Assignee: nobody → kchen
Depends on: 1299747
No longer depends on: 1299747
I keep poking at bug 1263973 (what has json-mozbuildinfo broken) every few months and don't have much to show for it.

All that HTTP API is doing is essentially invoking `hg mozbuildinfo` from a sandbox. `hg mozbuildinfo` is implemented at https://hg.mozilla.org/hgcustom/version-control-tools/file/407adc612136/hgext/hgmo/__init__.py#l554. And that command is essentially a JSON wrapper around https://hg.mozilla.org/hgcustom/version-control-tools/file/407adc612136/pylib/mozhg/mozhg/mozbuildinfo.py.

If you wanted to, you could make a clone of the repo anywhere and invoke this functionality. You do want to sandbox execution since the whole thing is essentially arbitrary code execution.
The json-mozbuildinfo endpoint is now working again. e.g. https://hg.mozilla.org/mozilla-central/json-mozbuildinfo/829d3be6ba64?p=dom/media/MediaFormatReader.cpp

If you find any issues with it, please open new bugs against Developer Services :: hg.mozilla.org.
I should mention that in some cases you may want to query for the metadata for the latest version of a file. In that case, you can plug a symbolic revision name into the URL. e.g. https://hg.mozilla.org/mozilla-central/json-mozbuildinfo/default?p=dom/media/MediaFormatReader.cpp

This will ask for metadata from the "default" branch head, which should also be equivalent to the "tip" revision on mozilla-central.
I fixing the dependent bug, I am moving at a good rate through the source tree of adding BUG_COMPONENTS to the moz.build files, more work would be helpful :)
Depends on: 1328351
No longer depends on: 1337806
(In reply to Joel Maher ( :jmaher) from comment #9)
> I fixing the dependent bug, I am moving at a good rate through the source
> tree of adding BUG_COMPONENTS to the moz.build files, more work would be
> helpful :)

Great! I'm about to do the same but you beat me to it ;)
gps did some work recently to make the build spit out a JSON file with this info:
https://groups.google.com/forum/#!topic/mozilla.dev.platform/l8DaPwjOMqA

It'd probably be simpler to ingest that data instead. The nicest thing would be to pull in the data that matches a build when we find new builds (I assume there's some replacement for ftpscraper nowadays?), but as a start we could just have Socorro use the latest version of the data since our source tree layout doesn't change that often.

The latest info from mozilla-central can be fetched via the taskcluster index here:
https://index.taskcluster.net/v1/task/gecko.v2.mozilla-central.latest.source.source-bugzilla-info/artifacts/public/components-normalized.json

I'm moving this to the processor component, but the work here probably involves making changes to the processor as well as the webapp. I'm also unassigning Kan-ru since nothing has happened here in a long time. Kan-ru: If you're still working on this, let me know.

json-mozbuildinfo no longer exists. It was removed in bug #1523745. So that option is no longer available.

I looked at libmozdata. It's got a modules.json file that hasn't been touched in 4 years. That suggests to me it doesn't have any kind of maintenance update cycle, so we shouldn't use it as is. I wrote up https://github.com/mozilla/libmozdata/issues/180 to cover updating the module and figuring out a maintenance cycle. Alternatively, finding a new source of the information.

I don't think we should implement this from whole-cloth in Socorro--I'd rather rely on a service or library for this information and then adjust the processor to get it from that.

Assignee: kanru → nobody
Component: Backend → Processor
Priority: -- → P3

The modules.json file was generated using a script from https://wiki.mozilla.org/. But the wiki itself is not updated, so even if we wanted, it doesn't make sense to update it. We should just get rid of it instead.

The alternative, and better way, to handle this is to use the components.json artifact from the source-bugzilla-info task, that is https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.latest.source.source-bugzilla-info/artifacts/public/components.json.

We can add support for it in libmozdata and use it through libmozdata in Socorro, or we can implement it directly in Socorro.

You need to log in before you can comment on or make changes to this bug.