bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

cpu_info and cpu_arch are the same in SuperSearch

NEW
Unassigned

Status

Socorro
Webapp
2 years ago
a year ago

People

(Reporter: peterbe, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
See a search like https://crash-stats.mozilla.com/search/?product=Firefox&_sort=-date&_facets=cpu_info&_facets=cpu_name&_facets=cpu_arch&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-cpu_arch

It seems the values are exactly the same for cpu_arch and cpu_name. Seems excessive. 

Also, this cpu_info looks totally different but it's name is almost the same.
(Reporter)

Comment 1

2 years ago
Also, when you do a search like this:
https://crash-stats.mozilla.com/api/SuperSearch/?product=Firefox&_results_number=1&_columns=cpu_arch&_columns=cpu_name&_facets_size=0
You often get a struct that looks like this:

"hits": [

    {
        "cpu_arch": [
            "x86",
            "x86"
        ],
        "cpu_name": [
            "x86",
            "x86"
        ]
    }

],

So not only are the keys the same, the values seems to be lists of repeated values.
(Reporter)

Comment 2

2 years ago
Notes-from-IRC:

11:39 <marco> peterbe: https://github.com/mozilla/socorro/blob/master/socorro/processor/general_transform_rules.py#L46
11:39 <marco> peterbe: looks like it's cpu_arch in the JSON dump and a new cpu_name is created using the value from that field
11:40 <marco> peterbe: does this mean the 'correct' one is cpu_arch?
11:41 <peterbe> No, I would bet on cpu_name. 
11:41 <marco> ok, thanks again
11:42 <peterbe> no wait. 
11:42 <peterbe> hang on
11:42 <peterbe> I’m looking at the admin page for SuperSearch fields. 
11:42 <peterbe> This is confusing to me. 
11:42 <peterbe> Every field has a name which maps to a name in the database. 
11:43 <peterbe> The “name” is what you see in the drop-down boxes on the UI. 
11:43 <peterbe> the “name in the database” is what ElasticSearch uses and knows about. 
11:43 <peterbe> The current mapping appears to be...
11:43 <peterbe> name: cpu_arch   name in database: cpu_name
11:43 <peterbe> name: cpu_info  name in database: cpu_info
11:44 <peterbe> name: cpu_name  name in database: cpu_name
11:44 <marco> ok, this is what we were seeing with the queries (one of cpu_arch or cpu_name was alias of the other)
11:44 <peterbe> So, elasticsearch only knows about “cpu_name”. But it’s got “two names” in the UI. 
11:45 <marco> so cpu_arch is alias of cpu_name
11:45 <peterbe> ha!
11:45 <peterbe> There’s a notes field. Here’s what it says for cpu_arch:
11:45 <peterbe> "The build architecture. Usually one of: 'x86', 'amd64' (a.k.a. x86-64), 'arm', 'arm64'. Duplicate of cpu_name, with a better name."
11:45 <peterbe> In other words, you should use cpu_arch. 
11:45 <peterbe> So cpu_arch is the correct name.
(Reporter)

Comment 3

2 years ago
In conclusion, it seems we want people to use cpu_arch. Not cpu_name. 

(Note: this doesn't explain why sometimes the value of both of these fields is a list of duplicates.)

Can we ditch cpu_name as a drop-down option? And what impact will that have?
`cpu_arch` used to be a value in the json_dump. When we dropped it, that field stopped working, but some users were relying on it, so I changed it to be an alias of `cpu_name` (which it was already). Removing one of them will break our public API, and that is something that I think we should avoid. 

Note that both fields have a comment saying they are duplicates. I'm not sure what else I can do without breaking the API.
(Reporter)

Comment 5

2 years ago
Does it only break the API? If so, could we list aliases there and drop the old field from SuperSearch Fields?
Well wouldn't that be moving the problem from the SuperSearch Fields list to a hard-coded line of code in our django webapp? 

What we could do is add real support for aliases in the fields list. Then it stays in a database, we can add or remove aliases as we want, and it's cleaner because we can hide these values in the UI.

Comment 7

2 years ago
It's also an option to break it and make users update the string.

This is just one instance, but we will eventually have to figure out a plan for making breaking changes. Versioning, communication channels, etc. This could be the first one.
Assignee: adrian → nobody
You need to log in before you can comment on or make changes to this bug.