prod Socorro /200912/adu/byday web service not aggregating oses properly

RESOLVED FIXED

Status

--
major
RESOLVED FIXED
9 years ago
7 years ago

People

(Reporter: ozten, Assigned: griswolf)

Tracking

Trunk
x86
Mac OS X
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

9 years ago
Calls to /200912/adu/byday have results that vary over time.

I've traced the call and see that multiple 'Windows' rows are returned, but the python code maps this into a response without aggregating these various Windows rows.
(Reporter)

Comment 1

9 years ago
Given

select
          CAST(ceil(EXTRACT(EPOCH FROM (window_end - timestamp without time zone '2010-01-24 00:00:00' - interval '8 hours')) / 86400) AS INT) * interval '24 hours' + timestamp without time zone '2010-01-24 00:00:00' as day,
          case when os.os_name = 'Windows NT' then
            'Windows'
          when os.os_name = 'Mac OS X' then
            'Mac'
          else
            os.os_name
          end as os_name,
          sum(count)
      from
          top_crashes_by_signature tcbs
              join osdims os on tcbs.osdims_id = os.id
                  and os.os_name in ('Windows','Mac OS X','Linux','Windows NT')
      where
          (timestamp without time zone '2010-01-24 00:00:00' - interval '8 hours') < window_end
          and window_end <= (timestamp without time zone '2010-02-07 00:00:00' - interval '8 hours')
          and productdims_id = 171
      group by
          day,
          os_name
      order by
          1, 2;

Results:
2010-01-24 00:00:00	Linux	20
2010-01-24 00:00:00	Mac	8295
2010-01-24 00:00:00	Windows	61472
2010-01-24 00:00:00	Windows	12
2010-01-25 00:00:00	Linux	46
2010-01-25 00:00:00	Mac	16335
2010-01-25 00:00:00	Windows	8
2010-01-25 00:00:00	Windows	109567
2010-01-26 00:00:00	Linux	67
2010-01-26 00:00:00	Mac	16655
2010-01-26 00:00:00	Windows	114005
2010-01-26 00:00:00	Windows	5
2010-01-27 00:00:00	Linux	74
2010-01-27 00:00:00	Mac	13657
2010-01-27 00:00:00	Windows	98353
2010-01-28 00:00:00	Linux	70
2010-01-28 00:00:00	Mac	16042
2010-01-28 00:00:00	Windows	4
2010-01-28 00:00:00	Windows	109888
2010-01-29 00:00:00	Linux	39
2010-01-29 00:00:00	Mac	11302
2010-01-29 00:00:00	Windows	80232
2010-01-29 00:00:00	Windows	6
2010-01-30 00:00:00	Linux	22
2010-01-30 00:00:00	Mac	6694
2010-01-30 00:00:00	Windows	49526
2010-01-30 00:00:00	Windows	2
2010-01-31 00:00:00	Linux	17
2010-01-31 00:00:00	Mac	6854
2010-01-31 00:00:00	Windows	2
2010-01-31 00:00:00	Windows	46473
2010-02-01 00:00:00	Linux	13
2010-02-01 00:00:00	Mac	4756
2010-02-01 00:00:00	Windows	33247
2010-02-01 00:00:00	Windows	2
2010-02-02 00:00:00	Linux	7
2010-02-02 00:00:00	Mac	1461
2010-02-02 00:00:00	Windows	11158
2010-02-03 00:00:00	Linux	35
2010-02-03 00:00:00	Mac	10395
2010-02-03 00:00:00	Windows	1
2010-02-03 00:00:00	Windows	65585
2010-02-04 00:00:00	Linux	47
2010-02-04 00:00:00	Mac	14806
2010-02-04 00:00:00	Windows	92339
2010-02-04 00:00:00	Windows	3
2010-02-05 00:00:00	Linux	45
2010-02-05 00:00:00	Mac	15010
2010-02-05 00:00:00	Windows	97549
2010-02-05 00:00:00	Windows	5
2010-02-06 00:00:00	Linux	55
2010-02-06 00:00:00	Mac	15235
2010-02-06 00:00:00	Windows	96039
2010-02-06 00:00:00	Windows	5
2010-02-07 00:00:00	Linux	11
2010-02-07 00:00:00	Mac	6737
2010-02-07 00:00:00	Windows	39244
2010-02-07 00:00:00	Windows	1

JSON response is:
{
    'product': 'Firefox',
    start_date': '2010-01-24 00:00:00',
    'end_date': ' 2010-02-07 00:00:00',
    ...
    [{'date': '2010-01-24',
                                    'os: 'Windows',
                                    'users: '57434300',
                                    'crashes: '12'},
...
},{'date: '2010-01-25',
                                    'os: 'Windows',
                                    'users: ' 67752231',
                                    'crashes: ' 109567'}
...
}

Either we should look at how osdims is used or we should aggregate os numbers before formatting response.
(Reporter)

Updated

9 years ago
Blocks: 544583
fascinating.  I think we're seeing something very very subtle in the SQL or it's a bug in PostgreSQL.  I can correct the behavior of this query by simply changing the name of the second column:

select
          CAST(ceil(EXTRACT(EPOCH FROM (window_end - timestamp without time
zone '2010-01-24 00:00:00' - interval '8 hours')) / 86400) AS INT) * interval
'24 hours' + timestamp without time zone '2010-01-24 00:00:00' as day,
          case when os.os_name = 'Windows NT' then
            'Windows'
          when os.os_name = 'Mac OS X' then
            'Mac'
          else
            os.os_name
          end as bloop,
          sum(count)
      from
          top_crashes_by_signature tcbs
              join osdims os on tcbs.osdims_id = os.id
                  and os.os_name in ('Windows','Mac OS X','Linux','Windows NT')
      where
          (timestamp without time zone '2010-01-24 00:00:00' - interval '8
hours') < window_end
          and window_end <= (timestamp without time zone '2010-02-07 00:00:00'
- interval '8 hours')
          and productdims_id = 171
      group by
          day,
          bloop
      order by
          1, 2;

         day         |  bloop  |  sum   
---------------------+---------+--------
 2010-01-24 00:00:00 | Linux   |     20
 2010-01-24 00:00:00 | Mac     |   8295
 2010-01-24 00:00:00 | Windows |  61484
 2010-01-25 00:00:00 | Linux   |     46
 2010-01-25 00:00:00 | Mac     |  16335
 2010-01-25 00:00:00 | Windows | 109575
 2010-01-26 00:00:00 | Linux   |     67
 2010-01-26 00:00:00 | Mac     |  16655
 2010-01-26 00:00:00 | Windows | 114010
 2010-01-27 00:00:00 | Linux   |     74
 2010-01-27 00:00:00 | Mac     |  13657
 2010-01-27 00:00:00 | Windows |  98353
 2010-01-28 00:00:00 | Linux   |     70
 2010-01-28 00:00:00 | Mac     |  16042
 2010-01-28 00:00:00 | Windows | 109892
 2010-01-29 00:00:00 | Linux   |     39
 2010-01-29 00:00:00 | Mac     |  11302
 2010-01-29 00:00:00 | Windows |  80238

...
(Reporter)

Comment 3

9 years ago
(In reply to comment #2)
Excellent! Thanks lars!
Indeed, lose 'bloop' and get 'osname' and it works as expeccted
After digging through the SQL standard, I finally understand what's going on here.  Select list aliases are not understood in the group by clause in standard SQL.  PostgreSQL, however, allows it as an extension.  In the original SQL statement, 'os_name' matched a existing column name, so PostgreSQL took that existing column name as the group by parameter.  PostgreSQL followed the standard even though 'os_name' was not qualified with a tag.  My replacement from Comment #1 worked because it invoked the PostgreSQL extension behavior.  The correct standard SQL should be:

select
          CAST(ceil(EXTRACT(EPOCH FROM (window_end - timestamp without time
zone '2010-01-24 00:00:00' - interval '8 hours')) / 86400) AS INT) * interval
'24 hours' + timestamp without time zone '2010-01-24 00:00:00' as day,
          case when os.os_name = 'Windows NT' then
            'Windows'
          when os.os_name = 'Mac OS X' then
            'Mac'
          else
            os.os_name
          end as os_name,
          sum(count)
      from
          top_crashes_by_signature tcbs
              join osdims os on tcbs.osdims_id = os.id
                  and os.os_name in ('Windows','Mac OS X','Linux','Windows NT')
      where
          (timestamp without time zone '2010-01-24 00:00:00' - interval '8
hours') < window_end
          and window_end <= (timestamp without time zone '2010-02-07 00:00:00'
- interval '8 hours')
          and productdims_id = 171
      group by
          1, 2
      order by
          1, 2;

While I have the solution, I should not be the one that actually fixes this bug: I'm not on the payroll right now.  This bug needs to be taken by someone.  Griswolf?
Sure. Thanks
Assignee: nobody → griswolf
Status: NEW → ASSIGNED
Fixed in revision 1772
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
(Reporter)

Comment 8

9 years ago
(In reply to comment #7)
Awesome, thanks for the quick turnaround!

Can you track staging this.

Once it's staged and verified, I'll track pushing to prod.

Thanks.
Depends on: 546523
re-opening to track resolution of bug #546523
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
for some reason that I don't understand, IT didn't how / where to run this on stage, so it has been delayed.
(Reporter)

Comment 11

9 years ago
This shipped with 1.5.
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.