1246701 - Prepare download-stats.mozilla.org for log ingestion

Reporter

Description

•

9 years ago

Stub installer pings are simple HTTP GETs to download-stats.mozilla.org. This bug tracks work that Wes needs to do to prepare for ingesting those logs into heka. This may mean moving that host into cloud services operations or something else.

Thomas Huelbert

Comment 1

•

9 years ago

please update with blocker tickets

Priority: -- → P2

Rob Miller [:rmiller]

Updated

•

9 years ago

Points: --- → 3

Priority: P2 → P1

Wesley Dawson [:whd]

Comment 2

•

9 years ago

The code for loading redshift is now living at https://github.com/whd/dsmo_load (based off of https://github.com/rafrombrc/push_derived), non-boilerplate files of which are: https://github.com/whd/dsmo_load/blob/master/hindsight/hs_run/output/dsmo_redshift.lua https://github.com/whd/dsmo_load/blob/master/heka/usr/share/heka/lua_filters/nginx_redshift.lua I'm going with a hybrid of heka+hindsight approach (heka tcpoutput + hindsight tcpinput) because our standard provisioning logic is all heka-based and we don't have all the necessary pieces implemented in hindsight yet. The steps remaining, roughly, are: Provisioning logic (puppet+CFN, already a WIP) Test provisioning in stage Acquire SSL cert for download-stats from IT Provision prod, change DNS record from A to CNAME and point at cloud services environment Set up redshift access for :mhowell, :bsmedberg and others I anticipate cutting over DNS early next week. Fortunately(?) there is really nothing to cut over from right now because per :mpressman and empirically d-s.m.o is currently being /dev/null'ed.

Wesley Dawson [:whd]

Comment 3

•

9 years ago

I cut this over on Monday, and things appear to be more or less working. I've hooked it into redash instead of setting up VPN access, see https://sql.telemetry.mozilla.org/queries/78 for an example. The data from 20160322 had some bugs related to processing the error codes (https://github.com/whd/dsmo_load/commit/0f8cf7b1aa2959ca6e4c07f852f46a1987977f0b) but we can backfill this since we keep the "raw" data in an s3 bucket. I've elected not to fix things because the code in dsmo_load hasn't been reviewed and possibly contains other errors, which should be fixed before I do a proper backfill. :mhowell can you take a look and see if the data showing up in redshift looks correct? As an aside, there seem to be a non-zero number of "v5" stub installer pings that the parsing logic is dropping. We can add support for this format if it is desirable.

Molly Howell (she/her) (mostly inactive)

Comment 4

•

9 years ago

(In reply to Wesley Dawson [:whd] from comment #3) > :mhowell can you take a look and see if the data showing up in redshift > looks correct? Yeah, I took a quick peek on redash and the data looks good. The only confusing thing I immediately see is a lot of rows (~5%) with version and build_id unset; zero is normal, but empty really should not happen. I don't know what to think of that at the moment; there's as much a chance of that being a bug in the installer as anything else. > As an aside, there seem to be a non-zero number of "v5" stub installer pings > that the parsing logic is dropping. We can add support for this format if it > is desirable. I don't think that will be necessary; it looks like v6 was introduced about Firefox 30, so those pings are for versions older than that.

Molly Howell (she/her) (mostly inactive)

Comment 5

•

9 years ago

(In reply to Matt Howell [:mhowell] from comment #4) > (In reply to Wesley Dawson [:whd] from comment #3) > > As an aside, there seem to be a non-zero number of "v5" stub installer pings > > that the parsing logic is dropping. We can add support for this format if it > > is desirable. > > I don't think that will be necessary; it looks like v6 was introduced about > Firefox 30, so those pings are for versions older than that. Slight adjustment to this: we're fine having the parser drop v5 pings, but we do need to have a count somewhere of dropped pings so that we can tell when we break something. Can that be made available somehow?

Flags: needinfo?(whd)

Wesley Dawson [:whd]

Comment 6

•

9 years ago

One option is to throw URLs that fail to parse into a different table with a schema like (timestamp, path, reason) which could then be counted. Does that seem reasonable?

Flags: needinfo?(whd)

Molly Howell (she/her) (mostly inactive)

Comment 7

•

9 years ago

(In reply to Wesley Dawson [:whd] from comment #6) > One option is to throw URLs that fail to parse into a different table with a > schema like (timestamp, path, reason) which could then be counted. Does that > seem reasonable? Yes, that sounds like it would work well.

Thomas Huelbert

Comment 8

•

9 years ago

Does it makes sense to close this and open a new one for the dropped pings?

Flags: needinfo?(mhowell)

Molly Howell (she/her) (mostly inactive)

Comment 9

•

9 years ago

I don't think so, since it's part of the same data set? But I don't feel strongly either way; if it would be easier to deal with as a separate bug, go for it. There's also one more thing I need that might or might not merit a separate bug, which is a download_stats view that unions all the download_stats_{date} tables. I think I don't have the permissions to make that myself.

Flags: needinfo?(mhowell)

Katie Parlante

Comment 10

•

9 years ago

Due to e10s work and ops related fires, work on this bug is going to be stalled. Moving to P2 until whd comes up for air.

Priority: P1 → P2

Blake Imsland [:robotblake]

Updated

•

9 years ago

Assignee: whd → bimsland

Points: 3 → 1

Priority: P2 → P1

Blake Imsland [:robotblake]

Comment 11

•

9 years ago

I will be taking over work on this.

Katie Parlante

Updated

•

9 years ago

Whiteboard: [SvcOps]

Molly Howell (she/her) (mostly inactive)

Comment 12

•

9 years ago

(In reply to Blake Imsland [:robotblake] from comment #11) > I will be taking over work on this. Hi Blake, sorry for taking a few days to get back to this. The one thing I need right now on my end is the view over the download_stats_* tables discussed above; I don't know a good way to write queries against the changing list of tables otherwise. Following that would be the table of failed parses (comment 6). Do you have a sense for when you might be able to work on those?

Blake Imsland [:robotblake]

Comment 13

•

9 years ago

(In reply to Matt Howell [:mhowell] from comment #12) > (In reply to Blake Imsland [:robotblake] from comment #11) > > I will be taking over work on this. > > Hi Blake, sorry for taking a few days to get back to this. The one thing I > need right now on my end is the view over the download_stats_* tables > discussed above; I don't know a good way to write queries against the > changing list of tables otherwise. Following that would be the table of > failed parses (comment 6). Do you have a sense for when you might be able to > work on those? I've done a bit of work on those already and they will be one of my priorities over the next week.

Cory Price [:ckprice] (bugmail disabled, NI me!)

Updated

•

9 years ago

Blocks: fx-stub-attribution

Molly Howell (she/her) (mostly inactive)

Comment 14

•

9 years ago

FYI, a change to the stub ping URL format recently landed on Nightly. Bug 1261140 bumped the version field to v7 and added a new field (path component) to the end which contains attribution data. The format of that data isn't 100% settled, but this process doesn't need to parse anything in there, at least for now; just treat it as an opaque URL-encoded string.

Flags: needinfo?(bimsland)

Blake Imsland [:robotblake]

Comment 15

•

9 years ago

:mhowell due to some changes with how we're doing this sort of logging and the re:dash work this fell off my radar. The URL format change shouldn't be too difficult to add and I've got an idea for the download_stats_* view that is crude but should work for now. Since this has sorta become a metabug, just want to be clear with what remaining work it covers... * Create and keep up to date a download_stats view that covers download_stats_* tables. * Add field for v7 pings to redshift table and parsing code to dsmo_load to handle the new field. * Deploy error parsing changes into production (v5 / bad structure). * Backfill v7 pings and pings that failed to parse into their respective tables. * Open support request with AWS regarding sporadic redshift batch insert errors. Let me know if you see anything missing from that list, I ordered them in roughly the order it sounds like they should be prioritised. With that said, I'm going to drop the priority on this meta bug and spawn new bugs for each of those (as blockers) so I can get a better handle on the remaining work.

Flags: needinfo?(bimsland) → needinfo?(mhowell)

Priority: P1 → P2

Molly Howell (she/her) (mostly inactive)

Comment 16

•

9 years ago

Yes, you've captured all the remaining work that I'm aware of, and the ordering/prioritization makes sense to me. Thanks very much for the update and for getting all that information together.

Flags: needinfo?(mhowell)

Cory Price [:ckprice] (bugmail disabled, NI me!)

Comment 17

•

9 years ago

(In reply to Blake Imsland [:robotblake] from comment #15) > With that said, I'm going to drop the priority on this meta bug and spawn new bugs for > each of those (as blockers) so I can get a better handle on the remaining > work. :robotblake were these created? I'd like to begin tracking this work in our stub attribution checkins.

Flags: needinfo?(bimsland)

Blake Imsland [:robotblake]

Comment 18

•

9 years ago

* https://bugzilla.mozilla.org/show_bug.cgi?id=1290794 * https://bugzilla.mozilla.org/show_bug.cgi?id=1290795 * https://bugzilla.mozilla.org/show_bug.cgi?id=1290798 * https://bugzilla.mozilla.org/show_bug.cgi?id=1290800 * https://bugzilla.mozilla.org/show_bug.cgi?id=1290803

Flags: needinfo?(bimsland)

Cory Price [:ckprice] (bugmail disabled, NI me!)

Updated

•

9 years ago

No longer blocks: fx-stub-attribution

Blake Imsland [:robotblake]

Comment 19

•

9 years ago

After talking to :ckprice this is no longer a priority for this quarter.

Assignee: bimsland → nobody

Priority: P2 → P3

David Durst [:ddurst]

Updated

•

9 years ago

Depends on: 1290794

David Durst [:ddurst]

Updated

•

9 years ago

Depends on: 1290795

David Durst [:ddurst]

Updated

•

9 years ago

Depends on: 1290798

David Durst [:ddurst]

Updated

•

9 years ago

Depends on: 1290800

David Durst [:ddurst]

Updated

•

9 years ago

Depends on: 1290803

David Durst [:ddurst]

Updated

•

9 years ago

Whiteboard: [SvcOps] → [SvcOps][fce-active]

Blake Imsland [:robotblake]

Comment 20

•

9 years ago

Deployed the changes needed to store erroneous pings along with handling of v7 pings. The backfill (https://bugzilla.mozilla.org/show_bug.cgi?id=1290800) is next up.

Thomas Huelbert

Comment 21

•

9 years ago

Hey BDS - once Blake has finished with backfill (https://bugzilla.mozilla.org/show_bug.cgi?id=1290800) are we able to close this issue?

Flags: needinfo?(benjamin)

Benjamin Smedberg

Reporter

Comment 22

•

9 years ago

->mhowell

Flags: needinfo?(benjamin) → needinfo?(mhowell)

Molly Howell (she/her) (mostly inactive)

Comment 23

•

9 years ago

Looks to me like the answer is yes; there's still work to do on the larger attribution project that spawned this bug, but we seem to be in business with this log ingestion work. Thanks to robotblake and everyone else involved!

Flags: needinfo?(mhowell)

Romain Testard [:RT]

Updated

•

9 years ago

Depends on: 1319871

Mark Reid [:mreid]

Comment 24

•

8 years ago

(In reply to Matt Howell [:mhowell] from comment #23) > we seem to be in business with this log ingestion work. Per Matt's reply, I'm closing this out. If there is more work to do here, please reopen. Thanks!

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

David Durst [:ddurst]

Updated

•

7 years ago

Whiteboard: [SvcOps][fce-active] → [SvcOps][fce-active-legacy]

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard