Closed
Bug 1344774
Opened 8 years ago
Closed 5 years ago
Reported Overlapping Subsessions in Longitudinal
Categories
(Data Platform and Tools Graveyard :: Datasets: Longitudinal, enhancement, P3)
Data Platform and Tools Graveyard
Datasets: Longitudinal
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: cameres, Unassigned, Mentored, NeedInfo)
Details
The following notebook shows that after deduplication (removed a majority of issues) there are still long subsessions that appear to overlap. In the last cell of my notebook, one of the subsession pairs in the output is the following...
NOTE: I am using the ordering of the arrays in longitudinal to determine prev_subsession_id. I note this, because there is a field in longitudinal 'prev_subsession_id'.
((prev_subsession_id, cur_subsession_id),
(
(prev_subsession_date, prev_subsession_length),
(cur_subsession_date, cur_subsession_length)
))
((u'7f429e0d-7372-45ad-bdb8-5584499e46a1',
u'675db484-0a61-4c71-9851-4a34e46098d1'),
((u'2016-11-07T00:00:00.000+09:00', 279104),
(u'2016-11-08T00:00:00.000+09:00', 11239)))
These are subsessions for a particular profile where the first subsession has a subsession of approximately 77 hours. The second subsession starts on the day following the first subsession, which in a conversation w/ @spenrose we decided should not exists. Although subsession lengths are known to be funky, they should not overlap.
https://gist.github.com/cameres/f9383c0c9813e63f9cc4b1b09de6613c
Comment 1•8 years ago
|
||
I can think of several fronts on which to advance this issue:
1) Root cause analysis for the client side. Are the pings chained via subsessionId? If not, do all fields that should be machine-invariant validate on these pings, or is it possible that we have found copied profiles?
2) Are one or both of the pings garbage which should not be included in analyses? How about the client record as a whole?
3) Building an infrastructure to prevent data like this from getting into longitudinal and other derived datasets. I am working on that.
4) Cleaning up longitudinal in the short term.
Georg and Roberto, what do you think?
Flags: needinfo?(rvitillo)
Flags: needinfo?(gfritzsche)
Comment 3•8 years ago
|
||
adding needs infor to Conner based on Robertos comment.
Flags: needinfo?(cameres)
I have the counts of how frequently this occurs in the notebook that I attached. I'm not sure if that answers Roberto's question.
Flags: needinfo?(cameres)
Comment 5•8 years ago
|
||
Connor, please submit a report to mozilla-reports with clearly stated conclusions and confidence intervals. It would be useful to answer the following questions:
- what's the percentage of profiles that have at least one overlapping session?
- what's the distribution of the percentage of overlapping sessions per profile?
- what's the distribution of the overlap duration?
- what's the distribution of subsession durations > 24h and how does it relate to the studied phenomena?
- how do the above distributions vary per channel?
- how do the above distributions vary when considering only subsessions originating from recent Firefox builds?
- how do the above distributions vary when using docid or profileSubsessionCounter to dedupe sessions?
Flags: needinfo?(cameres)
Great questions! Answers are in the works. The issue that I currently see with answering the third question and any other question that require measuring the time of the overlap, is that I have only been able to compute the overlap in days, b.c. of the granularity of subsession start dates. The issue is that for an example ping that overlaps a previous pings, the duration of the overlap is dependent on when the example ping starts.
Comment 7•8 years ago
|
||
adding priority to get it out of triaged and assigned to Conner as he is on it. please add points when you get a chance
Assignee: nobody → cameres
Priority: -- → P1
Updated•8 years ago
|
Flags: needinfo?(gfritzsche)
Updated•8 years ago
|
Component: Metrics: Pipeline → Datasets: Longitudinal
Product: Cloud Services → Data Platform and Tools
Comment 8•8 years ago
|
||
Hey cameres - If you're still working on this do you mind adding points? If not, please update the priority accordingly.
Comment 9•8 years ago
|
||
Last week was Connor's last week. Removing him as assignee and dropping the priority for now.
Assignee: cameres → nobody
Points: --- → 2
Priority: P1 → P3
Comment 10•5 years ago
|
||
Longitudinal has been decommissioned per Bug 1572033.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Updated•5 years ago
|
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•