Closed Bug 1045976 Opened 10 years ago Closed 10 years ago

[MTBF][B2G][Stability] 319 Flame failed and keep hunging

Categories

(Firefox OS Graveyard :: Stability, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.0+)

RESOLVED WORKSFORME
blocking-b2g 2.0+

People

(Reporter: wachen, Unassigned)

References

Details

http://mtbf-1:8080/view/v2.0/job/flame.v2.0.mtbf.319/46/

The phone was somehow crashed.
 1. The screen is on. Manual control is not working.
 2. "adb devices" indicates that the device is somehow connected
 3. Other adb commands include "shell" and "logcat" doesn't work
 4. Marionette can't detect the hang.
 5. After unplug the phone, the phone is not working but mtbf jenkins task fails

Related Actions:
 1. Viral is looking at the issue
 2. need to figure out the detection mechanism of this kind of hung.
Blocks: MTBF-B2G
Viral, thanks for helping. Any other helps needed please feel no hesitate to contact me.
Flags: needinfo?(vwang)
Gaia      8cb1a949f2e9650bb2c5598e78a6f24a58bbaf97
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/4bd4b0ae7bbe
BuildID   20140721000201
Version   32.0a2
ro.build.version.incremental=109
ro.build.date=Mon Jun 16 16:51:29 CST 2014
Now we only have console for output (read kmsg from device) and no useful log can be found.

We didn't have console input for debugging, see if we can have some help from partner to enable it.
At least we can have more log next time if we can enable it.
Flags: needinfo?(vwang)
kevin, can you help us to communicate with partner on this issue?

viral, can you upload the logs you have? In case it can help anything...
Flags: needinfo?(khu)
Francis, can you help to talk with the Flame partner about this issue if it's required? Thanks.
Flags: needinfo?(khu) → needinfo?(frlee)
this is already communicated with T2M this morning. there's no extra serial port that we can use for debugging unless we dissemble that specific Flame device.

Viral and i, we are checking if its possible to enable USB as a serial port with kernel config. if so, i will communicate with T2M to enable it in the next ROM release.
Flags: needinfo?(frlee)
Walter - This is a MTBF-blocker gaia/gecko bug that's causing us to not hit 100 MTBF on 319 MB right now, right? If so, should we nominate this to block 2.0?
Flags: needinfo?(wachen)
This is a blocker for us to getting into 100 MTBF. However, reproduction is hard, I don't know if we should take it to 2.0? ...
Flags: needinfo?(wachen) → needinfo?(jsmith)
(In reply to Walter Chen[:ypwalter][:wachen] from comment #8)
> This is a blocker for us to getting into 100 MTBF. However, reproduction is
> hard, I don't know if we should take it to 2.0? ...

Talked w/rel man about this - given the fact that this is hard to reproduce, let's first see what the MTBF is after the other issues are fixed & see if this happens again after the other issues get fixed. If it happens again after the other issues are fixed, then let's triage this under 2.0?
Flags: needinfo?(jsmith)
[Blocking Requested - why for this release]: Phone hung hard.
blocking-b2g: --- → 2.0?
NI : viral, given you are in the proximity of walter, can you check on this issue with him and help investigate?
blocking-b2g: 2.0? → 2.0+
Flags: needinfo?(vwang)
so far we only met one time hung about 10 days ago.
in that case, adb can not work and we can't do any analysis.
That's why we need serial console for debugging.
However, serial console is not enable in v122/v123(user build), that's why we need engineer build for testing.
We can have more log to analysis if we met another hung in engineer build.
Flags: needinfo?(vwang)
We actually talked already. We will keep this bug opened for further investigation.

The reason that we can't handle it properly is that we have no ways to debug it now.

We will try to run it on Mozilla base image in next week or so. In that case, if it ever reproduced, we can report bugs. We will keep this bug opened for now.
QA Whiteboard: [2.0-signoff-need]
QA Whiteboard: [2.0-signoff-need] → [2.0-signoff-need+]
(In reply to Walter Chen[:ypwalter][:wachen] from comment #0)
> http://mtbf-1:8080/view/v2.0/job/flame.v2.0.mtbf.319/46/
> 
> The phone was somehow crashed.
>  1. The screen is on. Manual control is not working.
>  2. "adb devices" indicates that the device is somehow connected
>  3. Other adb commands include "shell" and "logcat" doesn't work
>  4. Marionette can't detect the hang.
>  5. After unplug the phone, the phone is not working but mtbf jenkins task
> fails
> 
> Related Actions:
>  1. Viral is looking at the issue
>  2. need to figure out the detection mechanism of this kind of hung.

Walter, can you be more precise for points 2 and 3 ? Is it possible that you see "offline" in the adb output ?

This is something I have been hitting from time to time since the very early beginning of B2G, and it seems to be when adb gets somehow killed because of low memory or a bug on the device.
Flags: needinfo?(wachen)
2.
it will show the device like this(normal):
List of devices attached 
372ececf	device

3.
if you do "adb shell" or "adb logcat", it will never responde back.
Flags: needinfo?(wachen)
Does that matches what you want for your information? I actually seldom meet it in v2.0 for awhile after some bug fixed. Did it happen in v2.0 or master for you?
Flags: needinfo?(lissyx+mozillians)
No, your symptoms seems unrelated to what I was thinking of :(
Flags: needinfo?(lissyx+mozillians)
okay, thanks a lot. Do you have a bug id so that I may be able further track it? I think if I found anything related or useful, I can update you in that bug.
walter are you still seeing this in the latest run ? Else I would like to mark this as resolved WFM or move it to 2.1 as I don't think this is actionable.
Walter - Can you find out if this still happening on the latest 2.0?
Flags: needinfo?(wachen)
I haven't see this for awhile before we switch to v2.1.

However, I will still need to verify it again after we handled the initialization of v2.1 runs.
ni?=myself for record.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
It never happened again as recent 2 run.
Flags: needinfo?(wachen)
Resolution: FIXED → WORKSFORME
You need to log in before you can comment on or make changes to this bug.