Closed Bug 934424 Opened 11 years ago Closed 6 years ago

[B2G][Helix][stability][fengximing]Run the monkey test for long time, only backlight of homekey is bright, the screen is black, and press power key, it's no reaction, this issue has occurred in the six phone.

Categories

(Firefox OS Graveyard :: Vendcom, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: lecky.wanglei, Unassigned)

Details

(Whiteboard: [POVB])

Attachments

(4 files)

1.03 KB, application/x-zip-compressed
Details
723 bytes, patch
Details | Diff | Splinter Review
2.28 MB, application/x-zip-compressed
Details
2.48 MB, application/x-zip-compressed
Details
User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; aff-kingsoft-ciba; .NET4.0E; .NET4.0C)

Steps to reproduce:

1, run the monkey test for 30 hours
2, the second day, we went to wake up the phone.



Actual results:

3, we found the screen is black, only backlight of homekey is bright, we want to press power key, but it's no reaction.

normal log:
report_hs_key:Press power key ,key=81 ,now time (01:03:12.482869704 UTC)
request_suspend_state: wakeup (3->0) at 898566547995 (1980-01-06 01:03:12.494738037 UTC)
late_resume: call handlers
lcd_debug_malloc_get_para: Can not open /data/hw_lcd_init_param.txt
report_hs_key:Press power key ,key=255 ,now time (01:03:12.740829704 UTC)
[LCD_DEBUG] leave mipi_otm8018b_lcd_on
msm_fb_blank_sub:Waiting for LCD resume ,then set backlight level=31
synaptics_rmi4_touch is resume!
late_resume: done
request_suspend_state: sleep (0->3) at 908702439658 (1980-01-06 01:03:22.630629700 UTC)
early_suspend: call handlers
[LCD_DEBUG] leave mipi_otm8018b_lcd_off
batt_id_lim_volt = 4200


the log of this issue:
report_hs_key:Press power key ,key=255 ,now time (12:44:30.242365598 UTC)     
report_hs_key:Press power key ,key=81 ,now time (12:44:30.840162257 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:31.759318929 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:32.425585594 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:33.174577260 UTC)      
report_hs_key:Press power key ,key=255 ,now time (12:44:33.326890595 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:44:34.313707264 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:44:35.003083928 UTC)     
report_hs_key:Press power key ,key=81 ,now time (12:44:35.482527263 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:38.020702260 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:38.729698926 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:39.456843927 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:42.107015594 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:42.967352258 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:44:43.561562259 UTC)      
report_hs_key:Press power key ,key=255 ,now time (12:44:44.297667262 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:44:58.175608920 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:44:59.344245582 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:45:00.195417252 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:45:01.008120584 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:45:01.890202250 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:45:02.677283917 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:45:03.403643916 UTC)     
report_hs_key:Press power key ,key=255 ,now time (12:45:04.173422257 UTC)     
report_hs_key:Press power key ,key=81 ,now time (12:45:04.619610583 UTC)      
report_hs_key:Press power key ,key=81 ,now time (12:45:05.283707250 UTC)
Severity: normal → blocker
blocking-b2g: --- → hd?
OS: All → Gonk (Firefox OS)
Priority: -- → P2
Hardware: All → ARM
Attached file 1104.zip
Please see the log in the attachment 826706 [details].
This issue has occurred in the six phone.

Thanks.
Summary: [B2G][Helix][stability][fengximing]Run the monkey test for long time, only backlight of homekey is bright, the screen is black, and press power key of the phone no reaction. → [B2G][Helix][stability][fengximing]Run the monkey test for long time, only backlight of homekey is bright, the screen is black, and press power key, it's no reaction, this issue has occurred in the six phone.
Is this found on the Mozilla build?
Hi Wayne,

It was also found on the Mozilla build, but the time less than our build version, probably 1 second or 1.5 seconds.

Thanks.
(In reply to lecky from comment #5)
> It was also found on the Mozilla build, but the time less than our build
> version, probably 1 second or 1.5 seconds.

Mozilla build wakes up after 1~1.5 seconds.


(In reply to lecky from comment #0)
> Actual results:
> 
> 3, we found the screen is black, only backlight of homekey is bright, we
> want to press power key, but it's no reaction.

Your build has no reaction at all...

I don't think this is the same bug.

Can you reproduce the same bug on Mozilla build?
(In reply to Wayne Chang [:wchang] from comment #6)
> Your build has no reaction at all...
> I don't think this is the same bug.

Hi Wayne,

I'm sorry, I made a mistake, you are right, this is not the same bug, please ignore the content in reply to lecky from comment #5.

I don't know whether this issue could reproduce on Mozilla build, but this issue has occurred in the six phone, 
the probability is very high, so I think this issue is very serious, I would like to consult you how to analyze this.

When I restarted b2g, the phone resumed.
> adb shell
> stop b2g
> start b2g

Thanks.
I see power state in /sys/power/state, it's always 'mem'.

root@android:/ # echo "on" > /sys/power/state
echo "on" > /sys/power/state
root@android:/ # cat /sys/power/state
cat /sys/power/state
mem
root@android:/ # echo "standby" > /sys/power/state
echo "standby" > /sys/power/state
1|root@android:/ # cat /sys/power/state
cat /sys/power/state
mem
Hi Wayne,

I found these log:
11-03 10:01:18 <3>[21527, thermald] [146641.543177] Out of memory: Kill process 179 (b2g) score 834 or sacrifice child
11-03 10:01:18 <3>[21527, thermald] [146641.543220] Killed process 179 (b2g) total-vm:542012kB, anon-rss:329572kB, file-rss:260kB

the b2g was killed, and it was restarted, but it failed to restart.
                |     megabytes    |

NAME   PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER

 b2g 21538    0 10.4 10.5 11.0  71.4       0 root

Another example, the phone stay at the movistar logo, do you remember it?
One of reasons is also because the b2g crashed and failed to restart.

If the b2g restart, but it should not cause the system can not be entered,
I think we need to slove that the system can't be entered after the b2g crash. 

Thanks.
Flags: needinfo?(wchang)
Let's continue on this when you can reproduce it on a mozilla build and provide the appropriate information. We're unable to analyze your build as we do not have access to the details.
Flags: needinfo?(wchang)
Priority: P2 → P1
Hi Wayne,

Please see these log, b2g evidently has exception.

11-07 04:48:48 <7>[159, b2g] [28934.354517] b2g: unhandled page fault (11) at 0x8000001c, code 0x005
11-07 04:48:48 <1>[159, b2g] [28934.354548] pgd = d771c000
11-07 04:48:48 <1>[159, b2g] [28934.354560] [8000001c] *pgd=00000000
11-07 04:48:48 <4>[159, b2g] [28934.354572] 
11-07 04:48:48 <4>[159, b2g] [28934.354580] Pid: 159, comm:                  b2g
11-07 04:48:48 <4>[159, b2g] [28934.354598] CPU: 0    Not tainted  (3.0.21-perf-gb59a337-05320-g4aa3538-dirty #1)
11-07 04:48:48 <4>[159, b2g] [28934.354630] PC is at 0x416ffeac
11-07 04:48:48 <4>[159, b2g] [28934.354642] LR is at 0xffffffa5
11-07 04:48:48 <4>[159, b2g] [28934.354655] pc : [<416ffeac>]    lr : [<ffffffa5>]    psr: 40000030
11-07 04:48:48 <4>[159, b2g] [28934.354663] sp : be9135a0  ip : 20000000  fp : 4edb086c
11-07 04:48:48 <4>[159, b2g] [28934.354677] r10: 00000020  r9 : 0000000d  r8 : 00000000
11-07 04:48:48 <4>[159, b2g] [28934.354690] r7 : 00000241  r6 : 4edb0840  r5 : 4edb0014  r4 : 4edb0848
11-07 04:48:48 <4>[159, b2g] [28934.354705] r3 : 4edb0090  r2 : 80000000  r1 : 000000c3  r0 : 0000000d
11-07 04:48:48 <4>[159, b2g] [28934.354720] Flags: nZcv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
11-07 04:48:48 <4>[159, b2g] [28934.354735] Control: 10c5387d  Table: 2791c04a  DAC: 00000015
11-07 04:48:48 <4>[159, b2g] [28934.354780] [<c0052460>] (unwind_backtrace+0x0/0x130) from [<c0621a64>] (dump_stack+0x18/0x1c)
11-07 04:48:48 <4>[159, b2g] [28934.354807] [<c0621a64>] (dump_stack+0x18/0x1c) from [<c004ce74>] (show_regs+0x44/0x50)
11-07 04:48:48 <4>[159, b2g] [28934.354832] [<c004ce74>] (show_regs+0x44/0x50) from [<c0056394>] (__do_user_fault+0x5c/0xa4)
11-07 04:48:48 <4>[159, b2g] [28934.354857] [<c0056394>] (__do_user_fault+0x5c/0xa4) from [<c0627560>] (do_page_fault+0x2d4/0x314)
11-07 04:48:48 <4>[159, b2g] [28934.354880] [<c0627560>] (do_page_fault+0x2d4/0x314) from [<c06275c4>] (do_translation_fault+0x24/0xe4)
11-07 04:48:48 <4>[159, b2g] [28934.354908] [<c06275c4>] (do_translation_fault+0x24/0xe4) from [<c004527c>] (do_DataAbort+0x3c/0xa0)
11-07 04:48:48 <4>[159, b2g] [28934.354942] [<c004527c>] (do_DataAbort+0x3c/0xa0) from [<c06255e4>] (ret_from_exception+0x0/0x10)
11-07 04:48:48 <4>[159, b2g] [28934.354960] Exception stack(0xd768ffb0 to 0xd768fff8)
11-07 04:48:48 <4>[159, b2g] [28934.354977] ffa0:                                     0000000d 000000c3 80000000 4edb0090
11-07 04:48:48 <4>[159, b2g] [28934.354998] ffc0: 4edb0848 4edb0014 4edb0840 00000241 00000000 0000000d 00000020 4edb086c
11-07 04:48:48 <4>[159, b2g] [28934.355018] ffe0: 20000000 be9135a0 ffffffa5 4111-07 04:48:49 <4>[244, Binder Thread #] [28935.113760] do_exit: exit code=2816

Thanks.
(In reply to Wayne Chang [:wchang] from comment #10)
> Let's continue on this when you can reproduce it on a mozilla build and
> provide the appropriate information. We're unable to analyze your build as
> we do not have access to the details.
blocking-b2g: hd? → ---
Flags: needinfo?
Hi Wayne,

For this issue, I want to modify the code as follows:

 service b2g /system/bin/b2g.sh
     class main
+    # fengximing add ++
+    onrestart write /sys/power/state on
+    # fengximing add --
     onrestart restart media
+    # fengximing add ++
+    onrestart restart netd
+    # fengximing add --

For b2g, maybe this impact is large, so I want to consult you, can you check it for us?
Thanks.
(In reply to lecky from comment #13)
>  service b2g /system/bin/b2g.sh
>      class main
> +    # fengximing add ++
> +    onrestart write /sys/power/state on
> +    # fengximing add --
>      onrestart restart media
> +    # fengximing add ++
> +    onrestart restart netd
> +    # fengximing add --

Lecky, I have questions.

1. What this bug and you patch are for now? a crash, an oom, a stuck at boot animation, or a "only backlight of homekey is bright, the screen is black, and press power key, it's no reaction"?

2. Why you need to "write /sys/power/state on" while device is restarting? What do you want and what do you see?

Neither crash or oom will call set screen state disable and write "mem" to /sys/power/state. 
If this is for "stuck at boot animation" bug you have, you don't need to write "on", too. You can see boot animation so your /sys/power/state is on already. Same as the "only backlight of homekey is bright, the screen is black, and press power key, it's no reaction".

Also cc Vincent Chang for the netd part.
Flags: needinfo?
Hi Alan,

This is for "only backlight of homekey is bright, the screen is black, and press power key, it's no reaction", because later_resume is not called in our device, if no later_resume, lcd will not back to normal status, and wait_for_fb_wake will blocked until resume function of lcd driver is called. 

please look at the code in the BootAnimation.cpp:

    set_screen_state(1);

    // For some devices, it takes a while for the framebuffer to become
    // usable. So we wait until the framebuffer has woken up before we
    // try to open it.
    {
        char buf;
        int len = 0;
        ScopedClose fd(open("/sys/power/wait_for_fb_wake", O_RDONLY, 0));
        do {
            len = read(fd.get(), &buf, 1);
        } while (len < 0 && errno == EINTR);
        if (len < 0) {
            LOGE("BootAnimation: wait_for_fb_sleep failed errno: %d", errno);
        }
    }

the log shows "request_suspend_state: wakeup (0->0) at 87548211483272"

the normal log:
request_suspend_state: sleep (0->3) at 3259786973115 
or
request_suspend_state: wakeup (3->0) at 3260265089782

now please look at the request_suspend_state function in the kernel/power/Main.c:
void request_suspend_state(suspend_state_t new_state)
{
	unsigned long irqflags;
	int old_sleep;

	spin_lock_irqsave(&state_lock, irqflags);
	old_sleep = state & SUSPEND_REQUESTED;
	if (debug_mask & DEBUG_USER_STATE) {
		struct timespec ts;
		struct rtc_time tm;
		getnstimeofday(&ts);
		rtc_time_to_tm(ts.tv_sec, &tm);
		pr_info("request_suspend_state: %s (%d->%d) at %lld "
			"(%d-%02d-%02d %02d:%02d:%02d.%09lu UTC)\n",
			new_state != PM_SUSPEND_ON ? "sleep" : "wakeup",
			requested_suspend_state, new_state,
			ktime_to_ns(ktime_get()),
			tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
			tm.tm_hour, tm.tm_min, tm.tm_sec, ts.tv_nsec);
	}
	if (!old_sleep && new_state != PM_SUSPEND_ON) {
		state |= SUSPEND_REQUESTED;
		queue_work(suspend_work_queue, &early_suspend_work);
	} else if (old_sleep && new_state == PM_SUSPEND_ON) {
		state &= ~SUSPEND_REQUESTED;
		wake_lock(&main_wake_lock);
		queue_work(suspend_work_queue, &late_resume_work);
	}
	requested_suspend_state = new_state;
	spin_unlock_irqrestore(&state_lock, irqflags);
}


it is called by state_store fuction while write /sys/power/state.
I don't know why the power state still is 0 at first, lead to  set_screen_state(1) is valid.
As you said "set screen state disable and write "mem" to /sys/power/state", request_suspend_state shoud be set to 3, then it should be set to 0 when the b2g restart,
but it doesn't seem that way.

So I think, if the b2g crash and the fb device is disable or in the sleep mode at this time, then B2G will restart failure. 

Thanks.
Hi Alan,

If you don't understand what I mean, I'm sorry, please call me.
Tel:027-59599479.

Thanks.
Hi Vincent,

For netd change, please check it for us, I think we need to reset network state when the b2g restart.

Today, I found these log(notice: b2g main start and b2g do_main start, it represents the B2G has been restarted.).
The b2g restarted and blocked, so I think we need to restart network service when the b2g restart. 

arguments to dbus_connection_send_with_reply() were incorrect, assertion "connection != NULL" failed in file external/dbus/dbus/dbus-connection.c line 3384.
This is normally a bug in some application using the D-Bus library.
HCI command reset
[HCIFilterThread] Failed to issue Bt off
Last Error: Success
HCI command reset
BT turned off
HCI command reset
[HCIFilterThread] Failed to issue Bt off
Last Error: Success
HCI command reset
BT turned off
[BtStateActionProper], Indication =1, state=0,ControlAction.Length=12
Processing action for indication=1 (ON) (Sync State)
BT Action
[ExecuteBtAction] BT Action issue failed, Ioctl: 0x91, Len: 12
Last Error: Try again
Processing action for indication=1 (OFF) (Sync State)
BT Action
[ExecuteBtAction] BT Action issue failed, Ioctl: 0x91, Len: 12
Last Error: Try again
Shutting Down
Cleaning up dbus connection ... 
BT Stack Notification de-init complete
[BtStateActionProper], Indication =1, state=0,ControlAction.Length=12
Processing action for indication=1 (ON) (Sync State)
BT Action
[ExecuteBtAction] BT Action issue failed, Ioctl: 0x91, Len: 12
Last Error: Try again
Processing action for indication=1 (OFF) (Sync State)
BT Action
[ExecuteBtAction] BT Action issue failed, Ioctl: 0x91, Len: 12
Last Error: Try again
Shutting Down
Cleaning up dbus connection ... 
BT Stack Notification de-init complete
fengximing: b2g main start.
Terminating the WLAN Event Handler task
BT Filter Core : WLAN removed 
WLAN Stack Notification de-init complete
Terminating the BT Filter task
BT Filter Core de-init complete
Shutting Down Complete
Service terminated 
Terminating the WLAN Event Handler task
BT Filter Core : WLAN removed 
WLAN Stack Notification de-init complete
Terminating the BT Filter task
BT Filter Core de-init complete
Shutting Down Complete
Service terminated 
(29067): /system/etc/init.qcom.coex.sh: Coex stopped 
(29074): /system/etc/init.qcom.coex.sh: start_coex: pid = 29073 
Ath BT Filter Daemon
Enabling Debug Information
Calling Fcore Init
BT Filter Core init complete
WLAN Stack Notification init complete
Headset Profile notifications will use HCI filter instead of DBUS 
[DoMethodCall] Failed to invoke method call (org.bluez.Manager : method : DefaultAdapter) The name org.bluez was not provided by any .service files 
Last Error: Try again
[AcquireBtAdapter] Get Default Adapter failed 
Last Error: Try again
Starting the WLAN Event Handler task
 rule to add: type='signal',interface='org.bluez.Manager' 
 rule to add: type='signal',interface='org.bluez.Manager' 
BT Stack Notification init complete
Starting the BT Event Handler task
Entering DBus Message loop 
Get Sleep state
[Abf_WlanGetSleepState] Failed to issue get WLAN sleep state
Last Error: Success
Service running, waiting for termination .... 
Starting the BT Filter task
fengximing: b2g do_main start.
DEBUG_CALC_FPS: 0
period: 10
ignorethresh_us: 500000
DEBUG_CALC_FPS: 0
period: 10
ignorethresh_us: 500000
Registering start signal
gethostbyname normal

Thanks.
Flags: needinfo?(vchang)
(In reply to lecky from comment #8)
> I see power state in /sys/power/state, it's always 'mem'.
Yes, it is normal. cat /sys/power/state doesn't show what you wrote. It shows what the suspend states are. You can refer to kernel source code state_store() and state_show(), or try to echo on to it and see the result.

I believe there's kernel experts at your side. You can also consult them.

(In reply to lecky from comment #15)
Please note that power HAL and kernel are your confidential code. We don't have it.
I don't see why you are trying to solve this bug by this way. It seems there's no potential risk, but I won't recommend you do this. You should left this to those who should call power HAL. That's one reason we want HAL.

For the "b2g exist but hang there" bugs, unless you have full log and gdb backtraces at that time, we cannot help.

We are not going to put more resource into bugs like
1. crash,
2. oom,
3. stuck at boot animation, or 
4. "only backlight of homekey is bright, the screen is black, and press power key, it's no reaction",
unless someone inform us that 
1. This can be found on Mozilla software, AND
2. Necessary informations like crash reports, dmd files, full log and gdb backtraces at that time are provided. This one is included.
Flags: needinfo?(vchang)
Hi Alan,
1, state_store will trigger the earlysuspend process and late_resume process, if you write/read /sys/power/state, state_store will be invoked.

2,Now I know what's causing this issue. 
This is because the mediaserver restart failed. 
Because mediaserver will be restart when the b2g restarts. 

service b2g /system/bin/b2g.sh 
class main 
onrestart restart media 

Now the b2g has been restarted, but mediaserver restart failed, so it looks like this. 

Why the mediaserver restart failed? 
Now I'm working on it.
I/Gecko   (19340): fengximing: b2g main start.
I/Gecko   (19340): fengximing: b2g do_main start.
D/CALCFPS (19340): DEBUG_CALC_FPS: 0
D/CALCFPS (19340): period: 10
D/CALCFPS (19340): ignorethresh_us: 500000
D/CALCFPS (19340): DEBUG_CALC_FPS: 0
D/CALCFPS (19340): period: 10
D/CALCFPS (19340): ignorethresh_us: 500000
E/profiler(19340): Registering start signal
I/lzf     (19340): gethostbyname normal
D/DATA_CALL_MGR_QC_B2G(19340): Value of ro.ril.data_auto_attach is 0
I/GonkSwitch(19340): SwitchHandlerUsbIcs find missed port switch request, switch to port 0. 
I/GonkSwitch(19340): usbPortSwitch found port USBPORT_PROP_PRODUCTION
E/        (19338): waitBeforeAdding (media.audio_flinger) timed out
E/AudioHardwareMSM76XXA(19338): init_fir_filter: board MSM8X25_U8833
E/AudioHardwareMSM76XXA(19338): get_audpp_filter path = /system/etc/AudioFilter_MSM8X25_U8833.csv
E/AudioHardwareMSM76XXA(19338): AudioHardware::AudioHardware() init mttymode 0
E/AudioHardwareMSM76XXA(19338): get_audio_property_value: audio property value is 0x00011211
E/AudioHardwareMSM76XXA(19338): AudioHardware:FIR_ENABLED
E/AudioHardwareMSM76XXA(19338): Could not open MSM SND driver.
E/ServiceManager(  166): add_service('media.audio_flinger',0x8) uid=1013 - ALREADY REGISTERED, OVERRIDE
E/        (19338): waitBeforeAdding (media.player) timed out
E/ServiceManager(  166): add_service('media.player',0x2) uid=1013 - ALREADY REGISTERED, OVERRIDE
E/        (19338): waitBeforeAdding (media.camera) timed out
E/QualcommCamera(19338): Qint android::get_number_of_cameras(): E
I/QCameraHAL(19338): Handle base =0x0x41077294
I/QCameraHAL(19338): getCameraInfo: numOfCameras = 2
I/QCameraHAL(19338): Handle [0]=0x0x41077294
I/QCameraHAL(19338): Camera sensor 0 info:
I/QCameraHAL(19338): camera_id: 0
I/QCameraHAL(19338): modes_supported: 1
I/QCameraHAL(19338): position: 0
I/QCameraHAL(19338): sensor_mount_angle: 90
I/QCameraHAL(19338): Handle [1]=0x0x410772d8
I/QCameraHAL(19338): Camera sensor 1 info:
I/QCameraHAL(19338): camera_id: 1
I/QCameraHAL(19338): modes_supported: 1
I/QCameraHAL(19338): position: 1
I/QCameraHAL(19338): sensor_mount_angle: 270
E/ServiceManager(  166): add_service('media.camera',0x3) uid=1013 - ALREADY REGISTERED, OVERRIDE
E/        (19338): waitBeforeAdding (media.audio_policy) timed out
...
W/AudioFlinger(19338): Thread AudioOut_1 cannot connect to the power manager service
E/AudioHardwareMSM76XXA(19338): doRouting Enter: outputDevices=0x2 input=0x0x0
E/huawei_fir_adapter(19338): fir_set_mode: mode = 12
E/huawei_fir_adapter(19338): fir_set_mode: ret = 0
E/AudioHardwareMSM76XXA(19338): FIR in speaker mode
E/AudioHardwareMSM76XXA(19338): fir_set_mode: FIR_COEFFICIENT_SPEAKER
E/AudioHardwareMSM76XXA(19338): setVoiceVolume(1.200000)
E/AudioHardwareMSM76XXA(19338): Setting in-call volume to 6 (available range is 1 to 6)
E/ServiceManager(  166): add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
E/        (19338): Error Loading libmpqstobinder 
E/        (19338): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
E/        (19338): waitBeforeAdding (media.gestures) timed out
E/GestureDeviceService(19338): Could not load gesture HAL module
E/ServiceManager(  166): add_service('media.gestures',0x5) uid=1013 - ALREADY REGISTERED, OVERRIDE

Thanks.
(In reply to lecky from comment #19)

I remember I told you this nearly a month ago, and your guys said this WAS FIXED in our con-call with Wills and Kelly a week ago. I must be wrong. What's happening here?
Set POVB since there's nothing we can do here.
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Component: General → Vendcom
Resolution: --- → INVALID
Whiteboard: [POVB]
(In reply to Alan Huang [:ahuang] from comment #20)
> (In reply to lecky from comment #19)
> I remember I told you this nearly a
> month ago, and your guys said this WAS FIXED in our con-call with Wills and
> Kelly a week ago. I must be wrong. What's happening here?


Hi Alan,
1, 
Sorry, just I have someting to do, some infomation is not written.
after the b2g restarts, normal process as follows:
b2g main
|
load mediaserver done.
|
b2g do main

Exception follows:
b2g main
|
b2g do main
|
waitBeforeAdding mediaserver.

This is abnormal.

2, This issue different with that issue of a week ago.
Thanks.
(In reply to lecky from comment #22)
> 2, This issue different with that issue of a week ago.
> Thanks.

What's the difference?

https://bugzilla.mozilla.org/show_bug.cgi?id=934424#c19
E/        (19338): Error Loading libmpqstobinder 
E/        (19338): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
E/        (19338): waitBeforeAdding (media.gestures) timed out

https://bugzilla.mozilla.org/show_bug.cgi?id=908610#c37

04-09 21:07:25.989: E/ServiceManager(148): add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
04-09 21:07:25.989: E/(24438): Error Loading libmpqstobinder 
04-09 21:07:25.989: E/(24438): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
04-09 21:07:30.999: E/(24438): waitBeforeAdding (media.gestures) timed out
Hi Alan,

Now I know why the mediaserver restart failed.
Because the power service is not up when the b2g restart, so the mediaserver restart failed.

Thanks.
Hi Alan,

1, The issue was restored if you restart b2g in adb shell, so I think this is a b2g question.

2, I found the b2g was blocked in nsCategoryManager.cpp:NS_CreateServicesFromCategory, the b2g cannot continue to run.

Log:
11-20 06:50:09.249 I/Gecko   ( 4790): fengximing: 3, b2g XRE_mainRun.
11-20 06:50:09.249 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun ++.
11-20 06:50:09.249 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun mStartOffline:0.
11-20 06:50:09.299 E/fengximing( 4790): Frame delay is 33333 us but decoding took 77052 us
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun gDoMigration:0.
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun gDoProfileReset:0.
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun mDirProvider.DoStartup() start.
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup ++. mProfileNotified:0
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup mozilla::Preferences::ResetAndReadUserPrefs().
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup get appStartup service.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup appStartup != null.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsAppStartup::TrackStartupCrashBegin ++.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsAppStartup::TrackStartupCrashBegin Preferences::ClearUser.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup get Addons Manager service.
11-20 06:50:09.339 I/Gecko   ( 4790): Failed to create Addons Manager.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup LoadExtensionBundleDirectories end.

Code:
    // Init the Extension Manager
    nsCOMPtr<nsIObserver> em = do_GetService("@mozilla.org/addons/integration;1");
	printf_stderr("fengximing: nsXREDirProvider::DoStartup get Addons Manager service.\n");
    if (em) {
      em->Observe(nullptr, "addons-startup", nullptr);
    } else {
      printf_stderr("Failed to create Addons Manager.");
    }

    LoadExtensionBundleDirectories();
    printf_stderr("fengximing: nsXREDirProvider::DoStartup LoadExtensionBundleDirectories end.\n");
	
    obsSvc->NotifyObservers(nullptr, "load-extension-defaults", nullptr);
    obsSvc->NotifyObservers(nullptr, "profile-after-change", kStartup);

    // Any component that has registered for the profile-after-change category
    // should also be created at this time.
    (void)NS_CreateServicesFromCategory("profile-after-change", nullptr,
                                        "profile-after-change");
	
    printf_stderr("fengximing: nsXREDirProvider::DoStartup gSafeMode:%d.\n", gSafeMode);

It doesn't print "fengximing: nsXREDirProvider::DoStartup gSafeMode:%d.\n" from the log.
Thanks.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Hi Wayne,

Please help us to analysis, because I don't understander what's CreateServicesFromCategory.

Thanks.
Flags: needinfo?(wchang)
Flags: needinfo?(ahuang)
I am at work week so I don't have time for this now. Please make sure you can reproduce this on Mozilla software first.
Hi lecky,
summarize the things we would like to know.
1. How do you reproduce this symptom? (restart b2g in adb shell??)
   Kill b2g or ?
2. Could you attach GDB at that time and provide the backtrace for analysis?
3. If it can be reproduced, could you try it on Mozilla build?
I would try to reproduce the symptom if I get the answer of 1.
Thanks for your help.
Hi Alphan,

(In reply to Alphan Chen[:Alphan] from comment #28)
> Hi lecky, summarize the things we would like to know.
> 1. How do you reproduce this symptom? (restart b2g in adb shell??)
>   Kill b2g or ?
run monkey test, the b2g crashes and restart in this process, but b2g restart failure, so it looks like this.
when this issue reproduced, 

> 2.Could you attach GDB at that time and provide the backtrace for analysis?
if use gdb, the device automatically restart, and it was restored, so I can't use gdb to debug on this issue.

> 3.If it can be reproduced, could you try it on Mozilla build?
> I would try to reproduce the symptom if I get the answer of 1.
> Thanks for your help.
Now we also don't have more resources to replicate this issue on Mozilla build, 
So, could you provider guide of analysis base-on these log and process? 
(Comment 25 and before, or if you need more.) 
How to continue confirm what's happened?

Thank you very much.
(In reply to lecky from comment #29)
> Hi Alphan,
> 
> (In reply to Alphan Chen[:Alphan] from comment #28)
> > Hi lecky, summarize the things we would like to know.
> > 1. How do you reproduce this symptom? (restart b2g in adb shell??)
> >   Kill b2g or ?
> run monkey test, the b2g crashes and restart in this process, but b2g
> restart failure, so it looks like this.
> when this issue reproduced, 
So the failure is still reproduced by monkey test not by adb shell command.
Am I right?

> 
> > 2.Could you attach GDB at that time and provide the backtrace for analysis?
> if use gdb, the device automatically restart, and it was restored, so I
> can't use gdb to debug on this issue.
> 
How do you do the attach?
"./run-gdb.sh attach"? or "./run-gdb.sh"?
You should use "./run-gdb.sh attach", which will attach b2g process without restart b2g.
Then we can know the backtrace of b2g at that time.
1. ./run-gdb.sh attach
2. gdb start
3. (gdb) bt                      ---> backtrace of b2g (share the call stack on this bug)
4. (gdb) thread apply all bt     ---> backtrace of all threads (share the information on this bug)

> > 3.If it can be reproduced, could you try it on Mozilla build?
> > I would try to reproduce the symptom if I get the answer of 1.
> > Thanks for your help.
> Now we also don't have more resources to replicate this issue on Mozilla
> build, 
> So, could you provider guide of analysis base-on these log and process? 
> (Comment 25 and before, or if you need more.) 
> How to continue confirm what's happened?
For the existing log, I only saw the log you add.
Could you provide more log for this case?
Maybe you can attach the bugreport at that time.
We need more clues for analysis.
> 
> Thank you very much.
For now
Attached file black_screen.zip
Hi Alphan,

please look at the black_screen.zip log in attachment 8335139 [details].

Thanks.
I can't analyse this, clearing my NI.

(In reply to lecky from comment #26)
> Hi Wayne,
> 
> Please help us to analysis, because I don't understander what's
> CreateServicesFromCategory.
> 
> Thanks.
Flags: needinfo?(wchang)
(In reply to lecky from comment #24)
> Hi Alan,
> 
> Now I know why the mediaserver restart failed.
> Because the power service is not up when the b2g restart, so the mediaserver
> restart failed.
> 
> Thanks.

What now?

11-19 16:33:28.899 E/ServiceManager(  166): add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
11-19 16:33:28.899 E/        ( 6707): Error Loading libmpqstobinder 
11-19 16:33:28.899 E/        ( 6707): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
11-19 16:33:33.899 E/        ( 6707): waitBeforeAdding (media.gestures) timed out

11-20 06:50:28.040 E/ServiceManager(  168): add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
11-20 06:50:28.040 E/        ( 4789): Error Loading libmpqstobinder 
11-20 06:50:28.040 E/        ( 4789): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
11-20 06:50:33.039 E/        ( 4789): waitBeforeAdding (media.gestures) timed out
11-20 06:50:33.039 E/GestureDeviceService( 4789): Could not load gesture HAL module
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago11 years ago
Flags: needinfo?(ahuang)
Resolution: --- → INVALID
Hi Lecky,
we still saw the same log in black_screen.zip.
Three points need your attention and help.

1. MSM sound driver cannot open?! Why?
   Something wrong in audio driver?!
11-19 16:33:13.809 E/AudioHardwareMSM76XXA( 6707): Could not open MSM SND driver.

2. What's this library for?
11-19 16:33:28.899 E/        ( 6707): Error Loading libmpqstobinder 
11-19 16:33:28.899 E/        ( 6707): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found

3. Please use "./run-gdb.sh attach" to attach GDB.
-> ./run-gdb.sh attach
-> gdb start, attach b2g process.
-> (gdb) bt                      ---> backtrace of b2g (share the call stack on this bug)
-> (gdb) thread apply all bt     ---> backtrace of all threads (share the information on this bug)



(2013.11.18_21_40~11.19_11_00)-B006_F_screen_black_nsApprunner
11-19 16:33:13.769 E/        ( 6707): waitBeforeAdding (media.audio_flinger) timed out
11-19 16:33:13.799 E/AudioHardwareMSM76XXA( 6707): init_fir_filter: board MSM8X25_U8833
11-19 16:33:13.799 E/AudioHardwareMSM76XXA( 6707): get_audpp_filter path = /system/etc/AudioFilter_MSM8X25_U8833.csv
11-19 16:33:13.809 E/AudioHardwareMSM76XXA( 6707): AudioHardware::AudioHardware() init mttymode 0
11-19 16:33:13.809 E/AudioHardwareMSM76XXA( 6707): get_audio_property_value: audio property value is 0x00011211
11-19 16:33:13.809 E/AudioHardwareMSM76XXA( 6707): AudioHardware:FIR_ENABLED
11-19 16:33:13.809 E/AudioHardwareMSM76XXA( 6707): Could not open MSM SND driver.
11-19 16:33:13.819 E/ServiceManager(  166): add_service('media.audio_flinger',0x8) uid=1013 - ALREADY REGISTERED, OVERRIDE
...
11-19 16:33:28.899 E/AudioHardwareMSM76XXA( 6707): FIR in speaker mode
11-19 16:33:28.899 E/AudioHardwareMSM76XXA( 6707): fir_set_mode: FIR_COEFFICIENT_SPEAKER
11-19 16:33:28.899 E/AudioHardwareMSM76XXA( 6707): setVoiceVolume(1.200000)
11-19 16:33:28.899 E/AudioHardwareMSM76XXA( 6707): Setting in-call volume to 6 (available range is 1 to 6)
11-19 16:33:28.899 E/ServiceManager(  166): add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
11-19 16:33:28.899 E/        ( 6707): Error Loading libmpqstobinder 
11-19 16:33:28.899 E/        ( 6707): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
11-19 16:33:33.899 E/        ( 6707): waitBeforeAdding (media.gestures) timed out
11-19 16:33:33.899 E/GestureDeviceService( 6707): Could not load gesture HAL module



(2013.11.19_19_30~11.20_9_30)-B006_G_screen_black_nsApprunner\
11-20 06:50:12.889 E/        ( 4789): waitBeforeAdding (media.audio_flinger) timed out
11-20 06:50:12.919 E/AudioHardwareMSM76XXA( 4789): init_fir_filter: board MSM8X25_U8833
11-20 06:50:12.919 E/AudioHardwareMSM76XXA( 4789): get_audpp_filter path = /system/etc/AudioFilter_MSM8X25_U8833.csv
11-20 06:50:12.929 E/AudioHardwareMSM76XXA( 4789): AudioHardware::AudioHardware() init mttymode 0
11-20 06:50:12.929 E/AudioHardwareMSM76XXA( 4789): get_audio_property_value: audio property value is 0x00011211
11-20 06:50:12.929 E/AudioHardwareMSM76XXA( 4789): AudioHardware:FIR_ENABLED
11-20 06:50:12.939 E/AudioHardwareMSM76XXA( 4789): Could not open MSM SND driver.
11-20 06:50:12.949 E/ServiceManager(  168): add_service('media.audio_flinger',0x8) uid=1013 - ALREADY REGISTERED, OVERRIDE
11-20 06:50:17.959 E/        ( 4789): waitBeforeAdding (media.player) timed out
11-20 06:50:17.959 E/ServiceManager(  168): add_service('media.player',0x2) uid=1013 - ALREADY REGISTERED, OVERRIDE
11-20 06:50:22.959 E/        ( 4789): waitBeforeAdding (media.camera) timed out
11-20 06:50:22.999 E/QualcommCamera( 4789): Qint android::get_number_of_cameras(): E
...
11-20 06:50:28.030 W/AudioFlinger( 4789): Thread AudioOut_1 cannot connect to the power manager service
11-20 06:50:28.030 E/AudioHardwareMSM76XXA( 4789): doRouting Enter: outputDevices=0x2 input=0x0x0
11-20 06:50:28.030 E/huawei_fir_adapter( 4789): fir_set_mode: mode = 12
11-20 06:50:28.030 E/huawei_fir_adapter( 4789): fir_set_mode: ret = 0
11-20 06:50:28.030 E/AudioHardwareMSM76XXA( 4789): FIR in speaker mode
11-20 06:50:28.030 E/AudioHardwareMSM76XXA( 4789): fir_set_mode: FIR_COEFFICIENT_SPEAKER
11-20 06:50:28.030 E/AudioHardwareMSM76XXA( 4789): setVoiceVolume(1.200000)
11-20 06:50:28.030 E/AudioHardwareMSM76XXA( 4789): Setting in-call volume to 6 (available range is 1 to 6)
11-20 06:50:28.040 E/ServiceManager(  168): add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
11-20 06:50:28.040 E/        ( 4789): Error Loading libmpqstobinder 
11-20 06:50:28.040 E/        ( 4789): Error: Cannot load library: load_library[1093]: Library 'libmpqstobinder.so' not found
11-20 06:50:33.039 E/        ( 4789): waitBeforeAdding (media.gestures) timed out
11-20 06:50:33.039 E/GestureDeviceService( 4789): Could not load gesture HAL module

(In reply to Alan Huang [:ahuang] from comment #23)
> (In reply to lecky from comment #22)
> > 2, This issue different with that issue of a week ago.
> > Thanks.
> 
> What's the difference?
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=934424#c19
> E/        (19338): Error Loading libmpqstobinder 
> E/        (19338): Error: Cannot load library: load_library[1093]: Library
> 'libmpqstobinder.so' not found
> E/        (19338): waitBeforeAdding (media.gestures) timed out
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=908610#c37
> 
> 04-09 21:07:25.989: E/ServiceManager(148):
> add_service('media.audio_policy',0x4) uid=1013 - ALREADY REGISTERED, OVERRIDE
> 04-09 21:07:25.989: E/(24438): Error Loading libmpqstobinder 
> 04-09 21:07:25.989: E/(24438): Error: Cannot load library:
> load_library[1093]: Library 'libmpqstobinder.so' not found
> 04-09 21:07:30.999: E/(24438): waitBeforeAdding (media.gestures) timed out
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Hi Alphan,

> 1. MSM sound driver cannot open?! Why?
>   Something wrong in audio driver?!
> 11-19 16:33:13.809 E/AudioHardwareMSM76XXA( 6707): Could not open MSM SND driver.

> 2. What's this library for?
> 11-19 16:33:28.899 E/        ( 6707): Error Loading libmpqstobinder 
> 11-19 16:33:28.899 E/        ( 6707): Error: Cannot load library: load_library[1093]: 
> Library 'libmpqstobinder.so' not found

These are normal.

For the mediaser restart failure, that is just a concomitant phenomenon, because the system was blocked in NS_CreateServicesFromCategory, so this cause the problem.

please look at the log from b2g main start to nsXREDirProvider::DoStartup LoadExtensionBundleDirectories end.
then it should call the NS_CreateServicesFromCategory function, please see in code at Comment 25.

11-20 06:50:07.749 I/Gecko   ( 4790): fengximing: b2g main start.
11-20 06:50:08.309 I/Gecko   ( 4790): fengximing: b2g do_main start.
11-20 06:50:08.309 E/fengximing( 4790): fengximing: get wait_for_fb_wake.
11-20 06:50:08.649 D/CALCFPS ( 4790): DEBUG_CALC_FPS: 0
11-20 06:50:08.649 D/CALCFPS ( 4790): period: 10
11-20 06:50:08.649 D/CALCFPS ( 4790): ignorethresh_us: 500000
11-20 06:50:08.649 D/CALCFPS ( 4790): DEBUG_CALC_FPS: 0
11-20 06:50:08.649 D/CALCFPS ( 4790): period: 10
11-20 06:50:08.649 D/CALCFPS ( 4790): ignorethresh_us: 500000
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g do_main XRE_main start(2).
11-20 06:50:08.669 E/profiler( 4790): Registering start signal
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g XRE_main start.
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: 1, b2g XRE_mainInit.
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g XRE_mainInit ++.
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g get HOME directory
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g Check for application.ini overrides.
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g Check for application.ini overrides end.
11-20 06:50:08.669 I/Gecko   ( 4790): fengximing: b2g Check sanity and correctness of app data.
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: b2g XRE_mainInit --.
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: 2, b2g XRE_mainStartup.
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup start.
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup NS_CreateNativeAppSupport.
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup defined(USE_MOZ_UPDATER).
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup NS_NewToolkitProfileService.
11-20 06:50:08.689 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup SelectProfile.
11-20 06:50:08.689 I/lzf     ( 4790): gethostbyname normal
11-20 06:50:08.769 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup get gProfileLock.
11-20 06:50:08.769 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup mProfileLock->GetDirectory.
11-20 06:50:08.769 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup mProfileLock->GetLocalDirectory.
11-20 06:50:08.769 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup mDirProvider.SetProfile.
11-20 06:50:08.779 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup define MOZ_CRASHREPORTER.
11-20 06:50:08.779 I/Gecko   ( 4790): fengximing: b2g XRE_mainStartup end.
11-20 06:50:08.779 I/Gecko   ( 4790): fengximing: b2g Start the real application(ScopedXPCOMStartup).
11-20 06:50:08.779 I/Gecko   ( 4790): fengximing: b2g XRE_main mScopedXPCom Initialize.
11-20 06:50:08.979 E/fengximing( 4790): Frame delay is 33333 us but decoding took 121931 us
11-20 06:50:09.069 E/fengximing( 4790): Frame delay is 33333 us but decoding took 81129 us
11-20 06:50:09.139 E/fengximing( 4790): Frame delay is 33333 us but decoding took 72896 us
11-20 06:50:09.209 I/Gonk    ( 4790): Setting nice for pid 4790 to 0
11-20 06:50:09.209 I/Gonk    ( 4790): Changed nice for pid 4790 from 0 to 0.
11-20 06:50:09.219 E/fengximing( 4790): Frame delay is 33333 us but decoding took 75894 us
11-20 06:50:09.249 I/Gecko   ( 4790): fengximing: 3, b2g XRE_mainRun.
11-20 06:50:09.249 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun ++.
11-20 06:50:09.249 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun mStartOffline:0.
11-20 06:50:09.299 E/fengximing( 4790): Frame delay is 33333 us but decoding took 77052 us
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun gDoMigration:0.
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun gDoProfileReset:0.
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: b2g XRE_mainRun mDirProvider.DoStartup() start.
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup ++. mProfileNotified:0
11-20 06:50:09.329 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup mozilla::Preferences::ResetAndReadUserPrefs().
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup get appStartup service.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup appStartup != null.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsAppStartup::TrackStartupCrashBegin ++.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsAppStartup::TrackStartupCrashBegin Preferences::ClearUser.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup get Addons Manager service.
11-20 06:50:09.339 I/Gecko   ( 4790): Failed to create Addons Manager.
11-20 06:50:09.339 I/Gecko   ( 4790): fengximing: nsXREDirProvider::DoStartup LoadExtensionBundleDirectories end.
11-20 06:50:09.369 E/fengximing( 4790): Frame delay is 33333 us but decoding took 69191 us
11-20 06:50:09.449 E/fengximing( 4790): Frame delay is 33333 us but decoding took 77236 us
11-20 06:50:09.519 E/fengximing( 4790): Frame delay is 33333 us but decoding took 70414 us
11-20 06:50:09.589 E/fengximing( 4790): Frame delay is 33333 us but decoding took 68668 us
11-20 06:50:09.619 D/DATA_CALL_MGR_QC_B2G( 4790): Value of ro.ril.data_auto_attach is 0
11-20 06:50:09.619 I/GonkSwitch( 4790): SwitchHandlerUsbIcs find missed port switch request, switch to port 0. 
11-20 06:50:09.619 I/GonkSwitch( 4790): usbPortSwitch found port USBPORT_PROP_PRODUCTION

Then mediaserver error, but it doesn't cause the problem, because I have tested this, it also occurred in the normal phone.
Thanks.
(In reply to lecky from comment #36)
> Hi Alphan,
> 
> 
> Then mediaserver error, but it doesn't cause the problem, because I have
> tested this, it also occurred in the normal phone.
> Thanks.

Hi Lecky,
For the analysis of "NS_CreateServicesFromCategory", please use GDB to get backtrace.
Thanks.

1. ./run-gdb.sh attach
2. (gdb) bt                      ---> backtrace of b2g (share the call stack on this bug)
3. (gdb) thread apply all bt     ---> backtrace of all threads (share the information on this bug)
Flags: needinfo?(lecky.wanglei)
Hi Alphan,

When run ./run-gdb.sh attach, I don't know why the b2g automatically restart again, the symptom doesn't exist.

Thanks.
Flags: needinfo?(lecky.wanglei)
Hi Alphan,

I have a question:

Code:
void
NS_CreateServicesFromCategory(const char *category,
                              nsISupports *origin,
                              const char *observerTopic)
{
  printf_stderr("fengximing: NS_CreateServicesFromCategory ++. category:%s\n", category);
  ......
  printf_stderr("fengximing: NS_CreateServicesFromCategory while(1).\n");
  bool hasMore;
  
  while (NS_SUCCEEDED(senumerator->HasMore(&hasMore)) && hasMore) {
    // From here on just skip any error we get.
    nsAutoCString entryString;
    printf_stderr("fengximing: NS_CreateServicesFromCategory senumerator->HasMore.\n");
    ......
    nsXPIDLCString contractID;
    printf_stderr("fengximing: NS_CreateServicesFromCategory senumerator entryString:%s.\n", entryString.get());
    rv = categoryManager->GetCategoryEntry(category,entryString.get(),
                                           getter_Copies(contractID));

    nsCOMPtr<nsISupports> instance = do_GetService(contractID);
    ......	
    if (observerTopic) {
    ......
}

Log:
11-05 07:55:06.569: I/Gecko(1423): fengximing: nsXREDirProvider::DoStartup LoadExtensionBundleDirectories end.
11-05 07:55:06.579: I/Gecko(1423): fengximing: NS_CreateServicesFromCategory ++. category:profile-after-change
11-05 07:55:06.579: I/Gecko(1423): fengximing: NS_CreateServicesFromCategory while(1).
11-05 07:55:06.579: I/Gecko(1423): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-05 07:55:06.579: I/Gecko(1423): fengximing: NS_CreateServicesFromCategory senumerator entryString:Bluetooth Service.

NS_CreateServicesFromCategory only hanldes excption return value in the code, but if Bluetooth Service has been terminated, what will happen in do_GetService or NS_CreateServicesFromCategory? 

Thanks.
Hi Lecky,
We suppose that "do_GetService" should not be blocked.
If B2G real stuck here, we need to attach GDB to dig more.

In your latest reply, B2G is trying to get bluetooth service.
Maybe bluetooth service is not terminated but get some problems.
Could you check your bluetooth service and see if there is any problem in it?

By the way, as you mentioned in comment 38(https://bugzilla.mozilla.org/show_bug.cgi?id=934424#c38)
b2g restart automatically restart again when running "./run-gdb.sh attach".
1. could you share the log include the period of "./run-gdb.sh attach"?
2. You can use "adb shell b2g-info" to check pid of b2g to know whether b2g restart or not.
   the pid of b2g should be different if b2g restart again.
   If b2g didn't restart, we can check the log to see what happened after GDB attached.
Please help on the things above.
Thanks.
Flags: needinfo?(lecky.wanglei)
Hi Alphan,

I got the latest log, this seems to be blocked in "NS_CreateServicesFromCategory senumerator entryString:Telephony System Worker Manager".

abnormal log:
12-21 15:39:36.019 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory while(1).
12-21 15:39:36.019 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
12-21 15:39:36.019 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator entryString:Bluetooth Service.
12-21 15:39:36.059 E/fengximing(  758): Frame delay is 33333 us but decoding took 73058 us
12-21 15:39:36.129 E/fengximing(  758): Frame delay is 33333 us but decoding took 67765 us
12-21 15:39:36.149 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
12-21 15:39:36.149 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator entryString:Disk Space Watcher Service.
12-21 15:39:36.149 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
12-21 15:39:36.149 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator entryString:MobileMessageDatabaseService.
12-21 15:39:36.209 E/fengximing(  758): Frame delay is 33333 us but decoding took 77423 us
12-21 15:39:36.219 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
12-21 15:39:36.219 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator entryString:RILContentHelper.
12-21 15:39:36.279 E/fengximing(  758): Frame delay is 33333 us but decoding took 68305 us
12-21 15:39:36.289 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
12-21 15:39:36.289 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator entryString:TelemetryPing.

12-21 15:39:36.309 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
12-21 15:39:36.309 I/Gecko   (  758): fengximing: NS_CreateServicesFromCategory senumerator entryString:Telephony System Worker Manager.
12-21 15:39:36.319 D/DATA_CALL_MGR_QC_B2G(  758): Value of ro.ril.data_auto_attach is 0

12-21 15:39:36.319 I/GonkSwitch(  758): SwitchHandlerUsbIcs find missed port switch request, switch to port 0. 
12-21 15:39:36.319 I/GonkSwitch(  758): usbPortSwitch found port USBPORT_PROP_PRODUCTION
12-21 15:39:36.319 I/GonkSwitch(  758): SwitchHandlerUsbIcs GetStateString USB_STATE(DISCONNECTED).
12-21 15:39:36.319 E/NetlinkEvent(  758): NetlinkEvent::FindParam(): Parameter 'USB_PORT_SWITCH' not found
12-21 15:39:36.349 E/fengximing(  758): Frame delay is 33333 us but decoding took 71470 us
......
......
12-21 15:39:36.999 E/fengximing(  758): Frame delay is 33333 us but decoding took 64705 us
12-21 15:39:37.059 I/GonkSwitch(  758): SwitchHandlerUsbIcs GetStateString USB_STATE(CONNECTED).
12-21 15:39:37.059 E/NetlinkEvent(  758): NetlinkEvent::FindParam(): Parameter 'USB_PORT_SWITCH' not found
12-21 15:39:37.059 E/fengximing(  758): Frame delay is 33333 us but decoding took 61945 us
12-21 15:39:37.129 E/fengximing(  758): Frame delay is 33333 us but decoding took 61988 us
12-21 15:39:37.189 E/fengximing(  758): Frame delay is 33333 us but decoding took 63533 us
12-21 15:39:37.219 I/GonkSwitch(  758): SwitchHandlerUsbIcs GetStateString USB_STATE(CONFIGURED).
12-21 15:39:37.219 E/NetlinkEvent(  758): NetlinkEvent::FindParam(): Parameter 'USB_PORT_SWITCH' not found
12-21 15:39:37.249 E/fengximing(  758): Frame delay is 33333 us but decoding took 62105 us
......
......
12-21 15:39:37.699 E/fengximing(  758): Frame delay is 33333 us but decoding took 61547 us
12-21 15:39:54.599 W/AudioFlinger(  757): Thread AudioOut_1 cannot connect to the power manager service
......


normal log:
11-06 01:16:30.289 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory while(1).
11-06 01:16:30.289 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.289 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:Bluetooth Service.
11-06 01:16:30.319 E/fengximing(  683): Frame delay is 33333 us but decoding took 67129 us
11-06 01:16:30.339 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.339 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:Disk Space Watcher Service.
11-06 01:16:30.339 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.339 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:MobileMessageDatabaseService.
11-06 01:16:30.379 E/fengximing(  683): Frame delay is 33333 us but decoding took 65270 us
11-06 01:16:30.399 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.399 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:RILContentHelper.
11-06 01:16:30.459 E/fengximing(  683): Frame delay is 33333 us but decoding took 79253 us
11-06 01:16:30.469 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.469 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:TelemetryPing.
11-06 01:16:30.489 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.489 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:Telephony System Worker Manager.
11-06 01:16:30.499 D/DATA_CALL_MGR_QC_B2G(  683): Value of ro.ril.data_auto_attach is 0

11-06 01:16:30.499 D/EONS_QC_B2G(  683): Is EONS enabled 1
11-06 01:16:30.499 D/ContentHelper_QC_B2G(  683): Creating instance of QCContentHelper
11-06 01:16:30.529 I/Gecko   (  683): -*- QCContentHelper_QC_B2G: Starting QC Telephony Content Helper Interface

11-06 01:16:30.529 E/fengximing(  683): Frame delay is 33333 us but decoding took 68902 us
11-06 01:16:30.549 D/ContentHelper_QC_B2G(  683): Calling RegisterCallback
11-06 01:16:30.549 I/Gecko   (  683): -*- QCContentHelper_QC_B2G: Registering callback [xpconnect wrapped nsIQCContentHelperCallback]
11-06 01:16:30.549 D/ContentHelper_QC_B2G(  683): Return from RegisterCallback
11-06 01:16:30.549 W/QCLOC   (  683): Examining Gecko interfaces...
11-06 01:16:30.549 W/QCLOC   (  683):   nsIObserver: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIRILDataCallback: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIGeolocationProvider: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsISettingsService: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIPrefBranch: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIThread: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIRadioInterfaceLayer: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIRilContext: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIDOMMozMobileConnectionInfo: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIQCContentHelper: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsINetworkManager: ok
11-06 01:16:30.629 I/AutoMounter(  683): UpdateState: Volume sdcard is Mounted and inserted @ /mnt/sdcard2 gen 1 locked 1 sharing n
11-06 01:16:30.629 I/AutoMounter(  683): UpdateState: Volume extsdcard is Mounted and inserted @ /mnt/sdcard gen 2 locked 1 sharing n
11-06 01:16:30.629 D/VoldCmdListener(  170): volume list
......
......
11-06 01:16:30.669 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.669 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:messageWakeupService.
11-06 01:16:30.669 I/Gonk    (  683): Connected to netd
11-06 01:16:30.679 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.679 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:nsUpdateServiceStub.
11-06 01:16:30.679 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator->HasMore.
11-06 01:16:30.679 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory senumerator entryString:nsUpdateTimerManager.
11-06 01:16:30.699 E/fengximing(  683): Frame delay is 33333 us but decoding took 76965 us
11-06 01:16:30.699 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory while(0).
11-06 01:16:30.699 I/Gecko   (  683): fengximing: NS_CreateServicesFromCategory --.

11-06 01:16:30.699 I/Gecko   (  683): fengximing: nsXREDirProvider::DoStartup gSafeMode:0.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: nsXREDirProvider::DoStartup mode:1.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: nsXREDirProvider::DoStartup mozilla::Telemetry::Accumulate end.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: nsXREDirProvider::DoStartup NotifyObservers profile-initial-state.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: nsXREDirProvider::DoStartup --.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: b2g XRE_mainRun mDirProvider.DoStartup() end.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: b2g XRE_mainRun GetShuttingDown end(1).
11-06 01:16:30.699 I/Gecko   (  683): fengximing: b2g XRE_mainRun NS_GetSpecialDirectory start.
11-06 01:16:30.699 I/Gecko   (  683): fengximing: b2g XRE_mainRun NS_GetSpecialDirectory end(rv:0).
11-06 01:16:30.699 I/Gecko   (  683): fengximing: b2g XRE_mainRun mShuttingDown:0.
......
......
11-06 01:16:32.849 W/AudioFlinger(  185): Thread AudioOut_1 cannot connect to the power manager service


about details, please see in the attachment.
Thank you very much.
Flags: needinfo?(lecky.wanglei)
Hi Lecky,
share some finding for your reference.


(B006_H_screen_black_nsCategoryManager-7(2013.11.20_12_00~11.21_9_30)
1. For usb related log of GonkSwitch. 
   You can see corresponding kernel log in the same time.
   Please check if it is noraml or not.
   I can see the same log during device boot up as well.

(applogcat-log)
11-21 01:32:16.999 I/GonkSwitch(12422): SwitchHandlerUsbIcs find missed port switch request, switch to port 0. 
11-21 01:32:16.999 I/GonkSwitch(12422): usbPortSwitch found port USBPORT_PROP_PRODUCTION
11-21 01:32:17.009 I/GonkSwitch(12422): SwitchHandlerUsbIcs GetStateString USB_STATE(DISCONNECTED).
11-21 01:32:17.009 E/NetlinkEvent(12422): NetlinkEvent::FindParam(): Parameter 'USB_PORT_SWITCH' not found

(kmsgcat-log.4)
11-21 01:32:17 <6>[1, init] [49187.984440] usb_autorun: fsg_close_all_file
11-21 01:32:17 <7>[1, init] [49187.984666] diag: USB disconnected
11-21 01:32:17 <6>[964, adbd] [49187.984903] adb_release
11-21 01:32:17 <6>[857, adbd] [49187.985256] adb_op11-21 01:32:17 <6>[1, init] [49187.986485] nluns_store: nluns_store: nluns = 1
11-21 01:32:17 <6>[1, init] [49187.986598] cdrom_index_store: cdrom_index_store: cdrom_index = 0
11-21 01:32:17 <6>[1, init] [49187.986808] usb_autorun: fsg_store_file: buf=/system/cdrom/autorun.iso
11-21 01:32:17 <6>[1, init] [49187.986945] functions_store: functions_store: serial,mass_storage,adb,diag
11-21 01:32:17 <4>[1, init] [49187.987063] cleanup usb serial number for noraml and factory mode!
11-21 01:32:17 <6>[1, init] [49187.987131] adb_bind_config
11-21 01:32:17 <6>[12446, adbd] [49187.988160] adb_release
11-21 01:32:17 <6>[857, adbd] [49187.988263] adb_open

2. For "Telephony System Worker Manager", "DATA_CALL_MGR_QC_B2G" and "ContentHelper_QC_B2G" are QCTRIL.
   We don't have the source code of QCTRIL.
   Please check with QCT or the source code of QCTRIL.

11-21 01:32:16.999 D/DATA_CALL_MGR_QC_B2G(12422): Value of ro.ril.data_auto_attach is 0

(normal log)
11-06 01:16:30.499 D/EONS_QC_B2G(  683): Is EONS enabled 1
11-06 01:16:30.499 D/ContentHelper_QC_B2G(  683): Creating instance of QCContentHelper
11-06 01:16:30.529 I/Gecko   (  683): -*- QCContentHelper_QC_B2G: Starting QC Telephony Content Helper Interface

11-06 01:16:30.549 D/ContentHelper_QC_B2G(  683): Calling RegisterCallback
11-06 01:16:30.549 I/Gecko   (  683): -*- QCContentHelper_QC_B2G: Registering callback [xpconnect wrapped nsIQCContentHelperCallback]
11-06 01:16:30.549 D/ContentHelper_QC_B2G(  683): Return from RegisterCallback
11-06 01:16:30.549 W/QCLOC   (  683): Examining Gecko interfaces...
11-06 01:16:30.549 W/QCLOC   (  683):   nsIObserver: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIRILDataCallback: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIGeolocationProvider: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsISettingsService: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIPrefBranch: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIThread: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIRadioInterfaceLayer: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIRilContext: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIDOMMozMobileConnectionInfo: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsIQCContentHelper: ok
11-06 01:16:30.549 W/QCLOC   (  683):   nsINetworkManager: ok
Hi Alphan,

Now I can confirm it was blocked in "nsCOMPtr<nsISupports> instance = do_GetService(contractID);".

Please see the code and the log, after it printed "NS_CreateServicesFromCategory senumerator entryString:Telephony System Worker Manager.", 
then it printed "NS_CreateServicesFromCategory categoryManager->GetCategoryEntry end.",
but it does't print "NS_CreateServicesFromCategory instance do_GetService end.".

From this, do_GetService function will be blocked if it has exception.
I have a I have a question why is it no timeout or exception handling mechanism?


Code:
    printf_stderr("fengximing: NS_CreateServicesFromCategory senumerator entryString:%s.\n", entryString.get());
	
    rv = categoryManager->GetCategoryEntry(category,entryString.get(),
                                           getter_Copies(contractID));
    if (NS_FAILED(rv)) {
	  printf_stderr("Category enumerator doesn't support nsIUTF8StringEnumerator while(0).\n");
      continue;
    }
    printf_stderr("fengximing: NS_CreateServicesFromCategory categoryManager->GetCategoryEntry end.\n");
	
    nsCOMPtr<nsISupports> instance = do_GetService(contractID);
    if (!instance) {
      printf_stderr("While creating services from category '%s', could not create service for entry '%s', contract ID '%s'",
                 category, entryString.get(), contractID.get());
      continue;
    }
    printf_stderr("fengximing: NS_CreateServicesFromCategory instance do_GetService end.\n");

abnormal log:
11-01 10:46:50.189 I/Gecko   (24408): fengximing: NS_CreateServicesFromCategory senumerator entryString:Telephony System Worker Manager.
11-01 10:46:50.189 I/Gecko   (24408): fengximing: NS_CreateServicesFromCategory categoryManager->GetCategoryEntry end.
11-01 10:46:50.189 E/fengximing(24408): Frame delay is 33333 us but decoding took 67323 us
11-01 10:46:50.199 E/DATA_CALL_TRACKER_QC_B2G(24408): DataCallTracker: RegisterForUnsol, mPoweringOff(false)
11-01 10:46:50.199 D/DATA_CALL_MGR_QC_B2G(24408): Value of ro.ril.data_auto_attach is 0
11-01 10:46:50.199 I/GonkSwitch(24408): SwitchHandlerUsbIcs find missed port switch request, switch to port 0. 
11-01 10:46:50.199 I/GonkSwitch(24408): usbPortSwitch found port USBPORT_PROP_PRODUCTION
11-01 10:46:50.259 E/fengximing(24408): Frame delay is 33333 us but decoding took 67362 us

Thanks.
As talked, here are the action items for this bug.
1. Use minidump (crash report) to get more information. (see the following for detail)
2. skip do_GetService of "Telephony System Worker Manager" to identify the problem
3. Check "DATA_CALL_MGR_QC_B2G"  and QCT RIL with QCT
Hi Alphan,

I got the following information on this issue.

Thread 0 (crashed)
 0  libxul.so!mozalloc_abort [mozalloc_abort.cpp : 30 + 0x4]
     r4 = 0xbef71de4    r5 = 0x00000000    r6 = 0xffffffff    r7 = 0xbef719f8
     r8 = 0x40c24d65    r9 = 0x00000001   r10 = 0xbef719f8    fp = 0x411085a7
     sp = 0xbef719e0    lr = 0x4109c867    pc = 0x4109c86a
    Found by: given as instruction pointer in context
 1  libxul.so!NS_DebugBreak_P [nsDebugImpl.cpp : 423 + 0x5]
     r4 = 0xbef71de4    r5 = 0x00000000    r6 = 0xffffffff    r7 = 0xbef719f8
     r8 = 0x40c24d65    r9 = 0x00000001   r10 = 0xbef719f8    fp = 0x411085a7
     sp = 0xbef719e8    pc = 0x40c24b4d
    Found by: call frame info
 2  libxul.so!mozilla::dom::ContentChild::ProcessingError [ContentChild.cpp : 1065 + 0x15]
     r4 = 0x00000006    r5 = 0x41a1b624    r6 = 0xbef71e5c    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e18    pc = 0x40af7c8b
    Found by: call frame info
 3  libxul.so!mozilla::dom::PContentChild::OnProcessingError [PContentChild.cpp : 3260 + 0x5]
     r4 = 0x00000006    r5 = 0x41a1b624    r6 = 0xbef71e5c    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e28    pc = 0x40b8370f
    Found by: call frame info
 4  libxul.so!mozilla::ipc::AsyncChannel::MaybeHandleError [AsyncChannel.cpp : 631 + 0x5]
     r4 = 0x00000006    r5 = 0x41a1b624    r6 = 0xbef71e5c    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e30    pc = 0x40b0e305
    Found by: call frame info
 5  libxul.so!mozilla::ipc::AsyncChannel::OnDispatchMessage [AsyncChannel.cpp : 485 + 0x11]
     r4 = 0x41a1b624    r5 = 0xbef71e5c    r6 = 0xbef71e5c    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e48    pc = 0x40b0e387
    Found by: call frame info
 6  libxul.so!mozilla::ipc::RPCChannel::OnMaybeDequeueOne [RPCChannel.cpp : 402 + 0x7]
     r4 = 0x41a1b624    r5 = 0xbef71e5c    r6 = 0xbef71e5c    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e58    pc = 0x40b131ff
    Found by: call frame info
 7  libxul.so!RunnableMethod<IPC::ChannelProxy::Context, void (IPC::ChannelProxy::Context::*)(), Tuple0>::Run [tuple.h : 383 + 0x5]
     r4 = 0xbef72854    r5 = 0x4ad31d28    r6 = 0xbef71ec8    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e90    pc = 0x408982eb
    Found by: call frame info
 8  libxul.so!mozilla::ipc::RPCChannel::DequeueTask::Run [RPCChannel.h : 425 + 0x9]
     r4 = 0xbef72854    r5 = 0x4ad31d28    r6 = 0xbef71ec8    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71e98    pc = 0x40b11ba9
    Found by: call frame info
 9  libxul.so!MessageLoop::RunTask [message_loop.cc : 337 + 0x5]
     r4 = 0xbef72854    r5 = 0x4ad31d28    r6 = 0xbef71ec8    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71ea0    pc = 0x40c428d5
    Found by: call frame info
10  libxul.so!MessageLoop::DeferOrRunPendingTask [message_loop.cc : 345 + 0x5]
     r4 = 0x00000001    r5 = 0xbef71eb8    r6 = 0xbef71ec8    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71eb0    pc = 0x40c43707
    Found by: call frame info
11  libxul.so!MessageLoop::DoWork [message_loop.cc : 445 + 0x7]
     r4 = 0xbef72854    r5 = 0xbef71eb8    r6 = 0xbef71ec8    r7 = 0xbef72860
     r8 = 0xbef71ec0    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71eb8    pc = 0x40c442e5
    Found by: call frame info
12  libxul.so!mozilla::ipc::DoWorkRunnable::Run [MessagePump.cpp : 42 + 0x7]
     r4 = 0xbef72854    r5 = 0x00000001    r6 = 0x00000000    r7 = 0x41a06c0c
     r8 = 0x00000001    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71ee8    pc = 0x40b11565
    Found by: call frame info
13  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 627 + 0x5]
     r4 = 0x41a06be0    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x41a06c0c
     r8 = 0x00000001    r9 = 0xbef71f3f   r10 = 0x00000001    fp = 0x00000000
     sp = 0xbef71ef8    pc = 0x40c20851
    Found by: call frame info
14  libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 237 + 0xb]
     r4 = 0x00000000    r5 = 0xbef72854    r6 = 0x41a02350    r7 = 0x00000001
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71f38    pc = 0x40c00adf
    Found by: call frame info
15  libxul.so!mozilla::ipc::MessagePump::Run [MessagePump.cpp : 82 + 0x7]
     r4 = 0x41a02340    r5 = 0xbef72854    r6 = 0x41a02350    r7 = 0x00000001
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71f48    pc = 0x40b11675
    Found by: call frame info
16  libxul.so!mozilla::ipc::MessagePumpForChildProcess::Run [MessagePump.cpp : 231 + 0x7]
     r4 = 0xbef72854    r5 = 0x41a02340    r6 = 0xbef72854    r7 = 0x00000001
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71f70    pc = 0x40b11727
    Found by: call frame info
17  libxul.so!MessageLoop::RunInternal [message_loop.cc : 219 + 0x5]
     r4 = 0xbef72854    r5 = 0x43d82400    r6 = 0x41a06be0    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71f88    pc = 0x40c42891
    Found by: call frame info
18  libxul.so!MessageLoop::Run [message_loop.cc : 212 + 0x5]
     r4 = 0xbef72854    r5 = 0x43d82400    r6 = 0x41a06be0    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71f90    pc = 0x40c4293b
    Found by: call frame info
19  libxul.so!nsBaseAppShell::Run [nsBaseAppShell.cpp : 163 + 0x7]
     r4 = 0x00000000    r5 = 0x43d82400    r6 = 0x41a06be0    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71fa8    pc = 0x40a95ef5
    Found by: call frame info
20  libxul.so!XRE_RunAppShell [nsEmbedFunctions.cpp : 646 + 0x5]
     r4 = 0xbef71fbc    r5 = 0x41a02340    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71fb8    pc = 0x40421125
    Found by: call frame info
21  libxul.so!mozilla::ipc::MessagePumpForChildProcess::Run [MessagePump.cpp : 198 + 0x3]
     r4 = 0xbef72854    r5 = 0x41a02340    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71fd0    pc = 0x40b116f5
    Found by: call frame info
22  libxul.so!MessageLoop::RunInternal [message_loop.cc : 219 + 0x5]
     r4 = 0xbef72854    r5 = 0x41a1b600    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71fe8    pc = 0x40c42891
    Found by: call frame info
23  libxul.so!MessageLoop::Run [message_loop.cc : 212 + 0x5]
     r4 = 0xbef72854    r5 = 0x41a1b600    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef71ff0    pc = 0x40c4293b
    Found by: call frame info
24  libxul.so!XRE_InitChildProcess [nsEmbedFunctions.cpp : 485 + 0xb]
     r4 = 0xbef72854    r5 = 0x41a1b600    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef72008    pc = 0x404214c9
    Found by: call frame info
25  plugin-container!main [MozillaRuntimeMain.cpp : 85 + 0x5]
     r4 = 0xbef729c4    r5 = 0x00000005    r6 = 0x00000012    r7 = 0xbef7298c
     r8 = 0x00000000    r9 = 0x00000000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef72988    pc = 0x00008601
    Found by: call frame info
26  libc.so!__libc_init [libc_init_dynamic.c : 114 + 0x7]
     r4 = 0x00008574    r5 = 0xbef729c4    r6 = 0x00000006    r7 = 0xbef729e0
     r8 = 0x00000000    r9 = 0x00000000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef729a8    pc = 0x4010fc77
    Found by: call frame info
27  0xb0004575
     r4 = 0x00000000    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x00000000
     r8 = 0x00000000    r9 = 0x00000000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef729c0    pc = 0xb0004577
    Found by: call frame info

Thanks.
Flags: needinfo?(alchen)
comment 46 is a totally different bug. It is a b2g crash. Please file another bug and give it a correct title for this case.
Flags: needinfo?(alchen) → needinfo?(lecky.wanglei)
Hi Alan,

I'm sure this is the log for this issue, before I went to reproduce this issue, I have cleared all the log files in the phone, I got this log from the symptom.
For the log, there is only one.

Thanks.
Flags: needinfo?(lecky.wanglei)
Hi Alan, Alphan,

So far the analysis, this is the timing problem, sometime, if b2g crashes, the init process will restart b2g process, at this time, it should have more one sigchld signal, contains B2G and MediaServer, but the sigchld signal of the mediaserver is lost(I think), so the resouces of the mediaserver process is not recovered, it has become a zombie process, this can explain why the device can be normal if you restart or kill b2g agian.

I compared the same paltform on different device(the sanme kernel and the same QC's Code), it doesn't happen, they are different only b2g.

Please don't say whether this issue can be reproduced on the mozilla build, maybe this is caused by our modified, but we need to find out it.

So I think we need to analyze in the b2g. 

Thanks.
Hi Alan, Alphan,

I solved this issue in the mozalloc_abor.cpp, I delete the following code in the mozalloc_abort(const char* const msg), the issue does not recur.

void
mozalloc_abort(const char* const msg)
{
/* fengximing 20131225 begin */
/* delete this code, it will cause the mediaser to become a zombie process. */
/* when the b2g crashes and restarts, it is blocked in the movistar logo or the screen is black. */
/* MOZ_CRASH() is not safe, it can't be called in the MinThread. */
/*
#ifndef ANDROID
    fputs(msg, stderr);
    fputs("\n", stderr);
#else
    __android_log_print(ANDROID_LOG_ERROR, "Gecko", "mozalloc_abort: %s", msg);
#endif
    MOZ_CRASH();
*/
/* fengximing 20131225 end */
}

But I think this is not the best solution, maybe you can give me some advice.
Thanks a lot.
Flags: needinfo?(wchang)
Flags: needinfo?(wchang) → needinfo?(ahuang)
Flags: needinfo?(ahuang) → needinfo?(alchen)
Hi Lecky,
I checked the call stack.
Update the analysis for your reference.

It should be normal until 5.

 5  libxul.so!mozilla::ipc::AsyncChannel::OnDispatchMessage [AsyncChannel.cpp : 485 + 0x11]
@ipc/glue/AsyncChannel.cpp
AsyncChannel::OnDispatchMessage(const Message& msg) 
{
...
// it's OK to dispatch messages if the channel is closed/error'd,
// since we don't have a reply to send back
(void)MaybeHandleError(mListener->OnMessageReceived(msg), "AsyncChannel");
...
}


However, the "Result code" in 4 is not "MsgProcessed".
We need to know the code in this case.
You can check the following code.
There is "PrintErrorMessage" to print out the errorMsg.
Could you check the system log for more information?

4  libxul.so!mozilla::ipc::AsyncChannel::MaybeHandleError [AsyncChannel.cpp : 631 + 0x5]

AsyncChannel::MaybeHandleError(Result code, const char* channelName)
{
    if (MsgProcessed == code)
        return true;

    const char* errorMsg;
    switch (code) {
    case MsgNotKnown:
        errorMsg = "Unknown message: not processed";
        break;
    case MsgNotAllowed:
        errorMsg = "Message not allowed: cannot be sent/recvd in this state";
        break;
    case MsgPayloadError:
        errorMsg = "Payload error: message could not be deserialized";
        break;
    case MsgProcessingError:
        errorMsg = "Processing error: message was deserialized, but the handler returned false (indicating failure)";
        break;
    case MsgRouteError:
        errorMsg = "Route error: message sent to unknown actor ID";
        break;
    case MsgValueError:
        errorMsg = "Value error: message was deserialized, but contained an illegal value";
        break;

    default:
        NS_RUNTIMEABORT("unknown Result code");
        return false;
    }

PrintErrorMessage(mChild, channelName, errorMsg);

631 ----->        mListener->OnProcessingError(code);

    return false;
}


static void
PrintErrorMessage(bool isChild, const char* channelName, const char* msg)
{
    printf_stderr("\n###!!! [%s][%s] Error: %s\n\n",
                  isChild ? "Child" : "Parent", channelName, msg);
}


@xpcom/glue/nsCRTGlue.cpp
#elif defined(ANDROID)
void
printf_stderr(const char *fmt, ...)
{
  va_list args;
  va_start(args, fmt);
  __android_log_vprint(ANDROID_LOG_INFO, "Gecko", fmt, args);
  va_end(args);
}
#else
...
#endif
Flags: needinfo?(alchen)
Firefox OS is not being worked on
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: