Closed Bug 1031977 Opened 10 years ago Closed 6 years ago

[tarako][v1.3t][performance] performance was worse after 24hours monkey-test

Categories

(Firefox OS Graveyard :: Performance, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:-, tracking-b2g:-, b2g-v1.3T affected)

RESOLVED WONTFIX
blocking-b2g -
tracking-b2g -
Tracking Status
b2g-v1.3T --- affected

People

(Reporter: ying.xu, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [sprd325842][partner-blocker][c=memory p= s= u=])

Attachments

(5 files, 3 obsolete files)

      No description provided.
performance was worse after 24hours monkey-test than the moment the phone just powering up 

Any ideas to analyze this bug?
Flags: needinfo?(ttsai)
blocking-b2g: --- → 1.3T?
1. Can you use the launch time setting to get the precise difference between good and bad launch time?
2. Any memory usage status, oom status, zram status, data storage available space,
system loading, cpu usage, interrupt?
3. Will it be recovered by suspend/resume or by reboot?
Flags: needinfo?(ttsai) → needinfo?(ying.xu)
Whiteboard: [sprd325842][partner-blocker]
Attached file about-memory.zip (obsolete) —
one memory report ,possible with memory leak

apusr@yingxuubt:/home/ffos/yingxu/github/B2G-mozilla/tools/about-memory-0$ cat b2g-procrank 

APPLICATION        PID      Vss      Rss      Pss      Uss  cmdline
b2g                667   59628K   57648K   54888K   52416K  /system/b2g/b2g
(Preallocated a  19077   10352K   10348K    7597K    5128K  /system/b2g/b2g
(Nuwa)             680     696K     696K     365K     212K  /system/b2g/b2g
                                           ------   ------  ------
                                           65235K   59612K  TOTAL
(In reply to thomas tsai from comment #2)

> 3. Will it be recovered by suspend/resume or by reboot?

it can be recovered by rebooting.
Flags: needinfo?(ying.xu)
Wayne showed me there were two threads (StreamTrans #N) in b2g process consumes CPU around ~90% after monkey test, is the same sympton can be found from your device?
no, we didn't found that problem yet.
(In reply to ying.xu from comment #4)
> (In reply to thomas tsai from comment #2)
> 
> > 3. Will it be recovered by suspend/resume or by reboot?
> 
> it can be recovered by rebooting.

also recovered by killing b2g process
See Also: → 1032638
├──12.35 MB (22.03%) ── heap-unclassified
├──2.73 MB (36.00%) ── heap-unclassified

This is huge.
do we need to enable DMD?
Yes, that'd be helpful.
Attached file about-memory-tarako1.zip (obsolete) —
tarako v1.3t memory-reports after 24hours monkey-test

apusr@yingxuubt:/home/ffos/yingxu/github/B2G-mozilla/tools$ adb shell b2g-ps
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      623   1     163404 51960 ffffffff 400a9518 S /system/b2g/b2g
(Nuwa)           root      636   623   46964  552   ffffffff 400a9518 S /system/b2g/b2g
(Preallocated a  root      26068 636   54128  6828  ffffffff 400a9518 S /system/b2g/b2g
apusr@yingxuubt:/home/ffos/yingxu/github/B2G-mozilla/tools$ adb shell b2g-procrank
APPLICATION        PID      Vss      Rss      Pss      Uss  cmdline
b2g                623   54696K   51960K   50159K   48584K  /system/b2g/b2g
(Preallocated a  26068    6832K    6828K    5042K    3468K  /system/b2g/b2g
(Nuwa)             636     552K     552K     316K     208K  /system/b2g/b2g
(In reply to ying.xu from comment #10)
> Created attachment 8449165 [details]
> about-memory-tarako1.zip
> 
> tarako v1.3t memory-reports after 24hours monkey-test

both heap-unclassified and heap-overhead were huge.

Main Process (pid 623)
Explicit Allocations

48.72 MB (100.0%) -- explicit
├──15.17 MB (31.13%) ++ js-non-window
├──11.10 MB (22.79%) ── heap-unclassified
├───8.68 MB (17.82%) -- heap-overhead
Ying, attachment 8449165 [details] does not include dmd, I guess it will be included in next report?
(In reply to Ting-Yu Chou [:ting] from comment #12)
> Ying, attachment 8449165 [details] does not include dmd, I guess it will be
> included in next report?

yes. I will post next attachment with dmd report.
Keywords: perf
Priority: -- → P2
Whiteboard: [sprd325842][partner-blocker] → [sprd325842][partner-blocker][c=memory p= s= u=]
Component: General → Performance
if overall stability is passing, lets put this to backlog. thanks
blocking-b2g: 1.3T? → backlog
(In reply to ying.xu from comment #13)

> yes. I will post next attachment with dmd report.

I meet some problems when running fix_b2g_stack.py.
can you help to look at it?

ffos@ffos2:~/yingxu/6821/tools$ ./fix_b2g_stack.py --outfile output.txt --toolchain-dir ../prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin/ --toolchain-prefix arm-linux-androideabi- --gecko-objdir ../objdir-gecko --gonk-objdir ../out --remove-cache --product sp6821a_gonk ./about-memory-0/dmd-1388534682-85.txt
Traceback (most recent call last):
  File "./fix_b2g_stack.py", line 600, in <module>
    fix_b2g_stacks_in_file(infile, outfile, args)
  File "./fix_b2g_stack.py", line 543, in fix_b2g_stacks_in_file
    cppfilt.stdin.write(matcher.sub(subfn, line))
  File "./fix_b2g_stack.py", line 529, in subfn
    match.group('fn'))
  File "./fix_b2g_stack.py", line 385, in translate
    lambda: self._addr2line(lib, offset, pc, fn_guess))
  File "./fix_b2g_stack.py", line 349, in get_maybe_set
    self.put(lib_path, offset, result())
  File "./fix_b2g_stack.py", line 331, in put
    self._write_cache_to_disk()
  File "./fix_b2g_stack.py", line 289, in _write_cache_to_disk
    pickle.HIGHEST_PROTOCOL)
cPickle.PicklingError: Can't pickle <class 'fix_b2g_stack.DefaulterDict'>: it's not the same object as fix_b2g_stack.DefaulterDict
Ying, I'll try to figure this out soon.
You can temporarily workaround that as following, I will handle it in bug 1034090 separately:

diff --git a/tools/fix_b2g_stack.py b/tools/fix_b2g_stack.py
index 865e8b3..a4dc38d 100755
--- a/tools/fix_b2g_stack.py
+++ b/tools/fix_b2g_stack.py
@@ -200,7 +200,7 @@ class DefaulterDict(dict):
 
     [1] http://stefaanlippens.net/pickleproblem
     '''
-    __module__ = os.path.splitext(basename(__file__))[0]
+#    __module__ = os.path.splitext(basename(__file__))[0]
 
     def __init__(self, default_item):
         super(DefaulterDict, self).__init__()
Flags: needinfo?(kkuo)
I think we should fix it, it has 100% reproduce rate.
The version is v1.3t origin branch, without any WIP patches.
Ying, enable DMD and find memory leak.
blocking-b2g: backlog → 1.3T?
Flags: needinfo?(fabrice)
1.3t+ please.
Attached file about-memory-9.zip (obsolete) —
memory report with dmd , no symbols in dmd report yet
> memory report with dmd , no symbols in dmd report yet

Doesn't the workaround at comment 17 help?
(In reply to Ting-Yu Chou [:ting] from comment #21)
> > memory report with dmd , no symbols in dmd report yet
> 
> Doesn't the workaround at comment 17 help?

yes.
If I build the source code locally and run fix_b2g_stack.py , it works
But most time, we get the build image from our build server, we don't have the right version of source code and so/bin matched to the build from server. And it seems some paths are pre-set in fix_b2g_stack.py. No matter how I tried, it can not work.
(In reply to ying.xu from comment #22)
> server. And it seems some paths are pre-set in fix_b2g_stack.py. No matter
> how I tried, it can not work.

Could you please explain me how did you try, and what paths you were trying to alter? The addresses only DMD doesn't help much...
I download libs/bins with symbols into a separate folder, not in path of B2G source tree.

and put tools(containing fix_b2g_stack.py) folder into monkey-test path.

then run this command.
./fix_b2g_stack.py --outfile about-memory-2/dmd --toolchain-dir /home/apusr/b2gsource/ffos/prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin/ --gecko-objdir /home/apusr/b2gsource/ffos/monkey.test/images/objdir-gecko --gonk-objdir /home/apusr/b2gsource/ffos/monkey.test/images --remove-cache about-memory-2/dmd-1404870806-610.txt

where I put all the libs/bins into this path /home/apusr/b2gsource/ffos/monkey.test/images.

for example, libxul.so seems like
apusr@yingxuubt:~/b2gsource/6821/monkey.test/images/objdir-gecko/dist/crashreporter-symbols/libxul.so/9D2FBD424C598576E
BB1BAD4387307140$ ll -h 
total 77M
drwxr-xr-x 2 apusr apusr 4.0K Jul 11 05:30 ./
drwxr-xr-x 3 apusr apusr 4.0K Jul 11 05:30 ../
-rw-r--r-- 1 apusr apusr  77M Jul 11 05:30 libxul.so.sym
What error messages you see then? Except objdir-gecko, what else files you put into images folder?
Ying, FYR, I tried following setup, and it translates symbols correctly:

  + tools
    + about-memory-0
    + images
      + objdir-gecko
      + target
        + product
          + sp6821a_gonk

ting@portege:~/w/fx/os/tools$ ./fix_b2g_stack.py --outfile aobut-memory-0/dmd --toolchain-dir /home/ting/w/fx/os/tarako/prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin --gecko-objdir /home/ting/w/fx/os/tools/images/objdir-gecko --gonk-objdir /home/ting/w/fx/os/tools/images --remove-cache about-memory-0/dmd-1400859435-84.txt
where do you get these two folders? objdir-gecko/targe

maybe sprd build server are missing some files when packing libs/bins with debug infomation.
As I don't have your build server's output, I simply copied objdir-gecko and out/target/product/sp6821a_gonk from my local build.
I check a little

our build server packed objdir-gecko folder with only 187M files.

but with local build, there are about 3.7G files in path objdir-gecko.

I will try packing more files on sprd build server.
Based on https://github.com/mozilla-b2g/B2G/blob/master/tools/fix_b2g_stack.py#L374, packing *.so, b2g, and plugin-container should be sufficient.
Flags: needinfo?(fabrice)
Hi! James,

Is this same case as 1032638?
With the patch from 1032638, does the performance get improved?

--
Keven
Flags: needinfo?(kkuo) → needinfo?(james.zhang)
We're doing monkey test to verify every day, can't reproduce it now.
Flags: needinfo?(james.zhang)
We can still meet this issue on latest build.
In this case, b2g's zram is very big, and if we kill b2g process, it becomes normal.
Attached file memory-report-0.zip
memory-report, dmd output with function names
Attachment #8448464 - Attachment is obsolete: true
Attachment #8449165 - Attachment is obsolete: true
Attachment #8454343 - Attachment is obsolete: true
Attached file about-memory-1.zip
memory-report, dmd output with function names
(In reply to ying.xu from comment #35)
> Created attachment 8459386 [details]
> about-memory-1.zip
> 
> memory-report, dmd output with function names

Does this memory report get when swap of b2g is large? or get after b2g was restart?
yes. swap memory of b2g was large and performance dropped
Attached file oom-kill-details
b2g killed by LMK and b2g's zram is very big.
Flags: needinfo?(wchang)
Flags: needinfo?(dliang)
James,

Seems like there was a period of one week you did not meet this problem, until your latest build. So either something had been broken in the latest build or the occurrence rate is quite low. I am going to lower the priority of this issue on our end, feel free to continue your analysis.

For 1.3t, right now we're only focusing on the final few blockers that are holding it back from launch.
blocking-b2g: 1.3T? → -
Flags: needinfo?(wchang)
Wayne, we can meet this issue every day after monkey test.

see this log, b2g eats too much memory, RSS is about 52MB and SWAP is about 52MB. It recovers if reboot or OOM.

07-18 18:04:02.896 <6>0[14301.090849] [ pid ]   uid  tgid total_vm      rss     swap cpu oom_adj oom_score_adj name
07-18 18:04:02.897 <6>0[14301.091421] [ 1118]     0  1118    54133    13672    13509   0       0             0 b2g
DMD from both attachment does not actually include function names, Ying, do you have any error messages while running the script fix_b2g_stack.py?

Some highlights:

attachment 8459385 [details]:
0 (100.0%) -- queued-ipc-messages
├──0 (100.0%) ── content-parent(Calendar, pid=-1, closed channel, 0x4618c800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0x4a180400, refcnt=1)
└──0 (100.0%) ── content-parent(Communications, pid=-1, closed channel, 0x43f21c00, refcnt=1)

attachment 8459386 [details]:
0 (100.0%) -- queued-ipc-messages
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0x4385b800, refcnt=1)
(In reply to Ting-Yu Chou [:ting] from comment #41)
> DMD from both attachment does not actually include function names, Ying, do
> you have any error messages while running the script fix_b2g_stack.py?

I have put it into the zip , naming 'dmd-b2g-out'

for example:
Unreported: 66 blocks in stack trace record 1 of 898
 1,081,344 bytes (930,336 requested / 151,008 slop)
 3.58% of the heap (3.58% cumulative);  10.19% of unreported (10.19% cumulative)
 Allocated at
   replace_malloc /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/memory/replace/dmd/DMD.cpp:1247 (0x4000a74e libdmd.so+0x374e)
   malloc /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/memory/build/replace_malloc.c:152 (0x401431c2 libmozglue.so+0x271c2)
   moz_xmalloc /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/memory/mozalloc/mozalloc.cpp:53 (0x41d85eb2 libxul.so+0x1185eb2)
   NS_Alloc /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/xpcom/base/nsMemoryImpl.cpp:198 (0x40e906e2 libxul.so+0x2906e2)
   gfxUserFontSet::SanitizeOpenTypeData(gfxMixedFontFamily*, gfxProxyFontEntry*, unsigned char const*, unsigned int, unsigned int&, bool) /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/gfx/thebes/gfxUserFontSet.cpp:373 (0x410b7a30 libxul.so+0x4b7a30)
   gfxUserFontSet::LoadFont(gfxMixedFontFamily*, gfxProxyFontEntry*, unsigned char const*, unsigned int&) /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/gfx/thebes/gfxUserFontSet.cpp:682 (0x410b823e libxul.so+0x4b823e)
   gfxUserFontSet::LoadNext(gfxMixedFontFamily*, gfxProxyFontEntry*) /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/gfx/thebes/gfxUserFontSet.cpp:593 (0x410b8660 libxul.so+0x4b8660)
   gfxUserFontSet::FindFontEntry(gfxFontFamily*, gfxFontStyle const&, bool&, bool&) /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/gfx/thebes/gfxUserFontSet.cpp:244 (0x410b8874 libxul.so+0x4b8874)
   gfxFontGroup::ForEachFontInternal(nsAString_internal const&, nsIAtom*, bool, bool, bool, bool (*)(nsAString_internal const&, nsACString_internal const&, bool, void*), void*) /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/gfx/thebes/gfxFont.cpp:4386 (0x410ab978 libxul.so+0x4ab978)
   gfxFontGroup::ForEachFont(bool (*)(nsAString_internal const&, nsACString_internal const&, bool, void*), void*) /space/builder/repo/sp6821a_gonk4.0_dmd/B2G/gecko/gfx/thebes/gfxFont.cpp:4239 (0x410abae8 libxul.so+0x4abae8)
Saw it, thanks.
There're 25%~30% allocations from font on top of the DMD reports, I will turn to check the things of comment 41.
See Also: → 1042524
Can we limit the font cache?
Can we limit the font cache size?
In gfx/cairo/cairo/src/cairo-scaled-font.c:

/* XXX: This number is arbitrary---we've never done any measurement of this. */
#define MAX_GLYPH_PAGES_CACHED 256
a(In reply to Ting-Yu Chou [:ting] from comment #41)
> attachment 8459385 [details]:

It does not have cc/gc log of b2g process.

> attachment 8459386 [details]:
> 0 (100.0%) -- queued-ipc-messages
> ├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0x4385b800,
> refcnt=1)

This from cc log looks odd, I will check it further:

0x43c6ae20 [rc=1] FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html
> 0x48499250 Preserved wrapper
> 0x40342c40 mNodeInfo
> 0x43c85a60 [via hash] mListenerManager
Clear ni by comment#39, will keep monitor it and try to find a STR.
Flags: needinfo?(dliang)
Ying, just want to make sure, did you create about-memory-1 (attachment 8459386 [details]) with argument "-m"?
Flags: needinfo?(ying.xu)
yes, always with -m option.
Flags: needinfo?(ying.xu)
Attached file about-memory.zip
a new log of today.
From attachment 8459386 [details], I saw a leaked nsFrameLoader 0x43c850a0 from BrowserElementParent.activeInputFrame. As bug 1020726 has been resolved, I am asking the author whether the patch can also apply to v1.3t.
Depends on: 1020726
With 1.3t patches of bug 1020726, there is only one iframe of system app after 24hrs monkey test.
DUT4 met b2g was killed twice and memory size is big, uss is 56.4MB and swap is 50.8

DUT2:
0x44de8280 [rc=6] FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html
> 0x436fa040 Preserved wrapper
> 0x45e26970 mNodeInfo
> 0x40475ec0 GetParent()
> 0x44de8460 [via hash] mListenerManager
> 0x44de83a0 mFrameLoader

0 (100.0%) -- queued-ipc-messages
└──0 (100.0%) ── content-parent(E-Mail, pid=-1, closed channel, 0x40b94c00, refcnt=1)

DUT4:
0x437878e0 [rc=6] FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html
> 0x436ec610 Preserved wrapper
> 0x439654c0 mNodeInfo
> 0x4779dba0 GetParent()
> 0x439093a0 [via hash] mListenerManager
> 0x43787a00 mFrameLoader

b2g-info of DUT4
  NAME   PID  PPID CPU(s) NICE  USS  PSS  RSS SWAP VSIZE OOM_ADJ USER
   b2g 24261     1 7252.5    0 56.4 56.5 56.7 50.8 194.3       0 root
(Nuwa) 24306 24261    7.9    0  0.4  0.5  0.5  8.7  46.6       0 root

Hi Ting, could you help to check attached memory report?
Let me know if you want to check the device.
Flags: needinfo?(tchou)
Highlight from DUT4:

94.24 MB (100.0%) -- explicit
├──62.01 MB (65.80%) ── heap-unclassified

DMD lists some huge strings from DeviceStorageFile::collectFilesInternal(), the stack seems incomplete as the function is called recursively.
Flags: needinfo?(tchou)
See Also: → 985197
(In reply to Ting-Yu Chou [:ting] from comment #55)
> Highlight from DUT4:
> 
> 94.24 MB (100.0%) -- explicit
> ├──62.01 MB (65.80%) ── heap-unclassified
> 
> DMD lists some huge strings from DeviceStorageFile::collectFilesInternal(),
> the stack seems incomplete as the function is called recursively.

Yeah - I noticed that too. Currently when you enumerate DeviceStorge it allocates the entire result, and sends that to the child, and then the child enumerates through it entry by entry. This makes the enumeration potentially use lots of memory.

The other extreme, was that each time the child did a next on the cursor, it would call into the parent, which makes the enumeration extremely slow.

I think that there is a happy middle ground. The parent should send back a "block" of entries. The child can then enumerate through the block quickly and when it runs out, request another block from the parent.

Method 1 above is using a block size of infinity. Method 2 uses a block size of 1. 

Then we can tune the block size to give decent performance and not consume too much memory.
Dave, thanks for your information.

But from cc-edges.24261.log of attachment 8466031 [details] dut4, I found there're 52 mDeviceStorageStores[i] of Navigator 0x44e85430, for example 0x477fc9a0, do not link to any DeviceStorage JS object. In this case, don't we expect them to be collected?
Flags: needinfo?(dhylands)
Danny noticed those from top when heap-unclassified is high:

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
12210 13115  0  17% R 178864K  43524K  fg root     StreamTrans #15 /system/b2g/b2g
12210 12708  0  17% R 178864K  43524K  fg root     StreamTrans #6  /system/b2g/b2g
12210 12972  0  17% R 178864K  43524K  fg root     StreamTrans #12 /system/b2g/b2g
12210 13876  0  17% R 178864K  43524K  fg root     StreamTrans #36 /system/b2g/b2g
12210 12893  0  15% R 178864K  43524K  fg root     StreamTrans #10 /system/b2g/b2g

which is the symptom of bug 1032638.
(In reply to Ting-Yu Chou [:ting] from comment #58)
> Danny noticed those from top when heap-unclassified is high:
> 
>   PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
> 12210 13115  0  17% R 178864K  43524K  fg root     StreamTrans #15
> /system/b2g/b2g

The stack of the thread matches exactly with DMD's stack in comment 55, the number of recursive invocations of collectFilesInternal() is more than 200, |fullpath| is a file under /mnt/sdcard/DCIM/.
Attached more information, top and call stack by gdb, fullpath is a file under /mnt/sdcard/DCIM/108MZLLA/rviceman.age/

User 65%, System 34%, IOW 0%, IRQ 0%
User 96 + Nice 0 + Sys 51 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 0 = 147

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
12210 13876  0  12% R 178864K  44000K  fg root     StreamTrans #36 /system/b2g/b2g
12210 12260  0  12% R 178864K  44000K  fg root     DBus Thread     /system/b2g/b2g
12210 12708  0  12% R 178864K  44000K  fg root     StreamTrans #6  /system/b2g/b2g
12210 12972  0  12% R 178864K  44000K  fg root     StreamTrans #12 /system/b2g/b2g
12210 12893  0  11% R 178864K  44000K  fg root     StreamTrans #10 /system/b2g/b2g
12210 13115  0  11% R 178864K  44000K  fg root     StreamTrans #15 /system/b2g/b2g

(gdb) thread 40
[Switching to thread 40 (Thread 12210.12708)]#0  UTF8CharEnumerator::NextChar (buffer=0x459645e0, end=0x4b46200b "", err=0x459645e7) at ../../../dist/include/nsUTF8Utils.h:40
40	  static uint32_t NextChar(const char **buffer, const char *end,
(gdb) bt
#0  UTF8CharEnumerator::NextChar (buffer=0x459645e0, end=0x4b46200b "", err=0x459645e7) at ../../../dist/include/nsUTF8Utils.h:40
#1  0x40e8618a in ConvertUTF8toUTF16::write (first=<value optimized out>, last=<value optimized out>, result=...) at ../../../dist/include/nsUTF8Utils.h:302
#2  nsCharSinkTraits<ConvertUTF8toUTF16>::write (first=<value optimized out>, last=<value optimized out>, result=...) at ../../../dist/include/nsCharTraits.h:595
#3  copy_string<nsReadingIterator<char>, ConvertUTF8toUTF16> (first=<value optimized out>, last=<value optimized out>, result=...) at ../../../dist/include/nsAlgorithm.h:75
#4  0x40e86f82 in AppendUTF8toUTF16 (aSource=..., aDest=...) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/string/src/nsReadableUtils.cpp:189
#5  0x40e86fa4 in AppendUTF8toUTF16 (aSource=..., aDest=...) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/string/src/nsReadableUtils.cpp:159
#6  0x40e8700e in CopyUTF8toUTF16 (aSource=..., aDest=...) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/string/src/nsReadableUtils.cpp:56
#7  0x40ea53aa in NS_CopyNativeToUnicode (input=..., output=...) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/io/nsNativeCharsetUtils.cpp:19
#8  0x40e9e3f0 in nsLocalFile::GetPath (this=<value optimized out>, _retval=...) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/io/nsLocalFileUnix.cpp:1987
#9  0x41419d58 in DeviceStorageFile::collectFilesInternal (this=<value optimized out>, aFiles=..., aSince=<value optimized out>, aRootPath=...)
    at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/dom/devicestorage/nsDeviceStorage.cpp:1164
#10 0x41419dbc in DeviceStorageFile::collectFilesInternal (this=<value optimized out>, aFiles=..., aSince=<value optimized out>, aRootPath=...)
    at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/dom/devicestorage/nsDeviceStorage.cpp:1179

...........

#296 0x41419dbc in DeviceStorageFile::collectFilesInternal (this=<value optimized out>, aFiles=..., aSince=<value optimized out>, aRootPath=...)
    at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/dom/devicestorage/nsDeviceStorage.cpp:1179
#297 0x41419ef8 in DeviceStorageFile::CollectFiles (this=0x4642bee0, aFiles=..., aSince=0) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/dom/devicestorage/nsDeviceStorage.cpp:1125
#298 0x41414658 in mozilla::dom::devicestorage::DeviceStorageRequestParent::EnumerateFileEvent::CancelableRun (this=0x46ec8a00)
    at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/dom/devicestorage/DeviceStorageRequestParent.cpp:737
#299 0x41414756 in mozilla::dom::devicestorage::DeviceStorageRequestParent::CancelableRunnable::Run (this=0x46ec8a00)
    at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/dom/devicestorage/DeviceStorageRequestParent.h:58
#300 0x40eb1300 in nsThreadPool::Run (this=0x44ec54c0) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/threads/nsThreadPool.cpp:208
#301 0x40eb012c in nsThread::ProcessNextEvent (this=0x4636c400, mayWait=<value optimized out>, result=0x45975eaf)
    at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/threads/nsThread.cpp:612
#302 0x40e8258c in NS_ProcessNextEvent (thread=0x43461658, mayWait=true) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/glue/nsThreadUtils.cpp:263
#303 0x40eb0652 in nsThread::ThreadFunc (arg=<value optimized out>) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/xpcom/threads/nsThread.cpp:246
#304 0x408398d8 in _pt_root (arg=<value optimized out>) at /home/dannyliang/work/codes/b2g_tarako_v1.3t_github/gecko/nsprpub/pr/src/pthreads/ptthread.c:205
#305 0x400ba158 in __thread_entry (func=0x40839841 <_pt_root>, arg=0x47202280, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#306 0x400b9cac in pthread_create (thread_out=<value optimized out>, attr=0xbecd6b24, start_routine=0x40839841 <_pt_root>, arg=0x47202280) at bionic/libc/bionic/pthread.c:357
#307 0x00000000 in ?? ()

(gdb) p fullpath
$11 = {<nsAString_internal> = {
    mData = 0x4a55d008 u"/mnt/sdcard/DCIM/108MZLLA/rviceman.age/04:45.00.4 </  :  xxx.xx/í\022╟Æ«ñä8.$}q/910 <4>0.[47/\n06-13 1.4:4/2 <4>0[4.759/3.038578.] V/ngMode:.0.\n/ndex: 0\n.05-/976] ###.# V/18:32:10..29/ü╫+╘≥\b└ñ.\004┼/5700] ##"..., mLength = 3585, mFlags = 5}, <No data fields>}

In this case, we cannot access the folder /mnt/sdcard/DCIM/108MZLLA/rviceman.age/, the console is stuck and no any response.
(In reply to Danny Liang [:dliang] from comment #60)
> (gdb) p fullpath
> $11 = {<nsAString_internal> = {
>     mData = 0x4a55d008 u"/mnt/sdcard/DCIM/108MZLLA/rviceman.age/04:45.00.4
> </  :  xxx.xx/í\022╟Æ«ñä8.$}q/910 <4>0.[47/\n06-13 1.4:4/2
> <4>0[4.759/3.038578.] V/ngMode:.0.\n/ndex: 0\n.05-/976] ###.#
> V/18:32:10..29/ü╫+╘≥\b└ñ.\004┼/5700] ##"..., mLength = 3585, mFlags = 5},
> <No data fields>}

mLength 3585 seems odd, is it a valid file?
Clear NI since new evidence shows there are some strange files under /mnt/sdcard/DCIM, and seems the reason why DeviceStorageFile::collectFilesInternal() is deeply recursive.
Flags: needinfo?(dhylands)
I try to access the content of /sdcard/DCIM/108MZLLA/ by ls command on PC, the CPU usage of PC is over 80% and there are all input/output errors.

ls: cannot access 
05-26 1.3:1: Input/output error
ls: cannot access 84167] c.h1=: Input/output error
ls: cannot access <4>0[381.77.: No such file or directory
ls: cannot access  lux_val. =: Input/output error
ls: cannot access 5-26 13:.15:: Input/output error
ls: cannot access -26 13:1.5:4: Input/output error
ls: cannot access 77] ch1=.0
0: Input/output error
ls: cannot access x_val =.1 
: Input/output error
ls: cannot access 1),data=.0
0: Input/output error
ls: cannot access 0[38183..715: Input/output error
ls: cannot access 2] ch(3).,da: Input/output error
ls: cannot access _val = 2. 



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                             
 2482 dannylia  20   0 68352  41m  948 D   86  0.3  36:06.35 ls                                                                                                  
 7923 dannylia  20   0  752m  47m  22m R   21  0.3   7:08.77 shotwell                                                                                            
 2905 dannylia  20   0  205m  86m 1776 S    7  0.5  16:22.84 synergys                                                                                            
 3028 dannylia  20   0  614m  37m 9.8m S    3  0.2  10:11.42 gnome-termin

With discussion with Ting and check error message, we found the folder/file name looks like part of logcat or kernel log.

James, we suspected there might be something wrong on slog. Could you help to run monkey test with clean content (regular file format) of sdcard and disable slog to clarify this case? Please also have slog owner to check if any possibility to have wrong output.
Flags: needinfo?(james.zhang)
can you stop b2g and check what the file is?

I don't think slog can write any files with path /sdcard/DCIM/108MZLLA
and you can check all the files slog read/write 

apusr@yingxuubt:~/b2gsource/6821/device/sprd/common/tools/slog$ grep -ir fopen .   
./slog.c:	fp = fopen(filepath, "a+");
./slog.c:	fp = fopen(filepath, "a+");
./slog.c:	fcmd = fopen(info->content, "r");
./handler.c:	fp = fopen(buffer, "a+");
./handler.c:	fcmd = fopen(info->content, "r");
./handler.c:	fp_src = fopen(path, "r");
./handler.c:	fp_dest = fopen(new_path, "w");
./handler.c:    fp=fopen(filename,"r");
Binary file ./tar matches
./screenshot.c:	if ((outfile = fopen(filename, "wb+")) == NULL) {
./slogctl.c:	fp = fopen(TMP_SLOG_CONFIG, "r");
./slogctl.c:	fp = fopen(TMP_SLOG_CONFIG, "w");
./parse_conf.c:	fp = fopen(TMP_SLOG_CONFIG, "r");
./parse_conf.c:			fp = fopen(DEFAULT_USER_SLOG_CONFIG, "r");
./parse_conf.c:			fp = fopen(DEFAULT_DEBUG_SLOG_CONFIG, "r");
Flags: needinfo?(james.zhang)
(In reply to ying.xu from comment #64)
> can you stop b2g and check what the file is?

As my comment 63, I removed the sdcard from device and accessed the folder by ls on PC directly, the message as below and the folders looks like part of logcat or kernel log.

ls: cannot access 
05-26 1.3:1: Input/output error
ls: cannot access 84167] c.h1=: Input/output error
ls: cannot access <4>0[381.77.: No such file or directory
ls: cannot access  lux_val. =: Input/output error
ls: cannot access 5-26 13:.15:: Input/output error
ls: cannot access -26 13:1.5:4: Input/output error
ls: cannot access 77] ch1=.0
0: Input/output error
ls: cannot access x_val =.1 
: Input/output error
ls: cannot access 1),data=.0
0: Input/output error
ls: cannot access 0[38183..715: Input/output error
ls: cannot access 2] ch(3).,da: Input/output error
ls: cannot access _val = 2.
I get it.
how about other paths?
such as music?

I searched it , here is a link.It says the file system may corrupt.
http://linux.overshoot.tv/ticket/64

And on sprd side, we never meet the problem during monkey-test.
I recommend you should check the T-card.
(In reply to ying.xu from comment #67)
> I get it.
> how about other paths?
> such as music?
> 
This path is we observed by gdb. Other paths look good.
This is the only case we get keep the gdb environment when free swap is almost zero.

> I searched it , here is a link.It says the file system may corrupt.
> http://linux.overshoot.tv/ticket/64
> 
> And on sprd side, we never meet the problem during monkey-test.
Did you have the gdb environment when swap free is zero?
I am not sure if any other problem to cause free swap is zero.

> I recommend you should check the T-card.

Sure, I will do it. But I still suspect if any possibility that slog to cause file system corrupt.
This looks like a corrupt filesystem on the sdcard.
(In reply to Danny Liang [:dliang] from comment #68)
> (In reply to ying.xu from comment #67)
> > I get it.
> > how about other paths?
> > such as music?
> > 
> This path is we observed by gdb. Other paths look good.
> This is the only case we get keep the gdb environment when free swap is
> almost zero.
> 
> > I searched it , here is a link.It says the file system may corrupt.
> > http://linux.overshoot.tv/ticket/64
> > 
> > And on sprd side, we never meet the problem during monkey-test.
> Did you have the gdb environment when swap free is zero?
> I am not sure if any other problem to cause free swap is zero.
> 
> > I recommend you should check the T-card.
> 
> Sure, I will do it. But I still suspect if any possibility that slog to
> cause file system corrupt.

SD card r/w looks normal.

dannyliang@dannyliang-ubuntu-pc:/media$ sudo dd if=/dev/sdf1 of=/dev/null 
[sudo] password for dannyliang: 
3987209+0 records in 
3987209+0 records out 
2041451008 bytes (2.0 GB) copied, 98.5865 s, 20.7 MB/s

dannyliang@dannyliang-ubuntu-pc:/media$ sudo dd if=/dev/zero of=/dev/sdf1 
dd: writing to `/dev/sdf1': No space left on device 
3987210+0 records in 
3987209+0 records out 
2041451008 bytes (2.0 GB) copied, 111.151 s, 18.4 MB/s

I also setup the monkey test with the same SD card but different contents, the result is good, no b2g was killed and swap space of b2g is normal.
Depends on: 985197
tracking-b2g: --- → -
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: