Closed Bug 797629 Opened 8 years ago Closed 8 years ago

Image 12 pandas for B2G by hand by 10/9/12 (Chassis 3)

Categories

(Infrastructure & Operations :: DCOps, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cmtalbert, Unassigned)

References

Details

(Whiteboard: [reit-panda])

Attachments

(1 file)

In order to test the rest of our automation setup for B2G on pandas in the real chassis configuration (so we have PDUs available to us) we would like to image one box of pandas with a B2G build.  

The image by hand process involves a data center trip.  Here's how it works:
1. I'll image 12 sdcards
2. I'll deliver 12 sdcards to either dividehex or someone in DCOps
3. That someone puts the 12 sdcards into the pandas in the box that we're going to use for this and power cycles them.

Then we just need to know their IP addresses and we can start working with them. This way we can ensure the rest of the automation infrastructure (mozharness scripts, devicemanagement library, test runners etc) all work well with the B2G pandas so that when we are ready to go with the automatic flashing we can have confidence in the rest of the B2G panda automation stack.

I should have these sdcards tomorrow (10/4) in mountain view, let me know where to drop them off.
OK, The AOSP kernel that we are using for current panda builds does not allow us to persist mac addresses. That means that until we solve that problem, I'll need to use fixed mac addresses for these pandas now.  Can I get 12 IP addresses to fix these pandas to?

If I could get the names of the pandas as well I'll be sure to label the 12 sd cards with the corresponding panda they are destined for.

I have the image ready to go otherwise and will purchase the 12 cards this afternoon.
Whiteboard: [reit-panda]
If you need further instruction on how to image the panda SD cards, please sync up with Jake.
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations: DCOps
QA Contact: arich → dmoore
Clint, I just talked with Jake, and we're going to allocate chassis 3 to this.  Those include:

panda-relay-03                IN A         10.12.52.135
panda-034                     IN A         10.12.52.136
panda-035                     IN A         10.12.52.137
panda-036                     IN A         10.12.52.138
panda-037                     IN A         10.12.52.139
panda-038                     IN A         10.12.52.140
panda-039                     IN A         10.12.52.141
panda-040                     IN A         10.12.52.142
panda-041                     IN A         10.12.52.143
panda-042                     IN A         10.12.52.144
panda-043                     IN A         10.12.52.145
panda-044                     IN A         10.12.52.146
panda-045                     IN A         10.12.52.147

Please be sure to label the SD cards so that DCOps can insert them into the proper pandas.
Clint: also, DCOps should already have SD cards that you can use.  We bought them with the pandas.
Sorry, those are attached to 

panda-relay-04                IN A         10.12.52.148
colo-trip: --- → mtv1
I've done the renaming for all of the pandas and the relays to further zero pad them and to make sure that a relay board number matches up with the chassis number.

So, chassis 3 has:

panda-relay-003                IN A         10.12.52.135
panda-0034                     IN A         10.12.52.136
panda-0035                     IN A         10.12.52.137
panda-0036                     IN A         10.12.52.138
panda-0037                     IN A         10.12.52.139
panda-0038                     IN A         10.12.52.140
panda-0039                     IN A         10.12.52.141
panda-0040                     IN A         10.12.52.142
panda-0041                     IN A         10.12.52.143
panda-0042                     IN A         10.12.52.144
panda-0043                     IN A         10.12.52.145
panda-0044                     IN A         10.12.52.146
panda-0045                     IN A         10.12.52.147
Hi Jake,

Can we schedule time with you on Monday when you're in MTV so you can run us through the imaging process?

Thanks,
Van
colo-trip: mtv1 → scl1
Van: clint should be handing you pre-imaged cards for these, from what I understand.  I think you just have to put them in the appropriate pandas.
Clint, 

Are these SD cards imaged and ready to be picked up by DCOPs?

Thanks,
Van
Handed Vinh an sdcard that was imaged and instructions on how to image the rest of the cards since he graciously offered to handle it.  

Big thanks to everyone here.
Summary: Image 12 pandas for B2G by hand by 10/9/12 → Image 12 pandas for B2G by hand by 10/9/12 (Chassis 3)
update: We were able to image the SD cards by specifying 100MB block size when dd'ing. If we didn't specify that block size, there would be a 1 block parity error and partitions would be missing. It still takes about 20+ minutes per reimage. 

However we are running into a new issue with the board not completely booting up.

Logs when serial dongle is attached to a panda board:

[   17.026824] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   17.132995] request_suspend_state: wakeup (0->0) at 17118347171 (1970-01-01 00:00:17.114929203 UTC)
shell@android:/ $

shell@android:/ $ netcfg
/system/bin/sh: netcfg: cannot execute - Permission denied
126|shell@android:/ $ sudo netcfg
/system/bin/sh: sudo: not found
[   87.315307] request_suspend_state: sleep (0->3) at 87300659182 (1970-01-01 00:01:27.297241214 UTC)
[   87.326263] DSSCOMP: dsscomp_early_suspend
[   87.345855] DSSCOMP: blanked screen 

:ctalbert, I am curious if the panda boards in chassis 3 is any different from the one on your desk? If it isn't, could it be facing the same issue that came up with the SD card imaged by OSX? You were able to toy with it to make it see eth0 when it came up as wlan0.
Per discussion with :ctalbert in email, he's willing to share this chassis with releng to unblock panda-for-android foopy work.

Please reserve & set up 6 of the boards per Clint's original request, and 6 with the image from bug 769428 comment 15, even though there are problems as documented in bug 798519 comment 2.
We've configured the SD cards but the nodes still arent coming up. 

This is the only line we edited:
 ( ifconfig eth0 10.12.52.139 netmask 255.255.252.0 up ) & sleep 5

Please confirm the network.sh file. (This one belongs to panda-0037).
Attached file panda-0037 network.sh
I tried changing the word "dhcp" to "static" in network.sh hoping that would work, but it didnt. When I am on serial, I still cant get the netcfg command to work.

shell@android:/ $ netcfg
/system/bin/sh: netcfg: cannot execute - Permission denied
That looks correct.

Ensure that you made the network.sh file executable: chmod 755 /system/bin/network.sh
Sorry for the double comment, but I wanted to be clear. The patch in comment 14 is correct. changing "dhcp" to "static" doesn't seem to work on B2G's linux system.  I think the issue you hit in 15 is due to network.sh not being executable.
clint, network.sh is executable and it's still not working.

The file I attached may have different permissions because it was a copy/paste I did onto a USB drive to attach it to this ticket.

[   16.719757] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   17.110778] request_suspend_state: wakeup (0->0) at 17095916751 (1970-01-01 00:00:17.090271000 UTC)
[   17.365447] eth0: no IPv6 routers present
[   21.806427] init: untracked pid 121 exited
[   87.172760] request_suspend_state: sleep (0->3) at 87157989503 (1970-01-01 00:01:27.151763918 UTC)
[   87.182861] DSSCOMP: dsscomp_early_suspend
[   87.212829] DSSCOMP: blanked screen
clint,

this is what I see when I cat data/local/netcf.log

output of netcfg:
lo UP 127.0.0.1/8 0x00000049 00:00:00:00:00:00 ifb0 DOWN 0.0.0.0/0 0x00000082 2e:7b:bd:29:38:b1 ifb1 DOWN 0.0.0.0/0 0x00000082 6a:94:8e:93:82:92 sit0 DOWN 0.0.0.0/0 0x00000080 00:00:00:00:00:00 ip6tnl0 DOWN 0.0.0.0/0 0x00000080 00:00:00:00:00:00 eth0 UP 0.0.0.0/0 0x00001043 92:92:cf:eb:f9:b3 wlan0 UP 0.0.0.0/0 0x00001043 de:ad:be:ef:00:00
running ifconfig
lo UP 127.0.0.1/8 0x00000049 00:00:00:00:00:00 ifb0 DOWN 0.0.0.0/0 0x00000082 2e:7b:bd:29:38:b1 ifb1 DOWN 0.0.0.0/0 0x00000082 6a:94:8e:93:82:92 sit0 DOWN 0.0.0.0/0 0x00000080 00:00:00:00:00:00 ip6tnl0 DOWN 0.0.0.0/0 0x00000080 00:00:00:00:00:00 eth0 UP 10.12.52.138/22 0x00001043 92:92:cf:eb:f9:b3 wlan0 UP 0.0.0.0/0 0x00001003 de:ad:be:ef:00:00
ran
USER PID PPID VSIZE RSS WCHAN PC NAME root 103 1 768 372 c004f96c 4005ae74 S /system/bin/sh root 134 103 3284 784 c0118f70 40084438 S sutagent
insut
USER PID PPID VSIZE RSS WCHAN PC NAME root 103 1 768 372 c004f96c 4005ae74 S /system/bin/sh root 134 103 3284 784 c0118f70 40084438 S sutagent
insut
USER PID PPID VSIZE RSS WCHAN PC NAME root 103 1 768 372 c004f96c 4005ae74 S /system/bin/sh root 134 103 3284 784 c0118f70 40084438 S sutagent
insut
USER PID PPID VSIZE RSS WCHAN PC NAME root 103 1 768 372 c004f96c 4005ae74 S /system/bin/sh root 134 103 3284 784 c0118f70 40084438 S sutagent
insut

the last line repeats over and over.
Interesting. That's what the output said when I tested it here locally and I could ping the device. It looks like the eth0 interface is up and running with the network address that you provided to it.  I'm not sure at all why at this point that it isn't working.  The ethernet cord is all connected right? (Sorry for asking the obvious question, but I'm running out of ideas.)
I figured out what might be causing the issue. The netmask is hard coded as a /22 regardless of what the netmask is configured in network.sh. We need it to be a /21.

lo UP 127.0.0.1/8 0x00000049 00:00:00:00:00:00 ifb0 DOWN 0.0.0.0/0 0x00000082 32:76:03:6d:94:6c ifb1 DOWN 0.0.0.0/0 0x00000082 2a:41:d1:bd:cc:67 sit0 DOWN 0.0.0.0/0 0x00000080 00:00:00:00:00:00 ip6tnl0 DOWN 0.0.0.0/0 0x00000080 00:00:00:00:00:00 eth0 UP 10.12.52.137/22 0x00001043 8e:8e:fe:92:c6:85 wlan0 UP 0.0.0.0/0 0x00001003 de:ad:be:ef:00:00

Van
What is the status in here?
Thanks in advance.
Armen: the image doesn't work as is (it overrides the netmask).  I think Tom Zimmerman may have a working image soon that fixes these issues, but I defer to him.
Thanks Amy!

Hi Thomas, is bug 798427 (which got fixed) what is giving issues on this bug?
Hi

(In reply to Van Le [:van] from comment #15)
> I tried changing the word "dhcp" to "static" in network.sh hoping that would
> work, but it didnt. When I am on serial, I still cant get the netcfg command
> to work.
> 
> shell@android:/ $ netcfg
> /system/bin/sh: netcfg: cannot execute - Permission denied

Oh, I made this mistake as well. When you login over serial console, you are user 'shell', which has almost no permissions. You are root if you

 1) login via 'adb shell'

or

 2) (iirc) change user and group to 'root' in lines 350/351 of the file system/core/rootdir/init.rc, and re-build.

You should be able to run any commands now.
(In reply to Armen Zambrano G. [:armenzg] from comment #24)
> Thanks Amy!
> 
> Hi Thomas, is bug 798427 (which got fixed) what is giving issues on this bug?

Hmm, I don't see how. The fixes for this bug have nothing to do with IP addressing. They only make sure that each Ethernet NIC gets a constant MAC address.

I left a comment about getting root access on the PandaBoard. Could someone try this? If you cannot fix the problem this way, I'll take a look if this is kernel-related.
Thanks Thomas for the info.

Van, does this info help? When could this be tried?
All,

DC Ops does not yet have the proper equipment for running the adb tools or rebuilding images. We're picking up some linux laptops for this purpose, but it will be several days (at best) before we're equipped to do this for you.
Hi,

I am setting up my Linux laptop with adb tools. Which Android SDK Platform do I install to work with these Panda boards? Is there any harm installing all the SDK Platforms?

Van
Hi
 
> I am setting up my Linux laptop with adb tools. Which Android SDK Platform
> do I install to work with these Panda boards? Is there any harm installing
> all the SDK Platforms?

Should be straight-forward. I downloaded the latest SDK [1] and unpacked it somewhere into my home directory. Adb and other useful tools are located in the sub-directory platform-tools/, which I added to my PATH variable.

Thomas

[1] http://dl.google.com/android/android-sdk_r20.0.3-linux.tgz
van, what is the status in here? Thanks in advance.

I think this should block bug 805016.
This bug does not block 805016, that imaging process is specific for Android, this bug is specifically for b2g
Van is onsite heading up some high-priority work in our Phoenix datacenter. He'll be back in scl1 on 10/26.
:armenzg, I am not sure what update  you are looking for. This is for b2g and the issue we are currently facing is that even though I change the netmask in the script network.sh, the image is hard coded or over riding my change and keeping it as a /22.


Van
Hi van,
Thanks for the clarification.

Would this mean that the A-team or someone else needs to get you an image that does not override it?

BTW did you get to try what Thomas suggested? (see comment below). I could have misunderstood the comment or missed something. Please correct me if I got it wrong.

(In reply to Thomas Zimmermann from comment #25)
> Hi
> 
> (In reply to Van Le [:van] from comment #15)
> > I tried changing the word "dhcp" to "static" in network.sh hoping that would
> > work, but it didnt. When I am on serial, I still cant get the netcfg command
> > to work.
> > 
> > shell@android:/ $ netcfg
> > /system/bin/sh: netcfg: cannot execute - Permission denied
> 
> Oh, I made this mistake as well. When you login over serial console, you are
> user 'shell', which has almost no permissions. You are root if you
> 
>  1) login via 'adb shell'
> 
> or
> 
>  2) (iirc) change user and group to 'root' in lines 350/351 of the file
> system/core/rootdir/init.rc, and re-build.
> 
> You should be able to run any commands now.
Hi Armen,

I believe Thomas's comments were to show me how to get access to run commands and to capture any logs you guys would want. This doesn't resolve the issue we're am stuck at. 

Thanks,
Van
Status update: mdas and I are working with tzimmerman to get his new build working on a pandaboard.

The build is currently broken, but we have verified that the kernel MAC address fixes worked--The Panda has a consistent MAC address now with the new kernel.  Once we sort out why the Gecko code isn't landing on the pandaboard after flashing, we can make a new image that you can use on the boards.

The instructions once you have the new image are pretty simple:
1. Download image file and unzip it.
2. dd the image file to one sdcard
3. Duplicate the sdcard image to the other pandas
4. Boot the pandas.

We shouldn't need to much with the network.sh file anymore.

Thanks for your patience.
The new image is here: http://people.mozilla.org/~ctalbert/b2g-panda-16gb.img.gz

However, it only seems to boot on half the pandas we have here.  Mdas and I spent two days trying to debug this and we can't figure it out. I've opened bug 806096 to track this issue.  In the meantime, please try to image the sdcards of the pandas in chassis 3 with this image and let's see how many of them boot.  I expect 50% of them to work because that's what we're seeing here.

At least if we can get some to work, then we can continue working on other elements of the automation.
Clint, do I have to specify a block size when I dd the image? And to confirm, these will get their IPs through DHCP?
(In reply to Van Le [:van] from comment #39)
> Clint, do I have to specify a block size when I dd the image? And to
> confirm, these will get their IPs through DHCP?
I imaged it with both bs=100M and with no block size specified at all, both worked (as well as this image works at all).

And yes, these pandas will have static MAC addresses and so will be able to get their static IP addresses via DHCP.
I was able to get all the pandas up in this chassis with the new b2g image except  panda-0040. I tried several SD cards/reimages to no avail.

[root@admin1]~# fping panda-00{34..45}.build.scl1.mozilla.companda-0034.build.scl1.mozilla.com is alive
panda-0035.build.scl1.mozilla.com is alive
panda-0036.build.scl1.mozilla.com is alive
panda-0037.build.scl1.mozilla.com is alive
panda-0038.build.scl1.mozilla.com is alive
panda-0039.build.scl1.mozilla.com is alive
panda-0041.build.scl1.mozilla.com is alive
panda-0042.build.scl1.mozilla.com is alive
panda-0043.build.scl1.mozilla.com is alive
panda-0044.build.scl1.mozilla.com is alive
panda-0045.build.scl1.mozilla.com is alive
panda-0040.build.scl1.mozilla.com is unreachable

Van
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee: server-ops → server-ops-dcops
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.