Closed
Bug 836808
Opened 12 years ago
Closed 11 years ago
mac address collisions found in panda pool
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dividehex, Assigned: dividehex)
References
Details
While investigating troubled pandas at scl1, I found some boards which had identical mac addresses. Some were just incorrectly entered in the inventory k/v store but others were actually collisions. We may need to revisit the kernel patches but in the meantime I think we should check the switch port/mac address tables to determine if this is an actual collision or false inventory entry. If it is a real collision, we will replace the highest panda in the collision set.
A quick peek into the mozpool DB reveals 5 sets of collisions:
mysql> SELECT mac_address, COUNT(*) c FROM devices GROUP BY mac_address HAVING c > 1;
+--------------+---+
| mac_address | c |
+--------------+---+
| 0e6092094e01 | 2 |
| 2e6093784e01 | 2 |
| 2e60b8754e01 | 2 |
| 2e60c63d4e01 | 2 |
| 2e60f64e4e01 | 2 |
+--------------+---+
5 rows in set (0.01 sec)
mysql> select name, mac_address from devices where mac_address='0e6092094e01';
+------------+--------------+
| name | mac_address |
+------------+--------------+
| panda-0670 | 0e6092094e01 |
| panda-0822 | 0e6092094e01 |
+------------+--------------+
2 rows in set (0.00 sec)
mysql> select name, mac_address from devices where mac_address='2e6093784e01';
+------------+--------------+
| name | mac_address |
+------------+--------------+
| panda-0531 | 2e6093784e01 |
| panda-0574 | 2e6093784e01 |
+------------+--------------+
2 rows in set (0.00 sec)
mysql> select name, mac_address from devices where mac_address='2e60b8754e01';
+------------+--------------+
| name | mac_address |
+------------+--------------+
| panda-0158 | 2e60b8754e01 |
| panda-0465 | 2e60b8754e01 |
+------------+--------------+
2 rows in set (0.00 sec)
mysql> select name, mac_address from devices where mac_address='2e60c63d4e01';
+------------+--------------+
| name | mac_address |
+------------+--------------+
| panda-0479 | 2e60c63d4e01 |
| panda-0482 | 2e60c63d4e01 |
+------------+--------------+
2 rows in set (0.00 sec)
mysql> select name, mac_address from devices where mac_address='2e60f64e4e01';
+------------+--------------+
| name | mac_address |
+------------+--------------+
| panda-0486 | 2e60f64e4e01 |
| panda-0771 | 2e60f64e4e01 |
+------------+--------------+
2 rows in set (0.00 sec)
mysql>
Assignee | ||
Comment 1•12 years ago
|
||
I pulled these 2 from the chassis so I could check (at a later time) if the CPU die ids were identical also.
+------------+--------------+
| name | mac_address |
+------------+--------------+
| panda-0479 | 2e60c63d4e01 |
| panda-0482 | 2e60c63d4e01 |
+------------+--------------+
Comment 2•12 years ago
|
||
I suspect this will be come more important as we add more pandas. Assuming the die ids are not the same, how hard is it to hack the kernel patch to use more of the available address space? As is, it only changes two(?) of the hex digits.
Assignee | ||
Comment 3•12 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #2)
> I suspect this will be come more important as we add more pandas. Assuming
> the die ids are not the same, how hard is it to hack the kernel patch to use
> more of the available address space? As is, it only changes two(?) of the
> hex digits.
As long as the die id don't happen to be identical it shouldn't be terribly difficult to fix. And I'm fairly sure the ids are wider than 6 bytes.
Comment 4•12 years ago
|
||
The patch XOR's the TAP_IDCODE and the die ID. I don't know what those are, but if they both differ from device to device, then it's certainly possible to generate collisions.
In practice, numbered from left to right, only bits 4, 17-32, and 37-40 differ from panda to panda, although those last four bits are always either 1110 or 0101. So effectively, there are 18 bits of entropy here.
A better algorithm would be to pick a range of reserved or unused vendor codes, then generate the rightmost 24 bits using some fast hash with a reasonable diffusion factor (taking 24 bits from md5 would do).
But we have what we have. If there are only a few MAC collisions, then we can probably just ensure they're not in the same VLAN. We could add a quick script to run from cron on one of the imaging servers to alert us to any same-VLAN conflicts.
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
Assignee | ||
Updated•11 years ago
|
Blocks: panda-0479
Assignee | ||
Updated•11 years ago
|
Blocks: panda-0482
Comment 5•11 years ago
|
||
Jake: assuming we are going with Dustin's suggestions in comment 4, what is left to do here?
Flags: needinfo?(jwatkins)
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to John Hopkins (:jhopkins) from comment #5)
> Jake: assuming we are going with Dustin's suggestions in comment 4, what is
> left to do here?
Since there are no other conflicting macs in our current pool and we have no plans to purchase more, I figure we can safely close this bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(jwatkins)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•