We have been experiencing an intermittent issue over the last couple of months with our Juniper MX104 routers. After a few weeks of continuous operation, the Modular Interface Card (MIC) will suddenly stop passing traffic. As our 4G console servers are still on order, this has resulted in a few day trips to our data centres.
When the issue has occurred, the routers will still be up and running but the MIC is non-responsive. We send logs to a syslog server, but because connectivity was down this obviously wasn’t working. This meant that we had to get to the device before the logs were overwritten and also copy any core dump files for the open Juniper Technical Assistance Centre (JTAC) case.
We tracked down the following entries in the logs, which eventually pointed us at this Juniper Knowledge-base article.
afeb0 MIC0 IX PCI Fatal Error detected. afeb0 Ixchip(0): pio_handle(0x4c915b70); pio_read_u32() failed: 20(input/output error)! ge-addr=01105e04 afeb0 ixchip_env_check: Too many IXCHIP 0 IO errors, stop subsequent IXCHIP READ/WRITE operations
We came to the conclusion that our third-party optics were to blame. Unfortunately we had a few failures of genuine Juniper optics and were forced to use third party optics. We’ve generally had no issue with using third-party optics on other devices, but unfortunately that run of good luck has come to an end.
One of the possible symptoms of this issue is that the i2c failure count will increase for the third party SFPs. This can be seen with the following commands.
start shell pfe network afeb0 show sfp list
This will show a list of the SFPs that are installed and their serial numbers. You can see from the output below that there are three with one type of serial number and three with another.
Index Name Presence ID Eprom PNO SNO calibr Toxic ----- -------------- ---------- -------- ---------- ----------- ------- ------- 1 MIC(0/0)(0) Present Complete 740-031850 AC1345SA3XF int Unknown 2 MIC(0/0)(1) Present Complete 740-031850 AC1567SA3BD int Unknown 3 MIC(0/0)(2) Present Complete 740-031850 AC1987SA3BX int Unknown 4 MIC(0/0)(3) Present Complete 740-011783 F143JU01323 int Unknown 5 MIC(0/0)(4) Present Complete 740-011783 F143JU01345 int Unknown 6 MIC(0/0)(5) Present Complete 740-011783 F143JU01312 int Unknown
Looking through the info for each of the SFP index numbers shows that one has a non-zero error count.
MX104-ABB-0(HSTNM vty)# show sfp 5 info index: 0x05 sfp name: MIC(0/0) pic context: 0x4DC15170 id mem scanned: true linkstate: Up sfp_present: true sfp_changed: false i2c failure count: 0x2B (NON ZERO) diag polling count: 0x504 no diag polling from RE:0x1 run_periodic: false
After removing the three third-party SFPs, the MIC has been stable for a number of weeks, let’s hope that it stays that way. Thankfully we’ve not gone live on the routers yet.