Juniper Class of Service (CoS)

If you come from a Cisco Quality of Service (QoS) background then you may find converting to the way that Juniper does things a challenge at first. As an initial step, I can recommend the free Juniper e-book entitled ‘Junos QoS For IOS Engineers‘. The first thing to know is that Juniper call it Class of Service (CoS). In Cisco land, the term ‘CoS’ is used exclusively for layer 2 (802.1q) QoS, so this can be confusing initially.

As with my other Juniper posts, the configuration will be for an MX104 router. The MX series supports 16 forwarding classes and 8 output queues, which allows you to classify packets with more granularity. As there can be more forwarding classes than queues, if you configure more than 8 forwarding classes you must map multiple forwarding classes to single output queues.

We’ll be using the traffic classes that are described in RFC4594, ‘Configuration Guidelines for DiffServ Service Classes’. If you’ve seen any of the Cisco Live presentations about QoS in the last few years, then the mapping below may look familiar.

QoSMapping

First of all we need to define our 12 RFC4594 forwarding classes and then map them to the 8 output queues as per our diagram above. You will notice that some of the classes are mapped to the same queue. You may wonder why we are using more forwarding classes than there are queues available. This particular device supports 8 queues, but other devices on the network support more. We can therefore keep the application classes and DSCP markings consistent across the network but then simply map them to the supported number of queues at each hop.

set class-of-service forwarding-classes class voip-telephony queue-num 0
set class-of-service forwarding-classes class broadcast-video queue-num 0
set class-of-service forwarding-classes class real-time-interactive queue-num 0
set class-of-service forwarding-classes class network-control queue-num 1
set class-of-service forwarding-classes class signalling queue-num 1
set class-of-service forwarding-classes class ops-admin-mgmt queue-num 1
set class-of-service forwarding-classes class multimedia-conferencing queue-num 2
set class-of-service forwarding-classes class multimedia-streaming queue-num 3
set class-of-service forwarding-classes class transactional-data queue-num 4
set class-of-service forwarding-classes class bulk-data queue-num 5
set class-of-service forwarding-classes class scavenger queue-num 6
set class-of-service forwarding-classes class best-effort queue-num 7

 

Next we must classify traffic that has already been marked. This is a core device, so ideally we want to have marked our traffic at an earlier point in the network. As we can simply look at existing DSCP markings, we can use what’s called a Behaviour Aggregate (BA) classifier. If we wanted to classify based upon IP addresses and ports, we could use a multifield classifier along with a firewall filter.

set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class voip-telephony loss-priority high code-points ef
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class broadcast-video loss-priority high code-points cs3
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class real-time-interactive loss-priority high code-points cs4
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class multimedia-conferencing loss-priority high code-points [ af41 af42 af43 ]
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class multimedia-streaming loss-priority high code-points [ af31 af32 af33 ]
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class network-control loss-priority high code-points [ cs6 cs7 ]
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class signalling loss-priority high code-points cs5
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class ops-admin-mgmt loss-priority high code-points cs2
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class transactional-data loss-priority high code-points [ af21 af22 af23 ]
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class bulk-data loss-priority high code-points [ af11 af12 af13 ]
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class scavenger loss-priority high code-points cs1
set class-of-service classifiers dscp ba-classifier-rfc4594 forwarding-class best-effort loss-priority high code-points be

In the above configuration we are creating a BA classifier named ‘ba-classifier-rfc4594’ and then assigning our DSCP markings (EF, CS1, AF11 etc) to each of the forwarding classes that we defined earlier.

Loss priorities allow us to specify different drop profiles for certain types of traffic. Drop profiles tell the router how to drop packets when an interface is congested. For simplicity we keep the loss priority set to high for all forwarding classes. We will reference this ‘high’ value later and assign a drop profile.

Once the classifier is complete, we must apply it to each interface that will receive traffic that has already been marked. If you are using sub interfaces, the classifier should be applied to each sub-interface and not to the physical interface.

set class-of-service interfaces ge-0/1/0 unit 210 classifiers dscp ba-classifier-rfc4594
set class-of-service interfaces ge-0/1/0 unit 211 classifiers dscp ba-classifier-rfc4594
set class-of-service interfaces ge-0/1/1 unit 210 classifiers dscp ba-classifier-rfc4594
set class-of-service interfaces ge-0/1/1 unit 211 classifiers dscp ba-classifier-rfc4594

So far we have our forwarding classes which have been mapped to queues. We have a BA classifier that will inspect the DSCP markings on incoming packets and put them in the correct forwarding class. We now need to allocate available bandwidth across those queues using a scheduler.

You can think of a scheduler as a cashier that is servicing multiple queues of customers at once. The cashier will pick two people from queue one, five people from queue two etc. As you can imagine, queue two would reduce in length at a much faster rate than queue one. The number of customers picked from each queue at a time by the cashier would be the bandwidth allocation and the queue that they choose to serve first would be determined by the priority.

set class-of-service schedulers real-time transmit-rate percent 20
set class-of-service schedulers real-time priority high

set class-of-service schedulers control transmit-rate percent 10
set class-of-service schedulers control priority medium-high

set class-of-service schedulers multimedia-conferencing transmit-rate percent 10
set class-of-service schedulers multimedia-conferencing priority medium-high
set class-of-service schedulers multimedia-conferencing drop-profile-map loss-priority high protocol any drop-profile high-drop

set class-of-service schedulers multimedia-streaming transmit-rate percent 10
set class-of-service schedulers multimedia-streaming priority medium-high
set class-of-service schedulers multimedia-streaming drop-profile-map loss-priority high protocol any drop-profile high-drop

set class-of-service schedulers transactional-data transmit-rate percent 10
set class-of-service schedulers transactional-data priority medium-high
set class-of-service schedulers transactional-data drop-profile-map loss-priority high protocol any drop-profile high-drop

set class-of-service schedulers bulk-data transmit-rate percent 8
set class-of-service schedulers bulk-data priority low
set class-of-service schedulers bulk-data drop-profile-map loss-priority high protocol any drop-profile high-drop

set class-of-service schedulers scavenger transmit-rate percent 2
set class-of-service schedulers scavenger priority low

set class-of-service schedulers best-effort transmit-rate percent 30
set class-of-service schedulers best-effort priority medium-low
set class-of-service schedulers best-effort drop-profile-map loss-priority high protocol any drop-profile high-drop

There are a few things to point out in the above configuration. We start by creating 8 schedulers, one for each of the 8 queues supported on the MX series router. These are named according to the queues in the mapping diagram above. Each scheduler is given a certain percentage of the interface bandwidth and a priority.

Priorities determine the order in which queues have access to the outgoing interface. The available priorities are low, medium-low, medium-high, high and strict-high. Strict-high is similar to the priority queue in Cisco land, although as you can only map a single forwarding class to it, we’ve used high here instead.

Some of the schedulers reference a drop profile named ‘high-drop’ that is utilised when a loss-priority of ‘high’ has been set by our BA classifier. We will create this drop profile later on. The drop profiles are associated with queues that contain primarily TCP traffic and that would benefit from Weighted Random Early Detect (WRED).

The individual schedulers now need to be mapped to forwarding-classes with a scheduler map. As we have multiple forwarding-classes tied to certain queues, the forwarding-classes that share a queue must all reference the same scheduler.

set class-of-service scheduler-map scheduler-8q forwarding-class voip-telephony scheduler real-time
set class-of-service scheduler-map scheduler-8q forwarding-class broadcast-video scheduler real-time
set class-of-service scheduler-map scheduler-8q forwarding-class real-time-interactive scheduler real-time
set class-of-service scheduler-map scheduler-8q forwarding-class network-control scheduler control
set class-of-service scheduler-map scheduler-8q forwarding-class signalling scheduler control
set class-of-service scheduler-map scheduler-8q forwarding-class ops-admin-mgmt scheduler control
set class-of-service scheduler-map scheduler-8q forwarding-class multimedia-streaming scheduler multimedia-streaming
set class-of-service scheduler-map scheduler-8q forwarding-class multimedia-conferencing scheduler multimedia-conferencing
set class-of-service scheduler-map scheduler-8q forwarding-class transactional-data scheduler transactional-data
set class-of-service scheduler-map scheduler-8q forwarding-class bulk-data scheduler bulk-data
set class-of-service scheduler-map scheduler-8q forwarding-class scavenger scheduler scavenger
set class-of-service scheduler-map scheduler-8q forwarding-class best-effort scheduler best-effort

Next, the above scheduler map named ‘scheduler-8q’ needs to be assigned to each interface. This should be the physical interface even if you have sub-interfaces configured.

set class-of-service interfaces ge0/1/0 scheduler-map scheduler-8q
set class-of-service interfaces ge0/1/1 scheduler-map scheduler-8q

On transit interfaces where we might not trust the DSCP markings that are set on the incoming packets, we need to enforce the best-effort forwarding class. This prevents untrusted traffic from marking everything as voice (our highest priority queue) and starving our other traffic of bandwidth. To do this we can use a fixed classifier that specifies a single forwarding class that all traffic will be placed into.

set class-of-service interfaces ge0/0/0 unit 0 forwarding-class best-effort

All sorted, right? Well, not really. Although our traffic will leave this router in the correct queue, we haven’t changed the DSCP marking. The next router might use a BA classifier and will classify the traffic based upon the untrusted DSCP marking. To resolve this, we need to define rewrite rules. The important point to note here is that a packet’s DSCP marking is only changed as it leaves the router on the egress interface.

High Loss Priority:
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class voip-telephony loss-priority high code-point ef
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class broadcast-video loss-priority high code-point cs3
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class real-time-interactive loss-priority high code-point cs4
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class multimedia-conferencing loss-priority high code-point af41
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class multimedia-streaming loss-priority high code-point af31
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class network-control loss-priority high code-point cs6
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class network-control loss-priority high code-point cs7
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class signalling loss-priority high code-point cs5
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class ops-admin-mgmt loss-priority high code-point cs2
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class transactional-data loss-priority high code-point af21
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class bulk-data loss-priority high code-point af11
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class scavenger loss-priority high code-point cs1
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class best-effort loss-priority high code-point be

Low Loss Priority:
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class voip-telephony loss-priority low code-point ef
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class broadcast-video loss-priority low code-point cs3
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class real-time-interactive loss-priority low code-point cs4
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class multimedia-conferencing loss-priority low code-point af41
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class multimedia-streaming loss-priority low code-point af31
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class network-control loss-priority low code-point cs6
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class network-control loss-priority low code-point cs7
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class signalling loss-priority low code-point cs5
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class ops-admin-mgmt loss-priority low code-point cs2
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class transactional-data loss-priority low code-point af21
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class bulk-data loss-priority low code-point af11
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class scavenger loss-priority low code-point cs1
set class-of-service rewrite-rules dscp rewrite-rfc4594 forwarding-class best-effort loss-priority low code-point be

Above we have configured a rewrite rule named ‘rewrite-rfc4594’ and have specified DSCP markings for each forwarding class. There are two sets of rules to cover different loss priorities, but they are still the same markings. If we take our fixed classifier example from above, the packet with an untrusted marking will come in, be forced into the best effort forwarding class and will then be remarked to best effort (CS0) as it leaves the router on a different interface.

We need to apply the above rewrite rules to our interfaces and sub-interfaces.

set class-of-service interfaces ge-0/0/0 unit 0 rewrite-rules dscp rewrite-rfc4594
set class-of-service interfaces ge-0/1/0 unit 210 rewrite-rules dscp rewrite-rfc4594
set class-of-service interfaces ge-0/1/0 unit 211 rewrite-rules dscp rewrite-rfc4594
set class-of-service interfaces ge-0/1/1 unit 210 rewrite-rules dscp rewrite-rfc4594
set class-of-service interfaces ge-0/1/1 unit 211 rewrite-rules dscp rewrite-rfc4594

Bear with me, we are nearly done. If our circuit provider limits our bandwidth, we should shape our outbound traffic so that it is dropped sooner rather than later. Here we shape our outbound traffic to 500Mbps.

set class-of-service interfaces ge0/1/0 shaping-rate 500m
set class-of-service interfaces ge0/1/1 shaping-rate 500m

If you remember, we still need to define the drop profile named ‘high-drop’ that we referenced in the schedulers earlier. The following configuration tells the router to start dropping packets when the buffer is 50% full with a probability of 1% and then increase the drop probability gradually (interpolating the values) until the buffer is 100% full. This means that when the buffer is 60% full there is a 25% chance of the packet being dropped and so on. Once defined, you can use the command ‘show class-of-service drop-profile high-drop’ to show the full table of fill levels versus drop probabilities.

set class-of-service drop-profiles high-drop interpolate fill-level [50 70 90]
set class-of-service drop-profiles high-drop interpolate drop-probability [1 50 90]

By default, traffic sourced from the router such as OSPF, SNMP etc uses queues 0 and 3. In our design we have network control traffic in queue 1, so this doesn’t work for us. To change all router traffic to a different forwarding class, you can use the following command.

set class-of-service host-outbound-traffic forwarding-class network-control

We can verify our configuration with the command ‘show interfaces ge-0/0/0 extensive’. This will show us the number of packets seen in each queue, the forwarding class to queue mapping and the bandwidth allocation for each queue.

Phew, we’re done. The full configuration can be seen here. The diagram below should help you to visualise the mapping between the different configuration parts.

One aspect that isn’t really intuitive if you come from a Cisco background is the mapping between forwarding classes and the scheduler maps. It feels like they should be mapped to the queues, but they are mapped to the forwarding classes that are in turn mapped to the queues. As you can have more forwarding classes than queues, this means that you have to ensure that your forwarding class to scheduler mappings match your forwarding class to queue mappings.

JuniperCoS

Well, that’s all I have to say about that. Have fun!

Juniper MIC With Third-Party Optics Issue

We have been experiencing an intermittent issue over the last couple of months with our Juniper MX104 routers. After a few weeks of continuous operation, the Modular Interface Card (MIC) will suddenly stop passing traffic. As our 4G console servers are still on order, this has resulted in a few day trips to our data centres.

JuniperMX104

When the issue has occurred, the routers will still be up and running but the MIC is non-responsive. We send logs to a syslog server, but because connectivity was down this obviously wasn’t working. This meant that we had to get to the device before the logs were overwritten and also copy any core dump files for the open Juniper Technical Assistance Centre (JTAC) case.

We tracked down the following entries in the logs, which eventually pointed us at this Juniper Knowledge-base article.


afeb0 MIC0 IX PCI Fatal Error detected.
afeb0 Ixchip(0): pio_handle(0x4c915b70); pio_read_u32() failed: 20(input/output error)! ge-addr=01105e04
afeb0 ixchip_env_check: Too many IXCHIP 0 IO errors, stop subsequent IXCHIP READ/WRITE operations 

We came to the conclusion that our third-party optics were to blame. Unfortunately we had a few failures of genuine Juniper optics and were forced to use third party optics. We’ve generally had no issue with using third-party optics on other devices, but unfortunately that run of good luck has come to an end.

One of the possible symptoms of this issue is that the i2c failure count will increase for the third party SFPs. This can be seen with the following commands.


start shell pfe network afeb0
show sfp list 

This will show a list of the SFPs that are installed and their serial numbers. You can see from the output below that there are three with one type of serial number and three with another.


Index Name Presence ID Eprom PNO SNO calibr Toxic
----- -------------- ---------- -------- ---------- ----------- ------- -------
 1 MIC(0/0)(0) Present Complete 740-031850 AC1345SA3XF int Unknown
 2 MIC(0/0)(1) Present Complete 740-031850 AC1567SA3BD int Unknown
 3 MIC(0/0)(2) Present Complete 740-031850 AC1987SA3BX int Unknown
 4 MIC(0/0)(3) Present Complete 740-011783 F143JU01323 int Unknown
 5 MIC(0/0)(4) Present Complete 740-011783 F143JU01345 int Unknown
 6 MIC(0/0)(5) Present Complete 740-011783 F143JU01312 int Unknown

Looking through the info for each of the SFP index numbers shows that one has a non-zero error count.

MX104-ABB-0(HSTNM vty)# show sfp 5 info
index: 0x05
sfp name: MIC(0/0)
pic context: 0x4DC15170
id mem scanned: true
linkstate: Up
sfp_present: true
sfp_changed: false
i2c failure count: 0x2B (NON ZERO)
diag polling count: 0x504
no diag polling from RE:0x1
run_periodic: false

After removing the three third-party SFPs, the MIC has been stable for a number of weeks, let’s hope that it stays that way. Thankfully we’ve not gone live on the routers yet.

Juniper BGP Configuration

This week marked another milestone in our Internet upgrade project, with the completion of a second transit peering to a tier 1 provider. I’m working on a separate post about peering in general, but for now let’s look how it’s done from a Juniper BGP perspective.

Creating a BGP session on a Juniper MX series is a relatively straightforward process, but you need to be careful with routing policies if you don’t want to become transit for your other peers.

transitforpeers

Let’s start by defining a couple of policies. We are receiving full Internet routing tables from our transit provider, so this import policy simply filters out smaller prefixes such as /27-/32. Most transit providers filter out anything smaller than a /24, but this policy just reduces the size of the table if that hasn’t been done upstream. At the time of writing, the full Internet routing table is at 678,760 routes (source). As the number of routes has an impact on router performance, it’s important to keep the table as small as possible.

If your provider is not already doing it upstream, you should also filter for bogons, like the IP ranges used in this post that are reserved for documentation (RFC5737).

[edit policy-options]
policy-statement no-small-prefixes {
	from {
		route-filter 0.0.0.0/0 prefix-length-range /27-/32 reject
	}
}

Set Commands:
set policy-options policy-statement no-small-prefixes from route-filter 0.0.0.0/0 prefix-length-range /27-/32 reject

Next we need a route to announce. It’s a good practice to define either an aggregate route or a static summary discard route (a.k.a null route) when announcing our prefix to the Internet, so that things stay relatively stable.

If we are using only part of the range, those routes will be more specific so will take precedence over the summary route. Anything that comes to us for parts of the range that haven’t been used will simply be discarded.

[edit routing-options static]
route 203.0.113.0/24 discard;

Set Commands:
set routing-options static route 203.0.113.0/24 discard

Now we need a policy to announce the above static route but nothing else. Be very careful here, as the default routing policy action for BGP is as follows.

Readvertise all active BGP routes to all BGP speakers, while following protocol-specific rules that prohibit one IBGP speaker from readvertising routes learned from another IBGP speaker, unless it is functioning as a route reflector.

This means that if we don’t put an explicit reject term below our accept term, the default action will be to advertise all active routes in our table, including any from a second transit peer. This could make our network the better path to some of our other peers, and this is almost certainly not what you want. Thankfully most transit providers will filter on their side as well, but it’s best to make sure with our own policies.

[edit policy-options]
policy-statement announce {
    term 1 {
        from {
            protocol static;
            route-filter 203.0.113.0/24 exact;
        }
        then accept;
    }
    term 2 {
        then reject;
    }
}

Set Commands:
set policy-options policy-statement announce term 1 from protocol static
set policy-options policy-statement announce term 1 from route-filter 203.0.113.0/24 exact
set policy-options policy-statement announce term 1 then accept
set policy-options policy-statement announce term 2 then reject

Now that we have all of our policies in place, it’s time to configure BGP. The configuration below is for a single session with imaginary transit AS65100 from our own imaginary AS65000 (AS numbers 64512 to 65535 are reserved for private use or documentation).

In reality you will probably want multiple external peers for resilience and an internal BGP (iBGP) configuration to distribute those routes around your own network.

[edit routing-options]
autonomous-system 65000

[edit protocols bgp]
group ebgp-65100 {
 type external;
 description "*** eBGP with Transit (AS65100) ***";
 import no-small-prefixes;
 authentication-key "passwordhere";
 export announce;
 peer-as 65100;
 neighbor 198.51.100.1;
}

Set Commands:
set routing-options autonomous-system 65000
set protocols bgp group ebgp-65100 type external
set protocols bgp group ebgp-65100 description "*** eBGP with Transit (AS65100) ***"
set protocols bgp group ebgp-65100 import no-small-prefixes
set protocols bgp group ebgp-65100 authentication-key "passwordhere"
set protocols bgp group ebgp-65100 export announce
set protocols bgp group ebgp-65100 peer-as 65100
set protocols bgp group ebgp-65100 neighbor 198.51.100.1

After committing the above configuration, we can confirm that everything is working with a ‘show bgp summary’.

Peer               AS       Last Up/Dwn  State|#Active/Received/Accepted
198.51.100.1 65100 2d 23:40:54     492395/646126/646006

The number of received routes should be increasing as the full Internet routing table is downloaded.

So, we have confirmed that we are receiving our transit provider’s routes, but what about confirming our outbound announcements? We can use the following command to see that information.

show route advertising-protocol bgp 198.51.100.1

inet.0: 646306 destinations, 1162001 routes (646186 active, 0 holddown, 120 hidden)
Prefix Nexthop MED Lclpref AS path
* 203.0.113.0/24 Self I

You should only see the routes that were previously defined in our ‘announce’ routing policy. If there are more, then there is probably a mistake in the policy. Make sure that you have the second reject term as previously discussed.

We can also check that our announced routes are making their way around the rest of the Internet by using popular looking glass tools such as those listed below.

Happy peering!