Thursday, February 2, 2012

Wi-Fi Roaming Analysis Part 2 - Roaming Variations

In Part 1 of this series, I provided a high-level overview of Wi-Fi connection control, the importance of roaming, and what conditions are involved in triggering a client roam.

Now that we have the basics out of the way, let's discuss the large number of roaming variations that exist and the implications of that on performance analysis. Once the client determines to move its network connection to a new AP, the actual roam occurs. This is where things get complicated, because various combinations of authentication and encryption suites require different frame exchanges to complete a roam.

Wi-Fi Roaming Analysis Series:
  1. Part 1 - Connection Control and Importance of Roaming Analysis
  2. Part 2 - The Many Variations of Wi-Fi Roaming
  3. Part 3 - Methods of Measuring Roam Times
  4. Part 4 - Analysis with Wireshark and AirPcap
  5. Part 5 - Analysis with Wildpackets Omnipeek (coming)
  6. Part 6 - Tips for Roaming Performance Improvement (coming)
Security Brings Complexity
When Wi-Fi was young, client traffic flowed fast and easy. Clients roamed from one AP to another with nary a care in the world, albeit some inefficient client roaming algorithms did exist. But over time WEP was found increasingly vulnerable to attack and eventually full defeat. The IEEE responded, defining a very "robust" security network, indeed! But with this increased security came new restrictions. Clients had to present their identity to access the network, and the APs had to call their boss for approval (authentication server). And this took time! At first, clients didn't mind. But over time clients grew increasingly impatient, wanting to get where they were going without having to stop. "Why doesn't the AP know who I am? I come through here every day!" they would say. And they were right.

I explain Wi-Fi roaming like interstate traffic. Originally there were simple on-ramps (Open/WEP networks). As the roads required maintenance and repair, toll-booths were erected to collect a fee before use (802.1X). These first-generation tolls were "cash-only" and required every car to stop and pay, which backed up traffic. Eventually, due to increasing demands and volume, these toll-booths were replaced with electronic toll collection, which allows cars to slow down and pay without stopping (fast roaming).

As the 802.11 protocol has grown more mature, it has also grown much more complex. Introduction of more secure networks solved one problem but created another. The need for and lack of standardized fast roaming has led to proprietary vendor enhancements to fill the gap. And lack of coordination among vendors has led to multiple competing methods with fragmented support throughout the industry.

The Many Variations of Wi-Fi Roaming
* Update 2012-02-03: Original table listed 802.1X/EAP as part of CCKM, which is incorrect. The table was updated to reflect this change. *

Note - The initial GTK installation is defined as part of the 4-Way Handshake with 802.11i / WPA2. It has also been observed in the 4-Way Handshake with WPA pre-standard networks, despite not being specified as such by the Wi-Fi Alliance. The GTK exchange is mainly used to update existing group keys and is listed mainly for reference purposes.

Simple Authentication & Roaming Methods
I call the following "simple" methods because they involve simple security protocols relative to the more robust methods involving 802.1X. These methods typically allow clients to complete a roam in <50ms and are very fast. However, the trade-off is lower security which becomes readily apparent when the network must scale beyond some small amount of users, at which point encryption key or pre-shared keys become unmanageable to provision, rotate, and maintain proper access control over.
  • Open Network
    The client performs 802.11 open authentication (2-packet exchange) and 802.11 association (2-packet exchange), at which point data traffic is permitted. Simple, quick, and efficient! Open networks are typically found in hotspot and guest deployment scenarios and may have web authentication via a captive web portal  layered on top, in which case the wireless network or other in-line network appliance will only allow DHCP and DNS prior to web login. However, from a layer 2 Wi-Fi perspective, data traffic is unencrypted and presents significant security risks.
  • Static WEP
    When static WEP keys are used for network access control and encryption, clients perform the same steps as an open network roam, going through 802.11 open auth and 802.11 association, then encrypt data frames using the WEP algorithm. No additional authentication exchange occurs with static WEP unless shared key authentication is configured (discussed next). The use of WEP encryption is inferred by the presence of the "Data Protection" bit set in 802.11 header as well as the abscence of a WPA or RSN information element. The use of a correct WEP key is inferred from the ability to decrypt frames at the receiver and verify the ICV (integrity check value). WEP is a legacy security protocol which can be cracked very easily and offers virtually no protection. Do NOT use WEP!
  • Static WEP with Shared Key Authentication
    When an optional shared key authentication method is configured with static WEP, the access point and client exchange an additional challenge handshake and response to confirm that the client holds the correct WEP key prior to allowing it to associate to the AP. The desire to use shared key authentication is signalled within the 802.11 authentication request and response packets in the authentication algorithm fixed parameter field. The use of shared key authentication actually reduces the security of static WEP because versions of the same challenge text are transmitted over the air in both plaintext and hashed, allowing an attacker to recover the WEP key easier. Do NOT use WEP!
  • WPA/WPA2 Pre-Shared Key
    When WPA or WPA2 is configured with pre-shared keys, the client and AP must be configured out-of-band with the proper passphrase, which is used as the master key. The client and AP exchange the 802.11 open auth and association frames before performing a 4-Way Handshake. The handshake facilitates the exchange of  random information (nonces). The passphrase, station addresses, nonces, and SSID are all used to transform the master key into a series of sub-keys, one of which is the PTK used for actual data encryption. A WPA2 PSK network is simpler in that the users only require knowledge of the passphrase, but suffers from issues of scalability and is difficult to revoke access when all users use the same passphrase. Traffic from each user is uniquely encrypted, but knowledge of the passphrase along with observation of the 4-Way Handshake can allow any user to decrypt another user's traffic. WPA2 PSK is best used in homes and SMBs where there is a small user base, which is why it is commonly referred to as WPA2-Personal. It is also commonly used with VoWiFi deployment to prevent voice call disruption due to the excessive roaming latency involved with full authentication methods listed below. 
Full Authentication & Roaming Methods
The following methods are what I classify as "full auth", meaning they perform a full 802.1X authentication process using a back-end AAA RADIUS server. When implemented without any optimization for fast roaming, these methods are used for both initial connection establishment as well as subsequent roaming between APs. These methods provide robust network security that is enterprise-ready, but the trade-off is much longer authentication time and roaming latency. It is typical for full authentication roams to take >600ms to complete, and can be longer depending on network architecture (e.g. authentication server is across a WAN circuit).
  • Dynamic WEP
    The use of dynamic WEP is provided as a vendor proprietary feature by many manufacturers, and allows the use of 802.1X / EAP authentication with the WEP protocol. After successful 802.11 association, EAP authentiction proceeds using any supported EAP type (with Cisco LEAP being the most common). Unicast and broadcast WEP keys are assigned to the client by the AP using two EAPoL-Key frames after successful EAP authentication, which allows the network to remove reliance on statically configured WEP keys and the ability to dynamically assign unique unicast keys to each client. However, dynamic WEP still relies on the same flawed WEP protocol and does not remediate its inherent issues. There is no method to signal the use of dynamic WEP within the 802.11 frame, and relies on both the client and AP to be properly configured to support this process. (The use of LEAP authentication does use a proprietary Cisco information element, but is not required for dynamic WEP). Dynamic WEP was introduced into the market in Dec. 2000 with the release of LEAP authentication by Cisco. Dynamic WEP should NOT be used!
  • WPA/WPA2 Full Authentication
    When WPA or WPA2 is configured with AAA authentication, user or device credentials are verified using a back-end authentication server. The client and AP exchange the 802.11 auth and association frames, then proceed with EAP authentication. Many different EAP authentication protocols exist and any one of them can be used depending on the customer requirements. EAP protocols require a lengthy communication exchange between the client and authentication server, typically 8 or or more round-trip frame exchanges, which creates significant delay in the roaming process. Since a client can only be associated to a single AP at a time, it must break its previously working data path prior to establishing a new data path. And EAP authentication sits as a large barrier in that path that must be overcome before application data can begin to flow again. Upon successful EAP authentication the AAA server and client derive a master key, similar to what was configured out-of-band in a PSK network, except that the master key is unique to this client session. The AAA server also sends a copy of the master key to the AP (or controller) acting as the authenticator. The AP and client then perform the familiar 4-Way Handshake to transform the master key into a temporal key used for actual data encryption. WPA2 full authentication is the basis for most enterprise Wi-Fi deployments because of the strong security offered. However, it creates significant latency that can disrupt real-time applications such as voice and video.  The WPA certification program was introduced in 2003 by the Wi-Fi Alliance prior to final IEEE 802.11i amendment ratification in 2004. The WPA2 certification program was subsequently released in 2004 and expanded in 2005.
Fast Roaming Techniques
The following fast roaming techniques improve upon the full authentication methods by optimizing various steps in the authentication process. A full authentication method is required to establish the initial client connection, after which a fast roaming technique can subsequently be used when roaming between APs to minimize delay. Fast roaming techniques vary in their ability to minimize delay, with the goal to complete a roam in <100ms. Voice traffic typically sends frames every 20ms and requires roaming latency under 100ms to prevent call disruption.
  • Cisco Centralized Key Management (CCKM) (also called Fast Secure Roaming)
    CCKM is a vendor-proprietary fast roaming algorithm developed by Cisco Systems, and is only supported on their access points, both autonomous and lightweight models. CCKM works by caching the encryption key derived after an initial authentication (DWEP EAPoL key exchange or WPA/WPA2 4-Way Handshake) on both the WDS Master and the wireless client. A WDS master role is assigned to a central point of coordination for all the APs in a group, and can be an Autonomous AP, WLSE, or newer wireless LAN controller. When roaming to a new AP, the client increments a re-key number and derives a new PTK key using the BSSID of the new AP it wishes to roam to. The client indicates CCKM support by including a Cisco proprietary information element that includes the next re-key number within the association request frame. The new AP requests the new PTK key from the WDS master and then replies to the client with the association response frame. CCKM reduces the time to complete a roam by removing the EAP authentication and 4-Way Handshake. Roam times can be <50ms in most cases.

    CCKM was originally designed for use with LEAP authentication and WEP encryption, but can be used with other EAP authentication methods and encryption ciphers (TKIP or AES) as well. The use of CCKM is advertised by the presence of a vendor-specific AKMP (authentication and key management protocol) within the WPA and RSN information elements used in beacon and probe response frames. It is also indicated by a Cisco vendor-specific information element in 802.11 association request & response frames. Clients must support CCX version 2 at minimum to leverage CCKM with LEAP, version 3 for EAP-FAST, and version 4 for PEAP and EAP-TLS. CCKM was introduced into the market in 2004 with Cisco Autonomous software release 12.2(11)JA.
  • WPA/WPA2 EAP Session Resumption (also called Fast Reconnect)
    Many EAP types used with 802.1X authentication rely on TLS security. TLS relies on a lengthy handshake negotiation to setup a secure communication path between the client and authentication server. This handshake requires a server-side certificate which usually results in authentication of the authentication server to the client. After the TLS handshake completes, the client must then authenticate itself to the server. EAP-TLS accomplishes this with a client-side certificate. Tunneled EAP types such as PEAP and EAP-TTLS use other less secure protocols such as MSCHAPv2 or EAP-GTC inside the tunnel to complete authentication without being directly exposed to an attack. Upon successful EAP authentication, the AP and client perform the familiar 4-Way Handshake to derive the PTK for data encryption.

    Once a client has initially authenticated, the TLS session and resulting security context can be cached on both client and server. Upon subsequent re-authentication, the use of the cached TLS session allows use of simpler and shorter handshake process. Additionally, the existence of a valid cached TLS session implies a previously successful authentication, and many EAP types allow the inner client authentication to be skipped. Overall, this typically results in a 50% reduction in frame exchange to the backend authentication server during EAP authentication. The use of session resumption is transparent to the WLAN infrastructure and appears as a normal full 802.1X authentication. However, roam times typically require <300ms to complete, but may be longer depending on network architecture (e.g. authentication server is across a WAN circuit). Although a significant improvement over full 802.1X authentication, EAP session resumption is still not fast enough to support real-time applications such as voice over IP. However, it is well supported in the industry and is common on wireless networks.
  • WPA2 PMK Caching (also called Static PMK Caching or Fast/Secure Roam-Back)
    The client re-uses a previously cached PMK Security Association (PMKSA) from a prior full 802.1X authentication with an individual access point. The PMKSA cache can also be built by pre-authenticating through the existing AP association to the new AP. Once the client roams it will send the PMK Identifier (PMKID) of the cached PMKSA to the access point in the RSN Information Element within the Re-Association Request frame. If the AP has the same PMKID cached it will skip the 802.1X authentication and proceed directly to the 4-Way Handshake. The end-result of a PMK cache roam is functionally equivalent to an OKC and Fast BSS Transition roam, clients just cannot re-use a single cache entry across multiple APs. PMK cache roaming typically requires <100ms to complete.

    PMK caching is quite well supported by infrastructure and client devices alike. Unfortunately, its usefulness is limited by the fact that a client must have a cached PMKSA with each access point and this caching is not shared between APs within the same controller or AP group. Also, many clients and APs limit the amount of cached PMK entries due to memory utilization concerns. This means that it reduces how often it can be used, requiring full authentication to each AP the first time it associates. There is also a maximum lifetime for cached PMKSAs, after which time a full authentication is required again. PMK caching is therefore highly dependent on the traffic patterns of your clients. If they roam between the same set of APs most of the time, PMK caching could be a great benefit. If clients often roam to new APs throughout the network then PMK caching is less useful.  Reference section 8.4.1.2.1 of the IEEE 802.11-2007 standard. PMKSA caching was introduced in 2004 with the ratification of the IEEE 802.11i amendment.
  • WPA2 Proactive Key Caching (PKC) (also called Opportunistic Key Caching)
    PKC builds on top of the standardized PMK caching, but extends the re-use of a single cached PMKSA across all wireless access points connected to the same WLC or AP group. PKC is not a defined standard by the IEEE, and vendor implementations may vary.

    PKC works by caching the PMKSA from an initial client full authentication at a central point of coordination for multiple access points, typically a WLC. When the client roams to a new AP within the same Extended Service Set (ESS), it "proactively" calculates a new PMKID for use with the new AP based on the BSSID of the new AP. The client then sends the newly calculated PMKID to the new AP in the RSN information element of the re-association request. Depending on vendor implementation, the AP will already have a cached PMKSA or PMKID pushed to it by the WLC, or it will query the WLC for the PMKID upon receiving the re-association request. If the AP derives the same PMKID as the client, it will skip EAP authentication and proceed directly to the 4-Way Handshake to derive a new PTK for data encryption.

    Essentially, the client can re-use the cached PMKSA, but calculates a new PMKID for use with every AP without needing to perform a full 802.1X authentication. PKC roaming performs similar to both static PMK caching and PSK roaming, requiring <100ms to complete. However, support for PKC is highly variable within the industry, and despite favorable initial adoption by client manufacturers, support has been declining. PKC was introduced by Airespace, Funk Software, and Atheros in 2004, shortly after the ratification of the IEEE 802.11i amendment for robust security networks.
  • WPA2 Fast BSS Transition (FT)
    Given the limitations of static PMK caching, and limited support for proprietary OKC and CCKM fast roaming techniques, the IEEE standardized fast roaming across an ESS with the 802.11r amendment, which was ratified in 2008. An AP advertises support for FT in a new Mobility Domain Information Element (MDIE) in beacons, probe responses, and (re)association responses. The client must also indicate support in an MDIE included in authentication and (re)association requests.

    Fast Transition works by having the authenticator (typically a WLC) complete an initial successful 802.1X client authentication and derive a PMK-R0 for the client. The PMK-R0 is used to derive a unique PMK-R1 for each AP within the mobility domain. The authenticator then distributes the keys to other APs using a secure channel (which is not defined by the IEEE 11r amendment). During the initial authentication, the client performs full 802.1X authentication, completes the 4-Way Handshake to derive a PTKSA with the AP (using PMK-R1 key material), and then is allowed access to the network. However, upon roaming the 802.1X authentication and 4-Way Handshake steps may be skipped if a valid PMK-R1 for the new AP is presented by the client in the (re)authentication and (re)association request frames. Therefore, Fast Transition allows roaming faster than static PMK caching and OKC, and on-par with CCKM roaming, typically <50ms.

    This is an extremely over-simplified explanation, but will suffice to understand how Fast Transition works. It is also important to note that 11r also allows FT over the distribution system, through the current AP to the new AP, similar to 802.11i pre-authentication. However, I will not cover that topic in this article.

* Notice that many of these fast roaming techniques are restricted to WPA2 only.

Layer 2 versus Layer 3 Roaming
Layer 2 roaming occurs when a client roams from one AP to another AP which both attach to the same client subnet or VLAN.

Layer 3 roaming occurs when a client roams from one AP to another AP which does not attach to the same client subnet or VLAN. If a client is required to acquire a new IP address, existing application connections break which has an adverse affect on network usability. Existing client sessions will either hang or eventually timeout and disconnect.

Wireless controller architectures help eliminate the need for layer 3 roaming by tunneling client data traffic from APs back to the controller as the logical client network attachment point. This way, APs can be spread across a larger physical or logical network environment without impacting clients. However, this also can only scale so large until APs attach to different controllers with different client attachment points, or unique requirements may dictate traffic forwarding to an altogether different network segment (e.g. guest termination in a secured DMZ).

All enterprise Wi-Fi vendors implement layer 3 roaming transparency to clients in order to eliminate the need for a client to acquire a new IP address. This is typically accomplished through coordination between APs or controllers within a logical group to tunnel existing client traffic back to a point within the network that can serve the original client subnet or VLAN. Examples include Cisco's concept of a wireless controller "Mobility Group", Aerohive's concept of a "Hive", etc.

Revolution or Evolution? - Andrew's Take
Roaming is easily one of the most convoluted processes within the Wi-Fi industry. The complexity involved between security requirements, standard and proprietary roaming methods, combined with fragmented infrastructure and client manufacturer support is staggering. Network administrators cannot predict how roaming will perform without observing live clients and analyzing the results. As networks grow and administrators increasingly lose control of devices attaching to the network, it becomes an almost impossible task to ensure adequate performance for every client.

Support for standardized fast roaming is long overdue! As I have written before, it's time for both infrastructure and client vendors to adopt 802.11r Fast BSS Transition. Real-time traffic flows require better performance than can currently be achieved. Support for proprietary fast roaming techniques such as CCKM are available, but are tough to come by for customers and even harder to push adoption by vendors. It's time to stop the marketing spin around WLAN controllers and fast roaming, as Marcus can attest. We have a solution, now we need adoption!

I'm from Nebraska, and as Larry the Cable Guy would say: "Git-R-Done!"

Cheers,
Andrew



Further Reading
For further reading on fast roaming techniques related to CCKM, PMK caching, OKC/PKC, and 802.11r, see the Cisco Aironet Configuration Guide, Cisco Voice over Wireless 4.1 Design GuideCWNP  RSN Fast BSS Transition (free registration required), and the IEEE 802.11r amendment.

For an example packet flow analysis, review my post on PEAP authentication which details the frame by frame exchange for two types of roams that are most common, a full EAP authentication and EAP session resumption.

Other Posts You Might Be Interested In:

2 comments:

  1. Despite being an arrogant prick, you do write very good, accurate and informative articles.

    ReplyDelete
    Replies
    1. Thanks! You're the fuel that keeps me going!

      Andrew

      Delete