Monday, December 19, 2011

Wi-Fi Roaming Analysis Part 1 - Connection Control

Welcome to the series of articles on Wi-Fi roaming analysis. In this article, part 1, we will define Wi-Fi roaming, provide background on how a client and access point establish a connection, what roles each play in the control of the connection, and establish the importance of Wi-Fi roaming analysis for engineers to successfully operate a modern wireless LAN (WLAN) environment.

Wi-Fi Roaming Analysis Series:
  1. Part 1 - Connection Control and Importance of Roaming Analysis
  2. Part 2 - The Many Variations of Wi-Fi Roaming
  3. Part 3 - Methods of Measuring Roam Times
  4. Part 4 - Analysis with Wireshark and AirPcap
  5. Part 5 - Analysis with Wildpackets Omnipeek (coming)
  6. Part 6 - Tips for Roaming Performance Improvement (coming)
Introduction
Advanced protocol analysis is becoming an increasingly important skill for Wi-Fi engineers as networks grow increasingly sophisticated and complex. The wireless LAN market is a tremendously innovative and fast-changing landscape, and the skills necessary to understand and dissect their inner workings are highly valuable.

One of the most important aspects of building a successful enterprise wireless LAN is ensuring adequate Wi-Fi roaming performance. However, Wi-Fi roaming is a complex subject due to the many variations of Wi-Fi security found in the marketplace and the historical difficulty in being able to easily gather and analyze roaming data.

In this series I will provide an overview of Wi-Fi roaming, how it works, and provide readers with guidance on how to capture, measure, and analyze wireless roaming performance of clients within their own environments. In addition, I'll highlight a few professional tools and tricks of the trade to make this process simpler than manual analysis.

Wi-Fi Roaming Definition
Roaming, in the context of an 802.11 wireless network, is the process of a client moving an established Wi-Fi network association from one access point to another access point within the same Extended Service Set (ESS) without losing connection (e.g. within a defined time interval, usually in the range of a few seconds).


It is also helpful to distinguish between different wireless connection scenarios that may occur. Delineation will provide a better understanding of how and when each scenario will occur, why variations in performance between scenarios exist, and aid in establishing performance baselines.
  • Initial Connection - The client has no previous 802.11 association to the ESS (any AP advertising the same SSID). This situation requires the client to perform all required connection and authentication steps defined in the network policy before network access is achieved. The time required for a client to perform an initial connection will be the same as wireless roaming unless fast roaming or session caching techniques are implemented. The length of time required to complete full 802.1X authentication in secure wireless environments is considerably longer than in open or pre-shared key (PSK) networks, making implementation of fast roaming techniques highly desirable. It may even be required depending on the network architecture and applications implemented (e.g. branch / remote office networks with central RADIUS across the WAN increase the time to complete EAP authentication and can render real-time voice applications unusable).

  • Wireless Roaming - The client has an established 802.11 association to an infrastructure AP and migrates its connection within the same ESS to another AP. Association to the new AP terminates the previous AP association either implicitly or explicitly (only one association is allowed at a time, per the 802.11 standard). The goal of a wireless roam is to identify an alternate AP that can provide better service to the client than the current AP.

    Wireless client roaming algorithms are typically optimized to minimize the time required to transition between APs in order to avoid network access disruptions to client applications. This can be accomplished through fast roaming or session caching techniques that eliminate some of the authentication steps. Fast roaming can only occur after an initial connection has been performed to ensure the client has successfully completed all required authentication and authorization required by the network policy.

  • Connection Termination & Re-Establishment - The client has an established 802.11 association, but the performance severely degrades to the point that the connection is rendered unacceptable. The client and/or AP is required to recognize the degraded connection, which may not be explicitly apparent, then terminate and re-establish a connection from scratch. A connection could degrade for a number of reasons, including interference, multipath (with older 802.11a/b/g clients), excessive packet error rate, out of range, roam not completed within the client's time threshold, etc.

    When analyzing client roaming events it will be necessary to determine if the client performed a wireless roam or if it terminated and re-established its network connection. A terminated connection requires solutions to remediate underlying issues affecting network stability, versus the focus of wireless roaming which is to improve performance.

Additionally, identifying which situation is occurring can be incredibly valuable when performing protocol analysis and troubleshooting in order to determine what may be occurring with a client network connection when the client cannot be directly observed (e.g. remote troubleshooting).

Connection Control
Wi-Fi network connection establishment and roaming is decentralized, being controlled almost entirely by the client. The 802.11 standard explicitly places control of wireless connection establishment in the hands of clients by defining various logical services and breaking implementation out between clients and access points.

Think of the AP as a hotel concierge:
"Welcome to the Distribution System! You're 

requested Association is ready."
Some of these services require integration with external networks (e.g. the distribution system [DS] outside the basic service set [BSS]), which is not defined by the 802.11 standard but is typically an 802.3 wired Ethernet network. These services are only implemented in wireless access points, and include association and dis-association services among others. It is important to understand that although APs provide association services for client stations, it is the client station that invokes the association process. It may be difficult to conceptualize how client stations control connection establishment when the association service is only implemented within APs. However, remember that the 802.11 standard defines "services", and the AP provides the association service for the client who invokes the service.

Furthermore, the access point is responsible for association services in order to inform the broader network of the STA to AP mapping, and for data delivery between stations across the network. This mapping is also the reason why an 802.11 client station can only be associated to a single AP at a time to ensure that the network can deliver data to the correct AP.

Infrastructure Influence
Wi-Fi infrastructure vendors have developed proprietary features to influence client behavior. One example of this is the Cisco Compatible Extensions (CCX) program which includes AP assisted roaming through neighbor reports, fast roaming enhancements, RF scanning, client reporting, and roaming diagnostics. Another example is the band-steering feature provided by many vendors, which typically works by delaying probe responses to dual-band clients in order to influence them to join a 5GHz BSS instead of 2.4GHz BSS (otherwise many clients "stick" to 2.4GHz with high prejudice, although manufacturers are starting to change this preference due to the increasing prevalence of 5GHz Wi-Fi networks). Finally, the IEEE has standardized a set of radio resource enhancements with the 802.11k amendment that allows the infrastructure to send "Neighbor Reports" to the client to aid the client scanning and roaming decision. See the CWNP whitepaper on RSN Fast BSS Transition (free registration required) for more information on 802.11k and neighbor reports.

Proprietary Client Implementations
Since the connection is controlled by the client station, it typically relies on an internal algorithm developed by the manufacturer to determine when a wireless roam should occur. Client roaming algorithms are not standardized and are proprietary intellectual property of each manufacturer. This results in highly variable client roaming performance based on manufacturer implementation approaches and variations.

However, from a high level perspective, all client stations typically perform the same general steps when roaming, which includes:
  1. Passive / Active scanning in the background to identify other APs that are within range
  2. Client roam triggers (exact algorithms are vendor proprietary, but are commonly based on signal strength thresholds, RSSI heuristics between APs, data rate shifting, retry and error rates)
  3. Active scanning to confirm the new AP is still available
  4. Roam to the new AP
Comparison to Cellular Networks
For comparison, consider connection control similarities and difference between Wi-Fi roaming and cellular handover mechanisms. Cellular networks may implement a variety of handover protocols to transfer a mobile station between source and target cells, ranging from network-controlled to mobile-station-controlled depending on the standard being implemented (AMPS, CDMA, GSM, etc). Modern cellular networks typically rely on decentralized handover, similar to Wi-Fi, but define key enhancements to ensure connection reliability. Soft-handover in CDMA networks allows a mobile station to establish a connection to the target cell before breaking the connection to the source cell, thereby reducing the chance of service disruption. Standards such as 3GPP, which defines GSM and LTE networks, specifies that handover triggering (section III) is defined by the network core but implemented by mobile stations (user equipment) to improve consistency and performance. Finally, rigorous and thorough testing of every mobile phone is performed by mobile network operators (MNO)  before certification is granted for activation on their networks (the GCF is one example).

Note - Wi-Fi roaming is most comparable to cellular handover. In contrast, cellular roaming refers to service acquisition outside of the subscriber's home location or network provider, and should not be confused with Wi-Fi roaming.

Wi-Fi engineers should take away a few concepts from this comparison. First, soft-handover is likely not realistic for Wi-Fi networks due to typical enterprise multi-channel architectures based on frequency division of adjacent APs (similar to GSM). Second, standardized handover triggering is within the realm of possibility, and the central definition of trigger mechanisms is feasible with modern coordinated Wi-Fi architectures (typically involving a controller, but not required). However, the need for such standardization will need to become much more apparent before action by the IEEE or Wi-Fi Alliance is considered. Perhaps the industry will begin talking about such measures as MNOs take more prominent roles within the Wi-Fi standard and certification processes due to carrier Wi-Fi adoption.

Perhaps the most important takeaway is the approach to endpoint certification implemented by mobile network operators. By taking control of endpoint certification prior to activation and use on the network, MNOs more tightly control their network ecosystem to achieve desired performance levels. Wi-Fi networks will never be able to achieve such levels of control due to the use of unlicensed spectrum. However, Wi-Fi network administrators can (and should) implement similarly rigorous client testing and verification procedures to optimize network performance.

Importance of Wi-Fi Roaming Analysis
Consider - modern wireless networks require high performance to concurrently support voice, data, and real-time video, high capacity Wi-Fi to support an influx of mobile Internet devices, and ultra-low latency performance to support vertical industry solutions such as automated warehouses, robotics, and medical instrumentation.

Wi-Fi network design and optimization is a complex undertaking, with numerous features, configuration options, and environmental variables that can make achieving a high performance network difficult. Roaming analysis provides insight into how decisions made on wireless architecture, network design, client selection, and configuration impact overall network performance.

Performing Wi-Fi roaming analysis will enable network architects and engineers to:
  1. Baseline current client roaming performance
  2. Analyze gaps between current network performance and application requirements
  3. Identify opportunities to improve and optimize performance
  4. Implement changes to infrastructure and client devices to optimize performance
  5. Take more active control to ensure network performance matches desired service levels
Be sure to check back in for the next article in this series which will cover the complexity brought about by security protocols and the many resulting variations of wireless roaming.

Cheers,
Andrew


Many thanks to Marcus Burton at CWNP for technical review and contribution to this post!


11 comments:

  1. Does roaming work between different vendors? Say...Cisco WLC and Meraki?

    ReplyDelete
  2. Wi-Fi roaming is typically vendor agnostic, but the network security policies must be defined identically between the networks. Also, if using 802.1X user authentication, the same accounts must be available from both networks. Finally, fast roaming techniques will not work since vendors do not share PMK caching information. Session caching may work if the same RADIUS server is being used, but I have not tried that.

    Running two vendors in parallel is not recommended in general because issues can arise. However, it can be a good option during a migration or something. You should test the environment in a lab before using it in production.

    Cheers,
    Andrew

    ReplyDelete
  3. As always these posts rock! Thanks!

    ReplyDelete
  4. This is very nice post and Wifi is very good facility for the users and Wifi roaming is very good service.

    ReplyDelete
  5. I tested Xirrus and Cisco WLC Environments using same SSID with WPA2-PSK with AES. The mobility is pretty bad. The client sticks to initial connection even for -85. Most of the times I had to reset the connection. We redid the mobility testing using Cisco WLC and mobility is seamless. We walked over 10 AP's with no issue. We understand that mobility is client specific ..but the behavior is not acceptable with multi vendor environment.

    Prasanna!

    ReplyDelete
  6. Made changes to Xirrus management platform and roaming is much better between Cisco and Xirrus.

    Prasanna~

    ReplyDelete
  7. And what about 802.11f ? ....

    ReplyDelete
  8. 802.11F was an optional recommendation (not an amendment) by the IEEE. It specified Inter-Access Point Protocol (IAPP) for coordinating client transfer between multi-vendor access points. It was never widely supported and the IEEE rescinded the recommendation. IAPP was also obsolete with the emergence of a common control plane between access points with a wireless controller. That control plane is ironically now moving back into the access points with distributed architectures like Aerohive Cooperative Control, and to a much less effective extent with remote AP solutions like Cisco H-REAP, Aruba RAP and Instant, and Motorola Adaptive APs.

    Cheers,
    Andrew

    ReplyDelete
  9. For 802.1x enterprize roaming, when PMK caching is used, At what time the supplicant put's it's ports on autenticated state. At the time when supplicant receives the msg 1 of 4 wayhandshake or at the time when after receiving the msg it will generated the full key Hierarchy(PTK and all)??

    ReplyDelete
  10. Hi guys, I have the following problem at a customer: The client sticks to initial connection even for -85. Most of the times I had to reset the connection.

    Using Cisco WLC, H-REAP, this issue occur in all branches.

    ReplyDelete
  11. is there any open source WLC s to download???

    ReplyDelete