Revolution Wi-Fi: February 2012

Wednesday, February 22, 2012

Graphic: Comparing Wireless Technology Range and Data Rates

This graphic helps visualize the different use-cases for various wireless technologies by comparing range and data rates.

A few notes:

LTE could be included in the same area as WiMax, and could probably be re-named "4G".
Wi-Fi will soon approach Gigabit speeds with 802.11ac and 802.11ad and would shift upward in the graphic.

Cheers,
Andrew

Tuesday, February 21, 2012

Mesh Network Performance Impact

I've been "noodling" (yes, I scientific term, I assure you) on Wi-Fi mesh networking of late. There exists this "rule-of-thumb" that is often quoted in Wi-Fi circles that goes something like this:

For every mesh hop (after the first hop), throughput degrades by 50% [or more] per hop.

Not being the type to take statements like that at face value, I dug in and did a bit of research. The statement, on its face value, is very plausible. Given a single-radio mesh backhaul where all ingress and egress traffic must share the same frequency, throughput will degrade because each frame transmission must be repeated to the next hop on the same channel.

Note - I will not discuss multi-radio mesh backhaul solutions that use separate radios and frequencies for simultaneous ingress and egress transmission within the mesh network. These solutions are more mature and do not result in such dramatic throughput degradation.

This is very logical, but in my estimation, also a bit misleading.

My line of thought went something like this:

Yes, the 2nd mesh hop results in a 50% throughput degradation because each frame must be transmitted a second time across the backhaul from the 1st hop to the 2nd hop. Hence 50% loss.

But then we get to larger mesh networks, which have 3 or more hops. If a frame must be transmitted across 3 hops with all backhaul radios on the same frequency, then the total throughput should be 1/n, where 'n' represents the number of mesh hops the frame must traverse. The corresponding throughput degradation is 1 - 1/n. This results in 1/3 or 33% of original throughput, resulting in 66% degradation.

However, the rule-of-thumb dicatates that throughput degradation in a 3 hop mesh network would be 50% to 2nd hop and another 50% to the 3rd hop, net 1/4 or 25% of original throughput, resulting in 75% degradation.

This is the gap in analysis that prompted my research. What should engineers be using as planning values for single-radio backhaul mesh networks? 50% loss per hop, or 1 - 1/n loss based on the total number of hops a frame must traverse?

Both Calculations are Valid [based on your assumptions]
After researching it, both throughput models are technically accurate based on assumptions made during the analysis and design.

Single-Radio Mesh Backhaul Bandwidth Degradation
(courtesy of Strix Systems)

The best-case scenario is 1 - 1/n degradation. This assumes the mesh hops are deployed linearly and can only hear their two neighboring mesh nodes, one upstream and one downstream.

The worst-case scenario is 1 - 1/( 2^(n-1) ) degradation, which is 50% loss per hop (our rule-of-thumb). This assumes that the mesh network is designed as more of a web of mesh nodes that can hear three, four, or more neighboring nodes for failover and self-healing capabilities.

Using either method, it's clear that single-radio backhaul mesh networks result in substantial throughput degradation and should only be used when alternatives such as wired networking or traditional wireless access point deployments cannot be used.

Best-Case Mesh Throughput Degradation
(courtesy of Strix Systems)

Conclusion
Although the single-radio mesh backhaul rule-of-thumb gives the desired impression of performance degradation when mesh solutions are discussed, it should be used cautiously for explaining mesh performance issues from a conceptual point of view.

However, for real-world network planning, use either method as a starting point, and then verify through testing in your environment.

Know you know :)

Cheers,
Andrew

Friday, February 10, 2012

HP Wi-Fi Direct Printing in the Enterprise

Have you been thinking that Wi-Fi Direct will mainly be limited to consumer applications? Think again. HP just announced support for Wireless Direct Printing, which allows any Wi-Fi capable device to print directly to the printer when in proximity without connecting through the corporate network.

This solution works by leveraging the Wi-Fi Direct standard that was developed last year by the Wi-Fi Alliance and the Apple AirPrint technology that eliminates software or driver installation on Apple mobile devices. The user simply needs to connect to the Wi-Fi network that the printer advertises, then print.

Pros: Easy printing from mobile devices in the enterprise

This should help simplify support for BYOD (bring your own device) initiatives. Since BYOD typically is also designed with security restrictions around corporate network access, and printers are usually distributed throughout the network, providing access to those printers would be a management headache to say the least.

Also, mobile device printing via Apple AirPrint on a corporate network is not usable at this point due to protocol limitations that prevent printer discovery and access across layer 3 network boundaries. The ability to connect directly to the printer and print documents will allow immediate adoption of AirPrint in the enterprise.

HP Wireless Direct Printing is Easy using AirPrint
(but appears to lack any security)

Cons: Unproven security

The security issues involved with a Wi-Fi network being advertised by a printer that is directly cabled into your network are significant. Printers have historically been easy targets for attackers to gain access to corporate networks due to their lack of focus on security. Just look here! By allowing direct wireless access to the printer, enterprises risk exploitation of numerous printer vulnerabilities which could result in broad internal network access for an attacker.

HP's implementation also appears to use an open Wi-Fi network, which makes the risk even greater! The Wi-Fi Direct faq states the use of a separate "security domain" from the corporate wireless network. What this means is that security of the Wi-Fi Direct connection can be different (and simpler) than security required to access the corporate network. But that doesn't require an open connection. Wi-Fi Direct supports strong WPA2 pre-shared key security and ease-of-setup using WPS. However, HP's documentation implies a wide-open wireless network.

HP Wireless Direct Printing Appears to Lack Any Security

Recommendation: Wait and see

I can't provide a solid recommendation on this technology or use in the enterprise until I learn more about HP's implementation. I have more questions than answers at this point. The prudent path for enterprises will be to wait and see what is discovered about this solution by the community over the coming weeks / months and engage your HP account team to learn more about the solution and security features.

Additionally, verify if the printers that your organization are purchasing support this technology, what the default settings are, and what controls can be put in place to prevent use of this feature until its use is appropriately secured and approved.

Cheers,
Andrew

Tuesday, February 7, 2012

Wireless Field Day 2 Video Archives

In addition to all the videos from the Wi-Fi Mobility Symposium, check out the great videos from the subsequent Wireless Field Day 2.

The videos contain technical details on vendor solutions and are filled with answers to the questions that every engineer wants to ask the vendors but few get meaningful replies to. The delegate crew interacts with founders and technical experts at each of the vendors, making these discussions much more valuable than sales product pitches and marketing!

Thanks to Gestalt IT, Tech Field Day, and Prime Image Media for hosting the event!

Video Recordings

COMPLETE WIRELESS FIELD DAY 2 PRESENTATIONS

INDIVIDUAL WIRELESS FIELD DAY 2 VIDEOS

INDIVIDUAL WI-FI MOBILITY SYMPOSIUM VIDEOS

Cheers,
Andrew

Mac OS X Lion Creating Wi-Fi 802.1X Profiles

Mac OS X 10.7 (Lion) does not allow manual creation or configuration of 802.1X profiles for secure authentication on Wi-Fi and Ethernet networks for typical users. In order to access an 802.1X network in Lion, users are prompted to enter credentials when joining an active network that is in range, at which time it automatically detects the authentication settings that should be used.

The 802.1X tab in the System Preferences > Network > Advanced section no longer allows manual 802.1X profile creation.

Mac OS X 10.7 (Lion) 802.1X Profile Restriction

Lion forces the use of a configuration profile which must be created from Lion Server or using the iPhone Configuration Utility (iPCU). The config file is nothing more than an XML file containing the settings and usually has a .mobileconfig extension. Since Apple has decides to stop selling the Xserver line a year ago, most administrators will rely on the iPCU.

This restriction can be problematic for engineers wishing to test various client configuration scenarios without a live network. Many enterprise environments support multiple EAP types on their authentication servers in order to support various client deployment scenarios. Therefore, an engineer may wish to switch between profiles on the fly to test multiple authentication types. Additionally, user-created 802.1X profiles only work under their own user context, and do not work for pre-login or system level network connections which are of great benefit in enterprise environments for remote management and control when users are away from their desks (e.g. overnight). Finally, it should be noted that the "auto-detection" capability during network join may not work accurately for EAP-TTLS since it assumes use of MSCHAPv2 inner authentication.

To create an 802.1X profile for Lion, download and install the iPCU:

Install the iPhone Configuration Utility

Once installed, launch it from the Applications/Utilities folder in Finder. Start by selecting Configuration Profiles on the left side, then click New.

Create A New Configuration Profile in the iPhone Configuration Utility

Give the profile a name, unique identifier, organization name, and description. Then move on to the Wi-Fi section. Configure the basics like SSID and Security Type, then select one or multiple EAP types supported on the WLAN in the Protocols tab.

Switch to the Authentication tab to configure the credentials that will be used. Most enterprise admins will want to leave the username blank and select "Use Per-Connection Password" when deploying configuration profiles to their users to prompt each user to enter their own unique password instead of hardcoding a username and password. If using EAP-TLS an identity certificate may be selected. Finally, if you are concerned about username exposure with tunneled authentication protocols, provide an anonymous outer identity value so hackers cannot compile a list of valid usernames on your network.

iPhone Configuration Utility Wi-Fi Authentication Parameters

Last, configure the trusted certificates and server certificate names in the trust tab. This allows administrators to define which authentication servers or naming conventions are allowed to authenticate users. This also prevents users from being prompted to trust servers at the time of authentication.

When the Wi-Fi payload and configuration profile is completely finished, select either Share or Export. Share allows you to send the profile via email, whereas Export allows you to export the file to your local filesystem for distribution at a later time.

Note - See this Apple help document for further instructions on using the iPCU.

To install the configuration profile, locate the file (.mobileconfig extension) and double-click it.

Install the iPCU Configuration Profile

You will be prompted to fill-in any per-user authentication fields left blank by the administrator. The profiles can be viewed later in the System Preferences > Profiles section. This is also where you can delete previously installed profiles. The associated 802.1X profile is also visible in System Preferences > Network > Advanced > 802.1X.

802.1X Profile Successfully Installed

This method is not as easy for on-the-fly testing, but should allow administrators to accomplish all necessary tasks.

Cheers,
Andrew

Monday, February 6, 2012

Cisco WLC Now Supports PMK Caching, Finally!

I was sifting through the newly released Cisco 7.2.103.0 release notes in order to update the feature enhancements that I posted about over at the NSAShow website from the brief availability of version 7.2.101.0. Given my recent article on Wi-Fi Roaming Complexity that included a breakdown of the various types of roaming that exist, I thought it would be pertinent to point out the addition of Static PMK Caching support in the latest version of Cisco WLC code.

From the Cisco WLC 7.2.103.0 Release Notes:

Most client devices only support Static PMK Caching and not Proactive / Opportunistic Key Caching (PKC/OKC). This includes common enterprise devices including Windows 7 and ruggedized mobile devices from Motorola (to name a few).

But Cisco WLCs never supported static PMK caching, only OKC/PKC. This is something that our wireless team went back and forth with Cisco on a few years ago when we were running version 4.2 code. We were testing our Motorola mobile devices as part of our change management process to verify correct operation and performance with a configuration change from WPA-TKIP to WPA2-AES. Previously, we had been using CCKM for fast roaming, but Motorola did not have CCKM support for WPA2. In our traces we would see static PMK caching roams a large percentage of the time. Talking with our Advanced Services support rep. and reading Cisco documentation, we should NOT have been seeing this occur. The only official support within a WLC was for OKC/PKC.

After about a dozen calls with Cisco TAC, trace files being shared, and additional verification, TAC's response was that the WLC actually had enough information to re-assemble the PMKID the client was sending for each individual AP. It wasn't storing it, but was able to regenerate it from other information that was being kept on the client session. So static PMK caching was actually working, but they could not support it. The reason cited was due to memory concerns if they had to cache individual encryption keys for every client on every AP they visited, which could grow quite large. Given a large enough AP deployment and enough clients, I understand this concern.

It was just an interesting case in something working that shouldn't have been :)

With version 7.2.103.0, it's finally nice to see official support for static PMK caching, even though it was working before. I wonder if I execute a "show pmk-cache all" command on a WLC if I'll see multiple entries per wireless client now? I'll have to test in the lab to find out!

Cheers,
Andrew

Friday, February 3, 2012

Bluetooth That Will Make You Cry

Question: What happens when you put 40 Bluetooth devices in simultaneous operation within 800 sq. feet of each other?

Answer: This...

Spectrum Analysis Capture of 40 Simultaneous Bluetooth Devices

Now think about this on your own: Can you spot the Wi-Fi going on at the same time? No, you say that you can't? Why not? (Please provide your answers in the comments section for this post)

Here's an image of the "baseline" Wi-Fi activity prior to the Bluetooth activity in the same environment. I bet you can spot the Wi-Fi in this one. There's a typical enterprise deployment with APs on channels 1, 6, and 11, plus an iperf performance measurement currently going on across channel 6.

What's Going On Here?
This is a capture of 40 Honeywell Xenon 1902 cordless Bluetooth area-imaging barcode scanners operating at the same time. These units are used in retail environments at checkout registers to provide faster scan rates, ease of mobility, and overall a faster checkout process for customers. The "area-imaging" implies reading of 2D barcodes such as QR codes and such.

Honeywell Xenon 1920 Area-Imaging Cordless Barcode Scanners

Bluetooth is becoming the default communication method for cordless barcode scanners by almost every manufacturer. I did a bit of research on it, and only 2 out of 8 manufacturers of cordless barcode scanners support an alternative to Bluetooth (one used Wi-Fi and another used narrowband at 433 or 910 MHz). But every single one of them provide a Bluetooth option, and it is typically the more prominently displayed option on their websites. I also received information from a good friend, Joel Barrett, that indicates manufacturers are all moving to Bluetooth scanners and support for other options will be phased out. So, if I wanted to choose a different option I could probably get one now, but support would be short-lived and I'd end up having to switch to Bluetooth anyways. Might as well bite the bullet now and figure out how to deploy these in my environment to co-exist [relatively] peacefully with my Wi-Fi network.

Performance Impact
These units are rated as a Class 2 Bluetooth transmitter, meaning they should have a maximum power output of 2.5mW and an estimated range of 10 meters. Sounds nice and low, and one would expect minimal impact to Wi-Fi. But the reality can be far different!

It's important to understand the different impact that Bluetooth can have in an enterprise environment than in a consumer environment. The deployment scenarios can be dramatically different, and a high concentration of Bluetooth devices in a small area directly correlates to decreased performance. Sure, the typical duty cycle of a single Bluetooth device is small. But as Bluetooth device density increases so does Wi-Fi performance impact due to increased CCA busy detection by Wi-Fi devices and increased frame corruption when Bluetooth can't avoid APs on multiple channels. Even if Bluetooth version 1.2 and later capable devices are used that implement adaptive frequency hopping, they cannot avoid interfering with Wi-Fi access points spread out across the entire 2.4GHz frequency band.

In executing Wi-Fi performance testing with these Bluetooth devices our team ran multiple scenarios, changing Bluetooth power levels, pairing status, and scan rates. What we came away with also varied dramatically based on these settings. Our baseline was an 802.11g network with 20 Mbps throughput. The environment is an open-air retail setting at the front register checkout lanes.

Clearly, despite being rated as a Class 2 Bluetooth device, the RF signal was carrying quite far. Luckily, Honeywell has done a good job providing management tools to customize the radio performance of their barcode scanners. By adjusting the power level down we were able to minimize the impacted area as well as the impact to the Wi-Fi network.

What made our situation even more challenging was the desire to deploy VoWiFi around the same time as the cordless barcode scanners in the same environment. Our preference is to use voice handsets that support 5GHz frequency bands, but that may not be possible due to other business considerations on device capabilities and application support (we're still evaluating solutions). So, we ran 2.4GHz voice tests that showed an average 20% frame loss rate when the Bluetooth scanners operated at 10% (0.25mW) and an unacceptable user experience. When the power level was reduced to 1% (0.025mW) the frame loss was much lower and no perceptible voice quality issues could be observed by end-users.

Ultimately, we were able to find a compromise that allowed the use of these cordless barcode scanners while minimizing impact to the Wi-Fi network.

Deployment Considerations
Here are some considerations when deploying Bluetooth in an enterprise environment:

Device Selection
Select Bluetooth devices that are configurable and easy to provision. The device should support modification of all of the settings listed below, and keep those configuration settings across reboots. If a device is factory-reset or the battery dies, it should be able easy to re-apply the custom configuration settings by staff in the field with minimal training and effort.

Recommendation - Purchase "enterprise-class" Bluetooth devices that allow custom configuration.
Device Density
In general, the more Bluetooth devices operating in a confined area, the more impact to the Wi-Fi network. Pretty simple. Each individual Bluetooth device has minimal impact due to very low duty cycle (airtime used), but as more and more devices are added it linearly increases interference and decreases Wi-Fi performance.

Recommendation - Minimize Bluetooth device density as much as possible.
Power Level
The Bluetooth transmission power level, especially in dense deployments, can have a dramatic effect on the impact to a Wi-Fi network. In our testing, reducing power levels from 100% (2.5mW) down to 1% (0.025mW) significantly reduced the impact to the Wi-Fi network, and the range provided was still adequate to meet our business needs.

Recommendation - Reduce Bluetooth transmission power to the lowest setting that still allows reliable functionality for a given deployment scenario.
Bluetooth Pairing
The pairing status of a Bluetooth device can determine how actively the device transmits. A paired device usually transmits much less frequently than an unpaired one. Unpaired devices may constantly search for a base station or partner, often times transmitting very frequently in what many manufacturers call "distress mode". Honeywell also provides a configurable scan timer that adjusts how long an unpaired device will search for its partner. We adjusted this setting down to 3 cycles instead of infinite. It will also scan whenever the trigger is pulled. This minimizes interference in the worst-case scenario that the device gets unpaired.

Recommendation - Establish sound operational practices to ensure Bluetooth devices remain paired at all times. Additionally, adjust scanning timers down to a reasonable level from defaults.
Know Your Environment
Bluetooth impact will also vary based on the environmental characteristics in which it is deployed. In my situation the impact was significant because an "open-air" environment. But that may not be the case in an office with many more walls and obstacles that prevent RF signal propagation. Also, know your Wi-Fi client device capabilities and applications. If you only use data applications like web surfing and file transfer, Bluetooth may not be a big risk. But if you use real-time applications like voice or streaming video, then it could cause usability issues.

Recommendation - Understand how Bluetooth impact will vary based on the facility characteristics and applications deployed on the Wi-Fi network.
Migrate Wi-Fi to 5GHz
If you can't mitigate the performance issues with Bluetooth or any other source of interference in the 2.4GHz spectrum, move your clients over to 5GHz. This one is easy to understand, but can be difficult to achieve in practice. Consider the influx of mobile devices that only operate using a single-radio 2.4GHz chipset. What applications will be used on those devices, and what is the implied or defined service level agreement between the network team and business teams?

Recommendation - Use band steering techniques or different WLAN configurations on the Wi-Fi network to move 5GHz capable clients over to this band.

Andrew's Take
The delivery of business capabilities will always trump non-functional technology requirements (unless your business is technology). As IT professionals we must understand and accept this. Instead of saying "no" to solutions that go against best practices, work to understand the business request and develop a solution that delivers what the business needs. This requires compromise on your part, not every solution can be 100% the best technology. Often times, the best technical solution is NOT the best business solution.

Network administrators should be aware of initiatives to use Bluetooth client devices within their environments. They do not need to block use of these devices outright, but do need to perform proper performance analysis and modify Bluetooth configuration settings to minimize impact to the Wi-Fi network.

This type of scenario also highlights why I prefer to deploy wireless access points with integrated spectrum analysis in my environment. Day-to-day operation of this network requires "always-on" visibility into non-Wi-Fi sources of interference so that I can baseline, trend, and report on network performance. It's also why I prefer a dedicated SpecAn chipset in APs, and am skeptical of 1st-generation Atheros-based solutions that cannot perform concurrent spectrum analysis while serving Wi-Fi clients. They require a dedicated RF / Spectrum mode of operation and have to be taken out of service. I hear that's changing and Atheros solutions can now provide that "always-on" capability, but have yet to see one hands-on or test vendor claims.

Cheers,
Andrew

Thursday, February 2, 2012

Wi-Fi Roaming Analysis Part 2 - Roaming Variations

In Part 1 of this series, I provided a high-level overview of Wi-Fi connection control, the importance of roaming, and what conditions are involved in triggering a client roam.

Now that we have the basics out of the way, let's discuss the large number of roaming variations that exist and the implications of that on performance analysis. Once the client determines to move its network connection to a new AP, the actual roam occurs. This is where things get complicated, because various combinations of authentication and encryption suites require different frame exchanges to complete a roam.

Wi-Fi Roaming Analysis Series:

Part 1 - Connection Control and Importance of Roaming Analysis
Part 2 - The Many Variations of Wi-Fi Roaming
Part 3 - Methods of Measuring Roam Times
Part 4 - Analysis with Wireshark and AirPcap
Part 5 - Analysis with Wildpackets Omnipeek (coming)
Part 6 - Tips for Roaming Performance Improvement (coming)

Security Brings Complexity

When Wi-Fi was young, client traffic flowed fast and easy. Clients roamed from one AP to another with nary a care in the world, albeit some inefficient client roaming algorithms did exist. But over time WEP was found increasingly vulnerable to attack and eventually full defeat. The IEEE responded, defining a very "robust" security network, indeed! But with this increased security came new restrictions. Clients had to present their identity to access the network, and the APs had to call their boss for approval (authentication server). And this took time! At first, clients didn't mind. But over time clients grew increasingly impatient, wanting to get where they were going without having to stop. "Why doesn't the AP know who I am? I come through here every day!" they would say. And they were right.

I explain Wi-Fi roaming like interstate traffic. Originally there were simple on-ramps (Open/WEP networks). As the roads required maintenance and repair, toll-booths were erected to collect a fee before use (802.1X). These first-generation tolls were "cash-only" and required every car to stop and pay, which backed up traffic. Eventually, due to increasing demands and volume, these toll-booths were replaced with electronic toll collection, which allows cars to slow down and pay without stopping (fast roaming).

As the 802.11 protocol has grown more mature, it has also grown much more complex. Introduction of more secure networks solved one problem but created another. The need for and lack of standardized fast roaming has led to proprietary vendor enhancements to fill the gap. And lack of coordination among vendors has led to multiple competing methods with fragmented support throughout the industry.

The Many Variations of Wi-Fi Roaming

* Update 2012-02-03: Original table listed 802.1X/EAP as part of CCKM, which is incorrect. The table was updated to reflect this change. *

Note - The initial GTK installation is defined as part of the 4-Way Handshake with 802.11i / WPA2. It has also been observed in the 4-Way Handshake with WPA pre-standard networks, despite not being specified as such by the Wi-Fi Alliance. The GTK exchange is mainly used to update existing group keys and is listed mainly for reference purposes.

Simple Authentication & Roaming Methods
I call the following "simple" methods because they involve simple security protocols relative to the more robust methods involving 802.1X. These methods typically allow clients to complete a roam in <50ms and are very fast. However, the trade-off is lower security which becomes readily apparent when the network must scale beyond some small amount of users, at which point encryption key or pre-shared keys become unmanageable to provision, rotate, and maintain proper access control over.

Open Network
The client performs 802.11 open authentication (2-packet exchange) and 802.11 association (2-packet exchange), at which point data traffic is permitted. Simple, quick, and efficient! Open networks are typically found in hotspot and guest deployment scenarios and may have web authentication via a captive web portal layered on top, in which case the wireless network or other in-line network appliance will only allow DHCP and DNS prior to web login. However, from a layer 2 Wi-Fi perspective, data traffic is unencrypted and presents significant security risks.
Static WEP
When static WEP keys are used for network access control and encryption, clients perform the same steps as an open network roam, going through 802.11 open auth and 802.11 association, then encrypt data frames using the WEP algorithm. No additional authentication exchange occurs with static WEP unless shared key authentication is configured (discussed next). The use of WEP encryption is inferred by the presence of the "Data Protection" bit set in 802.11 header as well as the abscence of a WPA or RSN information element. The use of a correct WEP key is inferred from the ability to decrypt frames at the receiver and verify the ICV (integrity check value). WEP is a legacy security protocol which can be cracked very easily and offers virtually no protection. Do NOT use WEP!
Static WEP with Shared Key Authentication
When an optional shared key authentication method is configured with static WEP, the access point and client exchange an additional challenge handshake and response to confirm that the client holds the correct WEP key prior to allowing it to associate to the AP. The desire to use shared key authentication is signalled within the 802.11 authentication request and response packets in the authentication algorithm fixed parameter field. The use of shared key authentication actually reduces the security of static WEP because versions of the same challenge text are transmitted over the air in both plaintext and hashed, allowing an attacker to recover the WEP key easier. Do NOT use WEP!
WPA/WPA2 Pre-Shared Key
When WPA or WPA2 is configured with pre-shared keys, the client and AP must be configured out-of-band with the proper passphrase, which is used as the master key. The client and AP exchange the 802.11 open auth and association frames before performing a 4-Way Handshake. The handshake facilitates the exchange of random information (nonces). The passphrase, station addresses, nonces, and SSID are all used to transform the master key into a series of sub-keys, one of which is the PTK used for actual data encryption. A WPA2 PSK network is simpler in that the users only require knowledge of the passphrase, but suffers from issues of scalability and is difficult to revoke access when all users use the same passphrase. Traffic from each user is uniquely encrypted, but knowledge of the passphrase along with observation of the 4-Way Handshake can allow any user to decrypt another user's traffic. WPA2 PSK is best used in homes and SMBs where there is a small user base, which is why it is commonly referred to as WPA2-Personal. It is also commonly used with VoWiFi deployment to prevent voice call disruption due to the excessive roaming latency involved with full authentication methods listed below.

Full Authentication & Roaming Methods
The following methods are what I classify as "full auth", meaning they perform a full 802.1X authentication process using a back-end AAA RADIUS server. When implemented without any optimization for fast roaming, these methods are used for both initial connection establishment as well as subsequent roaming between APs. These methods provide robust network security that is enterprise-ready, but the trade-off is much longer authentication time and roaming latency. It is typical for full authentication roams to take >600ms to complete, and can be longer depending on network architecture (e.g. authentication server is across a WAN circuit).

Dynamic WEP
The use of dynamic WEP is provided as a vendor proprietary feature by many manufacturers, and allows the use of 802.1X / EAP authentication with the WEP protocol. After successful 802.11 association, EAP authentiction proceeds using any supported EAP type (with Cisco LEAP being the most common). Unicast and broadcast WEP keys are assigned to the client by the AP using two EAPoL-Key frames after successful EAP authentication, which allows the network to remove reliance on statically configured WEP keys and the ability to dynamically assign unique unicast keys to each client. However, dynamic WEP still relies on the same flawed WEP protocol and does not remediate its inherent issues. There is no method to signal the use of dynamic WEP within the 802.11 frame, and relies on both the client and AP to be properly configured to support this process. (The use of LEAP authentication does use a proprietary Cisco information element, but is not required for dynamic WEP). Dynamic WEP was introduced into the market in Dec. 2000 with the release of LEAP authentication by Cisco. Dynamic WEP should NOT be used!
WPA/WPA2 Full Authentication
When WPA or WPA2 is configured with AAA authentication, user or device credentials are verified using a back-end authentication server. The client and AP exchange the 802.11 auth and association frames, then proceed with EAP authentication. Many different EAP authentication protocols exist and any one of them can be used depending on the customer requirements. EAP protocols require a lengthy communication exchange between the client and authentication server, typically 8 or or more round-trip frame exchanges, which creates significant delay in the roaming process. Since a client can only be associated to a single AP at a time, it must break its previously working data path prior to establishing a new data path. And EAP authentication sits as a large barrier in that path that must be overcome before application data can begin to flow again. Upon successful EAP authentication the AAA server and client derive a master key, similar to what was configured out-of-band in a PSK network, except that the master key is unique to this client session. The AAA server also sends a copy of the master key to the AP (or controller) acting as the authenticator. The AP and client then perform the familiar 4-Way Handshake to transform the master key into a temporal key used for actual data encryption. WPA2 full authentication is the basis for most enterprise Wi-Fi deployments because of the strong security offered. However, it creates significant latency that can disrupt real-time applications such as voice and video. The WPA certification program was introduced in 2003 by the Wi-Fi Alliance prior to final IEEE 802.11i amendment ratification in 2004. The WPA2 certification program was subsequently released in 2004 and expanded in 2005.

Fast Roaming Techniques
The following fast roaming techniques improve upon the full authentication methods by optimizing various steps in the authentication process. A full authentication method is required to establish the initial client connection, after which a fast roaming technique can subsequently be used when roaming between APs to minimize delay. Fast roaming techniques vary in their ability to minimize delay, with the goal to complete a roam in <100ms. Voice traffic typically sends frames every 20ms and requires roaming latency under 100ms to prevent call disruption.

Cisco Centralized Key Management (CCKM) (also called Fast Secure Roaming)
CCKM is a vendor-proprietary fast roaming algorithm developed by Cisco Systems, and is only supported on their access points, both autonomous and lightweight models. CCKM works by caching the encryption key derived after an initial authentication (DWEP EAPoL key exchange or WPA/WPA2 4-Way Handshake) on both the WDS Master and the wireless client. A WDS master role is assigned to a central point of coordination for all the APs in a group, and can be an Autonomous AP, WLSE, or newer wireless LAN controller. When roaming to a new AP, the client increments a re-key number and derives a new PTK key using the BSSID of the new AP it wishes to roam to. The client indicates CCKM support by including a Cisco proprietary information element that includes the next re-key number within the association request frame. The new AP requests the new PTK key from the WDS master and then replies to the client with the association response frame. CCKM reduces the time to complete a roam by removing the EAP authentication and 4-Way Handshake. Roam times can be <50ms in most cases.

CCKM was originally designed for use with LEAP authentication and WEP encryption, but can be used with other EAP authentication methods and encryption ciphers (TKIP or AES) as well. The use of CCKM is advertised by the presence of a vendor-specific AKMP (authentication and key management protocol) within the WPA and RSN information elements used in beacon and probe response frames. It is also indicated by a Cisco vendor-specific information element in 802.11 association request & response frames. Clients must support CCX version 2 at minimum to leverage CCKM with LEAP, version 3 for EAP-FAST, and version 4 for PEAP and EAP-TLS. CCKM was introduced into the market in 2004 with Cisco Autonomous software release 12.2(11)JA.
WPA/WPA2 EAP Session Resumption (also called Fast Reconnect)
Many EAP types used with 802.1X authentication rely on TLS security. TLS relies on a lengthy handshake negotiation to setup a secure communication path between the client and authentication server. This handshake requires a server-side certificate which usually results in authentication of the authentication server to the client. After the TLS handshake completes, the client must then authenticate itself to the server. EAP-TLS accomplishes this with a client-side certificate. Tunneled EAP types such as PEAP and EAP-TTLS use other less secure protocols such as MSCHAPv2 or EAP-GTC inside the tunnel to complete authentication without being directly exposed to an attack. Upon successful EAP authentication, the AP and client perform the familiar 4-Way Handshake to derive the PTK for data encryption.

Once a client has initially authenticated, the TLS session and resulting security context can be cached on both client and server. Upon subsequent re-authentication, the use of the cached TLS session allows use of simpler and shorter handshake process. Additionally, the existence of a valid cached TLS session implies a previously successful authentication, and many EAP types allow the inner client authentication to be skipped. Overall, this typically results in a 50% reduction in frame exchange to the backend authentication server during EAP authentication. The use of session resumption is transparent to the WLAN infrastructure and appears as a normal full 802.1X authentication. However, roam times typically require <300ms to complete, but may be longer depending on network architecture (e.g. authentication server is across a WAN circuit). Although a significant improvement over full 802.1X authentication, EAP session resumption is still not fast enough to support real-time applications such as voice over IP. However, it is well supported in the industry and is common on wireless networks.
WPA2 PMK Caching (also called Static PMK Caching or Fast/Secure Roam-Back)
The client re-uses a previously cached PMK Security Association (PMKSA) from a prior full 802.1X authentication with an individual access point. The PMKSA cache can also be built by pre-authenticating through the existing AP association to the new AP. Once the client roams it will send the PMK Identifier (PMKID) of the cached PMKSA to the access point in the RSN Information Element within the Re-Association Request frame. If the AP has the same PMKID cached it will skip the 802.1X authentication and proceed directly to the 4-Way Handshake. The end-result of a PMK cache roam is functionally equivalent to an OKC and Fast BSS Transition roam, clients just cannot re-use a single cache entry across multiple APs. PMK cache roaming typically requires <100ms to complete.

PMK caching is quite well supported by infrastructure and client devices alike. Unfortunately, its usefulness is limited by the fact that a client must have a cached PMKSA with each access point and this caching is not shared between APs within the same controller or AP group. Also, many clients and APs limit the amount of cached PMK entries due to memory utilization concerns. This means that it reduces how often it can be used, requiring full authentication to each AP the first time it associates. There is also a maximum lifetime for cached PMKSAs, after which time a full authentication is required again. PMK caching is therefore highly dependent on the traffic patterns of your clients. If they roam between the same set of APs most of the time, PMK caching could be a great benefit. If clients often roam to new APs throughout the network then PMK caching is less useful. Reference section 8.4.1.2.1 of the IEEE 802.11-2007 standard. PMKSA caching was introduced in 2004 with the ratification of the IEEE 802.11i amendment.
WPA2 Proactive Key Caching (PKC) (also called Opportunistic Key Caching)
PKC builds on top of the standardized PMK caching, but extends the re-use of a single cached PMKSA across all wireless access points connected to the same WLC or AP group. PKC is not a defined standard by the IEEE, and vendor implementations may vary.

PKC works by caching the PMKSA from an initial client full authentication at a central point of coordination for multiple access points, typically a WLC. When the client roams to a new AP within the same Extended Service Set (ESS), it "proactively" calculates a new PMKID for use with the new AP based on the BSSID of the new AP. The client then sends the newly calculated PMKID to the new AP in the RSN information element of the re-association request. Depending on vendor implementation, the AP will already have a cached PMKSA or PMKID pushed to it by the WLC, or it will query the WLC for the PMKID upon receiving the re-association request. If the AP derives the same PMKID as the client, it will skip EAP authentication and proceed directly to the 4-Way Handshake to derive a new PTK for data encryption.

Essentially, the client can re-use the cached PMKSA, but calculates a new PMKID for use with every AP without needing to perform a full 802.1X authentication. PKC roaming performs similar to both static PMK caching and PSK roaming, requiring <100ms to complete. However, support for PKC is highly variable within the industry, and despite favorable initial adoption by client manufacturers, support has been declining. PKC was introduced by Airespace, Funk Software, and Atheros in 2004, shortly after the ratification of the IEEE 802.11i amendment for robust security networks.
WPA2 Fast BSS Transition (FT)
Given the limitations of static PMK caching, and limited support for proprietary OKC and CCKM fast roaming techniques, the IEEE standardized fast roaming across an ESS with the 802.11r amendment, which was ratified in 2008. An AP advertises support for FT in a new Mobility Domain Information Element (MDIE) in beacons, probe responses, and (re)association responses. The client must also indicate support in an MDIE included in authentication and (re)association requests.

Fast Transition works by having the authenticator (typically a WLC) complete an initial successful 802.1X client authentication and derive a PMK-R0 for the client. The PMK-R0 is used to derive a unique PMK-R1 for each AP within the mobility domain. The authenticator then distributes the keys to other APs using a secure channel (which is not defined by the IEEE 11r amendment). During the initial authentication, the client performs full 802.1X authentication, completes the 4-Way Handshake to derive a PTKSA with the AP (using PMK-R1 key material), and then is allowed access to the network. However, upon roaming the 802.1X authentication and 4-Way Handshake steps may be skipped if a valid PMK-R1 for the new AP is presented by the client in the (re)authentication and (re)association request frames. Therefore, Fast Transition allows roaming faster than static PMK caching and OKC, and on-par with CCKM roaming, typically <50ms.

This is an extremely over-simplified explanation, but will suffice to understand how Fast Transition works. It is also important to note that 11r also allows FT over the distribution system, through the current AP to the new AP, similar to 802.11i pre-authentication. However, I will not cover that topic in this article.

* Notice that many of these fast roaming techniques are restricted to WPA2 only.

Layer 2 versus Layer 3 Roaming
Layer 2 roaming occurs when a client roams from one AP to another AP which both attach to the same client subnet or VLAN.

Layer 3 roaming occurs when a client roams from one AP to another AP which does not attach to the same client subnet or VLAN. If a client is required to acquire a new IP address, existing application connections break which has an adverse affect on network usability. Existing client sessions will either hang or eventually timeout and disconnect.

Wireless controller architectures help eliminate the need for layer 3 roaming by tunneling client data traffic from APs back to the controller as the logical client network attachment point. This way, APs can be spread across a larger physical or logical network environment without impacting clients. However, this also can only scale so large until APs attach to different controllers with different client attachment points, or unique requirements may dictate traffic forwarding to an altogether different network segment (e.g. guest termination in a secured DMZ).

All enterprise Wi-Fi vendors implement layer 3 roaming transparency to clients in order to eliminate the need for a client to acquire a new IP address. This is typically accomplished through coordination between APs or controllers within a logical group to tunnel existing client traffic back to a point within the network that can serve the original client subnet or VLAN. Examples include Cisco's concept of a wireless controller "Mobility Group", Aerohive's concept of a "Hive", etc.

Revolution or Evolution? - Andrew's Take
Roaming is easily one of the most convoluted processes within the Wi-Fi industry. The complexity involved between security requirements, standard and proprietary roaming methods, combined with fragmented infrastructure and client manufacturer support is staggering. Network administrators cannot predict how roaming will perform without observing live clients and analyzing the results. As networks grow and administrators increasingly lose control of devices attaching to the network, it becomes an almost impossible task to ensure adequate performance for every client.

Support for standardized fast roaming is long overdue! As I have written before, it's time for both infrastructure and client vendors to adopt 802.11r Fast BSS Transition. Real-time traffic flows require better performance than can currently be achieved. Support for proprietary fast roaming techniques such as CCKM are available, but are tough to come by for customers and even harder to push adoption by vendors. It's time to stop the marketing spin around WLAN controllers and fast roaming, as Marcus can attest. We have a solution, now we need adoption!

I'm from Nebraska, and as Larry the Cable Guy would say: "Git-R-Done!"

Cheers,
Andrew

Further Reading
For further reading on fast roaming techniques related to CCKM, PMK caching, OKC/PKC, and 802.11r, see the Cisco Aironet Configuration Guide, Cisco Voice over Wireless 4.1 Design Guide, CWNP RSN Fast BSS Transition (free registration required), and the IEEE 802.11r amendment.

For an example packet flow analysis, review my post on PEAP authentication which details the frame by frame exchange for two types of roams that are most common, a full EAP authentication and EAP session resumption.

Other Posts You Might Be Interested In:

Pages