Friday, May 31, 2013

Apple iOS Fast Roaming with Aerohive Wi-Fi APs

Well folks, after what seems like an eternity, true standards-based Wi-Fi fast roaming is really here! I blogged back in December that Apple iOS version 6.01 added support for fast roaming with 802.11r and 802.11k. And WLAN infrastructure vendors have added support as well, with Aerohive 6.0 and Cisco 7.2 code releases.

Recently, I had the opportunity to test this functionality out on an iPhone 5 and an iPad mini with an Aerohive WLAN. I'd like to share my results with you... and I can tell you that you won't be disappointed! How do 8.5ms roams sound?!


Apple iPad Fast Roaming (1) and Aerohive AP Neighbor Report (2)
As you can see, the iPad completes the roam in 8.5ms, the time it takes to complete the 802.11 authentication and reassociation; no full 802.1X authentication, RADIUS TLS session resumption, or even 4-way handshake are required! This is the result of support for the Wi-Fi Alliance Voice-Enterprise certification on both the WLAN and client. In the tests that were captured, the WLAN was configured for WPA2-Enterprise with 802.1X authentication and dynamic keying. The initial client association resulted in a full 802.1X authentication with the RADIUS server, followed by fast roams as shown above.

Roaming with 802.11r (Fast BSS Transition) is noticeably faster than other proprietary fast-roaming methods (OKC/PKC) and it's also faster than roaming on a Pre-Shared Key (PSK) WLAN. This is because the 4-Way Handshake exchange can be eliminated by embedding the key derivation material (ANonce, SNonce, MIC, and GTK) within the Fast Transition Information Element inside the 802.11 Authentication and Reassociation frames. There is also a Mobility Domain IE that comes into play to distinguish boundaries between different WLANs (since key material must be exchanged between APs on the backend, two separate WLANs cannot facilitate fast roaming).

Here's a look at the Fast Transition IE inside the Reassociation Response (frame 18) from the AP to the iPad:

802.11r Fast Transition Information Element

You may want to review my previous post on The Many Variations of Wi-Fi Roaming to compare the frame exchanges required with each roaming method, CWNP's whitepaper on Fast BSS Transition [PDF] and blogs (here and here) to understand the key hierarchy and exchange between the initial AP authenticator (PMK-R0) and subsequent APs (PMK-R1).

Immediately after the fast roam completes, the Apple iPad submits a Neighbor Report Request within a Management Action frame. In essence, the client is requesting a list of all the neighbors from the AP in order to build a list for future roaming events. This report can be requested on-demand by the client and can help improve roam times by reducing or eliminating the need for the client station to actively scan off-channel. This way, the client has a list of nearby APs that is always up-to-date and can quickly move to another channel where it knows another AP is waiting.

Here is a look inside the Neighbor Report sent back to the iPad from the Aerohive AP (frame 21):

802.11k Neighbor Report
Unfortunately, Wireshark does not yet have a protocol dissector for 802.11k neighbor reports, so manual decoding must be performed. You can see that the Category Code is 5 (Radio Measurement) is used. Inside the tagged parameters lies the neighbor report details, which contains an element for each neighboring AP in the same WLAN and details about the AP such as it's BSSID and channel number which I have highlighted above. In this case, there is one neighboring AP with BSSID "08:ea:44:78:14:28" and it is operating on channel 161 (0xA1 in hexadecimal). Other information in the report includes AP's reachability, security policy (similar or different), and capabilities for spectrum management, quality of service, power save, block acknowledgements, and PHY type (802.11a/b/g/n).

There are three IEEE 802.11 amendments that come into play which are all bundled up in the Wi-Fi Alliance Voice-Enterprise certification.

Standards and Certification Recap
The core of fast roaming was drafted in the IEEE 802.11r amendment, defining "Fast BSS Transition"  or just Fast Transition (FT) for short. The name is derived because every individual AP radio cell is defined as a "Basic Service Set (BSS)" in the standard, and the amendment defines a method for client stations to transition (also called roaming) very fast between AP radios. It accomplishes this by defining a Mobility Domain comprised of a set of BSSs (APs) within the same Extended Service Set (ESS, otherwise known as an SSID) which have been validated. Validated APs must coordinate with each other to exchange client station details, including pairwise master key (PMK) encryption material, and perform pre-authentication of the client prior to the roam. This speeds the client roam by eliminating the need to re-authenticate the client through 802.1X/RADIUS or having to perform the 4-Way Handshake to derive pairwise transient key (PTK) encryption key material even in the case of a simple PSK network. The 802.11r amendement was ratified in 2008.

The IEEE 802.11k amendment on "Radio Resource Measurement" defines methods for information exchange about the RF environment between APs and client stations. The goal is to enable the client stations to understand the radio environment in which they exist so that they have more information to make correct decisions about roaming and performance. Stations can take radio measurements locally, request measurement by other stations, or have measurement requested of them and return the results. One interesting aspect for fast roaming is the Neighbor Report, where a client can request an AP to measure and report the neighboring APs which are available within the same Mobility Domain, including several pieces of operational information about each neighbor such as: BSSID, channel, security policy, and capabilities for QoS, APSD (power-save), BlockAck, spectrum management, and PHY type (802.11a/b/g/n). Some other reports available with 802.11k include: channel load, noise histogram, location configuration information, link measurement, and traffic stream measurements. The 802.11k amendment was ratified in 2008 as well.

The IEEE 802.11v amendment on "Wireless Network Management (WNM)" defines methods for stations to exchange information for the purpose of improving overall performance of the wireless network. Where 802.11k is concerned only with the radio environment, 802.11v expands it to include broader operational data surrounding existing network conditions allowing stations to be more cognizant of the topology and state of the network. There are a multitude of WNM services, the most interesting (for me, at least) is the BSS Transition Management capability, whereby an AP can request a client to roam to another AP for better performance or capacity. Some other services include: co-located interference, diagnostic reporting, directed multicast services, location services, multiple BSSID capability, proxy ARP, QoS traffic capability, and traffic filtering service, to name only a few. The 802.11v amendment was ratified in 2011.

Each of these amendments define numerous capabilities, of which I will only scratch the surface in this post to highlight a few. If you are interested in learning more about the services defined in each of the amendments, visit the IEEE Get Program website to download the 802.11-2012 standard, or search the web for PDF versions of each amendment.

Aerohive WLAN Configuration
Prior to being able to test and execute a fast transition (FT) roam, you need to configure the WLAN infrastructure to support the 11r/k/v features. In Aerohive HiveManager, navigate into the Configuration section and edit the SSID on which FT roaming should be supported. In the Advanced section of the SSID configuration you will see two sections, one for WMM and one for Voice Enterprise.

Aerohive Voice-Enterprise Configuration (IEEE 802.11r, k, v)

Upon checking the first check box for Voice Enterprise, you will be presented with the following notice, informing you that Voice-Enterprise requires 802.11rkv and WMM AC-Voice which will all be enabled automatically.


You may have also noticed the note which states 802.11r requires WPA2 key management. This is because 802.11r advertises FT support in-part through the Authentication and Key Management (AKM) suites in the Robust Security Network (RSN) Information Element, which was included in the 802.11i amendment and WPA2 certification program. Pre-standard WPA did not include the RSN IE and therefore cannot support fast transition. So make sure you're using WPA2 (with either 802.1X or PSK) on the SSID as well.

Save and upload this configuration to at-least two APs, which will then begin including the Mobility Domain IE, Fast Transition IE, and Radio Management capabilities in beacons and probe responses to advertise these capabilities to clients.

Note - no explicit configuration is required to enable Voice-Enterprise on Apple iOS devices. Simply run iOS 6.01 or later and join a Voice-Enterprise enabled WLAN.

Client Limitations
In addition, the RSN IE which advertises encryption ciphers and authentication and key management (AKM) methods in-use on the WLAN to clients now includes a new AKM type to advertise Fast Transition key management. Some existing client drivers have issues parsing the RSN IE with additional AKM and will fail to association to the WLAN - in fact, they won't even try. Until client drivers are updated by manufacturers to support this addition AKM type, they will be unable to join any SSID that has Voice-Enterprise (specifically 802.11r) enabled.

Therefore, it is recommended to create a separate SSID specifically for Fast Transition capable clients and migrate them over to the new SSID.

Final Thoughts
Wi-Fi roaming performance has been a painful sore spot on the industry for many years. Problems were initially obscured through the use of open or WEP encrypted networks where roaming was relatively fast due to the simple security models implemented. However, as security improved with 802.11i and WPA2-Enterprise, roaming performance became a glaring issue, often taking >500ms or worse! This impacted the usability of real-time applications on an enterprise WLAN, forcing many network administrators to rely on less-secure PSK security methods.

Some vendors responded with proprietary fast roaming methods such as CCKM, OKC, and PKC. However, this served to fragment the industry and support for these methods were spotty at best. The IEEE thankfully stepped in and ratified the 802.11r amendment in 2008, yet it has taken nearly 5 years since then for enough momentum to build to finally implement standards-based testing and certification of fast roaming through the Wi-Fi Alliance Voice-Enterprise certification program.

However, now that standards-based fast roaming is here, IT IS GLORIOUS! I applaud Apple for being an advocate for fast roaming and implementing it into their iOS platform, likely because their devices get blamed for poor performance all the time. I encourage other mobile device manufacturers to follow suit, especially if their devices are used with real-time voice or video applications.

Cheers,
Andrew

15 comments:

  1. Could you do some comparison without 802.11r activated (e.g. with a prior firmware version)?
    I mean, the roaming time of 8.5ms you're pointing out is actually not a roaming time at all. It says absolutley nothing about a whole roaming process.

    A roaming process is the time-delta of the last acknowledged packet on the old channel to the first acknowledged packet on the new channel/ap - somewhere defined in the never approved ieee 802.11t standard.
    So this means, it is the the whole time, where no (ack'ed) datapackets with a payload are traveling over the air.
    The sequence you are showing is just a small part of this process, how fast an AP can handle a reassociation + the (re-) keying of a station.

    ReplyDelete
    Replies
    1. I dug up a previous packet capture to do a comparison. Previously I've measured roam times around 489ms using the same method, #3 as described below.

      There is no standard method of measuring roam times, as I have previous described three different methods in this post:
      Wi-Fi Roaming Analysis Part 3 - Measuring Roam Times. Some people prefer to use the Data-to-Data frames as you obviously do. Personally I don't because that can be highly variable based on the application behaviour on the device itself, and across different applications, which can inflate roam times by adding in portions of time that have nothing to do with actually moving the Wi-Fi association between APs. In order to be confident in measuring Data-to-Data frames you need to intimately understand the application behaviour and ensure consistent measurement when that application is being used. I see this method mainly used when a specific VoIP client or custom developed application is used. But the method you decide to use is really personal preference and as long as you use one method and stick to it for consistency you should be okay.

      For this specific roam event detailed in the post, the roam time using each of the 3 methods was the following:

      1) Data to Data frames = 6.000 seconds flat (but no real-time application was being used at the time and regular application traffic could not be assumed.) A subsequent fast roam on the same iPad yielded this time as only 15.3ms, highlighting the variability based on application behaviour!

      2) Probe Request to Reassociation Response = 69.5ms. This is interesting because the iPad had not yet built the neighbor AP list, showing that active scanning takes time to complete. Compared with subsequent roams which were well under 16ms once the neighbor list was built shows the benefit of 11k neighbor reports!

      3) 802.11 Auth to Reassociation Response = 8.5ms. This is the most consistent method of measuring roams in my opinion, and I use it for most client devices unless it's a single-function device where the application behaviour is consistent - such as VoIP phones.

      Cheers,
      Andrew

      Delete
    2. Thanks for the detailed answer - I appreciate this very much.

      First, I have to agree with you, of course there is no standard method for the measurement of roaming times. But IEEE-802.11T describes a proper way to do this IMHO. But nevertheless you can do it the way you do, but keep in mind you are judging a whole roaming process by just a small piece of it - which is a bit misleading. Important things you leave aside for example: the HF-channel changover, assuming the APs are operating on different radio frequencies/channels.

      In research I did myself (different devices from another manufacturer without 11r/11k, because the devices don't the support this ATM), the period you're showing is taking up just a very minor part of the total roaming time. Compared to some of my measurements I got an optimitzed total time of under 2ms from auth-request to re-ass.-response. A following 4-way handshake for a WPA2 authentication was typically around 30 ms.
      So this concludes the 8.5ms aren't so impressive for me, as for you. Impressive is just the fact that the key exchange is now included in this short time period.

      You're absolutely right about the used application: each application, generating the datapackets traveling across the air to measure, has it's own behaviour. Some people, like me prefer iperf for that.

      But the primary point which is annoying me: users don't really care about a particular part of a roaming event. They want their application to work + fast roaming. People or even other blogger could be mislead by this post and expect an superior absolute roaming time of 8.5ms with their iPad 3, which in fact isn't true and not pointed out adequately.

      As a previous commenter asked: could you please provide those traces? Especially a trace of the subsequent 15.3ms iPad test, including a statement on the network load/utilization would be interessting for me.
      And if you put the application into the play this would also mean considering the environment and so on. So there is a huge lack of detailed technical background information on the driven test in this post, resulting in some - no offense! - catchy and sensational roam times IMHO ;)

      Delete
    3. Hi Jan,
      My comment is too long… so I'm going to break it up over two comments :)

      I think you are focusing on the wrong aspect of the method used to measure the roam time. The 11r amendment highlights the improved roaming time no matter how you measure it.

      The biggest benefit that I see is that you no longer need to make a compromise between high security with 802.1X authentication and dynamic keying (which then takes much longer to roam) and much lower security with PSK (which is then faster to roam). The ability to use 802.1X authentication while still enabling fast roaming is the key benefit. Now there is no more compromise between the two. You can have the highest security with the best roaming performance as well.

      Also, there is no way to accurately measure the channel changeover and using Data frames adds in way to much other time that is not attributable to that. And when you say you received a roam time of 30ms for auth/assoc/4-way handshake likely implies to me that you were using a lower-security PSK network. This is not where the big benefit comes into play. You would see much higher roam times if you were using an 802.1X authentication network, even with TLS session resumption enabled on both client and server.

      I fundamentally disagree with you that measuring roams this way is incorrect. I've been using this method successfully for over a decade to compare performance across device types, software versions, etc. with great results. The key is that you establish a baseline and measure the roams consistently every time so you're comparing historical results with new results accurately.

      I will use the data frame measurement method when I know the specific application and it's behavior. And because I understand it's behavior, I can accurately calculate the roam time by subtracting the packetization delay of the application (time between consecutive packets). For example, if I'm measuring a VoIP handset using G.711 it sends frames every 20ms by default. If a client initiates an 11r roam immediately after a voice packet was transmitted, then I know if the roam was greater than 20ms then the data to data frame time represents the entire roam time. But if the roam wasn't initiated until 19ms after the last packet transmission, then my roam time will be inflated by 19ms and that time needs to be subtracted from the roam time calculated between data frames. This is an easy example because I know the packetization delay of the G.711 VoIP codec. Now imagine doing data to data roam calculation with a random data application that is much more bursty in nature (which most data applications are). Your roam times will not be accurate (not anywhere close actually) because there is no standard packetization delay, it's random and bursty. It is for those clients and bursty data applications that I use the 802.11 auth/assoc/EAPoL/4-Way Handshake measurement method because it provides a consistent measurement of the client's radio behavior and is not skewed or inflated by the bursty data application which has nothing to do with the roam itself. I find it interesting that you have used iPerf for this scenario. Have you measured the packetization delay of iPerf on various client platforms? Also, what do you do on clients that don't support iPerf? I may look into trying this method out - so thanks for the idea.

      (1/2)...

      Delete
    4. (2/2)...

      To the point that annoys you: this blog is meant for very technical readers. I would not expect a common end-user to be reading my blog and interpreting that 8.5ms roams are what they will get every time. I agree that end-users just want their application to work. It's the job of the network engineer to understand how things work underneath the covers, which is why people read my blog. I went to great lengths in my opinion to detail the roaming process and the steps that have been eliminated between different types of roams and an 802.11r Voice-Enterprise roam. That's why I explicitly linked to my previous writing on this subject so that I didn't have to repeat what I've already stated on this subject - hopefully you took the time to go read those posts :)

      Again, to stress this, the main benefit is you don't have to compromise on security in order to get roaming performance. I know many networks that support real-time applications like voice over IP that have resorted to using PSK because the roaming performance with 802.1X is not acceptable. The addition of 11r support eliminates that trade-off.

      Also, I cannot provide the packet traces since this was captured on a private network. But I can tell you that it was in a common office environment with a neighboring WLAN. The roam time with the iPad was easily reproducible and I measured it over a dozen times, with over half of them resulting almost the exact same time of 8.5ms. So, in my estimation, this is not misleading but very accurate.

      Cheers,
      Andrew

      Delete
  2. Thanks for a nice summary of your findings. Will you be sharing the pcap files just like what you did for the roaming analysis series.


    ReplyDelete
  3. Andrew,

    All of your blogs are awesome, but this one is...well, awesomeR. :)

    Thanks for the details, the diligence, and the devotion.

    Devinator

    ReplyDelete
  4. That's a great read Andrew.

    Its a shame that the iPad2 doesn't support 802.11k/r given lots of schools/students buy them

    ReplyDelete
  5. Yes, one more request for packet captures :)
    Very informative post btw. Thank you very much.

    ReplyDelete
  6. How about support within Android?

    ReplyDelete
    Replies
    1. Hi Daan,
      I have not seen support for 11k/r/v yet within Android, but I will keep my eye out for it.

      Andrew

      Delete
  7. I woud love fast roaming. However, I'm not sure it's worth spinning up an entirely separate SSID for. How would you even analyze that decision, unless you were doing some sort of dedicated VoIP SSID already?

    ReplyDelete
    Replies
    1. Hi Andrew,
      Thanks for the detailed explanation. When I do the test with iPhone 5 ( iOS7) I don't see the neighbor report enabled in the RM enabled capabilities tag in the association request frame. But I can see Link measurement, beacon measurement and AP channel Report capability in the association request frame. Does this configuration help in fast roaming. Our AP supports only neighbor report in 802.11k

      Delete
    2. Hi anneypala,
      I see the same thing in my capture, the iOS7 device does not support "sending" a neighbor report. But it does support requesting a neighbor report from an AP. Per IEEE 802.11-2012 standard, sections 8.4.2.47 and 10.11.10.3, the "dot11RMNeighborReportActivated" MIB pertains to access points and their capability to send a Neighbor Report Response in response to a station sending a neighbor report request.

      The 802.11k radio measurement services, and specifically the Neighbor Report, can improve fast roaming by providing the client with more information on the surrounding RF environment. By pre-populating a list of nearby APs that are in the same ESS, the client can reduce the amount of time needed for off-channel scanning to discover an alternate AP to roam to.

      Cheers,
      Andrew

      Delete
  8. I tried my iPad air to connect to one AP whose beacon/probe response contain mdie.
    I didn't see my iPad send out association request with mdie or with RSN fast transition auth key mgmt.
    Do I need to setup anything in device ?

    ReplyDelete