Revolution Wi-Fi: Wi-Fi Roaming Analysis Part 3

This article picks up where we last left off in our discussion on Wi-Fi roaming. In part 1, I covered how connection control occurs in Wi-Fi, the importance of roaming, and what conditions are involved in triggering a client roam. Then in part 2, I dived into the many variations of Wi-Fi roaming and how they each work. Now that we know the background on how Wi-Fi roaming occurs in multiple different scenarios, it's almost time to dig in and get our hands dirty by actually capturing packets and measuring client roaming performance.

But before we an do so, we have one more topic to cover, namely - how to actually measure the roam. This may seem trivial, and really it isn't that difficult of a subject. However, it is important to establish the methodology we will use to provide consistent, repeatable, and comparable results. This will enable us to accurately compare roaming performance between different types of clients as well as across firmware, driver, and configuration updates on the same client or WLAN infrastructure.

Wi-Fi Roaming Analysis Series:

Part 1 - Connection Control and Importance of Roaming Analysis
Part 2 - The Many Variations of Wi-Fi Roaming
Part 3 - Methods of Measuring Roam Times
Part 4 - Analysis with Wireshark and AirPcap
Part 5 - Analysis with Wildpackets Omnipeek (coming)
Part 6 - Tips for Roaming Performance Improvement (coming)

Measuring Roam Times
There are several different methods by which we can actually measure the roaming event. Variations exist because organizations, wireless professionals, and software tools have different views on what constitutes a completed roam (from beginning to end). Different methods of measurement may be applicable in different scenarios, but what is important is to maintain consistency in approach in order to establish a baseline in performance and be able to accurately compare results to one another.

The following are common methods used to calculate the duration required for a client to roam from one AP to another AP:

Between 802.11 Encrypted Data Packets to/from the client on the old and new AP
This method focuses on the analyzing the impact of roaming delay to the application(s) running on the client device. By analyzing the amount of time between the last data frame transmitted on the "old" AP and the first data frame transmitted on the "new" AP, we can get an idea as to the latency that is experienced by applications. This can be useful to understand how roaming latency might impact software development and internal application timers that may result in timeout errors or otherwise disrupt network applications.

However, one drawback to this method is that it reflects not only the actual wireless roaming latency, but also any idle time between frame transmissions if the application does not have data buffered for transmission. This can inflate perceived roam times based on application behavior. For example, many applications are bursty in nature, and measuring roam times between data frames could result in a large amount of time being included that is simply client idle time since no frames have been sent by the application for transmission. Even in the "best-case" scenario of where application behavior is consistent, such as a VoIP G.711 call that sends frames every 20ms, this can still cause imprecise measurement. Historically, this has not been a problem when a large percentage of clients did not support fast roaming methods and roaming times were comparatively large. An inaccuracy of 20ms or less would not be a substantial factor in a 500ms - 1sec roam time. But as newer clients support 802.11r and Fast BSS Transition, roam times are likely to be 50-100ms. A measurement error of 20ms could represent 20-40% of the calculated roam time.
Between 802.11 Probe Request through EAPoL-Key (or Association Response)
This method focuses solely on the wireless roaming latency, removing application behavior from the calculation. The calculation begins with client probing to discover candidate APs to which it can roam, and typically completes with the final EAPoL-Key frame (but may vary based on which type of roam was completed; for instance with 802.11r Fast BSS Transition the final frame required for a successful roam is the Association Response. See Part 2 of this series).

However, the drawback with this method is that the calculation may be inflated by including the time required for client probing. Since client probing does not always reflect an actual roam event (many clients probe periodically to maintain a list of APs to minimize discovery time when they actually need to roam) and probing behavior varies between manufacturers and even driver versions, this can result in the inability to accurately compare roaming performance between clients or software upgrades.
Between 802.11 Authentication Request through EAPoL-Key (or Association Response)
This method is nearly identical to the previous method, but it omits client probing from the actual roam time calculation. Many times, when this method is used, the client probing is still listed for informational purposes in order to better understand the client behavior. Therefore, the roam time calculation is limited to the time it took the client to move to the new AP once it decided that a roam was required. Measurement is typically performed from the client 802.11 Authentication Request through the final EAPoL-Key frame (or Association Response, depending on the type of roam. See Part 2 of this series).

It should also be noted that 802.11 Authentication does not always indicate a roaming event; Wi-Fi clients are allowed to perform 802.11 authentication with multiple APs at once, but can only be "associated" to a single AP at a time. Therefore, look for 802.11 Association Request frames to positively identify a roaming event, then look backwards from that point to identify when the client both probed to discover the AP and performed 802.11 authentication to the AP (different than 802.1X/EAP authentication).

None of these methods are better than the others, simply different ways of measuring the time required for a roam to complete.

It may be helpful for you to reference a simple Wi-Fi connection ladder diagram. At what point do we start measuring roam, and and what point has the roam completed? Which method of measurement is most useful for my analysis?

Measuring Wi-Fi Roam Times

Also, consider which EAP type you have implemented in your network. Different EAP methods require different application flows to complete authentication and can impact the roaming performance for clients that do not support fast roaming. For reference, here is a PEAP authentication packet flow for a client performing a full 802.1X/EAP authentication on a Wi-Fi network.

Personally, I typically measure Wi-Fi roaming times between the 802.11 Authentication Request and the final EAPoL-Key frame (or Association Response). However, this is simply my preference because I often like to compare roaming times between different clients that may be running different applications. By eliminating the Data frames from my calculation, it is easier to directly compare the Wi-Fi driver stacks and radio performance of the clients.

In the next article on this topic, we'll dig into actual packet captures. Stay with me you packet analysis junkies!

Cheers,
Andrew

Pages

Monday, December 31, 2012

Wi-Fi Roaming Analysis Part 3 - Measuring Roam Times

1 comment: