Thursday, January 10, 2013

Wi-Fi Roaming Analysis with Wireshark and AirPcap

This article is part 4 in the Wi-Fi roaming analysis series. In this post, we'll take the concepts we've learned in the first three articles and apply them in a live environment by performing a wireless packet capture and analyzing the roaming performance of an actual client device.

Remember from part 1 that roaming analysis provides insight into how decisions made on wireless architecture, network design, client selection, and configuration impact overall network performance. Performing Wi-Fi roaming analysis will enable network architects and engineers to:
  1. Baseline current client roaming performance
  2. Analyze gaps between current network performance and application requirements
  3. Identify opportunities to improve and optimize performance
  4. Implement changes to infrastructure and client devices to optimize performance
  5. Take more active control to ensure network performance matches desired service levels
Throughout this blog post and the next, I will be using actual roaming events that I captured with my iPhone as an example. You can download and open this packet capture if you want to follow along.

Wi-Fi Roaming Analysis Series:
  1. Part 1 - Connection Control and Importance of Roaming Analysis
  2. Part 2 - The Many Variations of Wi-Fi Roaming
  3. Part 3 - Methods of Measuring Roam Times
  4. Part 4 - Analysis with Wireshark and AirPcap
  5. Part 5 - Analysis with Wildpackets Omnipeek (coming)
  6. Part 6 - Tips for Roaming Performance Improvement (coming)
Hardware Requirements
In order to perform wireless roaming analysis, you will need multiple wireless adapters to capture frames simultaneously on different channels. The hardware required  typically includes 3 wireless adapters and a USB hub. I have had good success with the Rosewill RHB-330 USB hub. Be sure to check the supported adapters list for the protocol analyzer software that you intend on using to capture and analyze the traffic.

Scanning between channels with a single adapter is not sufficient because the adapter will miss frames transmitted on alternate channels. If the scanning duration (also called dwell time) is set to a small value then the adapter will likely miss frames related to the roaming and authentication exchange because it hops away to a different channel before the roam completes. And if the scanning duration is set to a large a value then there is a good chance the adapter will be on the wrong channel when the roam occurs, as well as the inability to calculate roam times between data packets on the "old" and "new" AP as discussed in part 3 of this series. Remember, scanning ALWAYS results in missing frames. If you are scanning 3 channels, then you can only capture 1/3rd of the frames (actually less due to hop time between channels). So remember, never use channel scanning for protocol analysis!
Devin Akin originally covered multi-adapter Wi-Fi captures in his CWNP whitepaper titled The Triple Blendy. It is worth your time to read (even almost 4 years later... it has stood the test of time)!
The selection of a supported wireless adapter model for use with Wireshark can be tricky. This is because differences exist between operating system platforms which may prevent the ability to capture all wireless frames over the air. You will want to make sure that the adapter you use supports capturing in "Monitor Mode" not "Promiscuous" mode. This usually requires the Wi-Fi adapter to be disconnected from the network. Unfortunately, Microsoft Windows is very limited with regard to monitor mode support. For this reason, engineers typically take one of two approaches to capture Wi-Fi traffic with Wireshark:
  1. Use a Linux Distribution with custom Wi-Fi drivers. Many Wi-Fi and Security engineers use the Backtrack distribution coupled with a compatible wireless card. However, when multiple simultaneous captures are required, separate instances of Wireshark (or Tshark, the command-line version) must be run. The capture files must then be merged together, typically using the Mergecap tool included with Wireshark. This can be tedious and more time-consuming for everyday use.

  2. Use Windows with AirPcap adapter(s). The benefit of this approach is easier capturing because many engineers are unfamiliar with Linux. Engineers also do not have to run separate Tshark instances to capture each Wi-Fi channel and subsequently merge the files together since AirPcap software includes a virtual channel aggregator that can be selected for capture within a single Wireshark instance. The drawback is that the AirPcap adapters do cost money, significantly more than standard Wi-Fi client adapters that could be used with Linux.
For this article, I will be using Windows with three AirPcap Nx adapters. I won't cover the installation of Wireshark or AirPcap software since they are both straight-forward. It is also helpful to label the wireless adapters with the slot on the USB hub that they have been installed on. This will help prevent you from subsequently plugging them into a different USB slot causing device discovery and driver installation again by Windows.

Setting Up the Packet Capture
To begin capturing Wi-Fi frames, we first need to configure the channels that our AirPcap adapters will be listening on. Open the AirPcap Control Panel and select the AirPcap Multi-Channel Aggregator as the interface. In the 'Basic Configuration' section below you should see a greyed-out list of channels that the adapters are currently set to use. If these channels need to be changed, select each individual interface from the list and configure the channel. If you need to configure 40MHz wide channels, select an extension channel either above (+1) or below (-1) the primary channel. Leave all other settings at defaults (as pictured below).

AirPcap Multi-Channel Aggregator Setup

Next, launch Wireshark and navigate to the Interfaces dialog from Capture menu. You should see multiple network interfaces listed, including the AirPcap Multi-Channel Aggregator.

Wireshark Capture Interfaces
You can review the capture options by clicking the 'Options' button next to the adapter you plan on using. On wireless networks, you will typically want to disable promiscuous mode (since we want to capture in monitor mode instead). I also never use a capture filter because I like to make sure that I'm capturing all of the frames over the air. Instead, I rely exclusively on display filters applied after the capture to narrow down the packet list to only those I care to analyze. But I always like to capture everything!

Wireshark Capture Options

Start the capture from either the Interfaces or Capture Options dialogue windows and proceed to physically follow the wireless client station as it roams between access points.

A quick note should be made covering proper placement of the protocol analyzer workstation. Since the objective when performing roaming analysis is to capture all frames to and from the wireless client(s) under test, the protocol analyzer should be positioned near the client(s) rather than near an AP. Since wireless frames are encoded at a variable data rate, it is common for wireless protocol analyzers to receive frames that they cannot decode since the signal strength or SNR may be too low. Therefore, by positioning the analyzer nearest the client(s) you increase the likelihood of successfully receiving all frames both from and to those clients.
On a related note, to analyze the efficiency of wireless communications with a protocol analyzer, focus on the Wi-Fi retransmission rate rather than looking at FCS error rates since the FCS rate can be inflated simply because the analyzer workstation is not able to successfully decode all the wireless frames that it can hear in the environment. This is a common mistake many network engineers unfamiliar with Wi-Fi make.
Filtering for Frames of Interest
Display filters in Wireshark can be used to identify frames of interest for Wi-Fi roaming events. A display filter can be applied either during the wireless capture or after stopping the capture. Applying a display filter during the capture can help you ensure that roaming events are occurring and being captured by the protocol analyzer workstation. In this case you would want to filter only on frames that signal a roaming event to minimize scrolling in the live view. This is useful to avoid capturing a large amount of data only to find out that the client did not roam between APs or the workstation did not correctly capture the frames.

There are a couple of different methods to approach filtering to identify and analyze wireless roaming events that I recommend. I actually use both methods in succession, but feel free to find a workflow that works for you.

The first method that I use is to filter the packet capture on wireless association and reassociation frames, since those frames signal a new connection between a client and AP. To quickly find the roaming events within a capture file, filter the packet list for 802.11 Association Request frames or 802.11 Reassociation Request frames using the following display filter (the OR logic is denoted by the use of two pipe '||' symbols):

(wlan.fc.type_subtype == 0x00) || (wlan.fc.type_subtype == 0x02)

Take note of the packet numbers to reference where the roaming events occur in the capture list, then clear the display filter before continuing. In the example packet capture, these include frame numbers 48, 49, and 808.

Wireshark Display Filter for Wi-Fi Associations

The second method that I use is to filter the packet capture on a single wireless client station in order to analyze the roaming performance of a single device (or focusing your analysis on one client at a time). This is helpful after the roaming events have been identified using the first filtering method and corresponding packet numbers relating to roaming events have been recorded. This second filter then allows me to focus on a single wireless client and declutter the list by removing frames related to all other APs and clients. To narrow the frame list to a single wireless client, filter for the MAC address of the device in question using the following display filter (replace with the actual MAC address of the client):

wlan.addr == 0c:77:1a:c1:24:a2

This can be automated very easily by opening the Statistics > Endpoints table in Wireshark, navigating to the WLAN tab, right-clicking on the desired device and selecting "Apply a Filter > Selected".

Apply a Device Filter in Wireshark from the Statistics > Endpoints Menu

You can also combine the two filters in order to only view roaming events related to a specific client, which is very useful when capturing in a live environment with multiple clients in the area. The display filter would combine the two component filters with AND logic, denoted with two ampersands '&&' as follows:

(wlan.addr == 0c:77:1a:c1:24:a2) && ((wlan.fc.type_subtype == 0x00) || (wlan.fc.type_subtype == 0x02))

Wireshark coloring rules (found in the View menu) can be quite helpful when filtering on all of the frames from a single station. This allows the engineer to quickly scroll through the list to find frames that are related to roaming or other interesting events. Here is a list of sample coloring rules that I have written for Wireshark to highlight various wireless events.

Wireshark Coloring Rules for Wi-Fi
Note that the order of rules is important because the first matched rule is applied and processing stops. Therefore, given the ordered rules below, a retransmitted frame would always be colored yellow regardless of what type of frame it is. Likewise, unencrypted data is colored orange, but since EAP frames are highlighted with higher precedence they are colored green instead. Unencrypted wireless data frames are useful to pick out in a trace file to identify Null Data frames that are commonly used by clients to enter and exit a Wi-Fi adapter power-save state for battery life improvement.

Using these coloring rules, I can simply scan the list visually for frames that are highlighted in green to find information related to roaming events. Here is an example where I can clearly pick out encrypted QoS data frames (not colored), client probing (blue), and client roaming (green):

Wireshark Colored Frame List

Manually Analyzing Roam Times
Once you have a display filter applied that limits the displayed list of frames to a single wireless client, the next step is to calculate the amount of time elapsed during each roaming event for the client. After reading part 2 in this series you should be able to quickly identify which roaming variation is occurring for each roaming event just by looking at the list of frames, and from part 3 in this series you should have picked a preferred method for analyzing roam times because you'll want to stick with it.

In order to manually analyze a roaming event, we need to set a time reference. This allows Wireshark to automatically calculate the time differential of subsequent frames relative to a reference frame. What you want to do is set the time reference as the first frame of the roaming event, then identify the last frame of the roaming event and look at the 'Time' column for that frame, whose value will be equal to the time elapsed since the reference frame. That's your total roam time! If you want to go further, you can even break down the time elapsed for each portion of the roam, such as probing, 802.11 auth, 802.11 association, EAP authentication, and EAPoL-Key (4-way handshake).

To set a time reference in Wireshark, highlight a frame, right-click to bring up the menu and select 'Set Time Reference (toggle)'.

Setting a Time Reference in Wireshark

In our example frame capture, three association frames were identified using the first filter method. However, there are actually only two association / roaming events since frame #49 is a retransmission of frame #48. So... event #1 occurs around the association in frame #49 and event #2 occurs around frame #808.

Event #1 - A Full EAP Authentication
The first event is the client's initial connection to the WLAN, during which it performs a full EAP authentication since no data frames were captured prior. Using each of the three previously described methods for roam time measurement, this event could be measured as follows:
  1. Data frame to Data frame: not applicable since no previous data frames exist.
  2. 802.11 Probe Request through EAPoL-Key: 1.951 sec
    Set the time reference as frame #1; final time measured on frame #79.
  3. 802.11 Auth Request through EAPoL-Key: 131 ms
    Set the time reference as frame #46; final time measured on frame #79.
Wireshark Roam Time Analysis - Full EAP Authentication

Event #2 - A Client Roam using EAP Session Resumption
The second event is the client roaming between two APs on the same WLAN, during which it performs EAP session resumption with the RADIUS server. You can identify that it is using EAP Session Resumption because fewer packets are exchanged during the EAP authentication, and if you dig into frame #817 (as shown below) you will see a TLS session ID that is included in the 'Client Hello' message to re-use a previously established TLS session (contrast this with frame #57 which did not include a session ID). Also observe that the session ID established by the server in frame #58 matches the same session ID re-used by the client in frame #817.

TLS Session ID Re-Used for EAP Session Resumption

Using each of the three previously described methods for roam time measurement, this event could be measured as follows:
  1. Data frame to Data frame: 1.549 sec
    Set the time reference as frame #784; final time measured on frame #830.
  2. 802.11 Probe Request through EAPoL-Key: 1.153 sec
    Set the time reference as frame #788; final time measured on frame #827.
  3. 802.11 Auth Request through EAPoL-Key: 776 ms
    Set the time reference as frame #804; final time measured on frame #827. Notice that the client actually sends three authentication requests before getting a response. We also see a few additional frame re-transmissions during the roam as well (colored yellow). There could be many reasons why these occurred, such as interference, frame corruption, or null fading which can be an issue with mobile devices like the iPhone because they only have a single antenna chain for receive functions and do not have a digital signal processor (DSP) to perform MRC. Performing a packet capture simultaneously near the AP in question and comparing the two could provide additional insight.

    If this anomaly hadn't occurred, the client roam time would have been substantially less at 146ms (frame #806 to #827).
Wireshark Roam Time Analysis - EAP Session Resumption

Notice how much longer in time these roaming events can appear to be simply based on our method of measurement. This is why establishing a consistent methodology is important to compare roaming times between different client devices and over time. Roaming time data usually results in subjective assessments of a device's performance as either 'good' or 'bad'. But make sure that you understand how the measurement was calculated if you were not the one to perform the protocol analysis.

Breaking down the roam time of event #2 into subsequent parts, we can get an idea of where performance issues may be occurring:
  • Last Data Frame to Probing: this can indicate how long it takes the client to realize that it needs to roam and begin scanning for new APs. Poor performance here can indicate that the client's wireless stack does not identify a deteriorating connection quick enough to support the intended application. In this instance it took the client 157ms (frame #784 to #788).

  • Probing Time: this can indicate how efficient the client is at scanning and discovering a new AP to join. If the client performs active scanning periodically, it may already have a list of potential APs, thus minimizing the time it takes to scan when it needs to actually roam. Poor performance here can indicate that the client is not able to quickly identify a new AP to join. This could be caused by not performing active scanning periodically while the connection to the current AP is strong. If the client does perform active scanning, then there may be a logic issue if it does not adequately use the previously gathered AP list and instead scans all channels again before roaming. Also, verify that APs are responding to probe requests adequately, since many band steering implementations rely on delayed or suppressed probe responses to influence client roaming behavior.

    In this instance it took the client 376ms (frame #788 to #804) and the client scanned all channels 1, 6, and 11 again before roaming, despite previously performing active scans at 10sec. intervals. It may have also scanned 5 GHz channels, but I did not have additional sniffers operating on 5 GHz. What you will also notice is that during previous active scanning, no probe requests were received from alternate APs other than the one it was currently connected to. This was a small network environment comprised of only 3 APs and the APs were configured to suppress probe responses if the client SNR was too weak (<15dB). This was a likely contributing factor to the probing time of the client. If I wanted to improve roaming performance, I might try disabling that feature on the APs. (That feature is actually disabled by default, I enabled it to highlight how AP configuration can impact client roaming).

  • 802.11 Auth/Assoc: this can indicate if the client is able to successfully join a new AP. This process should be very short and efficient. If an AP is overloaded it may reject client associations, indicating a capacity issue on the WLAN. In this instance it took the client 633ms to join the new AP (frame #804 to #811). This is a LONG time, and provides cause for alarm and further investigation. As previously described, this appears to be an interference issue, but I cannot be certain. In a real-world environment, I would analyze this portion of multiple captured roaming events to determine if it was an anomaly or if it happens repeatedly.

  • EAP Authentication: this can indicate how well the WLAN and backend infrastructure are designed to perform client authentication transactions quickly and efficiently. In this instance it took 137ms for the client to complete EAP authentication (frame #811 to #823), which is relatively fast since no fast roaming technique was used and the RADIUS server was on the local LAN.

    Many variables can come into play and affect latency in this portion of the roam, such as: 
    • EAP method implemented, which determines the credentials used and number of round-trip packets to complete authentication.
    • Number of EAP methods supported, since the RADIUS server proposed EAP methods in succession until the client accepts one.
    • RADIUS server placement either on the local LAN or across a high-latency WAN link.
    • Transaction processing load on the RADIUS server.
    • Congestion on transit network devices, including switches or routers.
    • Performance issues on the client device impacting the Wi-Fi supplicant.
    • Ability to cache credentials on the client device for re-use. If credentials cannot be cached, then the user may be prompted to input or select a credential, which can take significantly longer than cached credentials.

  • EAPoL-Key: this step should complete very fast in order to derive the final encryption key used between the AP and client to encrypt data traffic. If this step does not complete, it can indicate a bad pre-shared key (PSK) if the WLAN does not implement 802.1X. In this instance it took 5ms to complete key derivation between the client and AP (frame #824 to #827), which is normal.
Additional factors to look at:
  • Power-Save: look for Null Data frames to indicate that the client is going in and out of power-save state. A client's power-save behavior can affect it's roaming performance. Clients will typically go into power-save state prior to performing active scanning because they need the AP to buffer data frames for them while they go off-channel. This is normal. However, if the client is going in and out of power-save state at other times during the roam (for example, during the EAP authentication) this may be a cause for alarm. In this instance, the client does go in and out of power-save state frequently since it is a mobile device, but it does not appear to impact roaming performance.
For this particular roam event, I would be most concerned with the probing time and 802.11 auth/assoc time. I would attempt to improve probing by verifying my AP configuration and disabling probe suppression of clients with weak SNR, and I would re-run multiple captures to ascertain whether or not the 802.11 auth/assoc process was an anomaly or occurs repeatedly.

Automated Roam Time Analysis
The Wireshark and AirPcap solution can be augmented with the purchase of Riverbed Cascade Pilot Personal Edition (formerly CACE Wi-Fi Pilot, which is the product that I have). This product complements Wireshark by providing data visualization, analysis, reporting, and drill-down capability on large packet captures. A good overview of the solution is provided here.

One useful function within the tool is automated roam time analysis. It will look through a packet capture file, identify client roaming events, and automatically measure the time elapsed for each roam. The measurement method used is between data frames on the old and new AP. Also, most automated tools like this one do not report a client's initial connection to the WLAN as a roam event and it will not show up in the analysis.

Adding the sample packet trace into Wi-Fi Pilot and opening the 'Roaming Time Analysis' view, we can confirm that the second event is identified as a roam and the time elapsed reported as 1.549 sec, the same as we manually calculated between data frames.

Wi-Fi Pilot - Roaming Time Analysis

The 'Avg/Min/Max Roaming Time Over Time' view is also interesting to plot client roaming latency over time:

Wi-Fi Pilot - Avg/Min/Max Roam Time

Wrap-Up
Wi-Fi roaming analysis used to be quite a tedious undertaking, requiring very specific hardware and software. However, in the last few years this has become much easier due to the availability of more common and affordable tools. Engineers can perform roaming analysis quite inexpensively using open-source software such as Backtrack Linux coupled with your choice of several common wireless network adapters. The trade-off of such a setup is the extra time and effort to merge capture files and perform the analysis manually. If you rarely need to perform roaming analysis, then this setup may suite your needs. If you are unfamiliar with Linux, or simply prefer a Windows setup, then you can purchase network adapters dedicated to wireless protocol analysis, such as AirPcap Nx, for a minor investment (currently $698 each, or $2,094 for a set of three). This can remove some of manual work merging capture files, but still requires manual roaming analysis.

If you have to perform roaming analysis quite often or could benefit from advanced data visualization and reporting features, it may be worth your time to invest in professional software that can automate the roaming analysis for you. Several Wi-Fi software packages provide automated roaming analysis. This can simplify and expedite the analysis, but depending on how the software calculates roam times, it may or may not match your preferred measurement method. In this article, I discussed Riverbed Cascade Pilot Personal Edition (currently $695) and will cover another software package, Wildpackets Omnipeek, that provides similar capabilities in my next post.

The examples shown in this article included a full EAP authentication and an EAP session resumption, neither of which are fast roaming techniques. Now that Apple has announced support for 802.11r (Fast BSS Transition) and enterprise WLAN gear will soon include support for it as well, I will be able to capture a fast secure roaming example and publish my findings.

Cheers,
Andrew


Other articles you might like:

10 comments:

  1. Thanks Andrew,

    A superb methodology. Much appreciated.

    --Bruce Johnson

    ReplyDelete
  2. Thank for the excellent article.
    Just one litte annotation regarding the simultanious capturing on different channels using Backtrack: In newer versions of Wireshark you can select multiple capture interfaces instead of just one. So there's no need to use Mergecap.
    - Johannes -

    ReplyDelete
  3. Andrew, thanks for the write up and I'm looking forward to your write up on Omnipeek (I'm evaluating whether to go with Wireshark or Omnipeek at the moment).

    FYI - I've tried two different browsers (IE & Chrome) and some of your images aren't showing up. Wireshark Capture Options, Wireshark Display Filter, Setting a Time Reference, & Wireshark Roam Time Analysis all show up as broken links/images.

    ReplyDelete
    Replies
    1. Hi Scott,
      Thanks for the feedback. I don't have any issues seeing the images on either Chrome or Safari.

      Andrew

      Delete
  4. Hi Andrew,

    Sorry for the false alarm. I checked again at home on Chrome, Safari, and mobile Safari via TweetDeck and everything showed up fine. Maybe some images got blocked on the corporate network today at the office, not really sure why it wasn't rendering right there.

    ~ Scott

    ReplyDelete
  5. Thanks for this, Andrew! :)

    (I see the 802.1X (EAPOL) captures are from an older version of Wireshark. Looks sightly different in 1.8.x, you may want to update.)

    ReplyDelete
  6. Great articles, please continue the series..

    ReplyDelete
  7. Yeah brilliant if you want to pay for stupid hardware and software. Linux all the way, real hackers share code for free.

    ReplyDelete
  8. Hi,

    A good read as always, i'd like to make a couple of little comments.
    (sorry for the looong post)

    I've been learning a lot recently, and started doing air captures,
    (i'm a network guy at a uni, also studying the CWNA)

    i've been using kali-linux (the replacement to backtrack), and as another poster pointed out, wireshark can
    capture from multiple interfaces simultaneously.
    As a bonus, when using linux/wireshark, i'm not "limited" to 3 interfaces, i've actually managed 8 atheros usb wifi cards
    simultaneously!
    Very helpful when the same ssid is on 2.4G & 5G, and you don't know to which channel a client may roam.
    Although it is a bit tedious getting that many interfaces setup with linux & iwconfig!
    (had to use 2 usb hubs)

    Also, i've been playing around with capture filters (BPF) instead of display filters,
    the reason is that sometimes users complain about disconnects at random intervals, which is difficult to filter
    for when you are continuously capturing a very large volume of traffic in a busy wifi network.
    By using capture filters, the traffic actually captured is much smaller, and you can can capture for a much longer period
    of time.
    (i got tired of wireshark crashing during long captures with huge amounts of traffic)

    I'll give a couple of quick examples of capture filters, as the syntax is different from display filters.
    (these work on wireshark with kali linux, not sure about other variants, ignore the quotation marks if you want to try these)

    "type mgt" only captures management frames, which includes beacons, probes, auths, associations, etc. excludes data

    "type mgt && ether host xx:xx:xx:xx:xx:xx" captures the same, but only for the host with the specified MAC address.

    "type mgt subtype assoc-req && ether host xx:xx:xx:xx:xx:xx or type mgt subtype disassoc && ether host xx:xx:xx:xx:xx:xx"
    captures association and disassociation for the specified host.
    (there might be a better way to order that)

    I wanted to give the examples, because i was unable to find any examples myself when searching, everyone seems to focus
    on display filters only.
    Google should help with finding the other subtypes possible, just look for BPF syntax, and look at the wireless options.

    Hope this is helpful to someone.
    (and i'm also using Andrews coloring rules in wireshark, which makes it a lot easier to visually "filter" the relevant packets)

    regards
    Andrew
    (yes another one)

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete