Monday, May 21, 2012

Are Apple iPhones Misbehaving on Wi-Fi

The latest generation of mobile devices, including the Apple iPhone 4S, may be causing performance degradation on your Wi-Fi network, which could be reported as a Denial of Service (DoS) attack by WIPS (wireless intrusion prevention systems).

One of my blog readers contacted me about a disturbing finding he had and asked for my opinion (thanks Thanh). The finding in question was that the Apple iPhone, iPad, and other mobile devices based on the latest Broadcom chipsets are setting really long Duration values in the range of 10-14ms within Wi-Fi control frames (e.g. RTS/CTS-to-self). This essentially reserves the medium for the device to transmit without a collision. The problem is that this is an excessively long period of time for an 802.11n capable device, and through my packet analysis I have found that no large frame transmission is subsequently occurring. This indicates that a performance problem may exist with the devices, and may be reported as an NAV DoS attack on the network by WIPS systems.

I've also heard anecdotal reports of this being observed on HTC Google Nexus One, Apple iPad 2, iPhone 4, some RIM Blackberrys, and even on a Broadcom evaluation board. So, there is a distinct possibility that this issue lies in the Broadcom chipset used in these devices and is not isolated to Apple devices. However, I have only tested this on my personal iPhone 4S and cannot directly verify those observations. So this post only details what I have actual been able to test and observe directly.

I had a hard time believing this could be true, so I investigated myself. Sure enough, I found the behavior when I tested.

Apple iPhone 4S Large Duration Value

What you see in the figure above is that the iPhone transmits CTS-to-Self frames at regular intervals, around once every second, with an excessively large Duration value. This causes all other Wi-Fi clients to set the NAV (virtual carrier sense) to the specified Duration value and defer transmission. Essentially, my iPhone was causing a blockage of all traffic on my wireless network for 11ms at regular intervals. Definitely not good for performance.

Possible valid explanations for this behavior could be:

1) Frame aggregation with 802.11n. I did see the iPhone using frame aggregation in other instances and using an appropriate Duration value around 4ms in those cases. But I did not see frame aggregation being used after the 11ms Duration control frames.

2) The use of a Transmit Opportunity (TXOP) on a QoS enabled WLAN in order to transmit a burst of packets within a specific QoS traffic class. I double-checked the TXOP values advertised by my access point and they are using the 802.11 default values of 0ms (single frame only) for best effort and background queues, 3.008ms for the video queue, and 1.504ms for the voice queue, which are much lower than the 11ms observed value.

3) A really large frame transmitted at a really low data rate. This is possible if the iPhone needed to transmit a 1,384 byte or larger frame at 1 Mbps to get such a Duration value. But I have all 802.11b rates disabled on my AP and verified in the packet trace that the iPhone was using 24 - 65 Mbps almost all the time, with only a few frames at my lowest available data rate of 6 Mbps.

After analyzing my packet trace I can find no evidence of my iPhone transmitting any data frames immediately after the control frame containing the high Duration value. This essentially rules out all three possible explanations.

Other possible explanations would be a poor client roaming algorithm that wants to halt traffic while it goes off-channel to scan for other APs to which it could roam, or a poorly designed battery saving technique for mobile devices. However, I captured on all three non-overlapping 2.4GHz channels and found no evidence of my iPhone probing on other channels during that 11ms time period. And using this as a battery saving technique doesn't even make a whole lot of sense because clients can already notify the AP of it's power save mode and the AP will buffer traffic until it wakes up, or at worst case send broadcast traffic at the configured DTIM interval, which is usually 102ms which allows for a much longer  sleep period anyways. Regardless, the iPhone indicated that it was staying awake (in active mode) in the control frames, so it appears to have not been putting the radio to sleep.

Frankly, I have no answer to explain this behavior. I don't exactly know what is going on.

The effect on a home network is likely to be small, since 11ms out of every 1 second is only about 1% of available airtime. However, in a corporate environment where mobile devices are being introduced into the environment at a staggering rate, the compound effect could be a serious reduction in Wi-Fi network performance and capacity! Think of having 30 students in a single classroom all connected with their mobile devices, 200 in a university lecture hall, or 500 at a trade show or conference. This is increasingly likely as smartphone and mobile device penetration is over 50% in the U.S. and other countries already, and consumers are typically carrying 2-3 Wi-Fi enabled devices with them at all time. The worst-case result in these environments would be so much reserved airtime that almost no network capacity would be left for actual user traffic. Just what we need trying to support tons of mobile devices on enterprise networks, huh.

If I uncover more details, I will update this post accordingly.

Thanks,
Andrew vonNagy

16 comments:

  1. Is it a case of turning off the WMM on the iPhone? I might be talking through a hole in my head, but I though I read something about that somewhere in relation to iPads.

    ReplyDelete
    Replies
    1. I'm not aware of any end-user control over WMM settings in iOS.

      Andrew

      Delete
  2. This certainly explains why IT conferences have terrible WiFi performance, even with top-end infrastructure in great quantity.

    ReplyDelete
    Replies
    1. Hi Jim,
      There are many factors that make high density deployments difficult. I wouldn't attribute the problems to this necessarily.

      Andrew

      Delete
  3. The fact that the client is sending a CTS-to-self without sending any data afterwards is a problem in and of itself regardless of the excessive duration. Are you sure your sniffer just didn't capture the data frame properly?

    Look for a CFE (Contention Free End) frame sent by the phone after it sends its data (or where it should have sent the data). This releases the airwave and lets clients reset their NAV. This is a legal technique.

    ReplyDelete
    Replies
    1. I performed multiple captures to validate what I was seeing, as well as placed the sniffer close to the iPhone and access point for various tests. There is definitely no data being sent after the CTS-to-Self frame. I agree with you that it is a problem regardless of the Duration value set in the frame.

      I haven't heard of a CFE frame being used in this way. Have you validated this technique on live networks? If so, what devices did you test with?

      Thanks,
      Andrew

      Delete
    2. I've seen Intel (laptop) devices work this way, with the CF-End at the end. Ugly behavior. The only reason I can think of is to reserve the air in case there would come another frame that it wanted to transmit before the first frame was done. But not sending that CF-End frame is really ugly.

      Delete
  4. Intel clients use the technique of sending a long duration (usually 4ms) in an RTS frame, sending their data and then releasing with a CFE frame. They only do this under certain circumstances however. You can see it by associating an 11n Intel client (I used a 6205) to an 11n AP (using the 2.4GHz radio) and then associating a legacy (non-11n) client. From a wired pc, then ping the 11n client. You'll see the RTS frames that the 11n client sends (for wireless protection) have a large duration and after it transmits the ping response, it sends a CFE.

    So someone at Intel believes this is a better method of sending data in a crowded environment. They don't do this behavior in 5GHz as far as I've seen. Broadcom may be trying to replicate something like this.

    Mike (anonymous above)

    ReplyDelete
    Replies
    1. Mike,
      Thanks for the information. I find it troubling that Intel would use such a method when a simple RTS/CTS or CTS-to-self protection mechanism with a valid Duration value works just fine. There should never be any reason the "lie" about the Duration value in my opinion, especially with such an inflated value of 4ms or longer!

      The IEEE 802.11 standard is very clear in section 9.13, stating that stations must follow the rules specified for calculating RTS/CTS NAV fields when used as a protection mechanism (which is defined in section 7.2.1.2 for CTS frames). Fudging the duration value is never in the IEEE rules to my knowledge. It always states a very carefully, calculated valued in microseconds based on pending frame transmissions.

      Andrew

      Delete
  5. Hi, Andrew.

    This is an interesting post. I tested my own iphones, a 3G and a 4, and I too see CTS frames with a duration of 10ms, with no obvious reason, Mine are less frequent though, sometimes about one per minute, however for high density BYOD deployments I am going to be concerned...

    ReplyDelete
  6. Hi Andrew, have you had a chance to test with iOS 6 to see if the behavior persists?

    ReplyDelete
  7. Hi Andrew,

    Please check DTIM Period of your AP. If it is 1, this issue (CTS-TO-SELF with long duration) didn't happen.

    -Nizam.

    ReplyDelete
    Replies
    1. Hi Nizam,
      I believe you are incorrect. The DTIM interval has no bearing on the virtual NAV transmitted inside a CTS control frame.

      Andrew

      Delete
  8. Hi Andrew,
    It's Thanh 8-) , it has been a while on this thread and thank you for mentioning me in your blog.
    I think I found why we see the CTS with long duration from the above mentioned vendor.
    Lately I work intensively on 801.11ac chipset and calibration code, I stumble on a piece of code that explains it all.

    I'm not breaching NDA as I found the code in the open source driver code.
    http://lxr.free-electrons.com/source/drivers/net/wireless/brcm80211/brcmsmac/phy/phy_lcn.c
    Go to function wlc_lcnphy_periodic_cal( ) and you can see what is going there.
    My theory is when the chip is doing calibration, it probably going deaf so the calibration code sends the CTS to ensure no clients are sending any frames to the AP.

    Cheers! and a big thank for your excel sheet about SSID vs overhead.

    Thanh

    ReplyDelete
  9. iPhone 4s WIFI issue is a hardware issue and the chip needs to be replaced/repaired. so far, the only one that does this worldwide and permanent is podmod.de

    ReplyDelete
  10. I know it's been awhile for this post, but we've been going crazy with this particular problem. In addition to the iPhone sending a bogus 10000 microsecond CTS-to-Self, the iPhone won't actually ACK any packets for almost 40ms after this is sent. The huge kicker is that the phone won't pass the packet to an app for nearly 400ms! That's a killer.

    We are the developers of a low-latency audio app that streams over UDP, and this is causing a huge mess for us.

    I'd love to know if anybody's found out anything more about it. We've filed a bug report with Apple, but who knows how long that will take to resolve.

    ReplyDelete