Thursday, October 14, 2010

Cisco CleanAir Review

Cisco's CleanAir system integrates spectrum analysis into wireless access points to provide real-time, always-on visibility into external non-Wi-Fi sources of interference present in the environment.

My organization was involved in the beta evaluation of the CleanAir product, and the product has been released for several months now to all customers. However, I have refrained from posting on this product until now in order to be able to provide conclusions on the product from live "real-world" deployments.

Having recently deployed the CleanAir product in two production facilities, I would like to now share some of our results and findings.

Brief Overview of CleanAir
Released last spring, this system is available to customers as of the 7.0 version of code and requires the newer 3500 series wireless access points. The 3500 AP series hardware has been augmented with a dedicated spectrum analysis chipset  to detect and report sources of interference. The AP reports findings up to the wireless controller, where the information can be integrated into the Radio Resource Management (RRM) feature set to automatically optimize the network channel and power settings to avoid severe and/or persistent sources of interference.

A funny anecdote about the "CleanAir" product name - many non-technical individuals being briefed on the technology originally thought it referred to either a.) a "green IT" initiative or b.) removing foul smells from the air.

The SAgE (spectrum analysis engine) chipset is architected in-parallel with the Wi-Fi chipset and does not impede wireless performance. If the incoming energy is recognized as a Wi-Fi signal (specifically the Wi-Fi preamble), it is sent to the Wi-Fi chipset in the AP. If not, then it is passed to the SAgE chipset for spectrum analysis. 

Many administrators (and even some engineers) confuse the meaning of "interference" to include medium contention from other nearby Wi-Fi networks. This is not correct. Strictly speaking, "interference" is non-Wi-Fi energy. The CleanAir system only attempts to measure, identify, and classify sources of non-Wi-Fi interference. This is evident in the basic CleanAir chipset architecture, essentially splitting incoming signals to either the Wi-Fi or SAgE chipset, but never both.

Air quality index (AQI) is an inverse measure of how much interference is in the environment. Air quality is at 100% when no interference is present, and is reduced based on energy strength and duty cycle (airtime occupied) by interference sources. SAgE samples are taken every 1 second by the AP, AQI is calculated every 15 seconds and summarized into 30 second intervals, which are then reported up to the controller every 15 minutes (by default). The exception is when an administrator is actively monitoring an AP radio interface from the WCS or WLC, then the AP is automatically instructed to switch into a rapid update mode which changes the default reporting period down to 30 seconds to provide more real-time information. 

A new RRM component, called Event-Driven RRM (EDRRM), allows the controller to take immediate action to mitigate severe interference issues rather than waiting for the RRM configured interval to take action. The sensitivity threshold determines the AQI value for an individual AP radio that is required in order for EDRRM to kick into effect and make an adjustment in order to avoid the source of interference. Three threshold settings are available to control what AQI value triggers RRM events: High Sensitivity requires AQI to fall below 60, Medium Sensitivity requires AQI below 50, and Low Sensitivity requires AQI below 35. Additionally, air quality SNMP trap alarms are sent when the AQI drops below a value of 35 (by default).

Psuedo-MACs (PMAC) are used to correlate interference sources being detected by multiple APs and merge report information on the device, which is likely to be the case in most enterprise deployments. CleanAir has to detect the interferer for a long-enough period of time (classification requires 5-60 sec. of activity) in order to correlate an interferer as the same device being detected by multiple APs. Since interference sources do not have MAC addresses, a psuedo-MAC is created to uniquely identify interference sources. "Clustering" is used to represent a merged record for an interference source from multiple APs. Currently, cluster information is discarded once the detected energy source stops, and is not persistent for any length of time after the interference stops or is removed from the environment.

Persistent device avoidance allows the CleanAir system to recognize devices that are fixed in position and unlikely to move and avoid recurring interference issue in the areas affected by such devices. The interference sources may be continuous or periodic in nature, but either way are likely to repeatedly impact the same physical area over and over again. Examples include microwave ovens and mounted video cameras. CleanAir recognizes these persistent devices and instructs nearby APs to operate on alternate channels even if the persistent device is no longer observed. Only after a persistent device is absent for greater than 7 days, does the CleanAir system allow APs in the affected areas to re-use those channels.

Reporting is tied into Cisco's WCS management platform and location tracking is performed through the Mobility Services Engine (MSE) context-aware service. WCS provides a central dashboard for administrative staff to monitor network performance, view historical interference trending data, and identify the location of the offending interferer when coupled with the MSE appliance for easy removal of the offending device.

For more information on the CleanAir feature set, see these excellent sources of information:
Cisco CleanAir Design Guide (The Definitive Resource from Cisco on CleanAir)

Deployment and Setup
Deployment and configuration of CleanAir are intuitive and straight-forward. The following steps are involved when implementing the product:

1. Upgrade WLC code to version 7.0 or later

2. Install CleanAir capable access points (3500 series). Note - Cisco does not recommend a "salt-and-pepper" approach to CleanAir AP deployment with other APs. This is because EDRRM can only take action with CleanAir capable APs and does not currently effect the broader RRM eco-system. Therefore, other APs would not benefit from spectrum data reported by nearby CleanAir APs.

3. Configure CleanAir Settings for each Network Band


4. Configure EDRRM in the RRM > DCA Section for each Network Band

To view the configuration of CleanAir on the system, issue the show 802.11b cleanair config command (substitute '802.11a' to see the config for the 5GHz band).

5. Optionally, Configure the MSE to Track Location for Interference Sources


6. Monitor Interference Activity 

From the WLC (Monitor > Cisco CleanAir):




From the WCS Dashboard:


From WCS Maps (if using the MSE to locate interference sources) (Monitor > Maps):


8. Monitor EDRRM Activity from WCS (Monitor > RRM)


9. Create Interference Reports from WCS (Reports > Report Launch Pad)


Real-World Findings

- Interference Detection - CleanAir has been adept at accurately finding and reporting on multiple sources of interference in our deployments. One environment which consists of carpeted office space has discovered numerous DECT devices (likely desk phone wireless headsets) as well as microwave ovens. It was amazing to see just how many floors and areas of the building have leaky microwaves! A real eye-opener. Another environment which is warehouse space has exhibited microwave ovens in breakrooms and bluetooth devices likely owned by employees and active in their pockets while they work (cell phones are likely).

Overall, validation of CleanAir findings with laptop-based spectrum analysis has confirmed the devices and severity levels being reported in our production environments.

- Numerous Similar Interference Entries - Some interference sources are reported multiple times because the PMAC cluster and merge process seems to not work. We have only experienced this for a few types of interferers, most notable DECT phones. It is annoying to see a list of 5 DECT phones, when in reality only one exists but is being detected by 5 APs. More information on this in the "Opportunities for Improvement" section below.

Good Coverage - Reception of Wi-Fi signals and interference energy exceeds the range for APs and clients to communicate reliably. So, network designs for data and voice should have no problem providing adequate coverage and visibility for CleanAir spectrum analysis.

As Cisco states in their CleanAir Design Guide, "The technology has been designed to compliment the current best practices in Wi-Fi deployment. This includes the deployment models of other widely used technologies such as Adaptive wIPS, Voice, and location deployments." 

- Well-Tuned EDRRM - We have found that EDRRM events are rare, and really only occur when interference is severe enough to warrant a channel change to improve client performance. Fears of change with wreckless abandon are unfounded, and network operation has been stable.

Benefits of CleanAir
The benefits of integrated spectrum analysis, and the CleanAir system in particular, include the following.

Event-Driven RRM - allows the controller to take immediate action to avoid severe sources of interference, translating into reduced network downtime, improved performance for clients, and faster time to resolution for  client impacting incidents. The Air Quality Index (AQI) drives Event-Driven RRM to make on-demand changes. An example would be a Wi-Fi video camera with strong narrowband interference that effectively kills network operation on the channel.

Persistent Interferer Avoidance - allows the controller to recognize sources of interference that may be lower-severity, yet occurring in a repeated fashion and degrading network performance. By tracking these repeating interference events, the network can pro-actively avoid such problems. An example would be a microwave oven that only gets turned on during lunch breaks but still needs to be avoided all the time so channel changes don't occur every day over the lunch hour.

Enhanced Network Visibility - Monitoring air quality through the WLC and WCS are easy and intuitive. The WCS dashboard provides quick snapshot information for administrators checking in on network operation. Air quality reports allow scheduled review of all activity in the environment. Should severe interference sources be found, administrators now have the visibility within their toolbag to positively confirm or deny the presence of interference, rather than trying to diagnose issues from client-reported symptoms. This is tremendously beneficial for removing uncertainty and speculation around Wi-Fi performance issues, as well as to remove offending devices from the environment to prevent future issues.

Accurate Device Classification - With a dedicated spectrum analysis chipset, the network gains precise and accurate information on non-Wi-Fi sources of interference. Other vendor solutions aiming to provide spectrum analysis capability rely on the Wi-Fi chipset itself to report on non-Wi-Fi energy. The problem with that approach is that Wi-Fi chipsets are designed primarily to modulate and de-modulate Wi-Fi signals, not to identify other sources of energy. Spectral resolution is also vastly superior with a chipset dedicated to spectrum analysis, which allows CleanAir to accurately identify spectral signatures to classify devices and report accurately on energy strength and duty cycle. Solutions based on Wi-Fi chipsets can take a guess, at best. This is especially true of narrowband interference sources or frequency-hopping wireless systems, where more granular spectrum resolution bandwidth can identify individual hopping patterns, as can be experienced with Bluetooth type devices for example.

Cisco's CleanAir spectral resolution is documented at 78KHz (on a 20MHz channel dwell) and 156KHz (on a 40MHz channel dwell), versus a standard Wi-Fi chipset at 312KHz. In addition, even other spectrum cards such as the MetaGeek Wi-Spy 2.4i at 373KHz resolution and AirMagnet Spectrum XT at 156.3KHz aren't as accurate as CleanAir. Also of note, is that the rated spectral resolution bandwidth of the Cisco Spectrum Expert laptop card is a minimum of 10KHz, so CleanAir is not quite as accurate as the laptop card and engineers may notice slight display differences between the two products.

Update: The newer Wi-Spy DBx product has much better resolution bandwidth rated at 24KHz. The Wi-Spy 2.4i resolution bandwidth is 373KHz, not 328KHz as originally posted. I also added a links to the AirMagnet Spectrum XT, Cisco CleanAir, and Cisco Spectrum Expert datasheets as requested by some readers.


Update 2: It appears that the resolution bandwidth listed in the CleanAir Design Guide - Glossary is inaccurate, swapping the values for 20MHz versus 40MHz dwell times. The information above has been updated to reflect the correct values.

Remote Troubleshooting - Placing the AP into SE-Connect mode allows a Wi-Fi engineer to remotely connect to the AP and view real-time spectrum analysis information through their workstation with Cisco Spectrum Expert software installed. The reduces the need for expensive on-site travel by an engineer, decreases time to resolution of incidents, and improves troubleshooting capabilities of remote branch offices by central IT staff.

Opportunities for Improvement
For a first-generation product, Cisco seems to have nailed CleanAir. However, there are a few features that could improve the solution as it stands today.

Unclassified Interferer Reporting - Currently, CleanAir only reports on interference sources that it can classify. This is also reflected in the AQI value for each AP radio. Any sources of interference which cannot be classified are not reported in the device list and do not affect AQI. They are visible however in the WLC via the Air Quality Graphs for individual radios. This behavior is on purpose because the CleanAir is specifically architected to classify specific non-Wi-Fi interference sources (currently 20+) and not to speculate on unknown energy.

I agree with this approach when it comes to EDRRM change activity, but disagree when it comes to reporting and alarms. CleanAir should be enhanced to give network administrators more visibility into the unknown energy sources through automated AQI reporting and alerting to signal the red flag for a human to investigate the source. Perhaps differentiating AQI between classified versus unclassifed device severity would still allow EDRRM to be based off only the sources which have been classifed, yet allow administrator visibility into air quality taking into account all energy being detected.

In addition, 802.11b DSSS/CCK modulation poses problems with detection because adjacent channel activity is hard to classify as either Wi-Fi or interference due to the spread spectrum modulation where most of the signal is around the center channel frequency, causing problems detecting side-lobe activity. This problem may be feeding energy to the SAgE chipset rather than the Wi-Fi chipset and resulting in some amount of detected energy remaining unclassified by CleanAir.

Off-Channel Interference Scanning - The current RRM channel scanning process uses short dwell times and is already being used for neighbor discovery, rogue scanning and aWIPS. Spectrum information is collected during these times, but does not provide enough time time to reliably classify devices; therefore data collected during off-channel scanning is suppressed by the system. One option is to deploy monitor mode APs, which spend significantly longer dwell times on channels which allows CleanAir adequate time to detect and classify interference sources. Another option is to use on-demand directed off-channel scanning. This feature would allow an AP to detect interference sources on off-channels during RRM scanning, or receive reports of interference sources on other channels from nearby APs through the controller, and queue up other channels to scan when client traffic activity is low.

Duplicate Interference Source Entries - PMAC does not always work as expected, and "bouncing" may occur when the interferer comes and goes faster than CleanAir can classify the source, which results in multiple PMACs being created for the same device or detecting APs being listed as "unknown" (location may still be fairly accurate, but reported information may be incomplete). An enhancement should be made to retain detected energy "clusters" for a period of time to better correlate a single interferer that is intermittent into a single PMAC.

Local Mode CleanAir - An enhancement to the CleanAir product should be made to allow remote troubleshooting to be performed without changing the mode of the AP, allowing it to continuously serve clients while remote spectrum analysis is performed. The SE-Connect mode should be discarded in favor of real-time spectrum analysis in Local mode.

A Note on Beamforming
One competitor has claimed that integrated spectrum analysis is great and all, but most performance issues come from the Wi-Fi network stomping on itself through medium contention among co-channel access points, and that dynamic beamforming can avoid non-Wi-Fi sources of interference.

Using beamforming to reduce Wi-Fi medium contention as well as to reduce received signal strength from interference sources is helpful to reduce negative network performance impact to some extent, but it cannot eliminate the impact completely. However, beamforming does not obviate the need for integrated spectrum analysis. In an ideal world, a product would have both.


Sure, changing channels disrupts the client session and should be avoided if possible, and beamforming may provide a bit more SNR and wiggle room to avoid a channel change. But there will inevitably be cases where the strength of the interfering device is so overwhelming (narrowband video cameras, for example) that it completely wipes out the channel, even with beamforming on the APs. In those instances would you rather have 1.) beamforming without integrated spectrum intelligence which just sits there confused and inoperable, or 2.) have visibility into the issue and be able to take corrective action to change channels and get the clients working again. Yeah, option number 2 is my choice too - get clients working again without manual intervention!


End result: integrated spectrum analysis is still a beneficial feature and cannot be discounted by vendors with dynamic beamforming capability.

Conclusions
The current CleanAir feature is still in its infancy, yet it already provides a great foundation for wireless network administrators to gain visibility into external sources of network problems. This feature is of great value to administrators today, and will only become even more important as Wi-Fi network become more pervasive and mission-critical to business operations.

Here are a few reasons why integrated spectrum analysis will be required to support mission-critical wireless networks now and into the future:

- Unlicensed spectrum use is growing. More devices, more uses, more potential for interference!

- Voice over Wi-Fi performance requires a well-tuned wireless network. Part of that is having some semblance of control over factors outside the realm of network control. Having visibility into these factors allows administrators better control over their environment to eliminate outside influences where required.

I have found Cisco's CleanAir to be a great initial product offering for integrated spectrum analysis. With the emergence of Wi-Fi networks as mission-critical to business operations, the maturation of voice over Wi-Fi requiring a highly stable and well-performing network, and growing use of unlicensed spectrum by devices of all types, integrated spectrum analysis will give organizations much-needed visibility into external sources of problems. This will allow organizations to solve performance problems rather than "speculate" as to the root-cause.

-Andrew

PS - Thank you Joel, Darrin, and Pete for support during the beta evaluation and production deployments of this product! This post is specifically dedicated to your teams for this (and other) reasons previously discussed ;)

15 comments:

  1. Andrew,
    You wrote an excellent post on CleanAir, this is something I will refer to later whenever I have a chance to put my hands on the technology. I've been using both Cisco's Spectrum Expert Cardbus and AirMagnet's Spectrum XT and find these great for troubleshooting. But deploying a full scale AP 3500 scenario provides a proactive environment where one can pinpoint problems as they occur.

    ReplyDelete
  2. Andrew,

    Thanks for the great write up. We are about to start our first deployment in a couple of weeks. It is encouraging to hear about your real-world success and limited concerns. Now if I can only make it through the design guide without falling asleep.

    ReplyDelete
  3. Thanks for the great review of CleanAir from Cisco. I learned a lot that I didn't know. The best kind of read!

    Well done.

    ReplyDelete
  4. Andrew, Thank for the post.

    The MetaGeek Wi-Spy's resolution that you posted is the resolution of the entire 2.4 GHz band and not 40MHz. I'll admit the Wi-Spy may not be as fast as the cognio or spectrum xt, but if all you care about is resolution. here:

    http://i51.tinypic.com/213irnm.png

    You talked a lot about hypothetical situations, but if someone brought in a narrow band av transmitter wouldn't you be better off fixing it manually than waiting 7 days for your set up to fix itself?

    ReplyDelete
  5. Hi cutlerite,
    Resolution bandwidth is only one important component in a spectrum analyzer solution. In your referenced image, it would be the "Res BW" field, not the step size. Also, looking at the Metageek products, my post referenced the older Wi-Spy 2.4i model. I will update the post to include the newer Wi-Spy DBx product, which has better resolution down to 24KHz.

    Also, CleanAir does not wait 7 days to mitigate an interference source, it does so immediately IF the source causes the AQI to drop below the configured sensitivity threshold. The 7 day wait period is a wait timer before an AP will be allowed to us a channel on which a persistent interference source has been detected. The persistent interferer must not be detected for 7 days, then the AP will consider using that channel again. In effect, CleanAir waits to make sure that fixed/stationary interferers are really gone before trusting that channel in the surrounding area again.

    Thanks for your input!
    Andrew

    ReplyDelete
  6. Hi Andrew,

    Great post. I would like to clarify the remarks on resolution. 78 KHz is for a 20 MHz dwell and applies to the full 2.4 Ghz and 5 Ghz spectrum. 156 KHz is for a 40 MHz dwell - which is wider - and so well double. 10 KHz resolution is available of the 3500 AP - it is not exposed to the customer for various reasons - but is in the hardware.

    ReplyDelete
  7. Good catch! yeah sorry about that. That is the field I had to change before adjusting the filter bandwidth. It was a hasty mistake.

    I should have clarified my question. Let's say there is an AV transmitter disrupting transmission on Wi-Fi channel 1, all of the affected APs must move to channels 6 and 11 increasing co-channel interference. Must you deal with that for 7 days?

    It seems to me that if you have serious interference issues, you must go to the location of it and eradicate the source - no matter what.
    Once you have gotten rid of the device, why wait 7 days?

    I really like what spectrum analysis can add in the AP, but 7 days seems like an excessive wait time - especially in a mission critical Wi-Fi set up.

    ReplyDelete
  8. Hi cutlerite,
    It is important to remember that only some device classifications are considered "persistent" devices and will be remembered for 7 days. Since these types of devices cause severe issues and are very likely to recur, CleanAir can be configured to avoid them using the persistent device avoidance feature. This feature can be turned off on the 802.11a/b/g/n RRM > DCA page, but you risk having the AP go back to that channel and being affected by the same interferer again.

    Essentially, it's the worse of two evils: complete channel disruption by the interferer, or more co-channel contention which is not as harsh to clients.

    Also, to clear persistent devices prior to the 7 day mark, Cisco states that the AP can be rebooted to clear its list on both radios. Unfortunately, there is not method to simply clear this list without rebooting the AP. The list can be viewed from the Wireless > 802.11a/b/g/n Radios > AP Radio Blue Drop Down > CleanAir RRM.

    I agree that once severe sources of interference are found, they should be removed by the administrator. That's part of the real benefit of CleanAir, giving administrators visibility into interference sources for removal. EDRRM is really a mitigation technique that should be used temporarily by the system only until the administrator removes the source of the problem.

    Andrew

    ReplyDelete
  9. Hi James,
    Thanks for the clarification. It looks like the CleanAir Design Guide on Cisco's website has incorrect information on the resolution bandwidth in the Glossary section. It has the values backwards, the same as in my original post. Perhaps this is something that you can get updated?

    http://www.cisco.com/en/US/products/ps10315/products_tech_note09186a0080b4bdc1.shtml#t9

    I will update my post accordingly.

    Andrew

    ReplyDelete
  10. Andrew,

    This is a killer write-up. Nice work. I loved your comments on how they could improve the product and how thorough your analysis was. I'm left only with one comment/question:

    Since there have been no 3rd party verified comparisons between Wi-Fi chipset-based spectrum analysis and Cisco's dedicated chipset, how do we know that one is better than the other in real-world application?

    Stated differently, for clarity:

    Is Wi-Fi chip-based analysis accurate and granular enough to make appropriate and accurate RRM changes and to appropriately visually display the RF environment?

    Thanks!

    Devin Akin
    Chief Wi-Fi Architect
    Aerohive Networks

    ReplyDelete
  11. Great review Andrew,

    While I understand Cisco's stance on not intermingling CleanAir with non-CleanAir APs, it still seems valid to deploy an overlay model with a single central CleanAir AP (perhaps in dedicated Monitor or SE-Connect mode) per floor as a detection strategy. This has been the approach of other IDS vendors including AirMagnet if locating a source and mitigating with EDRRM are not priorities. Would external antennas increase its interference detection capabilities?

    ReplyDelete
  12. Hi Devin,
    Like you stated, there is no 3rd party comparisons available between Wi-Fi chipset and dedicated chipset spectrum analysis accuracy that I am aware of.

    However, from my experience with traditional RRM from multiple vendors, Wi-Fi chipsets do a good job of detecting other Wi-Fi neighbors and adjusting channel and power accordingly.

    We tested with a few different interference sources (video camera, baby monitor) and the Wi-Fi chipset could only detect a raised noise floor. And comparing it to a spectrum analysis laptop card, it appeared to be vague in determining the severity. Also, reacting to interference seemed slow since the noise floor value bounced around quite a bit. I can only suspect that the Wi-Fi chipset had a hard time determining whether or not to make an adjustment because of this. Traditional RRM was forced to wait for a the next slotted RRM update interval, which also made reaction slower. Typically, I have seen companies set RRM intervals anywhere from 10 minutes to 24 hours.

    Can you imagine detecting interference and having to wait upwards of 24 hours to adjust settings. Ouch!

    In my opinion, CleanAir does seem to do a much better job of being able to classify interference sources. This gives it much better accuracy and gives the system the inherent "trust" in itself to make a change immediately. That's Event-Driven RRM's job, to make the change right now and avoid waiting for the next RRM interval.

    I think it all boils down to how trustworthy the data is, and I'm not sure I would trust network operational changes from interference based off Wi-Fi chipset data. Maybe thats just me.

    I think other vendors will eventually follow-suit and place dedicated spectrum chipsets into access points. If MetaGeek is any indication, dedicated chipset prices should be low enough to make this a possibility without a huge uptick in cost.

    -Andrew

    ReplyDelete
  13. Hi WiFi guy,
    If interference detection is all you want, not automated network changes, then I think deploying an overlay would be feasible. If you're going to deploy an overlay anyways, I would bundle it with WIPS functionality too. That just makes sense.

    As far as external antennas, that should provide increased range for detection. The downside would be that interference from potentially very far away could be detected, which may or may not be impacting your network.

    I would deploy the spectrum sensor with the same gain as my infrastructure APs, or attempt to tune it so that it only detects interference that will impact your APs. If you throw a high-gain antenna on it and it detects distant interference sources, those may end up being false positives.

    -Andrew

    ReplyDelete
  14. Hi Andrew,

    Good catch on the deployment guide - I think I can get that fixed. Now - as for third party comparisons, Check out the Miercom test report. It is located here.

    http://www.cisco.com/en/US/solutions/collateral/ns340/ns394/ns348/ns1070/Miercom_Report_DR100409D_Cisco_CleanAir_Competitive_for_22Apr10.pdf

    There just are not a lot of wi-fi based solutions actually released AND shipping at this time - I am sure that will change though.

    ReplyDelete
  15. Thanks for the update James. It looks like CleanAir is a solid competitive advantage and differentiator for Cisco.

    -Andrew

    ReplyDelete