How Prevalent Are XSS Vulnerabilities?
How Prevalent Are Cross Site Scripting (XSS) Vulnerabilities? Based on a recent experiment, I wasn't surprised to see that they're everywhere and finding dozens at a time doesn't present much of a challenge. Back in September, 2006 I sought to find empirical evidence of the prevalence of SQL Injection flaws. I blogged about my effort to leverage the Google API to find such evidence and it quickly became one of my more popular postings. Since then, I've wanted to conduct a similar experiment to investigate the prevalence of XSS vulnerabilities and have finally found the time to do so.
Mitre CVE statistics tell us that XSS is now the most common vulnerability, accounting for 21.5% of all newly discovered vulnerabilities in 2006. This is an important statistic but it only tells us what is being discovered in commercial and open source software, not what actually exists out there in the abyss we know as the Internet. When it comes to web application vulnerabilities, what's actually deployed is far more meaningful. Web apps commonly contain custom code, and vulnerabilities in custom code don't get CVE numbers. What I'm looking for is statistics on live, publicly accessible web apps.
Search Terms
Google is a powerful tool. It can help you make dinner reservations but it can also help you find vulnerable web sites and in my quest to look for vulnerable sites, I once again sought assistance from my old friend. In order to leverage Google, you need one thing - a search query, but what search terms would assist in identifying sites potentially vulnerable to XSS? XSS flaws exist because user supplied input is not properly sanitized before being included in a dynamically generated webpage. That weakness in turn allows attackers to inject client side script into the page. Therefore, what I needed were search terms that would allow me to identify requests containing user input and web pages that echoed back that same input. I chose to target search pages using GET requests. Search pages are common victims of XSS and identifying those using GET requests would ensure that the user supplied values were included in the URL and would therefore be included in the Google index. I ultimately chose the following terms:
- inurl:"search=xxx" intext:"search results for xxx"
- inurl:"query=xxx" intext:"search results for xxx"
- inurl:"q=xxx" intext:"search results for xxx"
The ‘xxx' within each query was replaced with various words and letters such as ‘the', ‘microsoft', ‘clock', ‘d', etc. Some had meaning, some were chosen simply because it was 3am and it was all that I could come up with. They were however purposely chosen to be very different and hopefully identify results from unique websites. By including ‘inurl' queries, I was able to target variables within the URL sent via a GET method. By combining that with an ‘intext' query I could look for the same value being included within the page itself, a necessary ingredient for a XSS vulnerability. Overall the search terms were designed to identify search result pages that were echoing back the user supplied query.
Automation
Rather than manually run Google queries ten results at a time, I automated the process by once again turning to the Google API, a programmatic interface which would allow me to build a tool to automate interaction with Google. To build the Google XSS tool I simply made a few modifications to the Google SQL Injection tool built back in September, 2006. The only real change involved altering code to clean up the results given the different search terms that were used. Eliminating duplicates turned out to be a real challenge in this experiment. I wanted to ensure that I was testing only one page from each unique website and although I started out with 7,436 search results, my ultimate population was quickly whittled down to 288 when I imposed the restriction of requiring unique sites. This occurs as Google does not limit the number of results that can come from a given site and my search terms were specific enough that they kept drawing from the same web sites.
Detection
Once the target URLs had been identified, it was necessary to devise a test that would determine if the page was vulnerable. XSS is commonly tested by submitting a request that includes code to produce a JavaScript pop-up window such as <script>alert(‘xss');</script>. This code presents two problems for our experiment. First, we would defeat the purpose of producing an automated test if it required sitting in front of a computer screen trying to visually identify pop-up windows. Additionally, many sites implement inadequate blacklist filtering. Although these pages remain vulnerable they would be most likely to catch such a request given its popularity. What I needed was a way for the resulting web page to ‘phone home' when a vulnerability was identified. This could certainly be done using JavaScript, but a far more simple solution exists in standard HTML. IMG tags request content from alternate locations as the page is rendered. Moreover, because exploitation is occurring on the client side, not the server, I can use an IMG tag to request resources on my local network. In the end I settled on the following IMG tag:
- "img+src%3dhttp%3a%2f%2flocalhost%2fxss-" + host + "%3e"
The above is a URL encoded version of an image tag pointed at a non-existent page on my local web server. The Google XSS tool is also dynamically inserting the name of the targeted host into the URL. By doing this, identifying sites vulnerable to XSS is as simple as looking at the log files on my local web server. If a site is vulnerable, the host will show up in the web server log. For example, an unencoded URL for testing would look like the following:
- http://vulnerable.com?search=<img src=http://localhost/xss- vulnerable.com>
If vulnerable.com were indeed vulnerable, our web server log files would include the following entry:
#Software: Microsoft Internet Information Services 5.1
#Version: 1.0
#Date: 2007-01-31 00:57:34
#Fields: time c-ip cs-method cs-uri-stem sc-status
00:57:34 127.0.0.1 GET /xss-http:/vulnerable.com 404
The HTML encoding used in the actual URL is merely a simple obfuscation technique designed to bypass basic validation routines that may check for certain characters but not their encoded equivalents. The results did reveal numerous sites that had some level of validation that would prohibit unencoded JavaScript requests but failed to filter our encoded IMG tag.
Results
Mitre tells us that 21.5% of new vulnerabilities are due to XSS. Jeremiah Grossman recently released a report stating that WhiteHat Security found XSS flaws in 71% of the websites they audited during the first half of 2006. RSnake suggests that the number should be closer to 80%. I believe all of them. In this simple experiment, which looked at a single input vector on each website and supplied only one XSS injection variable, 17.3% of sites were found to be vulnerable. That's scary. The raw results are below:
Unique sites identified by Google | 288 |
Unique sites accessible at time of testing | 272 |
Sites with confirmed XSS vulnerabilities | 47 |
Percentage vulnerable | 17.3% |
Who was vulnerable? In order to protect the innocent, I'm not going name names, but I will paint a picture of what I saw. Given the search terms used, not surprisingly, results included blogging, search, video sharing and news sites. There were a few retail web sites as well including a couple of online music stores and a consumer electronics retailer. The sites ranged from small to large with the two most notable participants being a major sports network and one of the largest newspapers in the US. Unfortunately, it's not surprising that even large corporations have vulnerable websites when you look at the names which litter the sla.ckers.org XSS Wall of Shame.
How long did it take me to identify 47 vulnerable websites? Once the methodology was in place and the tool was built - less than five minutes. Once again, we're setting the bar for web application security far too low. It shouldn't be this easy.
- michael
Erik is Sr. Director of Products for the Application Security Center.