How Prevalent Are SQL Injection Vulnerabilities?
[Update 01.31.07 - A follow up blog on the prevalence of XSS vulnerabilities has now been posted.]
[Update 01.17.07 - This blog is now also available as a webcast.]
Earlier this month, Mitre revealed that web application vulnerabilities have now claimed the top three spots on the CVE request list. Specifically, the ranking for 2006 is as follows:
- Cross Site Scripting (21.5%)
- SQL Injection (14%)
- PHP includes (9.5%)
- Buffer overflows (7.9%)
The statistics are significant as they provide evidence of the prevalence of web application vulnerabilities. Coverage of this issue has however been somewhat misleading as reports have suggested that it is a measure of what attackers are doing. This assumption is false. CVE requests represent the volume of discovered vulnerabilities in commercial and open source applications, they do not reflect the degree to which those vulnerabilities exist in the real world, nor do they reflect what vulnerabilities attackers are actually using to access vulnerable systems.
Web applications pose a unique threat as programming web applications does not require employing skilled programmers. Anyone with access to various point and click tools is now a web developer. For that reason, I suspect that web application vulnerabilities are even more of a threat in the real world than the Mitre statistics would suggest. CVE numbers tell us that web application vulnerabilities are plaguing software developers but they do not provide insight into vulnerabilities within custom built sites.
Of the vulnerabilities discussed in the Mitre report, I feel that SQL injection represents one of the greatest threats for the following reasons:
- Any website worth going to has a database on the back end.
- Many development texts actually teach programmers insecure SQL syntax.
- Databases often house sensitive data such as credit card numbers, social security numbers and other personal data that phishers would love to get their hands on.
- Many sites are exposed to SQL injection attacks but don't know it.
It's that last bullet point that I wanted to prove to myself by compiling some sort of empirical evidence. How can I do that without breaking into a website? Hmmm....perhaps my friend Google will have the answer. After some contemplating, I came up with the following theory:
- Identify a population of web sites likely to have databases and determine the request syntax.
- For this requirement, I turned to Google and took a bit of a shortcut. I selected a portion of a GET request that would likely be used to query a database. For my purposes, I chose inurl:"id=10". This query was selected for two reasons. First, using a GET request would allow me to leverage a search engine to identify a population of URLs for testing. Second, a URL containing a query such as "id=10" is likely to be querying a database for something such as a product catalog. That query in Google returns nearly 6 million hits, so obtaining a sample population clearly wasn't going to be a problem.
- Submit an altered query in order to elicit a SQL error message.
- Once again, I took a relatively simplistic approach here by altering the query to inject an single quote ahead of the actual query. Therefore, the request would now be in the form of "id='10". This is a common technique when attempting to identify SQL injection attacks as SQL queries enclose strings within single quotes. Injecting an extra quote will create a query with an open quotation mark and this will often cause the application to return an error message. I took this one step further and URL encoded the single quote to bypass sites that may be filtering for unusual characters. Therefore, the injected query was now "id=%2710".
- Parse all responses to look for signs of verbose SQL error messages.
- After much trial and error, I settled on three simple words that allowed for the identification of most SQL error messages. I would grep responses for "sql", "query" and "error".
With a theory in place, it was now time to build an automated solution in order to make the test practical. For that, I turned to the
Google API. Using the API, I was able to build a C# application (shown below) to automate the steps listed previously. While building the application, I realized that a few extra steps would be required:
- Remove duplicate sites
- I started with a sample population of 1000 sites from my initial query. I realized however, that Google would return some duplicate URLs that were from the same web site. These were filtered out as I was looking to determine the prevalence of vulnerable web sites, as opposed to web pages.
- Remove bad queries
- Google will actually ignore many special characters such as "=" and I therefore ended up with some URLs which contained "id" and "10" but not the "id=10" name value pair that I was looking for.
- Remove failed queries
- Just because a URL is indexed in Google, doesn't mean that it will be accessible when you want to view it. I therefore had to also ensure that the tool was capable of recording any failed requests.
- Manual verification
- I realized that no level of grepping the results was going to eliminate all false positives. I therefore implemented a system that would allow me to quickly visually validate the results. I settled on a system that would highlight the responses that appeared to contain verbose SQL errors based on the grepping of results. As the full response was captured and stored in memory, I could then quickly view the response to determine if it was a false positive. What I soon realized is that it was important to look at both the interpreted HTML and source code as SQL errors were occasionally hidden in the source code and not visually displayed.
Using all that I'd learned form various manual experimentations, I ultimately hacked together the tool shown below using C# and the Google API.
While far from perfect, this provided an easy to use tool that allowed me to quickly validate findings with a visual inspection. In the end, the results were as follows:
| Initial population of URLs | 1000 |
| Population after removal of duplicate servers | 732 |
| Population after removal of failed requests | 708 |
| Total number of verbose SQL errors | 80 |
| Percentage of sample web sites potentially vulnerable to SQL injection attacks | 11.3% |
While I don't find this result surprising, it certainly is sobering. This was a simple test. It was certainly not a comprehensive audit of the web servers which would no doubt have uncovered many more vulnerabilities. The experiment looked at only a single input vector on each of the web sites and did not take into account the possibility of blind SQL attacks. Granted, this test did not proceed to attempt to extract data from the databases as that would be illegal but it does in my opinion shine some light on just how poorly database connectivity is implemented on public websites and the dangers that it poses.
- michael
Erik is Sr. Director of Products for the Application Security Center.