Static Code Analysis Failures

Published 06 May 08 04:32 PM | Rafal Los 

Static code analysis failures are costing enterprises money and reputation.

White-box security testing is inherently a flawed proposition for many reasons -but it all comes down to a very simple concept:

  Machines do not execute source code, they execute machine code (compiled code). --Paul Anderson (GrammaTech)

  If you think this through for a minute you realize that there are a few specific reasons why the above statement fundamentally changes the way that people look at white-box testing, and why this is a losing proposition.  Let's analyze this in the context of a web application project for a mythical online bank.  Consider that the use-case here is that we are dealing with a bank that has an online presence (currently being analyzed) which will be integrated with a series of existing legacy applications, partners, and external 3rd party components.  Given this information let's analyze why white-box analysis (or static source-code analysis) is doomed to fail this project with respect to security.

  • Compiler Optimizers Break Things - Think of it this way, compilers are designed to make machine code from your source code.  That compiler's sole purpose (in most cases) is to create machine code that will be optimized, extremely fast-executing, but not necessarily secure.  Often times security functions that people build into source code can be removed by compiler optimizers and most often without our knowledge.  These actions often undo many of the advanced security features that developers may consciously insert into their code.  Consider the following example:
    • Developer is paranoid about data-persistence in memory space, and wants to be doubly-sure that variables are expired and destroyed
    • Developer writes a routine whereby the variable will have a null value written to it before the memory is freed
    • Compiler optimizer sees this as a double-work scenario, and removes the null-value portion and simply opts to free memory
    • A potential security vulnerability is created with variable persistence in freed memory space

This example ideally demonstrates how a security vulnerability can be inserted in spite of the developer's best efforts to write secure code.  Standard static-code analysis tools which are used to "scan code" at the static-file level will fail to catch this vulnerability.  Quite simply - static code analysis fails if it is not supplemented with dynamic analysis.

  • 3rd Party Library Integrations - There is another threat to developing and scanning static code in a white-box format.  Inevitably, 3rd party libraries are used to complement features or functionality that are not natively provided by the local development effort.  After all, no one re-invents the whole wheel everytime - we simply build what we cannot reuse from someone else's work, then use the publicly available libraries from 3rd parties to fill in the functionality and features that have already been written and (hopefully) tested before.  White-box testing (or static code analysis) will absolutely fail in finding flaws when it comes to pulling in 3rd party libraries.  By the definition of this type of issue, 3rd party libraries rarely provide you the source to be scanned and checked for weaknesses that will affect your application.  What you're left with is someone else's code (in machine-compiled format!) which will be interacting with your application.  Would you trust that model?
  • Static Code Analysis Rarely Understands Data-Flow Modeling (Data Tracing) - If you're scanning your application with a source-code-only analysis tool, you're going to not only miss things that will almost certainly come back to haunt you - but you may also be over-working yourself without a real purpose.  Consider the following example to illustrate my point.  Before I get into that example though, allow me to explain this idea of "data-flow modeling" for those that are not familiar with this idea.Data-flow modeling seeks to understand how data moves through your application, not just how the application code is written.  After all, that's the whole pointn of the application, to work with data.  Vulnerabilities lie in manipulating data either to or from the end users or the server(s).  Data-flow modeling maps out the data in your appliaction from it's instantiation (maybe when the user types it in) to its resting state (maybe when it's finally written to a database, or handed off to another application or service for additional work).  That being said let's consider a web application that has 1,000 forms across 100 pages written in the language of your choice, built to be AJAX.  While each page does nothing individually to validate user input (the data source) all variables (data) are filtered through a central validation module deep within the application logic.  A standard source-code analysis tool (I have evaluated this and can honestly say this is a real use-case but will not mention the tool) will flag on each and every input that is not validated (within the page) as vulnerable to hudreds of vulnerabilities ranging from XSS (Cross-Site Scripting) to SQL Injection and other attack types.  What you are left with is a very lengthy report with hundreds of critical and high vulnerabilities that you now obviously must address... unless you do some dynamic analysis on the code and realize that *none* of those theoretical vulnerabilities are exploitable due to the fact that the application filters all data through the central validator/scrubber.

So, there you have it.  Static code analysis is inherently doomed to fail.  White-box testing of source-only is flawed.  The sky is falling, global warming will kill us all.  In my next installment of this column, I'll give you what you need to know to avoid failing in your security initiatives at the development step of the SDLC - remember, knowing is half the battle.

 Stay tuned!

If this information disturbs you, and you would like to talk about it directly please don't hesitate to email me directly.  I am not a sensationalist, and pride myself on presenting practical solutions to real-world problems which are realistically attainable.  Thanks for reading.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Romain said on May 7, 2008 9:54 AM:

Well, even though the white-box approach has a lot of technical limitations, I really think this is just a matter of time for researcher to create great tools. But yes, we're not done yet.

Just for a losy comparison, black-box (web apps scanner) is even more doomed:

- Technologies change so fast and the current scanners are good with old technologies

- They are limited in the type of attacks they are actually performing: no smart attacks, rares multi-injections point attacks...

- They suck at understanding all the user/session interaction with the website

# Interesting Bits - May 7th, 2008 « Infosec Ramblings said on May 7, 2008 3:11 PM:

PingBack from http://infosecramblings.wordpress.com/2008/05/07/interesting-bits-may-7th-2008/

# Rafal Los said on May 9, 2008 5:02 AM:

Response to Romain:

 Actually, the whole reason for the 2nd installment of this article (coming very soon) is to address exactly what can be done to "fix" static code analysis.

 I actually disagree about black-box scanners being more doomed.  Since you're testing a user-interface the scanner is inherently ahead of the game *as long as* it understands what it is viewing and parsing.  Things like Flash and AJAX frameworks make it more difficult but I think that's our job as an application security company to stay ahead of that game.

 At the end of the day you're fundamentally doing different types of security testing, and I still stand by my assessment that pure static code analysis solutions are nearly worthless, unless... (stay tuned!)

# Andre Gironda said on May 11, 2008 3:24 PM:

Both dynamic analysis and static analysis are forms of white-box testing.  See my comment here - http://www.cigital.com/justiceleague/2008/04/09/is-penetration-testing-security-testing/#comments

Web application scanners are typically not dynamic analysis tools.  While they work on the runtime, it's not like they have a symbol table, knowledge of the data-flow, sources/sinks, etc.

Also - most security review tools that use static analysis techniques

1) Are not static checkers, but work on bytecode like FxCop or FindBugs

2) Do build an object model around control-flow, data flow, and control data with forward and reverse slicing

I am only aware of one commercial security review tool that doesn't require building the source.  It's not very common, and to be honest -- it still requires buildable source.  In other words, if the source code doesn't build, then how do you know that all of the braces match and that the tool can even run properly?

The added advantage of at least having a tool like this (a static checker) is that you don't need the code in a working environment and you don't need all of it.  For some clients, this can be very useful when outsourcing security review to a third-party.

Speaking of third-parties, of course there is going to be third-party code.  Some tools such as IvyDE in Eclipse can easily manage dependencies.

From my perspective, everything that you claim as a disadvantage to static analysis can also be viewed as an advantage.

You said, "What you are left with is a very lengthy report with hundreds of critical and high vulnerabilities that you now obviously must address... unless you do some dynamic analysis on the code and realize that *none* of those theoretical vulnerabilities are exploitable".

Dynamic analysis would be great, but

1) You're not recommending dynamic analysis, so it's all a lie.  It sounds to me like you're recommending black-box web application scanners.  Again, these are not typically dynamic analysis tools

2) There are plenty of ways to trace between the sources and sinks in a data flow to discover if a certain flagged vulnerability is exploitable

3) Exploitability is not the end goal.  The end goal is security in the source code and design.  Anyone can argue about whether or not something is exploitable or not.  Instead, we should be arguing about whether or not the code/design is obviously secure or not

4) Hundreds of inputs means hundreds of results -- doesn't matter if it's a black-box or white-box test.  I think you're being a bit pessimistic about the false positives in security review tools that take the static analysis approach.  You are correct, but you're not addressing the right problem

Here's the problem, and it does appear to be an unscalable, but not necessarily intractable, problem.  Static analysis based security review tools typically (and again, I've seen ones that do not suffer from this problem but they are not in the majority or popular yet) do not take into account the framework's reassignment of properties and use of different kinds of validators, etc.  For example in the case of the Spring framework, and additionally things like Dependency Injection make this even more difficult.  This is because the object model query language cannot typically be modified in the tools.  It's an extra step of customization.  In a similar way as web application security scanners -- static analysis security review tools require many levels of customization and proper expert use.

Comparing results of tools is stupid.  We should be comparing the results of testers and their test plans/strategies.  No tool is an island.

Even after defending static analysis, and clarifying what real dynamic analysis is (I'm talking about tools such as Coverity, Valgrind, Insure++ which typically have nothing to do with web applications) -- I remain skeptical about the benefits of both static-based security review tools as well as web application security scanners.

I discuss more about what should be the minimum requirements for building secure application here

http://jeremiahgrossman.blogspot.com/2008/04/was-pci-66-clarification-just-leaked.html#comments

# Rafal Los said on May 12, 2008 12:04 PM:

Andre - wow, that's a mouthful!  Well, without giving away the full 2nd article I'm in the process of writing, and not wishing to start a war of words... here's my quick reply:

1) Actually, yes - there are tools that transcend the "static code analysis" boundaries without being black-box scanners... stay tuned

2) I would debate that exploitability is not the key.  If you work in a real-world environment (say a fortune 5 company) you'll quickly realize that only the absolutely squeakiest wheel gets the grease... so if you produce a lengthy report with all the shortcomings of your code management will simply discount it as "too much, not actionable, move on"... that's the real world.

3) Sadly, no - the disadvantages I list are just that, and only that, disadvantages.  No matter how much lipstick you put on that pig, without the full traceability and debug(ability?) of byte-code... you're just reading line-by-line code... which is about as worthwhile as scanless PCI.

** Stay tuned, I'll address the issues I've raised here, in the next article as I pointed out.

# Vincent Liu said on May 15, 2008 1:51 AM:
Just a quick response on the following comments: "Technologies change so fast and the current scanners are good with old technologies" In "The Real World" you're dealing with old technologies most of the time, so if an organization can use the scanner against say 80% of the applications then it would be well worth it. "I am only aware of one commercial security review tool that doesn't require building the source [...snip...] and to be honest -- it still requires buildable source" The tool you're referring to is from Checkmarx and it technically does not require compilation. What it does require (and reasonably so) is that the code be grammatically correct, so that it can properly parse the source and build a model of the data. As any code reviewer can attest, the fact that you can compile without all the external dependencies is a *huge* benefit. Another item often overlooked by many "product reviews" and side by side comparisons is that with all static analysis tools is that before you can perform data flow analysis you must be able to identify "sources" and "sinks". The static analysis tool doesn't magically find these; they have to be explicitly defined. As a result, the number of "known" 3rd party libraries and frameworks that a tool can identify is of great importance - unless you want to manually go through 3rd party libraries to identify sources and sinks and add them as custom rules to . "Comparing results of tools is stupid. We should be comparing the results of testers and their test plans/strategies. No tool is an island." No and Yes. No, tools *should* be compared because testers inevitably rely use them to perform their jobs. That's not to say that buying a tool is going to magically secure your code, like laying a wrench on the hood of my car isn't going to change the oil. Yes, the skill of a tester and their methodology is important as well - that's why I go to my ASE-certified mechanic over other ASE-certified mechanic. Trust and reputation. I know he always does a good job at an honest price. More on the balance of tools and people here: http://blogs.technet.com/bluehat/archive/2008/04/08/effective-software-security-making-the-most-of-tools.aspx Okay that was more than I originally intended to write.

Leave a Comment

(required) 
(optional)
(required)