Measuring the Occurrence of Security-Related Bugs through Software Evolution

You can actually measure the number of the security-related bugs of a project by using the corresponding static analysis tools (depending on the programming language that you used) but you cannot check how they evolve over time.

To involve the aspect of time we can take advantage of the characteristics of version control systems (VCS). As we all know, to manage large software projects, developers employ such systems like Subversion and Github. Such systems can provide project contributors with major advantages like: automatic backups, sharing on multiple computers, maintaining different versions and others. For every new contribution, which is known as a commit, a VCS system goes to a new state, which is called a revision. Every revision stored in a repository represents the state of every single file at a specific point of time.

To measure the occurrence of security-related bugs through time, together with two colleagues, we combined FindBugs, an effective static analysis tool that runs over bytecode and has already been used in research, and Alitheia Core, an extensible platform designed for performing large-scale software quality evaluation studies. I will not spoil your time by describing the integration of the two (you can check it in the proceedings of this conference). I will go one step further and describe how the framework works: First, Alitheia Core checks out a project from its SVN repository. Then, for every revision of this project, the framework creates a build. Then it invokes FindBugs to examine this build and create an analysis report. A user can select whether to examine the project alone or the project together with its dependencies. Finally, from this report, it retrieves the security-related bugs.

We have examined four open source projects that are based on the Maven build system namely: xlitesxcjavancss, and grepo. We particularly chose Maven-based projects to exploit the advantages of the system such as: resolving and downloading dependencies on the fly and automatically build every revision. In addition, all projects deal with untrusted input, thus they could become targets for exploits. Our experiment included two measurements. First, for every revision, we applied FindBugs only to the bytecode of a specified project. Then for our second one, we also included the dependencies of this project. The figure below depicts the results for the javancss project. The results are representative for all four projects. The red line represents the first measurement and the green line the second one.

Bug frequency for javancss project

The most interesting observation that we can make is that the security bugs are increasing as projects evolve. This is quite important and shows that bugs should be fixed in time to decrease the effort and cost of the security audits after the end of the development process. Another observation is the existence of the domino effect. The usage of external libraries introduces new bugs. Thus, we can observe how third-party software can affect the evolution of a project while its developers are unaware of that. Another interesting issue regards the bugs themselves. As we observed, even after hundreds of revisions, in the release versions there are bugs that could be severe for the application. For instance, the last revision of javancss includes a code fragment that creates an SQL statement from a non-constant string. If this string is not checked properly, an SQL injection attack is prominent.

At first glance you could say that as a project evolves, the lines of code increase and as a result, the security-related bugs should increase too. But you cannot support this with confidence since there are techniques like refactoring that could decrease the project’s code as time passes. Hence, it could be interesting to check if there is a correlation between the lines of code and the security bugs of a project.

The main limitation of our approach is that it has to build automatically numerous revisions and run FindBugs on the created jars. This is not convenient since most Maven-based projects exist on GitHub. To validate the statistical significance of our results we should run our framework on more projects. This can be done by managing data from GitHub and we are currently working towards this direction.

The code of our framework can be found here.

Fatal Injection: The Server’s Side.

In my first blog post  I discussed about software security and mentioned some common traps into which programmers regularly fall. One of them involved input validation. For instance, a developer may assume that the user will enter only numeric characters as input, or that the input will never exceed a certain length. Such assumptions can lead to the processing of invalid data that an attacker can introduce into a program and cause it to execute malicious code. This class of exploits is known as code injection attacks and it is currently topping the lists of the various security bulletin providers. Code injection attacks are one of the most damaging class of attacks since they can occur in different layers, like databases, native code, applications and others; and they span a wide range of security and privacy issues, like viewing sensitive information, modification of sensitive data, or even stopping the execution of the entire application.

The set of code injection attacks that involves the insertion of binary code in a target application to alter its execution flow and execute inserted compiled code can be described as binary code injection attacks. This category includes the infamous buffer-overflow attacks. Such attacks are possible when the bounds of memory areas are not checked, and access beyond these bounds is possible by the program. Consider the following code segment:

#include <string.h>

void vulnerable_method (char *foo)
{
  char tmp[12];
  strcpy(tmp, foo);
}

int main (int argc, char **argv)
{
  vulnerable_method(argv[1]);
}

This code takes an argument and copies it to the tmp variable. But what happens when an argument is larger that 11 characters? By taking advantage of this, malicious users can inject additional data overwriting the existing data of adjacent memory. From there they can take control over a program or even take control of the entire host machine. C and C++ are vulnerable to this kind of attacks since typical implementations lack a protection scheme against overwriting data in any part of the memory. In comparison, Java guards against such attacks by preventing access beyond array bounds, throwing a runtime exception. To combat a buffer-overflow attack in the above case you could use strncpy instead (strlcpy is even better if you are running on BSD or Solaris) since it requires putting a length as a parameter.

Code injection also includes the use of source code, either of Domain-Specific Languages (DSLs) or Dynamic Languages. DSL-driven injection attacks constitute an important subset of code injection, as DSL languages like SQL and XML play a significant role in the development of applications. For instance, many applications have interfaces where a user enters input to interact with the underlying relational database management system of the application. This input can become part of an SQL statement and executed on the target RDBMS. A code injection attack that exploits the vulnerabilities of these interfaces is called an SQL injection attack. One of the most common forms of such an exploit involves taking advantage of incorrectly filtered quotation  characters. For instance, in a login page, users wanting to access a vulnerable application have to fill-in a simple username and password form. The following Java code illustrates the defect by accepting user input without performing any input validation:

Connection conn = DriverManager.getConnection(a_url, "a_username", "a_password");
String sql = "select * from user where username='" + uname +"' and password='" + pass + "'";
stmt = conn.createStatement();
rs = stmt.executeQuery(sql);
if (rs.next()) {
  loggedIn = true;
  out.println("Logged in");
} else {
  out.println("Credentials not recognized");
}

By using the string anything’ OR ’x’=’x as a username, a malicious user could log in to the site without supplying a password, since the ‘OR’ expression is always true. To avoid a situation like the above, you could use Java’s prepared statements. In a prepared statement, the query is precompiled on the driver that is used to connect the application with the database. From that point on, the parameters are sent to the driver as literal values and not as executable portions of SQL; hence no SQL can be injected using a parameter. ΧPath and LDAP injection attacks are DSL-driven attacks that have a very similar style to SQL injection. Also, notice that contrary to binary code injection, a DSL-driven injection attack is independent of the language that was used to create the application. So the problem above could similarly occur in PHP, C++ etc.

As you have probably realized, the aforementioned attacks target vulnerable, server-side applications. In one of my upcoming posts, we will have the opportunity to discuss about injection attacks that involve dynamic languages and target entities that exist on the client-side (i.e. the user’s browser).

How secure is your software?

When you are implementing an application, your first goal is to achieve a specific functionality. For instance, if you want to implement a specific algorithm that was given to you as an exercise from your informatics course professor, or you just want to create your personal website, the first thing that comes to mind is how to “make it work”. Then of course, you will follow some code conventions during implementation while simultaneously check your code quality. But how about security? How secure is your code? Is there a way for a malicious user to harm you or your application by taking advantage of potential bugs that exist in your code?

Unfortunately, most programmers have been trained in terms of writing code that implements the required functionality without considering its many security aspects. Most software vulnerabilities derive from a relatively small number of common programming errors that lead to security holes. For example, according to SANS (Security Leadership Essentials For Managers) two programming flaws alone were responsible for more than 1.5 million security breaches during 2008.

In 2001 when software security was first introduced as a field, information security was mainly associated with network security, operating systems security and viral software. Until then, there were hundreds of millions of applications implemented but not with security in mind. As a result, the vulnerabilities “hidden” in these (now legacy) applications can still be used as backdoors that lead to security breaches.

Although, nowadays computer security is standard fare in academic curricula around the globe, few courses emphasize on secure coding practices. For instance, during a standard introductory C course, students may not learn that using the gets function could make their code vulnerable to an exploit. Even if someone includes it in a program, while compiling he or she will get the following obscure warning: “the ‘gets’ function is dangerous and should not be used.”. Well, gets is dangerous because it is possible for the user to crash the program by typing too much into the command prompt. In addition, it cannot detect the end of available memory, so if you allocate an amount of memory too small for the purpose, it can cause a segmentation fault and crash.

The situation is similar in web programming. Programmers are not aware of security loopholes inherent to the code they write; in fact, knowing that they program using higher level languages than those prone to security exploits, they may assume that these render their application immune from exploits stemming from coding errors. Common traps into which programmers fall concerns user input validation, the sanitization of data that is sent to other systems, the lack of definition of security requirements, the encoding of data that comes from an untrusted source and others which we will have the opportunity to discuss later on this blog.

Today there are numerous books, papers and security bulletin providers that you can refer to about software security. Building Secure Software by Viega et al., Writing Secure Code by Howard et al. and Secure Coding: Principles & Practices by Graff et al. are three standard textbooks that you can refer to. Furthermore, there are some interesting lists that quote secure coding practices like OWASP’s (The Open Web Application Security Project), CERT’s (Carnegie Mellon’s Computer Emergency Response Team) and Microsoft’s. Also it is interesting to check from time to time the various lists that contain top software defects like CWE’s (Common Weakness Enumeretion) Top 25 and OWASP’s Top 10. But do not panic, you are not obliged to become an expert in secure coding. There are numerous tools that can help you either build secure applications or protect existing ones.