Buffer Overflow Detection for C Programs is Hard to Learn

Abstract

Machine learning has been used to detect bugs such as buffer overflow. Models get trained to report bugs at the function or file level, and reviewers of the results have to eyeball the code to determine whether there is a bug in that function or file, or not. Contrast this to static code analysers which report bugs at the statement level along with traces, easing the effort required to review the reports.

Based on our experience with implementing scalable and precise bug finders in the Parfait tool, we experiment with machine learning to understand how close the techniques can get to a precise static code analyser. In this paper we summarise our finding in using ML techniques to find buffer overflow in programs written in C language. We treat bug detection as a classification problem. We use feature extraction and train a model to determine whether a buffer overflow has occurred or not at the function level. Training is done over labelled data used for regression testing of the Parfait tool. We evaluate the performance of different classifiers using the 10-fold cross-validation and the leave-one-out strategy.

To understand the generalisability of the trained model, we use it on a collection of unlabelled real-world programs and manually check the reported warnings.

Our experiments show that, even though the models give good results over training data, they do not perform that well when faced with larger, unlabelled data. We conclude with open questions that need addressing before machine learning techniques can be used for buffer overflow detection.

Biography

Cristina is the Director of Oracle Labs Australia and an Architect at Oracle. Headquartered in Brisbane, the Lab focuses on Program Analysis as it applies to finding vulnerabilities in software and enhancing the productivity of developers worldwide.

Prior to founding Oracle Labs Australia, Cristina was the Principal Investigator of the Parfait bug tracking project at Sun Microsystems, then Oracle. Today, Oracle Parfait has become the defacto tool used by thousands of Oracle developers for bug and vulnerability detection in real-world, commercially sized C/C++/Java applications. Parfait's success is founded on the pioneering work in advancing static program analysis techniques by Cristina’s team of Researchers and Engineers at Oracle Labs Australia.

Cristina’s passion for tackling the big issues in the field of Program Analysis began with her doctoral work in binary decompilation at Queensland’s University of Technology. In an interview with Richard Morris for Geek of the Week, Cristina talks about Parfait, Walkabout and her career journey in this field.

Before she joined Oracle and Sun Microsystems, Cristina held teaching posts at major Australian Universities, co-edited Going Digital, a landmark book on cybersecurity, and served on the executive committees of ACM SIGPLAN and IEEE Reverse Engineering.

Cristina continues to play an active role in the international programming language, compiler construction and software security communities. On the weekends, she channels her interests into mentoring young programmers through the CoderDojo network.

Personal Career Highlights

Mentor at CoderDojo Brisbane
Adjunct Professor, School of Information Technology and Electrical Engineering, The University of Queensland
Adjunct Professor, School of Electrical Engineering and Computer Science, Queensland University of Technology
Chancellor's Outstanding Alumnus (2001), Queensland University of Technology
PhD in Computer Science, "Decompilation of Binary Programs" (1994), Queensland University of Technology

Buffer Overflow Detection for C Programs is Hard to Learn

Tue 10 Sep 2019 3:00pm–4:00pm

Venue

Richards Building (#05)

Room:

213