A tool that automatically creates fuzzing harnesses based on a library

Overview

AutoHarness

Created by Akshat Parikh

What is this tool?

AutoHarness is a tool that automatically generates fuzzing harnesses for you. This idea stems from a concurrent problem in fuzzing codebases today: large codebases have thousands of functions and pieces of code that can be embedded fairly deep into the library. It is very hard or sometimes even impossible for smart fuzzers to reach that codepath. Even for large fuzzing projects such as oss-fuzz, there are still parts of the codebase that are not covered in fuzzing. Hence, this program tries to alleviate this problem in some capacity as well as provide a tool that security researchers can use to initially test a code base. This program only supports code bases which are coded in C and C++.

Setup/Demonstration

This program utilizes llvm and clang for libfuzzer, Codeql for finding functions, and python for the general program. This program was tested on Ubuntu 20.04 with llvm 12 and python 3. Here is the initial setup.

sudo apt-get update;
sudo apt-get install python3 python3-pip llvm-12* clang-12 git;
pip3 install pandas lief subprocess os argparse ast;

Follow the installation procedure for Codeql on https://github.com/github/codeql. Make sure to install the CLI tools and the libraries. For my testing, I have stored both the tools and libraries under one folder. Finally, clone this repository or download a release. Here is the program's output after running on nginx with the multiple argument mode set. This is the command I used.

python3 harness.py -L /home/akshat/nginx-1.21.0/objs/ -C /home/akshat/codeql-h/ -M 1 -O /home/akshat/autoharness/ -D nginx -G 1 -Y 1 -F "-I /home/akshat/nginx-1.21.0/objs -I /home/akshat/nginx-1.21.0/src/core -I /home/akshat/nginx-1.21.0/src/event -I /home/akshat/nginx-1.21.0/src/http -I /home/akshat/nginx-1.21.0/src/mail -I /home/akshat/nginx-1.21.0/src/misc -I /home/akshat/nginx-1.21.0/src/os -I /home/akshat/nginx-1.21.0/src/stream -I /home/akshat/nginx-1.21.0/src/os/unix" -X ngx_config.h,ngx_core.h

Results: image It is definitely possible to raise the success by further debugging the compilation and adding more header files and more. Note the nginx project does not have any shared objects after compiling. However, this program does have a feature that can convert PIE executables into shared libraries.

Planned Features (in order of progress)

  1. Struct Fuzzing

The current way implemented in the program to fuzz functions with multiple arguments is by using fuzzing data provider. There are some improvements to make in this integration; however, I believe I can incorporate this feature with data structures. A problem which I come across when coding this is with codeql and nested structs. It becomes especially hard without writing multiple queries which vary for every function. In short, this feature needs more work. I was also thinking about a simple solution using protobufs.

  1. Implementation Based Harness Creation

Using codeql, it is possible to use to generate a control flow graph that maps how the parameters in a function are initialized. Using that information, we can create a better harness. Another way is to look for implementations for the function that exist in the library and use that information to make an educated guess on an implementation of the function as a harness. The problems I currently have with this are generating the control flow graphs with codeql.

  1. Parallelized fuzzing/False Positive Detection

I can create a simple program that runs all the harnesses and picks up on any of the common false positives using ASAN. Also, I can create a new interface that runs all the harnesses at once and displays their statistics.

Contribution/Bugs

If you find any bugs with this program, please create an issue. I will try to come up with a fix. Also, if you have any ideas on any new features or how to implement performance upgrades or the current planned features, please create a pull request or an issue with the tag (contribution).

PSA

This tool generates some false positives. Please first analyze the crashes and see if it is valid bug or if it is just an implementation bug. Also, you can enable the debug mode if some functions are not compiling. This will help you understand if there are some header files that you are missing or any linkage issues. If the project you are working on does not have shared libraries but an executable, make sure to compile the executable in PIE form so that this program can convert it into a shared library.

References

  1. https://lief.quarkslab.com/doc/latest/tutorials/08_elf_bin2lib.html
Comments
  • Fuzzers never compile

    Fuzzers never compile

    Hello, I made it to the final step in your README:

    python3 harness.py -L /root/nginx/ -C /root/codeql/ -M 1 -O /root/autoharness/ -D nginx -G 1 -Y 1 -F "-I /root/nginx/objs -I /root/nginx/src/core -I /root/nginx/src/event -I /root/nginx/src/http -I /root/nginx/src/mail -I /root/nginx/src/misc -I /root/nginx/src/os -I /root/nginx/src/stream -I /root/nginx/src/os/unix" -X ngx_config.h,ngx_core.h

    and it appears to be doing something:

    Compiling query plan for /root/codeql/multiargfunc.ql.
    [1/1 comp 19.4s] Compiled /root/codeql/multiargfunc.ql.
    Starting evaluation of codeql-cpp/multiargfunc.ql.
    [1/1 eval 4.9s] Evaluation done; writing results to /root/autoharness/multiarg.bqrs.
    Shutting down query evaluator.
    

    however, it never builds any fuzzers, this is all that appears on screen:

                      f             g                                                  t
    0             _Exit          void                                              [int]
    1        __asprintf           int    [const char *__restrict__, char **__restrict__]
    2    __asprintf_chk           int  [int, const char *__restrict__, char **__restr...
    3        __bswap_16    __uint16_t                                       [__uint16_t]
    4        __bswap_32    __uint32_t                                       [__uint32_t]
    ..              ...           ...                                                ...
    812         waitpid       __pid_t                              [int, __pid_t, int *]
    813        wcstombs        size_t  [size_t, char *__restrict__, const wchar_t *__...
    814          wctomb           int                                  [char *, wchar_t]
    815           write       ssize_t                        [int, size_t, const void *]
    816          zError  const char *                                              [int]
    
    [817 rows x 3 columns]****
    

    Am using Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 and have installed all of the requirements as mentioned in the README. Any tips would be much appreciated.

    opened by geeknik 11
  • Could not resolve type ...

    Could not resolve type ...

    Hi, The following command failed to generate harness.

    python3 harness.py -L /local/codeql-home/nginx-1.21.1/objs/ -C /local/codeql-home/codeql/ -M 1 -O /local/nginx/autoharness/ -D nginx -G 1 -Y 1 -F "-I /local/codeql-home/nginx-1.21.1/objs -I /local/codeql-home/nginx-1.21.1/src/core -I /local/codeql-home/nginx-1.21.1/src/event -I /local/codeql-home/nginx-1.21.1/src/http -I /local/codeql-home/nginx-1.21.1/src/mail -I /local/codeql-home/nginx-1.21.1/src/misc -I /local/codeql-home/nginx-1.21.1/src/os -I /local/codeql-home/nginx-1.21.1/src/stream -I /local/codeql-home/nginx-1.21.1/src/os/unix" -X ngx_config.h,ngx_core.h
    

    Error messages:

    Compiling query plan for /local/codeql-home/codeql/multiargfunc.ql.
    ERROR: Could not resolve module cpp. There should probably be a qlpack.yml file declaring dependencies in /local/codeql-home/codeql or an enclosing directory. (/local/codeql-home/codeql/multiargfunc.ql:1,8-11)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:3,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:3,30-39)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:6,40-51)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:9,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:9,27-36)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:10,65-76)
    ERROR: Could not resolve type Function (/local/codeql-home/codeql/multiargfunc.ql:13,6-14)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:13,18-22)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:14,18-27)
    ERROR: Could not resolve type Struct (/local/codeql-home/codeql/multiargfunc.ql:14,91-97)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:4,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:6,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,47-53)
    
    opened by JerryWang304 10
  • Master

    Master

    Lots of changes, some that I can remember:

    • Factoring out code to create bash commands into command_builder.py.
    • Change from readelf to nm to only harness dynamic, defined, exported function symbols.
    • Ignore functions with void * or array arguments. Arrays may be supported later.
    • Get parameters in the right order.
    • Handle const.

    Note: only tested using mode=1 and detect=1 on libsodium, with other settings you'll likely still get lots of errors but let's fix those by creating and resolving issues.

    opened by Jelle-Nauta 0
  • Support array parameters

    Support array parameters

    Functions with array parameters are currently ignored because they would lead to code like:

    auto data1= provider.ConsumeIntegral<char[16]>();
    

    Proposal: handle array-parameters as a special case, e.g. generating code like:

    char data1[16];
    for (size_t idx = 0; idx < 16; ++idx) {
        data1[idx] = provider.ConsumeIntegral<char>();
    }
    

    There may be a more elegant way to do this, but I don't see it at the moment.

    opened by Jelle-Nauta 0
  • Create test suite

    Create test suite

    Currently there are many different combinations of settings (mode, detect) and library-properties (exported or not, defined or not, function or other symbols, etc.) and many of these probably lead to errors, e.g. through uncompilable harness code.

    Proposal: create a minimal set of libraries with a comprehensive set of symbols and properties, and a test workflow to verify that autoharness works - or find bugs that can then be addressed.

    opened by Jelle-Nauta 0
Releases(1.0)
  • 1.0(Jul 10, 2021)

    Initial Release of AutoHarness -added executable to shared object functionality -added automatic header detection or function reconstruction -added automatic fuzzing harness creation for one argument and multiple arguments

    Source code(tar.gz)
    Source code(zip)
Streamlit Component, for a Chatbot UI

st-chat Streamlit Component, for a Chat-bot UI, example app authors - @yashppawar & @YashVardhan-AI Installation Install streamlit-chat with pip pip i

Yash AI 99 Jan 07, 2023
poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions

poetry2nix poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions. It does so by parsing pyproject.t

Nix community projects 405 Dec 29, 2022
Replit theme sync; Github theme sync but in Replit.

This is a Replit theme sync, basically meaning that it keeps track of the current time (which may need to be edited later on), and if the time passes morning, afternoon, etc, the theme switches. The

Glitch 8 Jun 25, 2022
Rock-paper-scissors basic game in terminal with Python

piedra-papel-tijera Juego básico de piedra, papel o tijera en terminal con Python. El juego incluye: Nombre de jugador Número de veces a jugar Resulta

Isaías Flores 1 Dec 14, 2021
An easy way to access the Scratch API!

The majority of people are likely here because they want to easily access the Scratch API!

rgantzos 0 May 04, 2022
A Lite Package focuses on making overwrite and mending functions easier and more flexible.

Overwrite Make Overwrite More flexible In Python A Lite Package focuses on making overwrite and mending functions easier and more flexible. Certain Me

2 Jun 15, 2022
Decentralized intelligent voting application.

DiVA Decentralized intelligent voting application. Hack the North 2021. Inspiration Following the previous US election, many voters were fearful that

Ali Shariatmadari 4 Jun 05, 2022
Projects using the Tkinter module in Python!

Tkinter projects This repository includes some Tkinter projects made by me. All of these are simple to understand. I create apps with good functionali

Amey 0 Sep 24, 2021
Python Project Template

A low dependency and really simple to start project template for Python Projects.

Bruno Rocha 651 Dec 29, 2022
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ahmet Efe AKYAZI 6 Aug 07, 2022
Petuhlang is a joke-like language, based on Python.

Petuhlang is a joke-like language, based on Python. It updates builtins to make a new syntax based on operators rewrite.

DenyS 9 Jun 19, 2022
A function decorator for enforcing function signatures

A function decorator for enforcing function signatures

Emmanuel I. Obi 0 Dec 08, 2021
Myrepo - A tool to create your own Arch Linux repository

myrepo A (experimental) tool to create your own Arch Linux repository Example We

Anton Hvornum 5 Feb 19, 2022
This module is for finding the execution time of a whole python program

exetime 3.8 This module is for finding the execution time of a whole program How to install $ pip install exetime Contents: General Information Instru

Saikat Das 4 Oct 18, 2021
本仓库整理了腾讯视频、爱奇艺、优酷、哔哩哔哩等视频网站中,能够观看的「豆瓣电影 Top250 榜单」影片。

Where is top 250 movie ? 本仓库整理了腾讯视频、爱奇艺、优酷、哔哩哔哩等视频网站中,能够观看的「豆瓣电影 Top250 榜单」影片,点击 Badge 可跳转至相应的电影首页。

MayanDev 123 Dec 22, 2022
API development made easy: a smart Python 3 API framework

appkernel - API development made easy What is Appkernel? A super-easy to use API framework, enabling API creation from zero to production within minut

156 Sep 28, 2022
A tool to nowcast quarterly data with monthly indicators: US consumption example

MIDAS_Nowcaster A tool to nowcast quarterly data with monthly indicators: US consumption example Pulls data directly from FRED from a list of codes -

Gene Kindberg-Hanlon 3 Oct 06, 2022
Addon to give a keybind to automatically enable contact shadows on all lights in a scene

3-2-1 Contact(Shadow) An easy way to let you enable contact shadows on all your lights, because Blender doesn't enable it by default, and doesn't give

TDV Alinsa 3 Feb 02, 2022
データサイエンスチャレンジ2021 サンプル

データサイエンスチャレンジ2021 サンプル 概要 線形補間と Catmull–Rom Spline 補間のサンプル Python スクリプトです。 データサイエンスチャレンジ2021の出題意図としましては、訓練用データ(train.csv)から機械学習モデルを作成して、そのモデルに推論させてモーシ

Bandai Namco Research Inc. 5 Oct 17, 2022
TriOTP, the OTP framework for Python Trio

TriOTP, the OTP framework for Python Trio See documentation for more informations. Introduction This project is a simplified implementation of the Erl

David Delassus 7 Nov 21, 2022