Searches through git repositories for high entropy strings and secrets, digging deep into commit history

Overview

truffleHog

codecov

Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.

Join The Slack

Have questions? Feedback? Jump in slack and hang out with me

https://join.slack.com/t/trufflehog-community/shared_invite/zt-pw2qbi43-Aa86hkiimstfdKH9UCpPzQ

NEW

truffleHog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to suppress entropy checking has also been added.

truffleHog --regex --entropy=False https://github.com/dxa4481/truffleHog.git

or

truffleHog file:///user/dxa4481/codeprojects/truffleHog/

With the --include_paths and --exclude_paths options, it is also possible to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths. To illustrate, see the example include and exclude files below:

include-patterns.txt:

src/
# lines beginning with "#" are treated as comments and are ignored
gradle/
# regexes must match the entire path, but can use python's regex syntax for
# case-insensitive matching and other advanced options
(?i).*\.(properties|conf|ini|txt|y(a)?ml)$
(.*/)?id_[rd]sa$

exclude-patterns.txt:

(.*/)?\.classpath$
.*\.jmx$
(.*/)?test/(.*/)?resources/

These filter files could then be applied by:

trufflehog --include_paths include-patterns.txt --exclude_paths exclude-patterns.txt file://path/to/my/repo.git

With these filters, issues found in files in the root-level src directory would be reported, unless they had the .classpath or .jmx extension, or if they were found in the src/test/dev/resources/ directory, for example. Additional usage information is provided when calling trufflehog with the -h or --help options.

These features help cut down on noise, and makes the tool easier to shove into a devops pipeline.

Example

Install

pip install truffleHog

Customizing

Custom regexes can be added with the following flag --rules /path/to/rules. This should be a json file of the following format:

{
    "RSA private key": "-----BEGIN EC PRIVATE KEY-----"
}

Things like subdomain enumeration, s3 bucket detection, and other useful regexes highly custom to the situation can be added.

Feel free to also contribute high signal regexes upstream that you think will benefit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed.

trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json

To explicitly allow particular secrets (e.g. self-signed keys used only for local testing) you can provide an allow list --allow /path/to/allow in the following format:

{
    "local self signed test key": "-----BEGIN EC PRIVATE KEY-----\nfoobar123\n-----END EC PRIVATE KEY-----",
    "git cherry pick SHAs": "regex:Cherry picked from .*",
}

Note that values beginning with regex: will be used as regular expressions. Values without this will be literal, with some automatic conversions (e.g. flexible newlines).

How it works

This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, truffleHog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen.

Help

usage: trufflehog [-h] [--json] [--regex] [--rules RULES] [--allow ALLOW]
                  [--entropy DO_ENTROPY] [--since_commit SINCE_COMMIT]
                  [--max_depth MAX_DEPTH]
                  git_url

Find secrets hidden in the depths of git.

positional arguments:
  git_url               URL for secret searching

optional arguments:
  -h, --help            show this help message and exit
  --json                Output in JSON
  --regex               Enable high signal regex checks
  --rules RULES         Ignore default regexes and source from json list file
  --allow ALLOW         Explicitly allow regexes from json list file
  --entropy DO_ENTROPY  Enable entropy checks
  --since_commit SINCE_COMMIT
                        Only scan from a given commit hash
  --branch BRANCH       Scans only the selected branch
  --max_depth MAX_DEPTH
                        The max commit depth to go back when searching for
                        secrets
  -i INCLUDE_PATHS_FILE, --include_paths INCLUDE_PATHS_FILE
                        File with regular expressions (one per line), at least
                        one of which must match a Git object path in order for
                        it to be scanned; lines starting with "#" are treated
                        as comments and are ignored. If empty or not provided
                        (default), all Git object paths are included unless
                        otherwise excluded via the --exclude_paths option.
  -x EXCLUDE_PATHS_FILE, --exclude_paths EXCLUDE_PATHS_FILE
                        File with regular expressions (one per line), none of
                        which may match a Git object path in order for it to
                        be scanned; lines starting with "#" are treated as
                        comments and are ignored. If empty or not provided
                        (default), no Git object paths are excluded unless
                        effectively excluded via the --include_paths option.

Running with Docker

First, enter the directory containing the git repository

cd /path/to/git

To launch the trufflehog with the docker image, run the following"

docker run --rm -v "$(pwd):/proj" dxa4481/trufflehog file:///proj

-v mounts the current working dir (pwd) to the /proj dir in the Docker container

file:///proj references that very same /proj dir in the container (which is also set as the default working dir in the Dockerfile)

Wishlist

  • A way to detect and not scan binary diffs
  • Don't rescan diffs if already looked at in another branch
  • A since commit X feature
  • Print the file affected
Comments
  • fix #8 - add `--include` and `--exclude` options

    fix #8 - add `--include` and `--exclude` options

    Fixes issue #8 by adding --include_paths and --exclude_paths options that allow the user to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths.

    If provided, the --include_paths option should point to a file with regular expressions (one per line), at least one of which must match a Git object path in order for it to be scanned. If empty or not provided (default), all Git object paths are included (unless otherwise excluded via the --exclude_paths option).

    Likewise, the --exclude_paths option, when provided, should point to a file with regular expressions, none of which may match a Git object path in order for it to be scanned. If empty or not provided (default), no Git object paths are excluded (unless effectively excluded via the --include_paths option).

    In either file, lines starting with "#" are treated as comments and are ignored.

    opened by milo-minderbinder 22
  • fix --since_commit parameter

    fix --since_commit parameter

    Hi, how can I contribute to this project? I was running truffleHog and using the --since_commit parameter, however it was buggy and did not work as expected. I made a very small change, and it worked as expected. Do you accept PRs or should I just tell you the change so you can verify it?

    opened by fahrishb 18
  • The regex functionality is not working as expected

    The regex functionality is not working as expected

    I git cloned the truffleHog repository. Changed my regexChecks.py file to look like below:

    import re
    
    regexes = {
        "Slack Token XOXP": re.compile('xoxp.*'),
        "Slack Token XOXB": re.compile('xoxb.*'),
        "Slack Token XOXO": re.compile('xoxo.*'),
        "Slack Token XOXA": re.compile('xoxa.*'),
        "AWS API Key": re.compile('AKIA.*'),
        "Private key": re.compile('-----BEGIN PRIVATE KEY-----.*')
    }
    

    I then installed the libraries required to run the tool by typing pip install -r requirements.txt. My requirements.txt file looked like below:

    GitPython==2.1.5
    gitdb2==2.0.2
    smmap2==2.0.2
    

    Finally, I ran the tool by typing - python truffleHog.py --regex --entropy=False https://github.com/secretuser1/secretrepo.git

    It printed out the Private Key, Slack Token XOXP and Slack Token XOXB. It should have also printed out the AWS key here - https://github.com/secretuser1/secretrepo/blob/master/secretfile.txt#L2 but it did not, even though the regex is present.

    Any idea why?

    opened by anshumanbh 14
  • Adding the capability for scanning a directory

    Adding the capability for scanning a directory

    This PR adds the capability for truffleHog to recursively scan a directory instead of a Git repository with all its history. This can be useful in CI pipelines or other situations where it is desirable to scan the codebase at a single point in time. Additionally, it can also be used to scan code that is not stored in Git.

    I've done some minor refactoring to the existing scanning code to reduce code duplication.

    opened by runako 14
  • ValueError: unknown reasons (During run application on EC2 RedHat)

    ValueError: unknown reasons (During run application on EC2 RedHat)

    If anyone can help I will be appreciate! Describe the bug Having an error during running the app on EC2 RedHat : ValueError: unknown reasons

    I installed on redhat ec2 instance trufflehog. By default there python 2.7 and 3.6 Trufflehog was installed from pip3. pip3 freeze shows that everything installed : gitdb==4.0.5 gitdb2==4.0.2 GitPython==3.0.6 smmap==3.0.5 truffleHog==2.2.1 truffleHogRegexes==0.0.7

    After installation and running next command ( just to check does it work or not) trufflehog --regex --entropy=False https://github.com/dxa4481/truffleHog.git ( I got next error) : Traceback (most recent call last): File "/usr/local/bin/trufflehog", line 11, in sys.exit(main()) File "/usr/local/lib64/python3.6/site-packages/truffleHog/truffleHog.py", line 93, in main surpress_output=False, branch=args.branch, repo_path=args.repo_path, path_inclusions=path_inclusions, path_exclusions=path_exclusions, allow=allow) File "/usr/local/lib64/python3.6/site-packages/truffleHog/truffleHog.py", line 351, in find_strings diff_hash = hashlib.md5((str(prev_commit) + str(curr_commit)).encode('utf-8')).digest() ValueError: unknown reasons

    opened by RepositoryOfCode 12
  •  Git issue when trying to scan cloned project in Apple-mac: trufflehog file:///

    Git issue when trying to scan cloned project in Apple-mac: trufflehog file:///

    Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/bin/trufflehog", line 10, in sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/truffleHog/truffleHog.py", line 82, in main surpress_output=False, branch=args.branch, repo_path=args.repo_path, path_inclusions=path_inclusions, path_exclusions=path_exclusions) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/truffleHog/truffleHog.py", line 309, in find_strings project_path = clone_git_repo(git_url) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/truffleHog/truffleHog.py", line 152, in clone_git_repo Repo.clone_from(git_url, project_path) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/repo/base.py", line 925, in clone_from return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/repo/base.py", line 880, in _clone finalize_process(proc, stderr=stderr) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/util.py", line 341, in finalize_process proc.wait(**kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/cmd.py", line 291, in wait raise GitCommandError(self.args, status, errstr) git.exc.GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git clone -v file:///GitHub/Github-guardian/ /var/folders/37/md_m401d073bw1vt7q863pbw0000gn/T/tmpchgjs_ln stderr: 'Cloning into '/var/folders/37/md_m401d073bw1vt7q863pbw0000gn/T/tmpchgjs_ln'... fatal: '/GitHub/Github-guardian/' does not appear to be a git repository fatal: Could not read from remote repository.

    Please make sure you have the correct access rights and the repository exists. '

    opened by dgurazada 12
  • Depth limits are needed to prevent long jobs

    Depth limits are needed to prevent long jobs

    When leveraging trufflehog for repo scans, it would be helpful to introduce the concept of depth limits, to ensure that when a scan is performed, it only goes to a certain number of commits back. On a test repository that I have, there is a huge number of commits dating back to 2014, and the job is running well more than 24 hours to go deep across all of them.

    opened by dend 11
  • WindowsError: [Error 5] Access is denied

    WindowsError: [Error 5] Access is denied

    Traceback (most recent call last): File "trufflehog.py", line 106, in <module> find_strings(args.git_url) File "trufflehog.py", line 98, in find_strings shutil.rmtree(project_path) File "C:\Python27\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python27\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python27\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python27\lib\shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Python27\lib\shutil.py", line 250, in rmtree os.remove(fullname) WindowsError: [Error 5] Access is denied: 'temp\\[uuid]\\.git\\objects\\pack\\pack-[uuid].idx'

    When scanning some repos. (This one crashes half way through, This one crashes at startup)

    opened by Peter-Maguire 11
  • i cant see the result

    i cant see the result

    1. See error

    {"level":"debug","msg":"Cloning remote Git repo without authentication","time":"2022-04-05T16:19:28Z"} {"level":"debug","msg":"Git repo local path: /tmp/trufflehog944564607","time":"2022-04-05T16:23:19Z"}

    2022/04/05 16:44:16 [updater parent] prog exited with 1

    I can see the result if found or note even if I use --json I cant see the saved file its always clone the repo in tmp folder after finish scanning it should delete the cloned folder in the tmp

    bug 
    opened by abramas 10
  • Hardcoded thresholds of 20 in get_strings_of_set()

    Hardcoded thresholds of 20 in get_strings_of_set()

    threshold keyword variable is declared and used on the last if statement Line 39, but not in the first else statement Line 35

    def get_strings_of_set(word, char_set, threshold=20):
        count = 0
        letters = ""
        strings = []
        for char in word:
            if char in char_set:
                letters += char
                count += 1
            else:
                if count > 20:
                    strings.append(letters)
                letters = ""
                count = 0
        if count > threshold:
            strings.append(letters)
    
    opened by bandrel 10
  • gitdb update breaks trufflehog

    gitdb update breaks trufflehog

    Probably related to #198

    We install inside a docker container using:

    $ pip install truffleHog==2.0.99
    

    We run:

    $ trufflehog --regex --entropy=False .
    

    Starting today this errored with:

    Traceback (most recent call last):
       File "/usr/local/bin/trufflehog", line 5, in <module>
         from truffleHog.truffleHog import main
       File "/usr/local/lib/python3.8/site-packages/truffleHog/truffleHog.py", line 17, in <module>
         from git import Repo
       File "/usr/local/lib/python3.8/site-packages/git/__init__.py", line 38, in <module>
         from git.config import GitConfigParser  # @NoMove @IgnorePep8
       File "/usr/local/lib/python3.8/site-packages/git/config.py", line 16, in <module>
         from git.compat import (
       File "/usr/local/lib/python3.8/site-packages/git/compat.py", line 16, in <module>
         from gitdb.utils.compat import (
     ModuleNotFoundError: No module named 'gitdb.utils.compat'
    

    A quick dive down the dependency tree showed that the trufflehog dependency on gitpython-2.1.1 (here) is pulling in gitdb2-3.0.2 (here) which has removed the gitdb.utils.compat (PR)

    Our fix for now is to use (may be useful to others):

    pip install gitdb2==3.0.0 truffleHog==2.0.99
    
    opened by danieldooley 9
  • Use access-token endpoint for validity check

    Use access-token endpoint for validity check

    This PR fixes the issue https://github.com/trufflesecurity/trufflehog/issues/990, it should correctly report keys as valid even if they are missing the user_read scope.

    opened by clonsdale-canva 1
  • Buildkite token validation missing tokens without user_read scope

    Buildkite token validation missing tokens without user_read scope

    Community Note

    • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
    • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
    • If you are interested in working on this issue or have submitted a pull request, please leave a comment

    TruffleHog Version

    3.21.0

    Expected Behavior

    Buildkite token is reported as valid

    Actual Behavior

    Buildkite token is not validated as the API call fails due to missing user_read scope

    Additional Context

    The logic to check if a buildkite token is valid will send out an API call to the /user endpoint https://github.com/trufflesecurity/trufflehog/blob/009756dce61948a66cf90a8b14018460c91ab4f0/pkg/detectors/buildkite/buildkite.go#L51. This will miss all tokens which do not have the read_user scope.

    Instead, we can use the access-token endpoint, which will return 200 for any valid token, and report on the scopes present / ID of the token - https://buildkite.com/docs/apis/rest-api/access-token.

    bug 
    opened by clonsdale-canva 0
  • Add max-depth limit to GitHub subcommand

    Add max-depth limit to GitHub subcommand

    Community Note

    • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
    • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
    • If you are interested in working on this issue or have submitted a pull request, please leave a comment

    Description

    Ability to limit the depth of the commit history being scanned for GitHub users We need the ability to set a --max-depth= limit to GitHub sub command.

    Problem to be Addressed

    It is very noisy for large GitHub enterprises to detect new issues due to the inability to ignore historical commit history that one has already remediated. Results have to be saved into a spreadsheet or database and then diff'd to see what has changed.

    Description of the Preferred Solution

    The ability to set a --max-depth= limit to GitHub sub command. This would be very beneficial when attempting to scan a GitHub enterprise repositories as a group.

    Additional Context

    References

    • #0000
    enhancement 
    opened by dwilliamsstc 0
  • go install - missing dot in first path element

    go install - missing dot in first path element

    build github.com/trufflesecurity/trufflehog/v3: cannot load embed: malformed module path "embed": missing dot in first path element

    Community Note

    • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
    • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
    • If you are interested in working on this issue or have submitted a pull request, please leave a comment

    TruffleHog Version

    Trace Output

    Expected Behavior

    Actual Behavior

    Steps to Reproduce

    Distributor ID: Elementary Description: elementary OS 6.1 Jólnir Release: 6.1 Codename: jolnir

    Additional Context

    References

    • #0000
    bug 
    opened by rip752 0
  • Run certain Detector Type

    Run certain Detector Type

    trufflehog version: trufflehog dev

    Currently I am running trufflehog as a pre-commit hook with all possible Detector type. Is it possible to only run few Detector types , say AWS keys, Private keys as such?

    bug 
    opened by Priyadhana 0
Releases(v3.21.0)
Windows Server 2016, 2019, 2022 Extracter & Recovery

Parsing files from Deduplicated volumes. It can also recover deleted files from NTFS Filesystem that were deduplicated. Installation git clone https:/

0 Aug 28, 2022
Writeups for wtf-CTF hosted by Manipal Information Security Team as part of Techweek2021- INCOGNITO

wtf-CTF_Writeups Table of Contents Table of Contents Crypto Misc Reverse Pwn Web Crypto wtf_Bot Author: Madjelly Join the discord server!You know how

6 Jun 07, 2021
The best Python Backdoor👌

Backdoor The best Python Backdoor Files Server file is used in all of cases If client is Windows, the client need execute EXE file If client is Linux,

13 Oct 28, 2022
JumpServer远程代码执行漏洞检测利用脚本

Jumpserver-EXP JumpServer远程代码执行漏洞检测利用脚本

Veraxy 181 Dec 20, 2022
log4j2 dos exploit,CVE-2021-45105 exploit,Denial of Service poc

说明 about author: 我超怕的 blog: https://www.cnblogs.com/iAmSoScArEd/ github: https://github.com/iAmSOScArEd/ date: 2021-12-20 log4j2 dos exploit log4j2 do

3 Aug 13, 2022
Dome - Subdomain Enumeration Tool. Fast and reliable python script that makes active and/or passive scan to obtain subdomains and search for open ports.

DOME - A subdomain enumeration tool Check the Spanish Version Dome is a fast and reliable python script that makes active and/or passive scan to obtai

Vadi 329 Jan 01, 2023
Open Source Tool - Cybersecurity Graph Database in Neo4j

GraphKer Open Source Tool - Cybersecurity Graph Database in Neo4j |G|r|a|p|h|K|e|r| { open source tool for a cybersecurity graph database in neo4j } W

Adamantios - Marios Berzovitis 27 Dec 06, 2022
Multi-Process Vulnerability Tool

Multi-Process Vulnerability Tool

Baris Dincer 1 Dec 22, 2021
A wordlist generator tool, that allows you to supply a set of words, giving you the possibility to craft multiple variations from the given words, creating a unique and ideal wordlist to use regarding a specific target.

A wordlist generator tool, that allows you to supply a set of words, giving you the possibility to craft multiple variations from the given words, creating a unique and ideal wordlist to use regardin

Cycurity 39 Dec 10, 2022
Hack computer in the form of RAR files from all types of clients, even Linux

Program Features 📌 Hide malware 📌 Vulnerability software vulnerabilities RAR 📌 Creating malware 📌 Access client files 📌 Client Hacking 📌 Link Do

hack4lx 5 Nov 25, 2022
Something I built to test for Log4J vulnerabilities on customer networks.

Log4J-Scanner Something I built to test for Log4J vulnerabilities on customer networks. I'm not responsible if your computer blows up, catches fire or

1 Dec 20, 2021
DepFine Is a tool to find the unregistered dependency based on dependency confusion valunerablility and lead to RCE

DepFine DepFine Is a tool to find the unregistered dependency based on dependency confusion valunerablility and lead to RCE Installation: You Can inst

Hossam mesbah 14 Nov 11, 2022
ONT Analysis Toolkit (OAT)

A toolkit for monitoring ONT MinION sequencing, followed by data analysis, for viral genomes amplified with tiled amplicon sequencing.

6 Jun 14, 2022
Safe Policy Optimization with Local Features

Safe Policy Optimization with Local Feature (SPO-LF) This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization wi

Akifumi Wachi 6 Jun 05, 2022
Python script to tamper with pages to test for Log4J Shell vulnerability.

log4jShell Scanner This shell script scans a vulnerable web application that is using a version of apache-log4j 2.15.0. This application is a static

GoVanguard 8 Oct 20, 2022
A proxy for asyncio.AbstractEventLoop for testing purposes

aioloop-proxy A proxy for asyncio.AbstractEventLoop for testing purposes. When tests writing for asyncio based code, there are controversial requireme

aio-libs 12 Dec 12, 2022
CVE-2021-41773 Path Traversal for Apache 2.4.49

CVE-2021-41773 Path Traversal for Apache 2.4.49

ac1d 3 Oct 20, 2021
A simple linux keylogger project.

The project This project is a simple linux keylogger. When activated, it registers all the actions made with the keyboard. The log files are registere

1 Oct 24, 2021
CVE-2022-21907 - Windows HTTP协议栈远程代码执行漏洞 CVE-2022-21907

CVE-2022-21907 Description POC for CVE-2022-21907: Windows HTTP协议栈远程代码执行漏洞 creat

antx 365 Nov 30, 2022
Obfuscate your Python scripts better, faster.

⚜️ Berserker ⚜️ An unique Python3 obfuscator using Kyrie Eleison's encryption protocol, written in Python3. 📋 Examples 📋 Unobfuscated: input("Hello

Billy 81 Dec 07, 2022