Similar looking domain detection using python fuzzywuzzy

Overview

Similar-looking-domain-detection-using-python-fuzzywuzzy

Major cause of phishing and BEC incident is similar looking domain, if you detect it early, you can prevent incidents early, python fuzzywuzzy module let you do that and here is the process.

By statistics every day thousands of domains are registered, some are use for legit purpose and some are not. BEC incidents incresing every day and cost millions to businesses, the core of BEC is spoofed email that looks similar to your business email. Because of these similar looking domain we fall pray to BEC incidents. Sometimes we end up submitting our credential when we received any email having link that looks like genuine website. e.g. microsoft.com vs micr0soft.com.

How can we detect/prevent such incident?

In python you have module named "fuzzywuzzy" that looks for similarity in strings and gives score of how similar strings are, like 90% match, 66% match. Use that to look for simiar domains

  1. Gather data (SIEM) having domain related information e.g. Proxy logs, DNS logs, Mail logs.
  2. Have list of domains related to your business ( your owned domain, list of vendor domains with whom you carry out business )
  3. Now run fuzz module against this data and check ratio which is more than 50% ( e.g. given below )
  4. Do the analysis of domains (check whois data) which looks similar to your domain, if genuine add to list gathered in Step 2
  5. If domain is not genuine, start digging more on that domain, like any email received(mail logs), any user visited the domain(proxy logs)
  6. You can run such checks on hourly, daily basis.... thats it.

Here are coule of examples.

  1. Basic example

    from fuzzywuzzy import fuzz
    a = "microsoft.com"
    b = "micros0ft.com"
    print("Match ratio is ", str(fuzz.ratio(a, b)), "%") // fuzz.ration(a,b) function gives you match score

  2. Working code

    from fuzzywuzzy import fuzz

    dns_data = open(r'/home/user/Desktop/BEC/your_domain.txt','r') # List of genuine domains owned by you
    output = open(r'/home/user/Desktop/BEC/output.txt','w') # Output file

    for dns in dns_data:

    domain = open(r'/home/user/Desktop/BEC/domain-names-data.txt','r') # domain data gathered from proxy/dns/mail logs
    for site in domain:

    if ( fuzz.ratio(site.rstrip(),dns.rstrip()) > 80 ):

    output.write("Match ratio is: \t" + dns.rstrip() + "\t" + site.rstrip() + "\t" + str(fuzz.ratio(site.rstrip(),dns.rstrip())))
    output.write("\n")

If you have access to whois database then you can run this code against newly registered domain everyday and probably you can get the result early!!!

I have run this code against newly registered doamin on 3rd Nov. Legit domains considered are top 1000 domains. Results are amazing as to how many similar looking domains are registered everyday and no wonder we receive lot of offerers from amzon apples :) Check out Output.txt file

Feel free to share your thoughts!!!

SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance .

SysInfo SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance . Installation Download t

5 Nov 08, 2021
A string to hashtags module

A string to hashtags module

Fayas Noushad 4 Dec 01, 2021
Script to generate a massive volume of data in sql, csv, json or xml format

DataGenerator Made with Python Open for pull requests 1. Dependencies To install required dependencies run pip install -r requirements.txt 2. Executi

icrescenti 3 Sep 20, 2022
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark - Parsing Library & Toolkit 3.5k Jan 05, 2023
A thing to simplify listening for PG notifications with asyncpg

asyncpg-listen This library simplifies usage of listen/notify with asyncpg: Handles loss of a connection Simplifies notifications processing from mult

ANNA 18 Dec 23, 2022
Attempts to crack the compression puzzle.

The Compression Puzzle One lovely Friday we were faced with this nice yet intriguing programming puzzle. One shall write a program that compresses str

Oto Brglez 14 Dec 29, 2022
Cardano Stakepools: Check for scheduled blocks in current epoch.

ReLeaderLogs For Cardano Stakepool Operators: Lightweight Scheduled Blocks Checker for Current Epoch. No cardano-node Required, data is taken from blo

SNAKE (Cardano Stakepool) 2 Oct 19, 2021
Backman is a random/fixed background image setter for wlroots based compositors

backman Backman is a random/fixed background image setter for wlroots based compositors Dependencies: The program depends on swaybg, python3-toml (or

Hemish 3 Mar 09, 2022
python package for generating typescript grpc-web stubs from protobuf files.

grpc-web-proto-compile NOTE: This package has been superseded by romnn/proto-compile, which provides the same functionality but offers a lot more flex

Roman Dahm 0 Sep 05, 2021
Functional UUIDs for Python.

🏷️FUUID stands for Functional Universally Unique IDentifier. FUUIDs are compatible with regular UUIDs but are naturally ordered by generation time, collision-free and support succinct representations

Phil Demetriou 147 Oct 27, 2022
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

Shreyas Ashtamkar 5 Oct 21, 2022
✨ Un chois aléatoire d'un article sur Wikipedia totalement fait en Python par moi, et en français.

Wikipedia Random Article ❗ Un chois aléatoire d'un article sur Wikipedia totalement fait en Python par moi, et en français. 🔮 Grâce a une requète a w

MrGabin 4 Jul 18, 2021
A toolkit for writing and executing automation scripts for Final Fantasy XIV

XIV Scripter This is a tool for scripting out series of actions in FFXIV. It allows for custom actions to be defined in config.yaml as well as custom

Jacob Beel 1 Dec 09, 2021
Genart - Generate random art to sell as nfts

Genart - Generate random art to sell as nfts Usage git clone

Will 13 Mar 17, 2022
A python script to generate wallpaper

wallpaper eits Warning You need to set the path to Robot Mono font in the source code. (Settings are in the main function) Usage A script that given a

Henrique Tsuyoshi Yara 5 Dec 02, 2021
Library for processing molecules and reactions in python way

Chython [ˈkʌɪθ(ə)n] Library for processing molecules and reactions in python way. Features: Read/write/convert formats: MDL .RDF (.RXN) and .SDF (.MOL

16 Dec 01, 2022
Aurin - A quick AUR installer for Arch Linux. Install packages from AUR website in a click.

Aurin - A quick AUR installer for Arch Linux. Install packages from AUR website in a click.

Suleman 51 Nov 04, 2022
Bounding Boxes Python Utils

Bounding Boxes Python Utils

Vadim 4 May 01, 2022
EthTx - Ethereum transactions decoder

EthTx - Ethereum transactions decoder Installation pip install ethtx Requirements The package needs a few external resources, defined in EthTxConfig o

398 Dec 25, 2022
Patch the pclntable from Go binaries

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

6 Oct 05, 2022