UDdup - URLs Deduplication Tool

Overview

UDdup - URLs Deduplication Tool

The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive and points to the same web template.

For example:

https://www.example.com/product/123
https://www.example.com/product/456
https://www.example.com/product/123?is_prod=false
https://www.example.com/product/222?is_debug=true

All the above are probably points to the same product "template". Therefore it should be enough to scan only some of these URLs by our various scanners.

The result of the above after UDdup should be:

https://www.example.com/product/123?is_prod=false
https://www.example.com/product/222?is_debug=true

Why do I need it?

Mostly for better (automated) reconnaissance process, with less noise (for both the tester and the target).

Examples

Take a look at demo.txt which is the raw URLs file which results in demo-results.txt.


Installation

With pip (Recommended)

pip install uddup

Manual (from code)

# Clone the repository.
git clone https://github.com/rotemreiss/uddup.git

# Install the Python requirements.
cd uddup
pip install -r requirements.txt

Usage

uddup -u demo.txt -o ./demo-result.txt

More Usage Options

uddup -h

Short Form Long Form Description
-h --help Show this help message and exit
-u --urls File with a list of urls
-o --output Save results to a file
-s --silent Print only the result URLs
-fp --filter-path Filter paths by a given Regex

Filter Paths by Regex

Allows filtering custom paths pattern. For example, if we would like to filter all paths that starts with /product we will need to run:

# Single Regex
uddup -u demo.txt -fp "^product"

Input:

https://www.example.com/
https://www.example.com/privacy-policy
https://www.example.com/product/1
https://www.example2.com/product/2
https://www.example3.com/product/4

Output:

https://www.example.com/
https://www.example.com/privacy-policy

Advanced Regex with multiple path filters

uddup -u demo.txt -fp "(^product)|(^category)"

Contributing

Feel free to fork the repository and submit pull-requests.


Support

Create new GitHub issue

Want to say thanks? :) Message me on Linkedin


License

License

Comments
  • cant run uddup

    cant run uddup

    This tool is so great and really useful but i have noticed you will move the uddup execute script to /usr/local/bin directory which is actually doesnt work some times because its needs to be in /usr/bin directory to be executed i dont know why... I copied the script to /usr/bin and its worked perfectly. Im using kali linux subsystem on windows 11. Sorry if theres a problem with my issue report, its my first issue report on github :V Thanks.

    question 
    opened by siratsami 2
  • Multiple hostnames (domains) which shares the same patterns conflicts

    Multiple hostnames (domains) which shares the same patterns conflicts

    I found out that I missed a very basic case like:

    https://www.example.com/product/123
    https://www.example2.com/product/123
    

    This currently results in one URL instead of two:

    https://www.example.com/product/123
    ```.
    bug 
    opened by rotemreiss 1
  • fix bug with unicode char in urls

    fix bug with unicode char in urls

    This fixes a problem with URLs with UTF8 chars, e.g:

    echo "http://www.shakedos.com:80/index.php/2010/05/עבודה-עם-שפות-ללא-טבלאות-מוכנות/feed/" > /tmp/urls.txt
    uddup -u /tmp/urls.txt 
    ...
    Traceback (most recent call last):
      File "/usr/local/bin/uddup", line 11, in <module>
        sys.exit(interactive())
      File "/usr/local/lib/python3.5/dist-packages/uddup/main.py", line 269, in interactive
        main(args.urls_file, args.output, args.silent, args.filter_path)
      File "/usr/local/lib/python3.5/dist-packages/uddup/main.py", line 184, in main
        for url in f:
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 100: ordinal not in range(128
    
    bug good first issue 
    opened by Shaked 0
  • Support paths filtering by Regex

    Support paths filtering by Regex

    Support paths filtering by Regex, as unew does.

    Requirements

    • Support custom regex to be provided by the user

    Known limitations

    • Only the path will be filtered while ignoring the hostname and parameters (may be extended in the future)
    enhancement 
    opened by rotemreiss 0
  • [request] - de-duplicate similar paths

    [request] - de-duplicate similar paths

    Hi Rotem,

    Currently, uddup is not able to de-duplicate similar paths like below.

    /users/122/edit
    /users/123/edit
    

    image

    This project https://github.com/ameenmaali/urldedupe trying to solve similar problems is able to de-duplicate them. The only issue is since it's written in C++ it requires rebuilding binary for a new machine.

    -- Regards, @bugbaba

    enhancement 
    opened by bugbaba 1
Releases(v0.9.3)
Owner
Rotem Reiss
Rotem Reiss
🌐 URL parsing and manipulation made easy.

furl is a small Python library that makes parsing and manipulating URLs easy. Python's standard urllib and urlparse modules provide a number of URL re

Ansgar Grunseid 2.4k Jan 04, 2023
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022
Astra is a tool to find URLs and secrets.

Astra finds urls, endpoints, aws buckets, api keys, tokens, etc from a given url/s. It combines the paths and endpoints with the given domain and give

Stinger 198 Dec 27, 2022
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirectio

JAYAKUMAR 28 Sep 11, 2022
Shorten-Link - Make shorten URL with Cuttly API

Shorten-Link This Script make shorten URL with custom slashtag The script take f

Ahmed Hossam 3 Feb 13, 2022
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

13 Apr 14, 2022
:electric_plug: Generating short urls with python has never been easier

pyshorteners A simple URL shortening API wrapper Python library. Installing pip install pyshorteners Documentation https://pyshorteners.readthedocs.i

Ellison 350 Dec 24, 2022
Python implementation for generating Tiny URL- and bit.ly-like URLs.

Short URL Generator Python implementation for generating Tiny URL- and bit.ly-like URLs. A bit-shuffling approach is used to avoid generating consecut

Alireza Savand 170 Dec 28, 2022
Temporary-shortner - A webapp that shortner URLs but for limited time

temporary-shortner A webapp that shortens URLs but for a limited time Demo site

Vitor 2 Jan 07, 2022
hugeURLer 是一个基于 Python 和 GitHub action 的短链接服务

hugeURLer 是一个基于 Python 和 GitHub action 的短链接服务 如何使用 您需要把库 clone 到本地,然后在终端执行 python3 .\src\addNewRedirection.py url ,就能创建一个指向你设置的 url 的跳转页面。

安东尼洪 2 Dec 22, 2021
C++ library for urlencode.

liburlencode C library for urlencode.

Khaidi Chu 6 Oct 31, 2022
ShortenURL-model - The model layer class for shorten url service

ShortenURL Model The model layer class for shorten URL service Usage Complete th

TwinIsland 1 Jan 07, 2022
A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.

webargs Homepage: https://webargs.readthedocs.io/ webargs is a Python library for parsing and validating HTTP request objects, with built-in support f

marshmallow-code 1.3k Jan 01, 2023
Simple python library to deal with URI Templates.

uritemplate Documentation -- GitHub -- Travis-CI Simple python library to deal with URI Templates. The API looks like from uritemplate import URITempl

Hyper 210 Dec 19, 2022
A simple URL shortener built with Flask

A simple URL shortener built with Flask and MongoDB.

Mike Lowe 2 Feb 05, 2022
Use this module to detect if a URL is on discord's phishing list.

PhishDetector This module was made so you can check a URL and see if it's in discord's official list of phishing and suspicious URLs. Installation pip

Elijah 4 Mar 25, 2022
find all the URL of a site with a specific Regex

href this program will find all the link with a spesfic Regex pattern from a site. what it will do in any site there are a lots of url that may you ne

Arya Shabane 12 Dec 05, 2022
python3 flask based python-url-shortener microservice.

python-url-shortener This repository is for managing all public/private entity specific api endpoints for an organisation. In this case we have entity

Asutosh Parida 1 Oct 18, 2021
declutters url lists for crawling/pentesting

uro Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

Somdev Sangwan 677 Jan 07, 2023
Simple Version of ouo.io. shorten any link on the web easily

OUO.IO LINK SHORTENER This is a simple python script that made to short links. currently ouo.io doesn't have Application Programming Interface so i de

Danushka-Madushan 1 Dec 11, 2021