declutters url lists for crawling/pentesting

Related tags

URL Manipulationuro
Overview

uro

Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

It doesn't make any http requests to the URLs and removes:

  • human written content e.g. blog posts
  • urls with same path but parameter value difference
  • incremental urls e.g. /cat/1/ and /cat/2/
  • image, js, css and other static files

Usage

First, install uro with pip:

pip3 install uro

Now, there's just one way to use it, no args, no bullshit.

cat urls.txt | uro

uro-demo

Comments
  • ImportError: cannot import name 'SIGPIPE' from 'signal'

    ImportError: cannot import name 'SIGPIPE' from 'signal'

    D:\uro>uro Traceback (most recent call last): File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.2', 'console_scripts', 'uro')()) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 25, in importlib_load_entry_point return next(matches).load() File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib\metadata.py", line 77, in load module = import_module(match.group('module')) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 655, in _load_unlocked File "", line 618, in _load_backward_compatible File "", line 259, in load_module File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\site-packages\uro-0.0.2-py3.8.egg\uro\uro.py", line 4, in ImportError: cannot import name 'SIGPIPE' from 'signal' (C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\signal.py)

    opened by umar98 3
  • Error install uro

    Error install uro

    suya has the error... WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behavior with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

    I've done the steps above but haven't found a bright spot :(

    can anyone help me???

    invalid 
    opened by mjulda 2
  • When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front

    example:

    cat subs.txt | uro

    subs.txt example: site.com sub.site.com sub123.site.com

    anything without http:// or https:// in front it leaves the :// in front.

    opened by gprime31 2
  • ERROR

    ERROR

    i just can't get this to work have cloned the repo and run the install command, bur when i try "cat file.txt | uro" it dosen't work. do i have to do any additional commands? any installation video??:)

    invalid 
    opened by spector012 2
  • PLease solve this

    PLease solve this

    └─# cat params.csv | uro | wc -l Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/uro/uro.py", line 155, in main if re.search(pattern, path): File "/usr/lib/python3.9/re.py", line 201, in search return _compile(pattern, flags).search(string) File "/usr/lib/python3.9/re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.9/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.9/sre_parse.py", line 962, in parse raise source.error("unbalanced parenthesis") re.error: unbalanced parenthesis at position 68 6547

    opened by r3dpars3c 2
  • It doesn't delete paths

    It doesn't delete paths

    When we check the paths, we see that 43935989 and 43935976 are used differently.

    [email protected]:~# cat urls.txt
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    

    it should delete one of them but it doesn't.

    [email protected]:~# cat urls.txt | uro
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    
    bug 
    opened by Phoenix1112 1
  • error handling

    error handling

    So I added uro to my workflow and after a while I got this error:

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 139, in main
        if matches_patterns(path):
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 107, in matches_patterns
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 836, in _parse
        raise source.error("missing ), unterminated subpattern",
    re.error: missing ), unterminated subpattern at position 369
    

    It is happening to me with different inputs so seems to be something that happens often

    invalid 
    opened by marcelo321 1
  • Uro error

    Uro error

    λ cat newfile222.txt | uro Traceback (most recent call last): File "C:\Users\Yaseen\AppData\Local\Programs\Python\Python39\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.1', 'console_scripts', 'uro')()) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 139, in main if matches_patterns(path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 107, in matches_patterns if re.search(pattern, path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 201, in search return _compile(pattern, flags).search(string) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 836, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 379 cat: write error: No space left on device

    Can you help, it saying space issue, i have alot of space

    bug invalid 
    opened by hellofresh01 1
  • Improvement Request

    Improvement Request

    Hi Somdev,

    1. I'd like to suggest you add the following extensions to be blacklisted. I have gathered all of these extensions manually and I think It would be nice to omit them:
    'svg','img','gif','mp4','flv','ogv','webm','webp','mov','mp3','m4a','m4p','ppt','pptx','pdf','scss','tif','tiff','ttf','otf','woff','woff2','eot','htc','swf','rtf','image'
    
    1. Also, I would like to ask for white-listing and allowing the js extension as there are lots of interesting features/endpoints to be found on them and I don't think if they are considered "useless".

    Thanks!

    Kind Regards, HolyBugx

    enhancement 
    opened by HolyBugx 1
  • More extension to declutter

    More extension to declutter

    Maybe it can be useful to add this extension to the one to declutter, at least, it's what I usually do:

    .doc
    .docx
    .mp3
    .mp4
    .exe
    .tif
    .ttf
    .woff
    .woff2
    .ico
    .zip
    
    duplicate 
    opened by leorac 0
  • Bad character range P-C at position 31

    Bad character range P-C at position 31

    cat urls.txt | uro

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 155, in main
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 598, in _parse
        raise source.error(msg, len(this) + 1 + len(that))
    re.error: bad character range P-C at position 31
    
    bug 
    opened by remonsec 0
  • uro error

    uro error

    cat urls.txt | uro > test

    Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/uro/uro.py", line 123, in main for line in sys.stdin: File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

    @s0md3v

    bug 
    opened by Iamsajidkhan 0
  • Error

    Error

    Traceback (most recent call last): File "/usr/local/bin/uro", line 33, in sys.exit(load_entry_point('uro==0.0.4', 'console_scripts', 'uro')()) File "/usr/local/bin/uro", line 25, in importlib_load_entry_point return next(matches).load() StopIteration

    opened by umarahmad125 0
  • broken pipe

    broken pipe

    I have been encountering this issue:

      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 151, in main
        print(host + path + dict_to_params(param))
    BrokenPipeError: [Errno 32] Broken pipe
    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 161, in main
        print(host + path)
    BrokenPipeError: [Errno 32] Broken pipe
    

    Any idea why would it be?

    opened by marcelo321 0
  • enhanced filtration

    enhanced filtration

    like i want to filter "/A/embed?url=" or "/B/embed?url=" which return similar data like i want to filter "/A.php" or "/A.php/" which return similar data

    enhancement 
    opened by LztCode 1
Releases(0.0.4)
  • 0.0.4(Mar 19, 2022)

  • 0.0.3(Feb 27, 2022)

    • removed redundant imports and code
    • added more extensions to blacklist
    • less memory and time consumption
    • fixed 'broken pipe' error when piping the output to utilities like head
    • fixed an error where similar urls were not getting filtered when they had any parameters
    Source code(tar.gz)
    Source code(zip)
  • 0.0.2(Sep 1, 2021)

Owner
Somdev Sangwan
I make things, I break things and I make things that break things.
Somdev Sangwan
A python code for url redirect check

A python code for url redirect check

Fayas Noushad 1 Oct 24, 2021
A url redirect status check module for python

A url redirect status check module for python

Fayas Noushad 2 Oct 24, 2021
A tool to manage the base URL of the Python package index.

chpip A tool to manage the base URL of the Python package index. Installation $ pip install chpip Usage Set pip index URL Set the base URL of the Pyth

Prodesire 4 Dec 20, 2022
A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.

webargs Homepage: https://webargs.readthedocs.io/ webargs is a Python library for parsing and validating HTTP request objects, with built-in support f

marshmallow-code 1.3k Jan 01, 2023
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirectio

JAYAKUMAR 28 Sep 11, 2022
Shorten-Link - Make shorten URL with Cuttly API

Shorten-Link This Script make shorten URL with custom slashtag The script take f

Ahmed Hossam 3 Feb 13, 2022
A teeny Tiny module to check URLs against discord's list of phishing domains

A teeny Tiny module to check URLs against discord's list of phishing domains

kaj 1 Aug 29, 2022
Fast pattern fetcher, Takes a URLs list and outputs the URLs which contains the parameters according to the specified pattern.

Fast Pattern Fetcher (fpf) Coded with 3 by HS Devansh Raghav Fast Pattern Fetcher, Takes a URLs list and outputs the URLs which contains the paramete

whoami security 5 Feb 20, 2022
a little project to make custom discord invites over a url

custom-dc-invite a little project to make custom discord invites over a url how it works you create a account for

baum1810 2 Oct 03, 2022
Astra is a tool to find URLs and secrets.

Astra finds urls, endpoints, aws buckets, api keys, tokens, etc from a given url/s. It combines the paths and endpoints with the given domain and give

Stinger 198 Dec 27, 2022
C++ library for urlencode.

liburlencode C library for urlencode.

Khaidi Chu 6 Oct 31, 2022
Simple python library to deal with URI Templates.

uritemplate Documentation -- GitHub -- Travis-CI Simple python library to deal with URI Templates. The API looks like from uritemplate import URITempl

Hyper 210 Dec 19, 2022
ShortenURL-model - The model layer class for shorten url service

ShortenURL Model The model layer class for shorten URL service Usage Complete th

TwinIsland 1 Jan 07, 2022
Use this module to detect if a URL is on discord's phishing list.

PhishDetector This module was made so you can check a URL and see if it's in discord's official list of phishing and suspicious URLs. Installation pip

Elijah 4 Mar 25, 2022
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

13 Apr 14, 2022
A url shortner written in Flask.

url-shortener-elitmus This is a simple flask app which takes an URL and shortens it. This shortened verion of the URL redirects to the user to the lon

2 Nov 23, 2021
UDdup - URLs Deduplication Tool

UDdup - URLs Deduplication Tool The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive

Rotem Reiss 128 Dec 21, 2022
a url shortener project from semicolonworld

Url Shortener With Django Written by Semicolon World

3 Aug 24, 2021
coURLan: Clean, filter, normalize, and sample URLs

coURLan: Clean, filter, normalize, and sample URLs Why coURLan? “Given that the bandwidth for conducting crawls is neither infinite nor free, it is be

Adrien Barbaresi 20 Dec 14, 2022
python3 flask based python-url-shortener microservice.

python-url-shortener This repository is for managing all public/private entity specific api endpoints for an organisation. In this case we have entity

Asutosh Parida 1 Oct 18, 2021