declutters url lists for crawling/pentesting

Related tags

URL Manipulationuro
Overview

uro

Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

It doesn't make any http requests to the URLs and removes:

  • human written content e.g. blog posts
  • urls with same path but parameter value difference
  • incremental urls e.g. /cat/1/ and /cat/2/
  • image, js, css and other static files

Usage

First, install uro with pip:

pip3 install uro

Now, there's just one way to use it, no args, no bullshit.

cat urls.txt | uro

uro-demo

Comments
  • ImportError: cannot import name 'SIGPIPE' from 'signal'

    ImportError: cannot import name 'SIGPIPE' from 'signal'

    D:\uro>uro Traceback (most recent call last): File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.2', 'console_scripts', 'uro')()) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 25, in importlib_load_entry_point return next(matches).load() File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib\metadata.py", line 77, in load module = import_module(match.group('module')) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 655, in _load_unlocked File "", line 618, in _load_backward_compatible File "", line 259, in load_module File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\site-packages\uro-0.0.2-py3.8.egg\uro\uro.py", line 4, in ImportError: cannot import name 'SIGPIPE' from 'signal' (C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\signal.py)

    opened by umar98 3
  • Error install uro

    Error install uro

    suya has the error... WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behavior with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

    I've done the steps above but haven't found a bright spot :(

    can anyone help me???

    invalid 
    opened by mjulda 2
  • When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front

    example:

    cat subs.txt | uro

    subs.txt example: site.com sub.site.com sub123.site.com

    anything without http:// or https:// in front it leaves the :// in front.

    opened by gprime31 2
  • ERROR

    ERROR

    i just can't get this to work have cloned the repo and run the install command, bur when i try "cat file.txt | uro" it dosen't work. do i have to do any additional commands? any installation video??:)

    invalid 
    opened by spector012 2
  • PLease solve this

    PLease solve this

    └─# cat params.csv | uro | wc -l Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/uro/uro.py", line 155, in main if re.search(pattern, path): File "/usr/lib/python3.9/re.py", line 201, in search return _compile(pattern, flags).search(string) File "/usr/lib/python3.9/re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.9/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.9/sre_parse.py", line 962, in parse raise source.error("unbalanced parenthesis") re.error: unbalanced parenthesis at position 68 6547

    opened by r3dpars3c 2
  • It doesn't delete paths

    It doesn't delete paths

    When we check the paths, we see that 43935989 and 43935976 are used differently.

    [email protected]:~# cat urls.txt
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    

    it should delete one of them but it doesn't.

    [email protected]:~# cat urls.txt | uro
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    
    bug 
    opened by Phoenix1112 1
  • error handling

    error handling

    So I added uro to my workflow and after a while I got this error:

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 139, in main
        if matches_patterns(path):
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 107, in matches_patterns
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 836, in _parse
        raise source.error("missing ), unterminated subpattern",
    re.error: missing ), unterminated subpattern at position 369
    

    It is happening to me with different inputs so seems to be something that happens often

    invalid 
    opened by marcelo321 1
  • Uro error

    Uro error

    λ cat newfile222.txt | uro Traceback (most recent call last): File "C:\Users\Yaseen\AppData\Local\Programs\Python\Python39\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.1', 'console_scripts', 'uro')()) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 139, in main if matches_patterns(path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 107, in matches_patterns if re.search(pattern, path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 201, in search return _compile(pattern, flags).search(string) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 836, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 379 cat: write error: No space left on device

    Can you help, it saying space issue, i have alot of space

    bug invalid 
    opened by hellofresh01 1
  • Improvement Request

    Improvement Request

    Hi Somdev,

    1. I'd like to suggest you add the following extensions to be blacklisted. I have gathered all of these extensions manually and I think It would be nice to omit them:
    'svg','img','gif','mp4','flv','ogv','webm','webp','mov','mp3','m4a','m4p','ppt','pptx','pdf','scss','tif','tiff','ttf','otf','woff','woff2','eot','htc','swf','rtf','image'
    
    1. Also, I would like to ask for white-listing and allowing the js extension as there are lots of interesting features/endpoints to be found on them and I don't think if they are considered "useless".

    Thanks!

    Kind Regards, HolyBugx

    enhancement 
    opened by HolyBugx 1
  • More extension to declutter

    More extension to declutter

    Maybe it can be useful to add this extension to the one to declutter, at least, it's what I usually do:

    .doc
    .docx
    .mp3
    .mp4
    .exe
    .tif
    .ttf
    .woff
    .woff2
    .ico
    .zip
    
    duplicate 
    opened by leorac 0
  • Bad character range P-C at position 31

    Bad character range P-C at position 31

    cat urls.txt | uro

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 155, in main
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 598, in _parse
        raise source.error(msg, len(this) + 1 + len(that))
    re.error: bad character range P-C at position 31
    
    bug 
    opened by remonsec 0
  • uro error

    uro error

    cat urls.txt | uro > test

    Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/uro/uro.py", line 123, in main for line in sys.stdin: File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

    @s0md3v

    bug 
    opened by Iamsajidkhan 0
  • Error

    Error

    Traceback (most recent call last): File "/usr/local/bin/uro", line 33, in sys.exit(load_entry_point('uro==0.0.4', 'console_scripts', 'uro')()) File "/usr/local/bin/uro", line 25, in importlib_load_entry_point return next(matches).load() StopIteration

    opened by umarahmad125 0
  • broken pipe

    broken pipe

    I have been encountering this issue:

      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 151, in main
        print(host + path + dict_to_params(param))
    BrokenPipeError: [Errno 32] Broken pipe
    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 161, in main
        print(host + path)
    BrokenPipeError: [Errno 32] Broken pipe
    

    Any idea why would it be?

    opened by marcelo321 0
  • enhanced filtration

    enhanced filtration

    like i want to filter "/A/embed?url=" or "/B/embed?url=" which return similar data like i want to filter "/A.php" or "/A.php/" which return similar data

    enhancement 
    opened by LztCode 1
Releases(0.0.4)
  • 0.0.4(Mar 19, 2022)

  • 0.0.3(Feb 27, 2022)

    • removed redundant imports and code
    • added more extensions to blacklist
    • less memory and time consumption
    • fixed 'broken pipe' error when piping the output to utilities like head
    • fixed an error where similar urls were not getting filtered when they had any parameters
    Source code(tar.gz)
    Source code(zip)
  • 0.0.2(Sep 1, 2021)

Owner
Somdev Sangwan
I make things, I break things and I make things that break things.
Somdev Sangwan
A tool programmed to shorten links/mask links

A tool programmed to shorten links/mask links

Anontemitayo 6 Dec 02, 2022
A URL builder for genius :D

genius-url A URL builder for genius :D Usage from gurl import genius_url

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 12 Aug 14, 2021
a url shortener project from semicolonworld

Url Shortener With Django Written by Semicolon World

3 Aug 24, 2021
Simple python library to deal with URI Templates.

uritemplate Documentation -- GitHub -- Travis-CI Simple python library to deal with URI Templates. The API looks like from uritemplate import URITempl

Hyper 210 Dec 19, 2022
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

13 Apr 14, 2022
Shorten-Link - Make shorten URL with Cuttly API

Shorten-Link This Script make shorten URL with custom slashtag The script take f

Ahmed Hossam 3 Feb 13, 2022
a little project to make custom discord invites over a url

custom-dc-invite a little project to make custom discord invites over a url how it works you create a account for

baum1810 2 Oct 03, 2022
A url shortner written in Flask.

url-shortener-elitmus This is a simple flask app which takes an URL and shortens it. This shortened verion of the URL redirects to the user to the lon

2 Nov 23, 2021
A simple URL shortener app using Python AWS Chalice, AWS Lambda and AWS Dynamodb.

url-shortener-chalice A simple URL shortener app using AWS Chalice. Please make sure you configure your AWS credentials using AWS CLI before starting

Ranadeep Ghosh 2 Dec 09, 2022
UDdup - URLs Deduplication Tool

UDdup - URLs Deduplication Tool The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive

Rotem Reiss 128 Dec 21, 2022
Extract countries, regions and cities from a URL or text

This project is no longer being maintained and has been archived. Please check the Forks list for newer versions. Forks We are aware of two 3rd party

Ushahidi 216 Nov 18, 2022
🔗 FusiShort is a URL shortener built with Python, Redis, Docker and Kubernetes

This is a playground application created with goal of applying full cycle software development using popular technologies like Python, Redis, Docker and Kubernetes.

Lucas Fusinato Zanis 7 Nov 10, 2022
🌐 URL parsing and manipulation made easy.

furl is a small Python library that makes parsing and manipulating URLs easy. Python's standard urllib and urlparse modules provide a number of URL re

Ansgar Grunseid 2.4k Jan 04, 2023
C++ library for urlencode.

liburlencode C library for urlencode.

Khaidi Chu 6 Oct 31, 2022
hugeURLer 是一个基于 Python 和 GitHub action 的短链接服务

hugeURLer 是一个基于 Python 和 GitHub action 的短链接服务 如何使用 您需要把库 clone 到本地,然后在终端执行 python3 .\src\addNewRedirection.py url ,就能创建一个指向你设置的 url 的跳转页面。

安东尼洪 2 Dec 22, 2021
A simple, immutable URL class with a clean API for interrogation and manipulation.

purl - A simple Python URL class A simple, immutable URL class with a clean API for interrogation and manipulation. Supports Pythons 2.7, 3.3, 3.4, 3.

David Winterbottom 286 Jan 02, 2023
A teeny Tiny module to check URLs against discord's list of phishing domains

A teeny Tiny module to check URLs against discord's list of phishing domains

kaj 1 Aug 29, 2022
🔗 Generate Phishing URLs 🔗

URLer 🔗 Generate Phishing URLs 🔗 URLer Table Of Contents General Information Preview Installation Disclaimer Credits Social Media Bug Report General

mrblackx 5 Feb 08, 2022
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirectio

JAYAKUMAR 28 Sep 11, 2022
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022