Scans pdfs for links written in plaintext and checks if they are active or returns an error code.

Overview

linkrot logo

Introduction

Scans pdfs for links written in plaintext and checks if they are active or returns an error code. It then generates a report of its findings. Extract references (pdf, url, doi, arxiv) and metadata from a PDF.

Features

  • Extract references and metadata from a given PDF.
  • Detects pdf, url, arxiv and doi references.
  • Checks for valid SSL certificate.
  • Find broken hyperlinks (using the -c flag).
  • Output as text or JSON (using the -j flag).
  • Extract the PDF text (using the --text flag).
  • Use as command-line tool or Python package.
  • Works with local and online pdfs.

Installation

Grab a copy of the code with pip:

pip install linkrot

Usage

linkrot can be used to extract info from a PDF in two ways:

  • Command line/Terminal tool linkrot
  • Python library import linkrot

1. Command Line/Terminal tool

linkrot [pdf-file-or-url]

Run linkrot -h to see the help output:

linkrot -h

usage:

linkrot [-h] [-d OUTPUT_DIRECTORY] [-c] [-j] [-v] [-t] [-o OUTPUT_FILE] [--version] pdf

Extract metadata and references from a PDF, and optionally download all referenced PDFs.

Arguments

positional arguments:

pdf (Filename or URL of a PDF file)

optional arguments:

-h, --help            (Show this help message and exit)  
-d OUTPUT_DIRECTORY,  --download-pdfs OUTPUT_DIRECTORY (Download all referenced PDFs into specified directory)  
-c, --check-links     (Check for broken links)  
-j, --json            (Output infos as JSON (instead of plain text))  
-v, --verbose         (Print all references (instead of only PDFs))  
-t, --text            (Only extract text (no metadata or references))  
-o OUTPUT_FILE,        --output-file OUTPUT_FILE (Output to specified file instead of console)  
--version             (Show program's version number and exit)  

Examples

Extract text to console

linkrot https://example.com/example.pdf -t

Extract text to file

linkrot https://example.com/example.pdf -t -o pdf-text.txt

Check Links

linkrot https://example.com/example.pdf -c

2. Main Python Library

Import the library:

import linkrot

Create an instance of the linkrot class like so:

pdf = linkrot.linkrot("filename-or-url.pdf") #pdf is the instance of the linkrot class

Now the following function can be used to extract specific data from the pdf:

get_metadata()

Arguments: None

Usage:

metadata = pdf.get_metadata() #pdf is the instance of the linkrot class

Return type: Dictionary

Information Provided: All metadata, secret metadata associated with the PDF including Creation date, Creator, Title, etc...

get_text()

Arguments: None

Usage:

text = pdf.get_text() #pdf is the instance of the linkrot class

Return type: String

Information Provided: The entire content of the PDF in string form.

get_references(reftype=None, sort=False)

Arguments:

reftype: The type of reference that is needed 
	 values: 'pdf', 'url', 'doi', 'arxiv'. 
	 default: Provides all reference types.

sort: Whether reference should be sorted or not
      values: True or False. 
      default: Is not sorted.

Usage:

references_list = pdf.get_references() #pdf is the instance of the linkrot class

Return type: Set of

linkrot.backends.Reference object has 3 member variables:
- ref: actual URL/PDF/DOI/ARXIV
- reftype: type of reference
- page: page on which it was referenced

Information Provided: All references with their corresponding type and page number.

get_references_as_dict(reftype=None, sort=False)

Arguments:

reftype: The type of reference that is needed 
	 values: 'pdf', 'url', 'doi', 'arxiv'. 
	 default: Provides all reference types.

sort: Whether reference should be sorted or not
      values: True or False. 
      default: Is not sorted.

Usage:

references_dict = pdf.get_references_as_dict() #pdf is the instance of the linkrot class

Return type: Dictionary with keys 'pdf', 'url', 'doi', 'arxiv' that each have a list of refs of that type.

Information Provided: All references in their corresponding type list.

download_pdfs(target_dir)

Arguments:

target_dir: The path of the directory to which the reference pdfs should be downloaded 

Usage:

pdf.download_pdfs("target-directory") #pdf is the instance of the linkrot class

Return type: None

Information Provided: Downloads all the reference pdfs to specified directory.

3. Linkrot downloader functions

Import:

from linkrot.downloader import sanitize_url, get_status_code, check_refs

sanitize_url(url)

Arguments:

url: The url to be sanitized.

Usage:

new_url = sanitize_url(old_url) 

Return type: String

Information Provided: URL is prefixed with 'http://' if it was not before and makes sure it is in utf-8 format.

get_status_code(url)

Arguments:

url: The url to be checked for its status. 

Usage:

status_code = get_status_code(url) 

Return type: String

Information Provided: Checks if the url is active or broken.

check_refs(refs, verbose=True, max_threads=MAX_THREADS_DEFAULT)

Arguments:

refs: set of linkrot.backends.Reference objects
verbose: whether it should print every reference with its code or just the summary of the link checker
max_threads: number of threads for multithreading

Usage:

check_refs(pdf.get_references()) #pdf is the instance of the linkrot class

Return type: None

Information Provided: Prints references with their status code and a summary of all the broken/active links on terminal.

4. Linkrot extractor functions

Import:

from linkrot.extractor import extract_urls, extract_doi, extract_arxiv

Get pdf text:

text = pdf.get_text() #pdf is the instance of the linkrot class

extract_urls(text)

Arguments:

text: String of text to extract urls from

Usage:

urls = extract_urls(text)

Return type: Set of URLs

Information Provided: All URLs in the text

extract_arxiv(text)

Arguments:

text: String of text to extract arxivs from

Usage:

arxiv = extract_arxiv(text)

Return type: Set of arxivs

Information Provided: All arxivs in the text

extract_doi(text)

Arguments:

text: String of text to extract dois from

Usage:

doi = extract_doi(text)

Return type: Set of dois

Information Provided: All dois in the text

Code of Conduct

To view our code of conduct please visit our Code of Conduct page.

License

This program is licensed with an MIT License.

Comments
  • xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 55, column 10

    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 55, column 10

    Receive this error when I run the file. Traceback below. File Attached.

    Traceback (most recent call last): File "c:\python38\lib\runpy.py", line 193, in _run_module_as_main return run_code(code, main_globals, None, File "c:\python38\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Python38\Scripts\linkrot.exe_main.py", line 7, in File "c:\python38\lib\site-packages\linkrot\cli.py", line 182, in main pdf = linkrot.linkrot(args.pdf) File "c:\python38\lib\site-packages\linkrot_init.py", line 131, in init self.reader = PDFMinerBackend(self.stream) File "c:\python38\lib\site-packages\linkrot\backends.py", line 213, in init self.metadata.update(xmp_to_dict(metadata)) File "c:\python38\lib\site-packages\linkrot\libs\xmp.py", line 92, in xmp_to_dict return XmpParser(xmp).meta File "c:\python38\lib\site-packages\linkrot\libs\xmp.py", line 41, in init self.tree = ET.XML(xmp) File "c:\python38\lib\xml\etree\ElementTree.py", line 1320, in XML parser.feed(text) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 55, column 10

    ah-5.pdf

    bug help wanted good first issue hacktoberfest python 
    opened by marshalmiller 11
  • Remove Python 2 checks and functionality.

    Remove Python 2 checks and functionality.

    Keeping support for Python 2 might be slowing down some of the process. Of more concern is that in order to patch vulnerabilities that exist in some libraries Python 2 depends on, we have had to cut support for some versions of Python 3. Specifically 3.6,3.7. 3.7 is still fairly widely used and I think I'd prefer to remove Python 2 support and bring back 3.7. Even though it's clearly a bigger task.

    enhancement help wanted good first issue dependencies python 
    opened by marshalmiller 10
  • Move from `requirements.txt`, `requirements_dev.txt`, `setup.cfg`, and `setup.py` to `pyproject.toml`.

    Move from `requirements.txt`, `requirements_dev.txt`, `setup.cfg`, and `setup.py` to `pyproject.toml`.

    Is your feature request related to a problem? Please describe. Hey @marshalmiller. As you may already know, the use of setup.cfg, setup.py, and requirements.txt files is quite outdated. Because of PEP 517, PEP 660, and PEP 631, the packaging is now being standardized on the usage of the pyproject.toml file.

    Describe the solution you'd like Given the above info, the project packaging should add support for pyproject.toml.

    Describe alternatives you've considered Not available.

    Additional context That's pretty much it. What do you think? Also, I would like to work on this issue.

    enhancement hacktoberfest python 
    opened by wiseaidev 7
  • (Bug) AttributeError: 'NoneType' object has no attribute 'findall'

    (Bug) AttributeError: 'NoneType' object has no attribute 'findall'

    Describe the bug Certain PDFs give Attribute Error

    To Reproduce Steps to reproduce the behavior:

    1. Download Research_Ethics.pdf
    2. Open terminal and run:
    linkrot <path_to_above_file>
    

    Expected behavior It should generate the expected linkrot report.

    Screenshots Screenshot from 2021-10-12 23-37-47

    bug help wanted hacktoberfest 
    opened by aditirao7 7
  • Add Link Archiving

    Add Link Archiving

    I'd like to add a feature that takes all links that are verified to be active and add them to the Internet Archive Wayback Machine to preserve them in time. There is a draft python script in lib called archive.py. The idea is that you navigate to https://web.archive.org/save/{url} the service automatically archives that page. So after verifying that it returns a valid code, we would just connect to all of those sites and it would create a snapshot. I'd love for this to be an optional argument like -a or something. This way it is optional and we don't take more resources than we need. Anyone able to complete this task, please take a stab at it.

    enhancement help wanted good first issue hacktoberfest python 
    opened by marshalmiller 6
  • UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 0: character maps to <undefined>.

    UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 0: character maps to .

    Receiving this error when running the file. Traceback Below. File Attached.

    > Traceback (most recent call last):
    >   File "c:\python38\lib\runpy.py", line 193, in _run_module_as_main
    >     return _run_code(code, main_globals, None,
    >   File "c:\python38\lib\runpy.py", line 86, in _run_code
    >     exec(code, run_globals)
    >   File "C:\Python38\Scripts\linkrot.exe\__main__.py", line 7, in <module>
    >   File "c:\python38\lib\site-packages\linkrot\cli.py", line 182, in main
    >     pdf = linkrot.linkrot(args.pdf)
    >   File "c:\python38\lib\site-packages\linkrot\__init__.py", line 131, in __init__
    >     self.reader = PDFMinerBackend(self.stream)
    >   File "c:\python38\lib\site-packages\linkrot\backends.py", line 204, in __init__
    >     self.metadata[k] = make_compat_str(v)
    >   File "c:\python38\lib\site-packages\linkrot\backends.py", line 67, in make_compat_str
    >     out_str = in_str.decode(enc["encoding"])
    >   File "c:\python38\lib\encodings\cp1254.py", line 15, in decode
    >     return codecs.charmap_decode(input,errors,decoding_table)
    > UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 0: character maps to <undefined>
    

    ah-1.pdf

    bug help wanted hacktoberfest python 
    opened by marshalmiller 5
  • (Update) documentation for python library usage

    (Update) documentation for python library usage

    The main documentation needs to be updated to include the usage of linkrot as a python library as well. Some of it can be found in the docstrings of this file.

    enhancement 
    opened by aditirao7 5
  • Separate code from data

    Separate code from data

    Is your feature request related to a problem? Please describe.

    The current size of the repo is too big because of pdf data samples:

    ➜  du -sh * | sort -h
    4.0K	CONTRIBUTING.md
    4.0K	LICENSE
    4.0K	Makefile
    4.0K	pyproject.toml
    4.0K	SECURITY.md
    8.0K	code_of_conduct.md
    8.0K	README.md
    44K	branding
    68K	linkrot
    1.7M	tests
    919M	Random PDF Samples
    

    Describe the solution you'd like I suggest either storing the pdf files in a separate repo or on a cloud provider's bucket.

    Describe alternatives you've considered Not available.

    Additional context That's pretty much. I am currently working on this issue.

    documentation enhancement hacktoberfest 
    opened by wiseaidev 4
  • Add Link Check Results to CLI Output

    Add Link Check Results to CLI Output

    Right now, if you use the -o argument to export the results to a text file, the document metadata and the list of links are the only components listed. I would like to add the results of the link check to this output as well.

    enhancement help wanted good first issue hacktoberfest python hacktoberfest-accepted 
    opened by marshalmiller 4
  • Displays Page Number Wrong in Results

    Displays Page Number Wrong in Results

    When it returns the results of links that it tests, it gives a list of the links, along with a page number. The page number would appear to be the page the link was found on but it is actually just the total number of pages in the PDF. It would be extremely helpful if we could get it to display the correct page number.

    bug enhancement help wanted hacktoberfest python hacktoberfest-accepted 
    opened by marshalmiller 4
  • Update Tests

    Update Tests

    The tests written for this repo were developed during the very early stages of this project. I don't think they are a great representation of where the project is now. I'd love to have them updated to be more rigorous and keep the quality of the project high.

    enhancement help wanted good first issue hacktoberfest python 
    opened by marshalmiller 2
  • Update ReadMe to Include Changes from Hacktoberfest.

    Update ReadMe to Include Changes from Hacktoberfest.

    We have had a lot of great improvements already during Hacktoberfest. I will update the ReadMe with all the changes once the event is over, if not before.

    documentation enhancement hacktoberfest 
    opened by marshalmiller 3
  • Consider Replacing Threadpool with Redis

    Consider Replacing Threadpool with Redis

    Given the performance and timeout issues with the flask app, I am wondering if I should be replacing the current thread pool with a Redis model, as suggested by other forums and Heroku.

    https://python-rq.org/

    enhancement help wanted dependencies hacktoberfest python 
    opened by marshalmiller 2
Releases(3.9.5)
  • 3.9.5(Oct 3, 2022)

    What's Changed

    • Add test cases for detecting embedded URLs by @marwansalem in https://github.com/marshalmiller/linkrot/pull/161
    • rm Random PDF Samples by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/163
    • updated .gitignore, added mega.py, rm pdfs, cleanups by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/164
    • cleanup python 2 syntax by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/165

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.9.4...3.9.5

    Source code(tar.gz)
    Source code(zip)
  • 3.9.4(Oct 2, 2022)

    What's Changed

    • Migrating from setup.py to pyproject.toml by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/149
    • Upgrade to PyProject by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/156
    • add missing dependencies by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/158
    • add missing cli entry point by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/157
    • handle UnicodeDecode exception by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/159

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.9.3...3.9.4

    Source code(tar.gz)
    Source code(zip)
  • 3.9.3(Oct 2, 2022)

    What's Changed

    • Resolved Add Link Archiving #102 by @mailtodanish in https://github.com/marshalmiller/linkrot/pull/150
    • add etree xml_parser to ignore invalid tags by @wiseaidev in https://github.com/marshalmiller/linkrot/pull/155

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.9.2...3.9.3

    Source code(tar.gz)
    Source code(zip)
  • 3.9.2(Oct 1, 2022)

    What's Changed

    • Fix the page number error, in the link checker by @ajratnam in https://github.com/marshalmiller/linkrot/pull/147
    • Add Link Check Results to CLI Output #120 by @mailtodanish in https://github.com/marshalmiller/linkrot/pull/145

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.9.1...3.9.2

    Source code(tar.gz)
    Source code(zip)
  • 3.9.1(Oct 1, 2022)

    What's Changed

    • Bump mypy from 0.971 to 0.981 by @dependabot in https://github.com/marshalmiller/linkrot/pull/142
    • Bump coverage from 6.4.4 to 6.5.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/143
    • Resolved Add DOIs to References Summary #128 by @mailtodanish in https://github.com/marshalmiller/linkrot/pull/144
    • Remove numpy import by @ajratnam in https://github.com/marshalmiller/linkrot/pull/146

    New Contributors

    • @mailtodanish made their first contribution in https://github.com/marshalmiller/linkrot/pull/144
    • @ajratnam made their first contribution in https://github.com/marshalmiller/linkrot/pull/146

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.9...3.9.1

    Source code(tar.gz)
    Source code(zip)
  • 3.9(Sep 25, 2022)

    What's Changed

    • Bump flake8 from 5.0.3 to 5.0.4 by @dependabot in https://github.com/marshalmiller/linkrot/pull/131
    • Bump coverage from 6.4.2 to 6.4.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/132
    • Bump numpy from 1.23.1 to 1.23.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/133
    • Bump coverage from 6.4.3 to 6.4.4 by @dependabot in https://github.com/marshalmiller/linkrot/pull/134
    • Bump pylint from 2.14.5 to 2.15.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/135
    • Bump black from 22.6.0 to 22.8.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/136
    • Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/137
    • Bump pylint from 2.15.0 to 2.15.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/138
    • Bump numpy from 1.23.2 to 1.23.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/139
    • Bump pylint from 2.15.2 to 2.15.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/141
    • Resolve issue130 by @westofwest in https://github.com/marshalmiller/linkrot/pull/140

    New Contributors

    • @westofwest made their first contribution in https://github.com/marshalmiller/linkrot/pull/140

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.8.8...3.9

    Source code(tar.gz)
    Source code(zip)
  • 3.8.8(Aug 2, 2022)

  • 3.8.5(Aug 2, 2022)

    What's Changed

    • Bump flake8 from 5.0.1 to 5.0.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/129

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.8.4...3.8.5

    Source code(tar.gz)
    Source code(zip)
  • 3.5(Jun 1, 2022)

    What's Changed

    • Bump mypy from 0.910 to 0.920 by @dependabot in https://github.com/marshalmiller/linkrot/pull/71
    • Bump mypy from 0.920 to 0.930 by @dependabot in https://github.com/marshalmiller/linkrot/pull/73
    • Bump mypy from 0.930 to 0.931 by @dependabot in https://github.com/marshalmiller/linkrot/pull/75
    • Bump mccabe from 0.6.1 to 0.7.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/76
    • Bump coverage from 6.2 to 6.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/77
    • Bump black from 21.12b0 to 22.1.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/78
    • Bump coverage from 6.3 to 6.3.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/79
    • Bump pytest from 6.2.5 to 7.0.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/80
    • Bump pytest from 7.0.0 to 7.0.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/81
    • Bump coverage from 6.3.1 to 6.3.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/82
    • Bump pytest from 7.0.1 to 7.1.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/84
    • Bump mypy from 0.931 to 0.940 by @dependabot in https://github.com/marshalmiller/linkrot/pull/83
    • Bump mypy from 0.940 to 0.941 by @dependabot in https://github.com/marshalmiller/linkrot/pull/85
    • Bump pytest from 7.1.0 to 7.1.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/86
    • Bump pdfminer-six from 20211012 to 20220319 by @dependabot in https://github.com/marshalmiller/linkrot/pull/87
    • Bump mypy from 0.941 to 0.942 by @dependabot in https://github.com/marshalmiller/linkrot/pull/88
    • Bump pylint from 2.12.2 to 2.13.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/89
    • Bump pylint from 2.13.0 to 2.13.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/90
    • Bump black from 22.1.0 to 22.3.0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/91
    • Bump pylint from 2.13.2 to 2.13.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/92
    • Bump pylint from 2.13.3 to 2.13.4 by @dependabot in https://github.com/marshalmiller/linkrot/pull/93
    • Bump pylint from 2.13.4 to 2.13.5 by @dependabot in https://github.com/marshalmiller/linkrot/pull/94
    • Bump pylint from 2.13.5 to 2.13.7 by @dependabot in https://github.com/marshalmiller/linkrot/pull/95
    • Bump pytest from 7.1.1 to 7.1.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/96
    • Bump mypy from 0.942 to 0.950 by @dependabot in https://github.com/marshalmiller/linkrot/pull/97
    • Bump pylint from 2.13.7 to 2.13.8 by @dependabot in https://github.com/marshalmiller/linkrot/pull/98
    • Bump pdfminer-six from 20220319 to 20220506 by @dependabot in https://github.com/marshalmiller/linkrot/pull/99
    • Bump coverage from 6.3.2 to 6.3.3 by @dependabot in https://github.com/marshalmiller/linkrot/pull/100
    • Bump pylint from 2.13.8 to 2.13.9 by @dependabot in https://github.com/marshalmiller/linkrot/pull/101
    • Bump coverage from 6.3.3 to 6.4 by @dependabot in https://github.com/marshalmiller/linkrot/pull/103
    • Bump pdfminer-six from 20220506 to 20220524 by @dependabot in https://github.com/marshalmiller/linkrot/pull/104
    • Bump mypy from 0.950 to 0.960 by @dependabot in https://github.com/marshalmiller/linkrot/pull/105
    • A fix for: Exclude Email Addresses #106 by @marwansalem in https://github.com/marshalmiller/linkrot/pull/107

    New Contributors

    • @marwansalem made their first contribution in https://github.com/marshalmiller/linkrot/pull/107

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/3.4...3.5

    Source code(tar.gz)
    Source code(zip)
  • 3.4(Dec 11, 2021)

    What's Changed

    • Added documentation for library by @aditirao7 in https://github.com/marshalmiller/linkrot/pull/41
    • fix(downloader.py): change string comparison to use regex by @sousatg in https://github.com/marshalmiller/linkrot/pull/42
    • Bump flake8 from 4.0.0 to 4.0.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/43
    • Bump coverage from 6.0.1 to 6.0.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/44
    • Bump pdfminer-six from 20201018 to 20211012 by @dependabot in https://github.com/marshalmiller/linkrot/pull/46
    • Bring up to date by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/47
    • Replace pagenos with a safe default value by @alanyee in https://github.com/marshalmiller/linkrot/pull/48
    • Staging to Main 10-17-2021 by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/49
    • Start testing for Python 3.10 by @alanyee in https://github.com/marshalmiller/linkrot/pull/50
    • Checking the rdftree before parsing the metadata #45 by @rosdyana in https://github.com/marshalmiller/linkrot/pull/51
    • Staging by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/52
    • Bump black from 21.9b0 to 21.10b0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/55
    • Bump coverage from 6.0.2 to 6.1.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/54
    • Add comments to colorprint.py by @vacom13 in https://github.com/marshalmiller/linkrot/pull/56
    • Bump coverage from 6.1.1 to 6.1.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/57
    • Bump black from 21.10b0 to 21.11b0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/58
    • Add Comments to cli.py by @vacom13 in https://github.com/marshalmiller/linkrot/pull/60
    • Bump black from 21.11b0 to 21.11b1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/59
    • Bump pylint from 2.11.1 to 2.12.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/61
    • Bump coverage from 6.1.2 to 6.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/63
    • Bump black from 21.11b1 to 21.12b0 by @dependabot in https://github.com/marshalmiller/linkrot/pull/67
    • Bump pylint from 2.12.1 to 2.12.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/66

    New Contributors

    • @sousatg made their first contribution in https://github.com/marshalmiller/linkrot/pull/42
    • @alanyee made their first contribution in https://github.com/marshalmiller/linkrot/pull/48
    • @rosdyana made their first contribution in https://github.com/marshalmiller/linkrot/pull/51
    • @vacom13 made their first contribution in https://github.com/marshalmiller/linkrot/pull/56

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/2.1.1...3.4

    Source code(tar.gz)
    Source code(zip)
  • 2.3(Oct 24, 2021)

    What's Changed

    • Added documentation for library by @aditirao7 in https://github.com/marshalmiller/linkrot/pull/41
    • fix(downloader.py): change string comparison to use regex by @sousatg in https://github.com/marshalmiller/linkrot/pull/42
    • Bump flake8 from 4.0.0 to 4.0.1 by @dependabot in https://github.com/marshalmiller/linkrot/pull/43
    • Bump coverage from 6.0.1 to 6.0.2 by @dependabot in https://github.com/marshalmiller/linkrot/pull/44
    • Bump pdfminer-six from 20201018 to 20211012 by @dependabot in https://github.com/marshalmiller/linkrot/pull/46
    • Bring up to date by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/47
    • Replace pagenos with a safe default value by @alanyee in https://github.com/marshalmiller/linkrot/pull/48
    • Staging to Main 10-17-2021 by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/49
    • Start testing for Python 3.10 by @alanyee in https://github.com/marshalmiller/linkrot/pull/50
    • Checking the rdftree before parsing the metadata #45 by @rosdyana in https://github.com/marshalmiller/linkrot/pull/51
    • Staging by @marshalmiller in https://github.com/marshalmiller/linkrot/pull/52

    New Contributors

    • @sousatg made their first contribution in https://github.com/marshalmiller/linkrot/pull/42
    • @alanyee made their first contribution in https://github.com/marshalmiller/linkrot/pull/48
    • @rosdyana made their first contribution in https://github.com/marshalmiller/linkrot/pull/51

    Full Changelog: https://github.com/marshalmiller/linkrot/compare/2.1.1...2.3

    Source code(tar.gz)
    Source code(zip)
Owner
Marshal Miller
Marshal Miller
A tool for certificate PDF generation.

certificate-pdf-generator 获奖证书PDF批量生成工具 | a Tool for certificate PDF generation. ⚠️ 下载前请注意 本项目使用了LFS来存储PDF等大文件。在克隆或下载本仓库前,请先使用apt等包管理器安装git-lfs包。如果已经克

Wanghao Xu 4 Nov 28, 2022
Convert Lecture Videos to PDF

Convert Lecture Videos to PDF Description Want to go through lecture videos faster without missing any information? Wish you can read the lecture vide

Emilio Kartono 20 Nov 25, 2022
PDFSanitizer - Renders possibly unsafe PDF files and outputs harmless PDF files

PDFSanitizer Renders possibly malicious PDF files and outputs harmless PDF files

9 Jan 30, 2022
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf

8k Jan 08, 2023
rst2pdf: Use a text editor. Make a PDF.

rst2pdf: Use a text editor. Make a PDF.

rst2pdf 487 Jan 06, 2023
CLI tool to generate pdf invoices written in python

invoicepy CLI invoice tool, store and print invoices as pdf. save companies and customers for later use. installation pip install invoicepy config co

Adam Wojtczak 9 Aug 01, 2022
WeasyPrint is a smart solution helping web developers to create PDF documents.

WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…

Kozea 5.4k Jan 08, 2023
Program that locks/unlocks pdf files🐍

🐍 📄 PDFtools 📄 🐍 Programa que bloqueia/desbloqueia arquivos pdf Requisitos • Como usar • Capturas de Tela 🚨 Aviso 🚨 Altere os caminhos referente

João Victor Vilela dos Santos 1 Nov 04, 2021
Extract the table in the PDF,outputs the data similar to the json format

extract the table in the PDF,outputs the data similar to the json format

3 Nov 25, 2021
Svg2pdfgen - Svg To PDF gen with python

Svg2pdfgen - Svg To PDF gen with python

Robert Urbańczyk 3 May 30, 2022
x-ray is a Python library for finding bad redactions in PDF documents.

A tool to detect whether a PDF has a bad redaction

Free Law Project 73 Dec 19, 2022
A bot for PDF for doing Many Things....

Telegram PDF Bot A Telegram bot that can: Compress, crop, decrypt, encrypt, merge, preview, rename, rotate, scale and split PDF files Compare text dif

Mr. Developer 60 Dec 27, 2022
minipdf is a package for creating simple, single-page PDF documents.

minipdf minipdf is a package for creating simple, single-page PDF documents. Installation You can install the development version from GitHub with: #

mikefc 41 Dec 19, 2022
Compare-pdf - A Flask driven restful API for comparing two PDF files

COMPARE-PDF A Flask driven restful API for comparing two PDF files. Description

Karthikeyan JC 3 Mar 13, 2022
Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

Malicious PDF Generator ☠️ Generate ten different malicious pdf files with phone-home functionality. Can be used with Burp Collaborator. Used for pene

Jonas Lejon 1.9k Jan 01, 2023
pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

pystitcher pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a mark

Nemo 387 Dec 10, 2022
PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files.

PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files

Matthew Stamy 5k Jan 04, 2023
Produce pdf in python backend from simple bootstrap vue frontend and download to browser

vollmacht produce pdf in python backend from simple bootstrap vue frontend and download to browser Frontend in one file with bootstrap-vue (allthough

Otto 1 Nov 08, 2020
Convert PDF to AudioBook and Audio Speech to PDF

In this Python project, we will build a GUI-based PDF to Audio and Audio to PDF converter using the Tkinter, OS, path, pyttsx3, SpeechRecognition, PyPDF4, and Pydub libraries and the messagebox modul

RISHABH MISHRA 1 Feb 13, 2022
DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata

DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata

Frédéric BISSON 6 Jul 27, 2022