Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Overview

Deskew

by Marek Mauder
https://galfar.vevb.net/deskew
https://github.com/galfar/deskew

v1.30 2019-06-07

Overview

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

There are binaries built for these platforms (located in Bin folder): Win64 (deskew.exe), Win32 (deskew32.exe), Linux 64bit (deskew), macOS (deskew-mac), Linux ARMv7 (deskew-arm).

GUI frontend for this CLI tool is available as well (Windows, Linux, and macOS).

License: MIT

Downloads And Releases

https://github.com/galfar/deskew/releases
https://galfar.vevb.net/deskew#downloads

Usage

Usage:
deskew [-o output] [-a angle] [-b color] [..] input
    input:         Input image file
  Options:
    -o output:     Output image file (default: out.png)
    -a angle:      Maximal expected skew angle (both directions) in degrees (default: 10)
    -b color:      Background color in hex format RRGGBB|LL|AARRGGBB (default: black)
  Ext. options:
    -q filter:     Resampling filter used for rotations (default: linear,
                   values: nearest|linear|cubic|lanczos)
    -t a|treshold: Auto threshold or value in 0..255 (default: a)
    -r rect:       Skew detection only in content rectangle (pixels):
                   left,top,right,bottom (default: whole page)
    -f format:     Force output pixel format (values: b1|g8|rgb24|rgba32)
    -l angle:      Skip deskewing step if skew angle is smaller (default: 0.01)
    -g flags:      Operational flags (any combination of):
                   c - auto crop, d - detect only (no output to file)
    -s info:       Info dump (any combination of):
                   s - skew detection stats, p - program parameters, t - timings
    -c specs:      Output compression specs for some file formats. Several specs
                   can be defined - delimited by commas. Supported specs:
                   jXX - JPEG compression quality, XX is in range [1,100(best)]
                   tSCHEME - TIFF compression scheme: none|lzw|rle|deflate|jpeg|g4

  Supported file formats
    Input:  BMP, JPG, PNG, JNG, GIF, DDS, TGA, PBM, PGM, PPM, PAM, PFM, TIF, PSD
    Output: BMP, JPG, PNG, JNG, GIF, DDS, TGA, PGM, PPM, PAM, PFM, TIF, PSD

Notes

For TIFF support in Linux and macOS you need to have libtiff 4.x installed (package is usually called libtiff5).

For macOS you can download prebuilt libtiff binaries here: https://galfar.github.io/store/TiffLibBins-macOS.zip. Just put the files inside the archive to the same folder as deskew-mac executable.

You can find some test images in TestImages folder and scripts to run tests (RunTests.bat and runtests.sh) in Bin. By default scripts just call deskew command but you can pass a different one as a parameter (e.g. runtests.sh deskew-arm).

Bugs, Issues, Proposals

File them here:
https://github.com/galfar/deskew/issues

Version History

v1.30 2019-06-07:

  • fix #15: Better image quality after rotation - better default and also selectable nearest|linear|cubic|lanczos filtering
  • fix #5: Detect skew angle only (no rotation done) - optionally only skew detection
  • fix #17: Optional auto-crop after rotation
  • fix #3: Command line option to set output compression - now for TIFF and JPEG
  • fix #12: Bad behavior when an output is given and no deskewing is needed
  • libtiff in macOS is now picked up also when binaries are put directly in the directory with deskew
  • text output is flushed after every write (Linux/Unix): it used to be flushed only when writing to device but not file/pipe.

v1.25 2018-05-19:

  • fix #6: Preserve DPI measurement system (TIFF)
  • fix #4: Output image not saved in requested format (when deskewing is skipped)
  • dynamic loading of libtiff library - adds TIFF support in macOS when libtiff is installed

v1.21 2017-11-01:

  • fix #8: Cannot compile in Free Pascal 3.0+ (Windows) - Fails to link precompiled LibTiff library
  • fix #7: Windows FPC build fails with Access violation exception when loading certain TIFFs (especially those saved by Windows Photo Viewer etc.)

v1.20 2016-09-01:

  • much faster rotation, especially when background color is set (>2x faster, 2x less memory)
  • can skip deskewing step if detected skew angle is lower than parameter
  • new option for timing of individual steps
  • fix: crash when last row of page is classified as text
  • misc: default back color is now opaque black, new forced output format "rgb24", background color can define also alpha channel, nicer formatting of text output

v1.10 2014-03-04:

  • TIFF support for Win64 and 32/64bit Linux
  • forced output formats
  • fix: output file names were always lowercase
  • fix: preserves resolution metadata (e.g. 300dpi) of input when writing output

v1.00 2012-06-04:

  • background color
  • "area of interest" content rectangle
  • 64bit and Mac OSX support
  • PSD and TIFF (win32) support
  • show skew detection stats and program parameters

v0.95 2010-12-28:

  • Added auto thresholding

v0.90 2010-02-12:

  • Initial version

Compiling Deskew

Deskew is written in Object Pascal. You need Free Pascal or Delphi to recompile it.

Tested Compilers

There are project files for these IDEs:

  1. Lazarus 2.0.10 (deskew.lpi)
  2. Delphi XE + 10.3 (deskew.dproj)

Additionally, there are compile shell/batch scripts for standalone FPC compiler in Scripts folder.

Supported/Tested Platforms

Deskew is precompiled and was tested on these platforms: Win32, Win64, Linux 64bit, macOS 64bit, Linux ARMv7

Source Code

Latest source code can be found here:
https://github.com/galfar/deskew

Dependencies

Vampyre Imaging Library is needed for compilation and it's included in Deskew's repo in Imaging folder.

Comments
  • Detect skew angle only (no rotation done)

    Detect skew angle only (no rotation done)

    Original report by Marek Mauder (Bitbucket: galfar, GitHub: galfar).


    As requested on blog:

    Can you explain how to simply find the angle but not rotate using this tool? Since I’m dealing with archival TIFFs I need to keep the DPI and embedded metadata in place, so I’m thinking I would use ImageMagick to rotate once I have the angle. Thanks.

    Answer:

    For now you could use -l parameter: -l angle: Skip deskewing step if skew angle is smaller And use some large threshold so rotation will always be skipped.

    $deskew -l 80 Sken003.png
    ...
    Preparing input image (Sken003.png) ...
    Calculating skew angle...
    Skew angle found: 0.23
    Skipping deskewing step, skew angle lower than threshold of 80.00
    Done!
    

    For next version I plan to modify this: angle is optional and if omitted rotation is always skipped.

    major DeskewCmdLine proposal 
    opened by galfar 7
  • Please clarify licensing situation

    Please clarify licensing situation

    README says that the license is MIT, but the source files seem to all claim MPL/LGPL. What is actual license of this code? Can you please distribute a license file?

    This came up at AUR: https://aur.archlinux.org/packages/deskew-git/

    opened by ctrlcctrlv 5
  • Parameters on DeskewGui v0.90

    Parameters on DeskewGui v0.90

    Hello,

    After doing some minor testing with the default and the lanczos filters, I agree that the quality under lanczos is noticeably better, and in my old computer, it doesn't take much longer to process (I would say that a couple of seconds per page).

    In conclusion, I'd like to use lanczos from now on, but DeskewGui doesn't allow me, to the best of my knowledge, to include parameters when I call deskew.exe.

    Is there way I can tell the programme that I want to use lanczos?

    Thanks!

    DeskewGui 
    opened by vivadavid 5
  • EImagingError: Error while loading images from file.... Exception message: Access violation

    EImagingError: Error while loading images from file.... Exception message: Access violation

    Original report by Anonymous.


    When trying to deskew landscape images on Windows, the error message:

    EImagingError: Error while loading images from file "name of file" <format: tif> Exception message: Access violation

    appears.

    bug major 
    opened by galfar 4
  • Better image quality after rotation

    Better image quality after rotation

    Original report by Marek Mauder (Bitbucket: galfar, GitHub: galfar).


    Received several "complaints" about images after rotation step to be a bit blurry (compared to doing rotation in ImageMagick etc.).

    Current "speed over quality" rotation algorithm used in Deskew must be replaced/supplemented with another one.

    enhancement DeskewCmdLine critical 
    opened by galfar 3
  • Simple GUI frontend

    Simple GUI frontend

    Original report by Marek Mauder (Bitbucket: galfar, GitHub: galfar).


    I get many requests for "process all files in folder" etc. from people not comfortable with command line, shell scripts etc.

    Simple GUI frontend (to cmd. line Deskew) with batch processing capability would be nice for many people.

    Binaries for Windows, macOS, and Linux required.

    major proposal DeskewGui 
    opened by galfar 3
  • GUI: Option to select sampling filter

    GUI: Option to select sampling filter

    The GUI does not currently have an option for this, and I don't know any other way of doing batch image processing with the CLI tool, the default linear filter option blurs the images too much, so it would be nice to be able to change it in the GUI.

    opened by NebulaOnion 2
  • Support for multipage pdf files

    Support for multipage pdf files

    Hi, thanks for this program. I think it would be useful to support to deskew a whole pdf file with multiple pages. I normally scan books or documents directly into a multipage pdf file, since it is more manageable to only have one file and not one per page. Do you think that this might be within the scope of this program?

    Cheers

    opened by cristobaltapia 2
  • DeskewGui: default window is too big for some common screens

    DeskewGui: default window is too big for some common screens

    Original report by Anonymous.


    The default window for DeskewGui is a little too big. In some screens, in particular laptop screens with 1366x800 pixels, it is somewhat difficult to redimension the window because the window title bar is left off screen. The maximum height should be no bigger than about 600 pixels. Thank you for your nice work.

    bug minor DeskewGui 
    opened by galfar 2
  • Compiling the GUI on Linux?

    Compiling the GUI on Linux?

    Hello @galfar, I'm your AUR maintainer

    I noticed you have a GUI now, but in Scripts there only seems to be a script to compile it on Mac.

    I get:

    deskewgui.lpr(10,3) Fatal: Can't find unit Interfaces used by deskewgui
    

    Is this usable on Linux?

    opened by ctrlcctrlv 1
  • GUI: allow passing extra parameters to CLI

    GUI: allow passing extra parameters to CLI

    As GUI will always lag behind a bit and won't provide all the options CLI can handle it may be useful to allow passing extra parameters directly from GUI to CLI.

    A new text edit in "Advanced options" should take care of it.

    enhancement DeskewGui 
    opened by galfar 1
  • feature request: release for linux arm64

    feature request: release for linux arm64

    It will be awesome to have more releases a) linux ARM64 and b) macOS-arm64. I would like to use this CLI in linux docker running over Apple M1 computer.

    opened by amitm02 2
  • Auto detect content rectangle

    Auto detect content rectangle

    Content rectangle can be auto detected by scanning the image (after thresholding) from the sides and looking where non-white pixels start. If it's fast enough it could be a default setting. If custom content rectangle is passed as a parameter by the user let's not use the detection.

    enhancement DeskewCmdLine 
    opened by galfar 0
  • Options for Specifying Skew Angle

    Options for Specifying Skew Angle

    I am looking for a cmd option that can be used to specify the angle that is being calculated before but i was not able to find it in the given set of options. I am applying some preprocessing on same image but has different preprocessed version. But when I run deskew on both of these images I get different skew angles. I want to use same angle for both versions of same image.

    opened by hamxahbhatti 2
  • Take the background color from the input image

    Take the background color from the input image

    Request came in to extend "-b" background color parameter to take it's value from the input image (edge/corner). https://galfar.vevb.net/wp/projects/deskew/comment-page-2/#comment-184847

    One request if at all possible is can there be an option for -b that auto samples the rgb value of an edge. I currently run an auto-crop script after they are deskewed and some pages have a different background color. This sometimes trips up the auto crop into thinking the added background is an edge. I’m hoping for something a little more dynamic that allows me to use the tool without sorting the material first.

    Hi, you mean something like “look at pixel [0,0] of input image and use it for output background”?

    Yes. “look at pixel [0,0] of input image and use it for output background” would be a great additional feature.

    enhancement 
    opened by galfar 8
Releases(v1.30)
  • v1.30(Jun 18, 2019)

    Command line tool for deskewing scanned documents. Binaries for several platforms and test images included.

    README

    Recent changes:

    v1.30 2019-06-07:

    • fix #15: Better image quality after rotation - better default and also selectable nearest|linear|cubic|lanczos filtering
    • fix #5: Detect skew angle only (no rotation done) - optionally only skew detection
    • fix #17: Optional auto-crop after rotation
    • fix #3: Command line option to set output compression - now for TIFF and JPEG
    • fix #12: Bad behavior when an output is given and no deskewing is needed
    • libtiff in macOS is now picked up also when binaries are put directly in the directory with deskew
    • text output is flushed after every write (Linux/Unix): it used to be flushed only when writing to device but not file/pipe.

    v1.25 2018-05-19:

    • fix #6: Preserve DPI measurement system (TIFF)
    • fix #4: Output image not saved in requested format (when deskewing is skipped)
    • dynamic loading of libtiff library - adds TIFF support in macOS when libtiff is installed
    Source code(tar.gz)
    Source code(zip)
    Deskew-1.30.zip(4.29 MB)
  • gui-v0.90(Jan 4, 2019)

    GUI Frontend for Deskew Command Line Tool

    Now it’s easier to process many files without writing shell scripts. It needs the command line tool which is called for the each input file. You can set the basic and most of the advanced options for deskewing in the GUI.

    Prebuilt executables for Windows and Linux are available in the download – you just place them to the same folder as the command line tool. Version for macOS is a bit more convenient – it’s a self-contained app bundle with CLI tool already inside and all placed in DMG image. You can also set the explicit path to the command line tool in the program itself.

    Source code(tar.gz)
    Source code(zip)
    DeskewGui-0.90.zip(4.11 MB)
  • v1.25(Jan 4, 2019)

    Command line tool for deskewing scanned documents. Binaries for several platforms and test images included.

    Changes since the last release:

    1.25 2018-05-19:

    • fix #6: Preserve DPI measurement system (TIFF)
    • fix #4: Output image not saved in requested format (when deskewing is skipped)
    • dynamic loading of libtiff library - adds TIFF support in macOS when libtiff is installed

    1.21 2017-11-01:

    • fix #8: Cannot compile in Free Pascal 3.0+ (Windows) - Fails to link precompiled LibTiff library
    • fix #7: Windows FPC build fails with Access violation exception when loading certain TIFFs (especially those saved by Windows Photo Viewer etc.)
    Source code(tar.gz)
    Source code(zip)
    deskew-125.zip(4.31 MB)
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
Creating a virtual tv using opencv in python3.

Virtual-TV Creating a virtual tv using opencv in python3. In order to run the code follow the below given steps: Make sure the desired videos which ar

Vamsi 1 Jan 01, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 08, 2023
Markup for note taking

Subtext: markup for note-taking Subtext is a text-based, block-oriented hypertext format. It is designed with note-taking in mind. It has a simple, pe

Gordon Brander 224 Jan 01, 2023
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
Python library to extract tabular data from images and scanned PDFs

Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d

Org. Account 165 Dec 31, 2022
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023
An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

pyTelegramBotCAPTCHA An easy to use and (hopefully useful) image CAPTCHA soltion for pyTelegramBotAPI. Installation: pip install pyTelegramBotCAPTCHA

29 Dec 26, 2022
pyntcloud is a Python library for working with 3D point clouds.

pyntcloud is a Python library for working with 3D point clouds.

David de la Iglesia Castro 1.2k Jan 07, 2023
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Paint Opencv 📷 This project is basically to draw lines with your hand, using python, opencv, mediapipe. Screenshoots 📱 Tools ⚙️ Python Opencv Mediap

Williams Ismael Bobadilla Torres 3 Nov 17, 2021
Qrcode Attendence System with Opencv and Pyzbar

Setup process Creates a virtual environment (Scripts that ensure executed Python code uses the Python interpreter and site packages installed inside t

Ganesh 5 Aug 01, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 07, 2023
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
Fine tuning keras-ocr python package with custom synthetic dataset from scratch

OCR-Pipeline-with-Keras The keras-ocr package generally consists of two parts: a Detector and a Recognizer: Detector is responsible for creating bound

Eugene 1 Jan 05, 2022
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022
【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿,我们会帮你完成一切✨

原神钓鱼辅助工具 ✨ 作者正在努力重构代码中……会尽快带给大家一个更完美的脚本 ✨ 「您只需抛出鱼竿,然后我们会帮您搞定一切」 如果你觉得这个脚本好用,请点一个 Star ⭐ ,你的 Star 就是作者更新最大的动力 点击这里 查看演示视频 ✨ 欢迎大家在 Issues 中分享自己的配置文件 ✨ ✨

261 Jan 02, 2023