Find duplicate files

Overview

dupeGuru

dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It is written mostly in Python 3 and has the peculiarity of using multiple GUI toolkits, all using the same core Python code. On OS X, the UI layer is written in Objective-C and uses Cocoa. On Linux, it is written in Python and uses Qt5.

The Cocoa UI of dupeGuru is hosted in a separate repo: https://github.com/arsenetar/dupeguru-cocoa

Current status

Still looking for additional help especially with regards to:

  • OSX maintenance: reproducing bugs & cocoa version, building package with Cocoa UI.
  • Linux maintenance: reproducing bugs, maintaining PPA repository, Debian package.
  • Translations: updating missing strings, transifex project at https://www.transifex.com/voltaicideas/dupeguru-1
  • Documentation: keeping it up-to-date.

Contents of this folder

This folder contains the source for dupeGuru. Its documentation is in help, but is also available online in its built form. Here's how this source tree is organized:

  • core: Contains the core logic code for dupeGuru. It's Python code.
  • qt: UI code for the Qt toolkit. It's written in Python and uses PyQt.
  • images: Images used by the different UI codebases.
  • pkg: Skeleton files required to create different packages
  • help: Help document, written for Sphinx.
  • locale: .po files for localization.
  • hscommon: A collection of helpers used across HS applications.
  • qtlib: A collection of helpers used across Qt UI codebases of HS applications.

How to build dupeGuru from source

Windows & macOS specific additional instructions

For windows instructions see the Windows Instructions.

For macos instructions (qt version) see the macOS Instructions.

Prerequisites

System Setup

When running in a linux based environment the following system packages or equivalents are needed to build:

  • python3-pyqt5
  • pyqt5-dev-tools (on some systems, see note)
  • python3-wheel (for hsaudiotag3k)
  • python3-venv (only if using a virtual environment)
  • python3-dev
  • build-essential

Note: On some linux systems pyrcc5 is not put on the path when installing python3-pyqt5, this will cause some issues with the resource files (and icons). These systems should have a respective pyqt5-dev-tools package, which should also be installed. The presence of pyrcc5 can be checked with which pyrcc5. Debian based systems need the extra package, and Arch does not.

To create packages the following are also needed:

  • python3-setuptools
  • debhelper

Building with Make

dupeGuru comes with a makefile that can be used to build and run:

$ make && make run

Building without Make

$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt
$ python build.py
$ python run.py

Generating Debian/Ubuntu package

To generate packages the extra requirements in requirements-extra.txt must be installed, the steps are as follows:

$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt -r requirements-extra.txt
$ python build.py --clean
$ python package.py

This can be made a one-liner (once in the directory) as:

$ bash -c "python3 -m venv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt -r requirements-extra.txt && python build.py --clean && python package.py"

Running tests

The complete test suite is run with Tox 1.7+. If you have it installed system-wide, you don't even need to set up a virtualenv. Just cd into the root project folder and run tox.

If you don't have Tox system-wide, install it in your virtualenv with pip install tox and then run tox.

You can also run automated tests without Tox. Extra requirements for running tests are in requirements-extra.txt. So, you can do pip install -r requirements-extra.txt inside your virtualenv and then py.test core hscommon

Comments
  • Add new contributor

    Add new contributor

    dupeGuru has currently only one maintainer, me. This is a dangerous situation that needs to be corrected.

    The goal is to eventually have another active maintainer, but before we can get there, the project needs more contributors. It is very much lacking on that side right now.

    Whatever your skills, if you are remotely interestested in being a contributor, I'm interested in mentoring you. I've been saying so in the Contribute page for a while, but now I'm thinking it might be a better idea to adverstise the need for contributors in a ticket. This way, it's clear whether someone has answered the call or not.

    So, if you would like to start contributing to dupeGuru but would like some guidance/mentorship, simply add a comment here, we'll get started.

    bug beginner 
    opened by hsoft 56
  • Package 4.0.4

    Package 4.0.4

    • [x] Update Version to 4.0.4
    • [x] Update Changelog
    • [x] Package Windows 64 bit
    • [x] Package Windows 32 bit
    • [ ] Package OSX Qt (experimental)
    • [ ] Package OSX Cocoa
    • [x] ~~Package .deb~~
    • [x] ~~Package .rpm (maybe)~~
    • [x] PPA Ubuntu LTS
    • [x] PPA Ubuntu Latest (LTS packages should work)
    • [x] Package Arch Linux
    • [x] Commit any changes needed to complete packing
    • [x] Tag Repository
    • [x] Make Github Release
    • [ ] Update Website
    opened by arsenetar 46
  • Package 4.1.0

    Package 4.1.0

    • [x] Merge remaining PRs #733 & #705
    • [x] Update Version to 4.1.0 #733
    • [x] Update Changelog #733
    • [x] Package Windows 64 bit
    • [x] Package Windows 32 bit
    • [ ] Package OSX Qt (experimental)
    • [ ] Package OSX Cocoa
    • [x] Package .deb
    • [x] Notify Ubuntu PPA maintainers
    • [x] Notify Arch Linux maintainers
    • [x] Make Github Release
    • [ ] Update website links for new packages.
    opened by arsenetar 37
  • dupeGuru PPA is outdated

    dupeGuru PPA is outdated

    Hello, I'm not sure if this is the right place to raise this but are there any plans to provide updated versions to the Ubuntu PPA ?

    The last build is from 2017-08-25, current Ubuntu release is 17.10 (Artful Aardvark) rleased in October 2017.

    I thought the building of the DEB files for the PPA is an automated process. So could this be checked and fixed? Unfortunately I'm lacking the skills to be more helpful.

    Linux 
    opened by seb-1204 33
  • Package 4.1.1

    Package 4.1.1

    • [x] Verify any other issues to fix before tag
    • [x] Update Version to 4.1.1
    • [x] Update Changelog
    • [x] Package Windows 64 bit
    • [x] Package Windows 32 bit
    • [x] Package OSX Qt (experimental) Packaging has been fixed
    • [x] Package OSX Cocoa
    • [x] Package .deb (x64 only, also debian source archive)
    • [x] Notify Ubuntu PPA maintainers
    • [x] Notify Arch Linux maintainers
    • [x] Make Github Release
    • [x] Update website links for new packages.
    opened by arsenetar 27
  • Language problems in 4.1.0

    Language problems in 4.1.0

    First of all, thank you for this great project and all the awesome work!

    I just installed 4.1.0 on windows, the language only defaults to system language,

    e.g., if my system is in English, the interface is displayed in English, changing it in display setting seems to be doing nothing, even if I reboot my PC, the displayed language is still English. Same applies when I installed it in a Chinese system, language defaults to Chinese, there is simply no way to change it.

    Another issue is that on transifex, I saw that Chinese translation is 98% finished with only 4 strings to be translated, however in reality more than half of the interface is not translated.

    • OS: Windows 10 1903 x64
    • Version: 4.1.0

    1

    bug Windows 
    opened by terrytw 19
  • Initial Update of Windows Packaging

    Initial Update of Windows Packaging

    I tested the generated installers and executable files on a couple windows 10 machines and they seem to work fine. There are probably a few areas for improvement, namely:

    • Installing both x86 and x64 versions is not completely supported as the installer script is written right now, I think I know how to get this working without being overly complicated.
    • path to makensis is currently hard-coded in package.py, adding the ability to pass the path would be better

    Right now I think as long as the program itself works and the packaging works well it would probably be good to get it out to see if there are additional issues.

    Ref #393

    opened by arsenetar 17
  • Post-scan re-prioritization

    Post-scan re-prioritization

    Delta value + Dupes only mode is powerful, but not for all cases. For example, there's no way to re-prioritize cases like this one. A new tool would be needed with more powerful options, such as kind-based prioritization, folder-based prioritization, and so one. I'm not sure yet of the form it should take.

    enhancement 
    opened by hsoft 16
  • Dupeguru 4.1.1 does not start on Ubuntu 21.04 with Wayland

    Dupeguru 4.1.1 does not start on Ubuntu 21.04 with Wayland

    Describe the bug Hi, I finally succeded to install Dupeguru 4.1.1 new version on Ubuntu 21.04 - I just had to add python3-pyqt5

    Installation was fine, but when I launch Dupeguru, it doesn't start, and on the terminal I have following error message :

    $ dupeguru Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway. Traceback (most recent call last): File "/usr/bin/dupeguru", line 89, in sys.exit(main()) File "/usr/bin/dupeguru", line 72, in main from qt.app import DupeGuru File "/usr/share/dupeguru/qt/app.py", line 22, in from core.app import AppMode, DupeGuru as DupeGuruModel File "/usr/share/dupeguru/core/app.py", line 24, in from . import se, me, pe File "/usr/share/dupeguru/core/pe/init.py", line 1, in from . import ( # noqa File "/usr/share/dupeguru/core/pe/block.py", line 9, in from ._block import NoBlocksError, DifferentBlockCountError, avgdiff, getblocks2 # NOQA ModuleNotFoundError: No module named 'core.pe._block'

    Desktop:

    • OS: Ubuntu 21.04 / Gnome 3.38.5 / Wayland
    • Version 4.1.1

    Thanks !

    bug 
    opened by Valeryan24 15
  • Crash on startup on Ubuntu 20.04

    Crash on startup on Ubuntu 20.04

    Hi, I just installed DupeGuru with the Ubuntu ppa - https://launchpad.net/~dupeguru/+archive/ubuntu/ppa - on the development version 20.04 LTS.

    But program doesn't launch, here is the error message :

    Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Traceback (most recent call last): File "/usr/bin/dupeguru", line 81, in sys.exit(main()) File "/usr/bin/dupeguru", line 66, in main from qt.app import DupeGuru File "/usr/share/dupeguru/qt/app.py", line 22, in from core.app import AppMode, DupeGuru as DupeGuruModel File "/usr/share/dupeguru/core/app.py", line 24, in from . import se, me, pe File "/usr/share/dupeguru/core/pe/init.py", line 1, in from . import block, cache, exif, iphoto_plist, matchblock, matchexif, photo, prioritize, result_table, scanner # noqa File "/usr/share/dupeguru/core/pe/block.py", line 9, in from ._block import NoBlocksError, DifferentBlockCountError, avgdiff, getblocks2 # NOQA ModuleNotFoundError: No module named 'core.pe._block'

    https://framapic.org/V0Mwt71yWUPq/nKkKQ0meHsr5.png https://framapic.org/lYvGBlydaTrK/FrI6fXBeRpOe.png

    Thanks in advance for your help !

    bug Linux 
    opened by Valeryan24 15
  • Support for ubuntu 18.04

    Support for ubuntu 18.04

    Running dupeguru xenial distribution on 18.04 bionic beaver results in ModuleNotFoundError: No module named 'core.pe._block'

    The fix is to relink proper libraries -

    sudo ln /usr/share/dupeguru/core/pe/_cache.cpython-35m-x86_64-linux-gnu.so /usr/share/dupeguru/core/pe/_cache.cpython-36m-x86_64-linux-gnu.so
    sudo ln /usr/share/dupeguru/core/pe/_block.cpython-35m-x86_64-linux-gnu.so /usr/share/dupeguru/core/pe/_block.cpython-36m-x86_64-linux-gnu.so
    sudo ln /usr/share/dupeguru/qt/pe/_block_qt.cpython-35m-x86_64-linux-gnu.so /usr/share/dupeguru/qt/pe/_block_qt.cpython-36m-x86_64-linux-gnu.so
    

    Could you add this fix the distribution for 18.04 bionic beaver?

    opened by alexivkin 14
  • feat: Remove shelve picture cache

    feat: Remove shelve picture cache

    • Remove shelve picture cache as it has had a fair number of historical issues. Original issue for which it was added should be long resolved. Additionally this allows additional consolidation of the various cache code and potentially dbs in the future.
    • Remove all related preferences and related code for changing cache backend between sqlite and shelve.
    opened by arsenetar 1
  • How to make sure program reads content of the file?

    How to make sure program reads content of the file?

    I have 1.2TB of data(photos and videos), out of which DupeGuru identified 400GB of duplicates.

    My concern is that it finished analysis very fast (within one minute), and I didn't see much disk reading activity in Windows Task Manager. I would expect DupeGuru to spend significant time (30mins?) to check content of 400GB of duplicate files to calculate hashes. I have "partial hash for large files" option disabled.

    How to explain such behavior and how to make sure DupeGuru is actually checking files content and not just sizes? OS: Windows 11, filesystem: NTFS, disk: SSD NVME.

    bug 
    opened by rikuiki 5
  • Behavior of

    Behavior of "keep selection preference"

    Is your feature request related to a problem? Please describe. I am new to using dupeguru. I just installed the dmg on an M1 running OSX12.6.2

    When there are two multiple audio files with different quality bitrate, is the "keep selection preference" to prefer keep the higher quality audio file?

    I could only find high level documentation. If there is documentation that I missed that discusses these settings please refer me to it.

    Thank you in advance.

    opened by noahwallach 0
  • Error when scanning entire mac desktop as reference, and a couple of folders within the desktop as normal, and a few folders excluded

    Error when scanning entire mac desktop as reference, and a couple of folders within the desktop as normal, and a few folders excluded

    Application Identifier: com.hardcoded-software.dupeguru Application Version: 4.0.3 Mac OS X Version: Version 10.16 (Build 21G5046c)

    Traceback (most recent call last): File "build/dupeGuru.app/Contents/Resources/py/cocoa/inter.py", line 259, in pulse File "build/dupeGuru.app/Contents/Resources/py/hscommon/gui/progress_window.py", line 101, in pulse File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 323, in _job_error File "build/dupeGuru.app/Contents/Resources/py/hscommon/jobprogress/performer.py", line 43, in _async_run File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 780, in do File "build/dupeGuru.app/Contents/Resources/py/core/scanner.py", line 137, in get_dupe_groups File "build/dupeGuru.app/Contents/Resources/py/core/pe/scanner.py", line 31, in _getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 167, in getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 65, in prepare_pictures File "build/dupeGuru.app/Contents/Resources/py/core/pe/cache_shelve.py", line 129, in purge_outdated File "build/dupeGuru.app/Contents/Resources/py/core/pe/cache_shelve.py", line 47, in delitem File "build/dupeGuru.app/Contents/Resources/py/shelve.py", line 128, in delitem KeyError: b'id:16042'

    bug 
    opened by matttrv 0
  • Running Multiple Instances of DupeGuru in 1 computer

    Running Multiple Instances of DupeGuru in 1 computer

    Application Name: dupeGuru Version: 4.3.1 Python: 3.8.13 Operating System: Windows-10-10.0.17763-SP0

    Traceback (most recent call last): File "hscommon\gui\progress_window.py", line 111, in pulse File "core\app.py", line 300, in _job_completed File "core\fs.py", line 176, in commit sqlite3.OperationalError: database is locked

    opened by LumarMotta 1
Releases(4.3.1)
ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s

ProPublica 86 Oct 18, 2022
:bookmark: Browser-independent bookmark manager

buku buku in action! Introduction buku is a powerful bookmark manager written in Python3 and SQLite3. When I started writing it, I couldn't find a fle

Mischievous Meerkat 5.4k Jan 02, 2023
RedNotebook is a cross-platform journal

RedNotebook RedNotebook is a modern desktop journal. It lets you format, tag and search your entries. You can also add pictures, links and customizabl

Jendrik Seipp 417 Dec 28, 2022
Open source platform for the machine learning lifecycle

MLflow: A Machine Learning Lifecycle Platform MLflow is a platform to streamline machine learning development, including tracking experiments, packagi

MLflow 13.3k Jan 04, 2023
A time tracking application

GTimeLog GTimeLog is a simple app for keeping track of time. Contents Installing Documentation Resources Credits Installing GTimeLog is packaged for D

GTimeLog developers 224 Nov 28, 2022
Find duplicate files

dupeGuru dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It is written mostly in Python 3 and has th

Andrew Senetar 3.3k Jan 04, 2023
Scan, index, and archive all of your paper documents

[ en | de | el ] Important news about the future of this project It's been more than 5 years since I started this project on a whim as an effort to tr

Paperless 7.8k Jan 06, 2023
The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim through format.

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim

Pinry 2.7k Jan 08, 2023
A collection of self-contained and well-documented issues for newcomers to start contributing with

fedora-easyfix A collection of self-contained and well-documented issues for newcomers to start contributing with How to setup the local development e

Akashdeep Dhar 8 Oct 16, 2021
🦉Data Version Control | Git for Data & Models

Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List Data Version Control or DVC is an open-source tool for data sci

Iterative 10.9k Jan 05, 2023
:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database

About Calibre-Web is a web app providing a clean interface for browsing, reading and downloading eBooks using an existing Calibre database. This softw

Jan B 8.2k Jan 02, 2023
Conference planning tool: CfP, scheduling, speaker management

pretalx is a conference planning tool focused on providing the best experience for organisers, speakers, reviewers, and attendees alike. It handles th

492 Dec 28, 2022
A Python library to manage ACBF ebooks.

libacbf A Python library to read and edit ACBF formatted comic book files and archives. XML Specifications here: https://acbf.fandom.com/wiki/Advanced

Grafcube 0 Nov 09, 2021
Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic.

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic. Exclusiv

pyMedusa 1.5k Dec 30, 2022
Collect your thoughts and notes without leaving the command line.

jrnl To get help, submit an issue on Github. jrnl is a simple journal application for your command line. Journals are stored as human readable plain t

Manuel Ebert 31 Dec 01, 2022
Main repository of the zim desktop wiki project

Zim - A Desktop Wiki Editor Zim is a graphical text editor used to maintain a collection of wiki pages. Each page can contain links to other pages, si

Zim Desktop Wiki 1.6k Dec 30, 2022
A CalDAV/CardDAV server

Xandikos is a lightweight yet complete CardDAV/CalDAV server that backs onto a Git repository. Xandikos (Ξανδικός or Ξανθικός) takes its name from the

Jelmer Vernooij 255 Jan 05, 2023
:mag: Ambar: Document Search Engine

🔍 Ambar: Document Search Engine Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search. Am

RD17 1.9k Jan 09, 2023
Fava - web interface for Beancount

Fava is a web interface for the double-entry bookkeeping software Beancount with a focus on features and usability. Check out the online demo and lear

1.5k Dec 30, 2022
A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.

Baby Buddy A buddy for babies! Helps caregivers track sleep, feedings, diaper changes, tummy time and more to learn about and predict baby's needs wit

Baby Buddy 1.5k Jan 02, 2023