Search duplicate files linux

4 Useful Tools to Find and Delete Duplicate Files in Linux

Organizing your home directory or even system can be particularly hard if you have the habit of downloading all kinds of stuff from the internet.

Often you may find you have downloaded the same mp3, pdf, epub (and all kind of other file extensions) and copied it to different directories. This may cause your directories to become cluttered with all kinds of useless duplicated stuff.

In this tutorial, you are going to learn how to find and delete duplicate files in Linux using rdfind and fdupes command-line tools, as well as using GUI tools called DupeGuru and FSlint.

A note of caution – always be careful what you delete on your system as this may lead to unwanted data loss. If you are using a new tool, first try it in a test directory where deleting files will not be a problem.

1. Rdfind – Finds Duplicate Files in Linux

Rdfind comes from redundant data find. It is a free tool used to find duplicate files across or within multiple directories. It uses checksum and finds duplicates based on file contains not only names.

Rdfind uses an algorithm to classify the files and detects which of the duplicates is the original file and considers the rest as duplicates. The rules of ranking are:

  • If A was found while scanning an input argument earlier than B, A is higher ranked.
  • If A was found at a depth lower than B, A is higher ranked.
  • If A was found earlier than B, A is higher ranked.

The last rule is used particularly when two files are found in the same directory.

To install rdfind in Linux, use the following command as per your Linux distribution.

To run rdfind on a directory simply type rdfind and the target directory. Here is an example:

Find Duplicate Files in Linux

As you can see rdfind will save the results in a file called results.txt located in the same directory from where you ran the program. The file contains all the duplicate files that rdfind has found. You can review the file and remove the duplicate files manually if you want to.

Another thing you can do is to use the -dryrun an option that will provide a list of duplicates without taking any actions:

When you find the duplicates, you can choose to replace them with hard links.

And if you wish to delete the duplicates you can run.

To check other useful options of rdfind you can use the rdfind manual with.

2. Fdupes – Scan for Duplicate Files in Linux

Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:

  • Comparing partial md5sum signatures
  • Comparing full md5sum signatures
  • byte-by-byte comparison verification

Just like rdfind it has similar options:

  • Search recursively
  • Exclude empty files
  • Shows size of duplicate files
  • Delete duplicates immediately
  • Exclude files with a different owner

To install fdupes in Linux, use the following command as per your Linux distribution.

Fdupes syntax is similar to rdfind. Simply type the command followed by the directory you wish to scan.

To search files recursively, you will have to specify the -r an option like this.

You can also specify multiple directories and specify a dir to be searched recursively.

To have fdupes calculate the size of the duplicate files use the -S option.

To gather summarized information about the found files use the -m option.

Scan Duplicate Files in Linux

Finally, if you want to delete all duplicates use the -d an option like this.

Fdupes will ask which of the found files to delete. You will need to enter the file number:

Delete Duplicate Files in Linux

A solution that is definitely not recommended is to use the -N option which will result in preserving the first file only.

To get a list of available options to use with fdupes review the help page by running.

3. dupeGuru – Find Duplicate Files in a Linux

dupeGuru is an open-source and cross-platform tool that can be used to find duplicate files in a Linux system. The tool can either scan filenames or content in one or more folders. It also allows you to find the filename that is similar to the files you are searching for.

Читайте также:  Картридж струйный cactus cs pg440xl

dupeGuru comes in different versions for Windows, Mac, and Linux platforms. Its quick fuzzy matching algorithm feature helps you to find duplicate files within a minute. It is customizable, you can pull the exact duplicate files you want to, and Wipeout unwanted files from the system.

To install dupeGuru in Linux, use the following command as per your Linux distribution.

DupeGuru – Find Duplicate Files in Linux

4. FSlint – Duplicate File Finder for Linux

FSlint is a free utility that is used to find and clean various forms of lint on a filesystem. It also reports duplicate files, empty directories, temporary files, duplicate/conflicting (binary) names, bad symbolic links and many more. It has both command-line and GUI modes.

To install FSlint in Linux, use the following command as per your Linux distribution.

FSlint – Duplicate File Finder for -Linux

Conclusion

These are the very useful tools to find duplicated files on your Linux system, but you should be very careful when deleting such files.

If you are unsure if you need a file or not, it would be better to create a backup of that file and remember its directory prior to deleting it. If you have any questions or comments, please submit them in the comment section below.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

We are thankful for your never ending support.

Источник

How to Find and Remove Duplicate Files on Linux?

Most of us have a habit of downloading many types of stuff (songs, files, etc) from the internet and that is why we may often find we have downloaded the same mp3 files, PDF files, and other extensions. Your disk spaces are unnecessarily wasted by Duplicate files and if you want the same files on a different location you can always set up a soft link or hard link that doesn’t eat the space unnecessarily and store the data in only one location on your disk. This will end up eating your system space unnecessarily and manually locating this duplicate file is quite a tough job. So there are some good tools in Linux for locating duplicate files and removing them to free up your system space by scanning your system, no matter you’re using Linux on your desktop or on a server.

Note: Whenever you’re trying a new tool make sure to first try it on a test directory where deleting files will not be a problem

Method 1: Using FSlint.

FSlint is a tool that helps us to search and remove unnecessary duplicate files, empty directories, temp files, or files with incorrect names completely and free up the disk space on your Linux system. FSlint provides a convenient GUI by default, but it also has CLI modes for various functions which are quite convenient for new users of Linux.

Install fslint in Linux using the following commands:

Fslint Interface

When the FSlint interface will be open you will find that by default, FSlint interface is opened with the Duplicates panel being selected and your home directory is set as the default search path, You will find other several numbers options to choose from like: installed packages, bad names, name clashes, temp files, empty directories, Bad IDs, etc.

Steps to use:

Step 1: First choose the task that you want to perform from the left panel like I am choosing the Duplicates panel option, you can choose the other panel too.

Step 2: Choose the Search Path where you want to perform the task

Step 3: Click on the Find option to locate the files.

Some directories may not be displayed/deleted due to permission issues

Once you get duplicate files (according to the option you choose), you can select and delete them. There is an Advanced search parameter where you can rule can be defined to exclude certain file types or directories which you don’t want to include in the search.

Advanced search parameters

Method 2: Using Fdupe.

Fdupe is another duplicate file removal tool residing within specified directories like fslint but unlike fslint, Fdupe is a command-line interface tool.It is a free and open-source tool written in C. Fdupe uses several modes of searching, they are:

  • By size
  • Comparing full or partial MD5 signatures and by comparing each bite.
  • Byte-by-byte comparison

Install fdupe in Linux using the following commands:

After installation simply run the fdupes command followed by the path to a directory that you want to scan.

Читайте также:  Adobe linux что это

Duplicate files being displayed

This tool will not automatically delete anything, it will just show you a list of all the duplicate files. You can then delete the duplicate files according to your choice.

The size of the duplicate files is calculated by -S option:

At last, if you want to delete all duplicates you can use the -d option like the given screenshot:

fdupes -d /path/to/directory

In the above screenshots, we can see the -d command showing all the duplicate files within the folder and giving you the option to select the file which you want to keep(preserve files option), by giving you the option to either delete files one by one or select a range to delete it or all at once. Even If you want to delete all files without asking and preserve the first one, you can use the -N option.

For more option see the help option of fdupes by typing fdupes -h:

Источник

How To Find And Delete Duplicate Files In Linux

I always backup the configuration files or any old files to somewhere in my hard disk before edit or modify them, so I can restore them from the backup if I accidentally did something wrong. But the problem is I forgot to clean up those files and my hard disk is filled with a lot of duplicate files after a certain period of time. I feel either too lazy to clean the old files or afraid that I may delete an important files. If you’re anything like me and overwhelming with multiple copies of same files in different backup directories, you can find and delete duplicate files using the tools given below in Unix-like operating systems.

A word of caution:

Please be careful while deleting duplicate files. If you’re not careful, it will lead you to accidental data loss. I advice you to pay extra attention while using these tools.

Find And Delete Duplicate Files In Linux

For the purpose of this guide, I am going to discuss about three utilities namely,

These three utilities are free, open source and works on most Unix-like operating systems.

1. Rdfind

Rdfind, stands for redundant data find, is a free and open source utility to find duplicate files across and/or within directories and sub-directories. It compares files based on their content, not on their file names. Rdfind uses ranking algorithm to classify original and duplicate files. If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates. Once it found the duplicates, it will report them to you. You can decide to either delete them or replace them with hard links or symbolic (soft) links.

Installing Rdfind

Rdfind is available in AUR. So, you can install it in Arch-based systems using any AUR helper program like Yay as shown below.

On Debian, Ubuntu, Linux Mint:

Usage

Once installed, simply run Rdfind command along with the directory path to scan for the duplicate files.

Scan a directory with Rdfind

As you see in the above screenshot, Rdfind command will scan

/Downloads directory and save the results in a file named results.txt in the current working directory. You can view the name of the possible duplicate files in results.txt file.

By reviewing the results.txt file, you can easily find the duplicates. You can remove the duplicates manually if you want to.

Also, you can -dryrun option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:

Once you found the duplicates, you can replace them with either hardlinks or symlinks.

To replace all duplicates with hardlinks, run:

To replace all duplicates with symlinks/soft links, run:

You may have some empty files in a directory and want to ignore them. If so, use -ignoreempty option like below.

If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.

To delete all duplicates, simply run:

If you do not want to ignore empty files and delete them along with all duplicates, run:

For more details, refer the help section:

And, the manual pages:

Suggested read:

2. Fdupes

Fdupes is yet another command line utility to identify and remove the duplicate files within specified directories and the sub-directories. It is free, open source utility written in C programming language. Fdupes identifies the duplicates by comparing file sizes, partial MD5 signatures, full MD5 signatures, and finally performing a byte-by-byte comparison for verification.

Similar to Rdfind utility, Fdupes comes with quite handful of options to perform operations, such as:

  • Recursively search duplicate files in directories and sub-directories
  • Exclude empty files and hidden files from consideration
  • Show the size of the duplicates
  • Delete duplicates immediately as they encountered
  • Exclude files with different owner/group or permission bits as duplicates
  • And a lot more.
Читайте также:  Hp deskjet 1510 картридж снпч

Installing Fdupes

Fdupes is available in the default repositories of most Linux distributions.

On Arch Linux and its variants like Antergos, Manjaro Linux, install it using Pacman like below.

On Debian, Ubuntu, Linux Mint:

Usage

Fdupes usage is pretty simple. Just run the following command to find out the duplicate files in a directory, for example

Sample output from my system:

As you can see, I have a duplicate file in /home/sk/Downloads/ directory. It shows the duplicates from the parent directory only. How to view the duplicates from sub-directories? Just use -r option like below.

Now you will see the duplicates from /home/sk/Downloads/ directory and its sub-directories as well.

Fdupes can also be able to find duplicates from multiple directories at once.

You can even search multiple directories, one recursively like below:

The above commands searches for duplicates in «

/Downloads» directory and «

/Documents/ostechnix» directory and its sub-directories.

Sometimes, you might want to know the size of the duplicates in a directory. If so, use -S option like below.

Similarly, to view the size of the duplicates in parent and child directories, use -Sr option.

We can exclude empty and hidden files from consideration using -n and -A respectively.

The first command will exclude zero-length files from consideration and the latter will exclude hidden files from consideration while searching for duplicates in the specified directory.

To summarize duplicate files information, use -m option.

To delete all duplicates, use -d option.

This command will prompt you for files to preserve and delete all other duplicates. Just enter any number to preserve the corresponding file and delete the remaining files. Pay more attention while using this option. You might delete original files if you’re not be careful.

If you want to preserve the first file in each set of duplicates and delete the others without prompting each time, use -dN option (not recommended).

To delete duplicates as they are encountered, use -I flag.

For more details about Fdupes, view the help section and man pages.

Also read:

3. FSlint

FSlint is yet another duplicate file finder utility that I use from time to time to get rid of the unnecessary duplicate files and free up the disk space in my Linux system. Unlike the other two utilities, FSlint has both GUI and CLI modes. So, it is more user-friendly tool for newbies. FSlint not just finds the duplicates, but also bad symlinks, bad names, temp files, bad IDS, empty directories, and non stripped binaries etc.

Installing FSlint

FSlint is available in AUR, so you can install it using any AUR helpers.

On Debian, Ubuntu, Linux Mint:

Once it is installed, launch it from menu or application launcher.

This is how FSlint GUI looks like.

As you can see, the interface of FSlint is user-friendly and self-explanatory. In the Search path tab, add the path of the directory you want to scan and click Find button on the lower left corner to find the duplicates. Check the recurse option to recursively search for duplicates in directories and sub-directories. The FSlint will quickly scan the given directory and list out them.

From the list, choose the duplicates you want to clean and select any one of them given actions like Save, Delete, Merge and Symlink.

In the Advanced search parameters tab, you can specify the paths to exclude while searching for duplicates.

fslint advanced search

FSlint command line options

FSlint provides a collection of the following CLI utilities to find duplicates in your filesystem:

  • findup — find DUPlicate files
  • findnl — find Name Lint (problems with filenames)
  • findu8 — find filenames with invalid utf8 encoding
  • findbl — find Bad Links (various problems with symlinks)
  • findsn — find Same Name (problems with clashing names)
  • finded — find Empty Directories
  • findid — find files with dead user IDs
  • findns — find Non Stripped executables
  • findrs — find Redundant Whitespace in files
  • findtf — find Temporary Files
  • findul — find possibly Unused Libraries
  • zipdir — Reclaim wasted space in ext2 directory entries

All of these utilities are available under /usr/share/fslint/fslint/fslint location.

For example, to find duplicates in a given directory, do:

Similarly, to find empty directories, the command would be:

To get more details on each utility, for example findup, run:

For more details about FSlint, refer the help section and man pages.

Conclusion

You know now about three tools to find and delete unwanted duplicate files in Linux. Among these three tools, I often use Rdfind. It doesn’t mean that the other two utilities are not efficient, but I am just happy with Rdfind so far. Well, it’s your turn. Which is your favorite tool and why? Let us know them in the comment section below.

Источник

Поделиться с друзьями
КомпСовет
Adblock
detector