Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
TreeSize
Light Logo Dark Logo
TreeSize
  • Overview
  • Installation
  • Quickstart
  • Using TreeSize
    • Ribbon Bar
      • Application Menu
      • Quick Access Toolbar
      • Home tab
      • Scan tab
      • Tools menu
      • View menu
      • Help tab
    • Scan Targets
    • Selecting a scan target
    • Directory Tree
    • TreeSize Views
      • Charts
      • Details
      • Extensions
      • Users
      • Age of files
      • Top Files
      • History
    • Drive list
    • Snapshots
    • Disk Usage Comparison
    • Scanning SharePoint
      • Azure AD Configuration
        • Certificate based authentication
        • User based authentication
    • Options
      • Display
      • Directory Tree
      • File groups
      • Age of files
      • Charts
      • Top files
      • General scan options
      • Scan filters
      • Printer
      • PDF
      • Excel
      • HTML
      • CSV
      • XML
      • SQLite
      • Text
      • Email
      • Application start
      • Context menu
  • Using File Search
    • File Search Ribbon Bar
    • Search paths
    • Basic search
      • Search syntax
    • Duplicate file search
      • How does the duplicate search work?
      • How does the deduplication work?
    • Advanced search
      • How do I define search filters?
      • What types of filters are available?
    • File search templates
    • How to exclude files
    • Process search results
    • Move checked files
      • Unicode ZIP files
    • Mass renaming files
    • Search options
      • General options
      • Personalize
      • Other
  • Using Scheduled Tasks
    • Scheduling scans
      • Options
      • Export
      • File operations
      • Advances
      • Command line
      • Scheduling
      • All Tasks
    • Command line options
    • How to schedule a file search
    • How to schedule a move or delete operation
  • Tips & Annotations
    • Notes on NTFS
    • Wasted space
    • Regular expressions
    • Translations
  • Copyright & Contact
Back to top

Duplicate file search¶

The duplicate file search searches for duplicate files on the selected drives or shares.

In this context, duplicate files are files which seem to exist more than once. Such redundant files increase the allocated space of your disks unnecessarily.

_images/TreeSize-FileSearch_DuplicateFiles.png

Context tab¶

Search Mode¶

Select one of three modes of the duplicates search. You can search for duplicate files, duplicate folders, or files that do not have any duplicates.

Duplicate Files¶

Searches for files that are duplicates of each other, using the selected comparison method.

Duplicate Folders¶

Searches for folders that are duplicates of each other. Two folders are considered duplicates, if they contain the same amount of subfolders and files. These subfolders and files also have to be equal to each other, in regards to the selected comparison method.

Unique Files¶

This setting searches for files that do not have any duplicates across the selected search paths.

Comparison method¶

Defines which criteria should be used to identify files as duplicates. Here is a list of the available strategies:

File Content¶

This option uses MD5 checksums for comparison by default.

When using this method, a so called hash value is calculated based on the contents of each file. Files with the same content will have the same hash value, files with different content will almost certainly have different values. Empty files are ignored, since there is no content to compare.

This is more accurate than comparing files by their name, size and date but it is also much slower.

Within the file search options, it is possible to adjust this method to use SHA256 hashes instead. The SHA256 algorithm further reduces the statistical risk of hash collisions compared to MD5 but it is also significantly slower. This option is only visible when using the expert application mode.

Size, Name and Date¶

Select this option to identify duplicate files by looking for equal names, sizes and last change dates.

This is much faster than using check sums to indicate duplicates, but it is also less accurate.

Name and Size¶

Select this option to identify duplicate files by looking for equal names and sizes.

Equal to the very first compare criteria, but without regarding the “last modified” time stamp of the files.

This is helpful in case files had been moved from one location to another, which might modify this time stamp.

Name¶

Select this option to find all files with equal file names.

This compare type can be helpful when you are searching for undesired copies (e.g. documents which have been copied and modified locally).

Name without Extension¶

Select this option to detect files with equal names, without regarding the file extension.

This can be interesting in case you are searching for duplicated backup files or e.g. row-data and compact image or video files (“MyPhoto.bmp” - “MyPhoto.png”).

Size and Date¶

Compares files according to their size and date values. This allows for a faster, but therefore less accurate search for duplicate files with different names. Accidental copies with names such as “Copy of …” can be identified quickly, using this method.

Size only¶

Select this option to find all files with equal size.

Search Filters¶

Additional options to customize the duplicate file search:

Exclude filter¶

Allows to activate, deactivate or customize the global exclude filters for this search.

By restricting the duplicate search to a specific preselection of files, you can prevent listing files of certain directories (e.g. your local system directories) as duplicates. Additionally, this option will reduce the number of files to compare, which improves the speed of the search.

Ignore NTFS hardlinks¶

If this option is activated, hardlinks are not regarded as file duplicates. Note: NTFS hardlinks do not allocate memory. Therefore, deleting them does not make additional memory available. In addition, TreeSize uses hard links for deduplication.

Deduplicate¶

Use the “Operations > Deduplicate” button to replace all but one checked duplicate files by NTFS hardlinks (How does the deduplication work?)

_images/TreeSize-FileSearch_Duplicates_Deduplicate.png

In the configuration window you can select a log file to log the performed replacements to. You can also define how TreeSize will handle files located on different hard disks.

You can either replace files located on the same hard disk with hardlinks separately or simply select a reference drive and replace all files located on other hard disks with symbolic links.

Note

In case the permission to create symbolic links can not be granted, a Windows shortcut (.LNK file) will be created instead as fallback.

The context menu of the duplicate files list offers a feature named “Replace duplicates by hardlinks”.

This function works just like the “Deduplicate” function, but will handle all selected files instead of checked files.

  • How does the duplicate search work?
  • How does the deduplication work?
Next
How does the duplicate search work?
Previous
Search syntax
Copyright © 1995 - 2025 Joachim Marder e.K.
Made with Sphinx and @pradyunsg's Furo
On this page
  • Duplicate file search
    • Context tab
      • Search Mode
        • Duplicate Files
        • Duplicate Folders
        • Unique Files
      • Comparison method
        • File Content
        • Size, Name and Date
        • Name and Size
        • Name
        • Name without Extension
        • Size and Date
        • Size only
      • Search Filters
        • Exclude filter
        • Ignore NTFS hardlinks
    • Deduplicate