How does the deduplication work?

<< Click to Display Table of Contents >>

Navigation:  Using the TreeSize File Search > Duplicate search >

How does the deduplication work?

Deduplicate:

The process of removing redundant files and replacing them by NTFS hardlinks with TreeSize is called "deduplication". This will reduce the disk space that is blocked by your duplicate files.

Instead of having each of the files take up individual space on your hard disk, TreeSize removes all duplicate files and keeps only one of them. The files that were removed will be replaced by hardlinks, which will then point to the remaining data (See: NTFS hardlinks). The data is now shared shared by all the hardlinks for this file, as shown in the image below.

Deduplication

These hardlinks can be used like any normal file. You will not notice any difference, except that the data is now shared between the other links. In fact, they are not different to any normal file, except that they do not occupy their own space.

 

Which of the duplicate files will be replaced, and which files will be kept as "master"?

If you checkmark all files of a duplicates group, TreeSize will pick the file with the newest "Last modified" date and use it as "master" for this group. All other files will be removed and replaced by hardlinks, which point towards the master file. If you want to manually select a master file, you can leave one of the files in a duplicates group unchecked. This file will then not be replaced, but used as master instead.

Please note:

Unfortunately, Windows Explorer does not show the size difference for a deduplicated file, or the folder that it is located in. You can find further information about this topic and how to see the size difference in our knowledge base.

You cannot use hardlinks to replace files located on different hard drives.

To use deduplication with hardlinks you need these NTFS permissions in all affected folders: Read permissions, write permissions, create files, delete files.

TreeSize does not offer the functionality to "undo" a deduplication.

All hardlinks pointing to the same file share the same "Security Description" (access permissions). Deduplication will apply a unified set of permissions to the one physical remaining file.