Duplicate Files

Duplicate Files

Rate This
  • Comments 11

Need a way to check if two files are the same?  Calculate a hash of the files.  Here is one way to do it:

 

## Calculates the hash of a file and returns it as a string.

function Get-MD5([System.IO.FileInfo] $file = $(throw 'Usage: Get-MD5 [System.IO.FileInfo]'))

{

  $stream = $null;

  $cryptoServiceProvider = [System.Security.Cryptography.MD5CryptoServiceProvider];

 

  $hashAlgorithm = new-object $cryptoServiceProvider

  $stream = $file.OpenRead();

  $hashByteArray = $hashAlgorithm.ComputeHash($stream);

  $stream.Close();

 

  ## We have to be sure that we close the file stream if any exceptions are thrown.

  trap

  {

    if ($stream -ne $null)

    {

      $stream.Close();

    }

   

    break;

  }

 

  return [string]$hashByteArray;

}

 

I think about the only new thing here is the trap statement.  It’ll get called if any exception is thrown, otherwise its just ignored.  Hopefully nothing will go wrong with the function but if anything does I want to be sure to close any open streams.  Anyway, keep this function around, we’ll use it along with AddNotes and group-object to write a simple script that can search directories and tell us all the files that are duplicates.  Now… an example of this function in use:

 

MSH>"foo" > foo.txt

MSH>"bar" > bar.txt

MSH>"foo" > AlternateFoo.txt

MSH>dir *.txt | foreach { get-md5 $_ }

33 69 151 28 248 32 88 177 8 34 154 58 46 59 255 53

54 122 136 147 125 209 249 229 12 105 236 19 140 5 107 169

33 69 151 28 248 32 88 177 8 34 154 58 46 59 255 53

MSH>

 

 

Note that two of the files have the same hash, as expected since they have the same content.  Of course, it is possible for two files to have the same hash and not the same content so if you are really paranoid you might want to check something else in addition to the MD5 hash.

 

- Marcel

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]

Leave a Comment
  • Please add 3 and 2 and type the answer here:
  • Post
  • I eventually gave up trying to do larger-scale duplicate file comparisons via this method. We found a third-party application called Duplicate File Detective (http://www.duplicate-file-detective.com) that does this job amazingly well.

  • PingBack from http://xaegr.wordpress.com/2008/08/07/get-md5-string/

  • Сегодня получил в почту письмо с вопросом про получение MD5, цитирую кусочек: Хочу задать вопрос по поводу вычисления md5 строки. Вся информация что есть в доступе касается вычисления md5 файлов, но обычной функции, аргумент которой - любой текст нет.

  • We can use Duplicate Finder tool for this job? this is very good tool to find and remove duplicate files.

  • Try Directory Report

    It can find duplicates based on CRC-32 and/or comparing byte-by-byte

    http://www.file-utilities.com

  • Sure there are other tools to do this... but this is more fun :)

    I grouped on file size first to make it perform better (I'm sure it can be optimized further).

    $group=0;`

    gci "<path to check>" -Filter "*.mp3" | `

    group {$_.Length} | ? { $_.Count -gt 1} | % { $_.Group | sort Name} | `

    select FullName, @{Name="MD5"; Expression={get-md5 $_}} | group MD5 | `

    ? { $_.Count -gt 1 } | % {$group++;$_.Group | select {$group}, {$_.FullName} } | `

    Format-Table -AutoSize

  • Well.. When you find your drive is full of the temporary files in my documents or any default download folder, it may be due to dozens of duplicate files, which you download time by time and leave it there after viewing. This may cause a lot of drive space to cover. To manage this we have introduced an automatic method to help you. Which is using duplicatefilesdeleter.com                                                                                                        saymahayen

  • Well.. When you find your drive is full of the temporary files in my documents or any default download folder, it may be due to dozens of duplicate files, which you download time by time and leave it there after viewing. This may cause a lot of drive space to cover. To manage this we have introduced an automatic method to help you. Which is using duplicatefilesdeleter.com                                                                                                        saymahayen

  • Good Job, duplicate file deleter is very useful

  • Guys... did you try duplicate file deleter?

    it is really amazing , i guess it will help u

  • I am using "DuplicateFilesDeleter" from long time to delete duplicate files from my windows PC.

    Personally i am recommending you to use this tool.

Page 1 of 1 (11 items)