Differences
This shows you the differences between the selected revisions of the page.
library_example_find_duplicate_files 2016-01-26 | library_example_find_duplicate_files 2022-06-16 (current) | ||
Line 5: | Line 5: | ||
You can use the script to efficiently find duplicate files on a remote SFTP/FTP server. The script first iterates remote directory tree and looks for files with the same size. When it finds any, it by default downloads the files and compares them locally. | You can use the script to efficiently find duplicate files on a remote SFTP/FTP server. The script first iterates remote directory tree and looks for files with the same size. When it finds any, it by default downloads the files and compares them locally. | ||
- | If you known that the server supports [[protocols|a protocol extension for calculating checksums]], you can improve the script efficiency by adding ''-remoteChecksumAlg'' switch, to make the script ask the server for the checksum, sparing the file download. | + | You can install this script as a [[extension|WinSCP extension]] by using this page URL in the //[[ui_pref_commands#extensions|Add Extension]]// command. If you known that the server supports a [[protocols|protocol extension for calculating checksums]], you can improve the extension efficiency by [[#options|configuring it]] to ask the server for the checksum, sparing the file download. |
+ | |||
+ | ~~AD~~ | ||
+ | |||
+ | To run the script manually use: | ||
<code batch> | <code batch> | ||
- | powershell.exe -File C:\path\find_duplicates.ps1 -remotePath "/path" -remoteChecksumAlg sha-1 | + | powershell.exe -File C:\path\FindDuplicates.ps1 -sessionUrl "sftp://user:password;fingerprint=ssh-rsa-xxxxxxxxxxx...@example.com/" -remotePath "/path" -remoteChecksumAlg sha-1 |
</code> | </code> | ||
- | You can use the script [[guide_custom_commands_automation|from WinSCP GUI as a local custom command]]. | + | <code powershell - FindDuplicates.ps1> |
- | + | # @name ········Find &Duplicates... | |
- | <code powershell> | + | # @command powershell.exe -ExecutionPolicy Bypass -File "%EXTENSION_PATH%" ^ |
+ | # -sessionUrl "!E" -remotePath "!/" -pause ^ | ||
+ | # -remoteChecksumAlg "%RemoteChecksumAlg%" -sessionLogPath "%SessionLogPath%" | ||
+ | # @description Searches for duplicate files on the server, starting from the current directory | ||
+ | # @flag RemoteFiles | ||
+ | # @version ·····8 | ||
+ | # @homepage ~~SELF~~ | ||
+ | # @require WinSCP 5.16 | ||
+ | # @option RemoteChecksumAlg -config -run combobox "&Checksum:" "local" ^ | ||
+ | # "local=Local sha-1" "sha1=Remote sha-1" "sha256=Remote sha-256" ^ | ||
+ | # "md5=Remote md5" | ||
+ | # @option SessionLogPath -config sessionlogfile | ||
+ | # @optionspage ~~SELF~~#options | ||
+ | · | ||
param ( | param ( | ||
- | # Use Generate URL function to obtain a value for -sessionUrl parameter. | + | # Use Generate Session URL function to obtain a value for -sessionUrl parameter. |
- | $sessionUrl = "sftp://user:mypassword;fingerprint=ssh-rsa-xx-xx-xx@example.com/", | + | $sessionUrl = "sftp://user:mypassword;fingerprint=ssh-rsa-xxxxxxxxxxx...@example.com/", |
- | [Parameter(Mandatory)] | + | [Parameter(Mandatory = $True)] |
$remotePath, | $remotePath, | ||
- | $remoteChecksumAlg = $Null | + | $remoteChecksumAlg = $Null, |
+ | $sessionLogPath = $Null, | ||
+ | [Switch] | ||
+ | $pause | ||
) | ) | ||
+ | · | ||
function FileChecksum ($remotePath) | function FileChecksum ($remotePath) | ||
{ | { | ||
if (!($checksums.ContainsKey($remotePath))) | if (!($checksums.ContainsKey($remotePath))) | ||
{ | { | ||
- | if ($remoteChecksumAlg -eq $Null) | + | if (!$remoteChecksumAlg -or ($remoteChecksumAlg -eq "local")) |
{ | { | ||
- | Write-Host ("Downloading file {0}..." -f $remotePath) | + | Write-Host "Downloading file $remotePath..." |
# Download file | # Download file | ||
$localPath = [System.IO.Path]::GetTempFileName() | $localPath = [System.IO.Path]::GetTempFileName() | ||
$transferResult = $session.GetFiles($remotePath, $localPath) | $transferResult = $session.GetFiles($remotePath, $localPath) | ||
+ | · | ||
if ($transferResult.IsSuccess) | if ($transferResult.IsSuccess) | ||
{ | { | ||
$stream = [System.IO.File]::OpenRead($localPath) | $stream = [System.IO.File]::OpenRead($localPath) | ||
- | $checksum = [BitConverter]::ToString($sha1.ComputeHash($stream)) | + | $checksum = [System.BitConverter]::ToString($sha1.ComputeHash($stream)) |
$stream.Dispose() | $stream.Dispose() | ||
- | Write-Host ("Downloaded file {0} checksum is {1}" -f $remotePath, $checksum) | + | Write-Host "Downloaded file $remotePath checksum is $checksum" |
+ | |||
Remove-Item $localPath | Remove-Item $localPath | ||
} | } | ||
else | else | ||
{ | { | ||
- | Write-Host ("Error downloading file {0}: {1}" -f $remotePath, $transferResult.Failures[0]) | + | Write-Host ( |
+ | ····················"Error downloading file ${remotePath}: $($transferResult.Failures[0])") | ||
$checksum = $False | $checksum = $False | ||
} | } | ||
Line 51: | Line 72: | ||
else | else | ||
{ | { | ||
- | Write-Host ("Request checksum for file {0}..." -f $remotePath) | + | Write-Host "Request checksum for file $remotePath..." |
- | $checksum = [BitConverter]::ToString($session.CalculateFileChecksum($remoteChecksumAlg, $remotePath)) | + | ···········$buf = $session.CalculateFileChecksum($remoteChecksumAlg, $remotePath) |
- | Write-Host ("File {0} checksum is {1}" -f $remotePath, $checksum) | + | $checksum = [System.BitConverter]::ToString($buf) |
+ | Write-Host "File $remotePath checksum is $checksum" | ||
} | } | ||
+ | · | ||
$checksums[$remotePath] = $checksum | $checksums[$remotePath] = $checksum | ||
} | } | ||
+ | · | ||
return $checksums[$remotePath] | return $checksums[$remotePath] | ||
- | } | ||
- | |||
- | function FindDuplicatesInDirectory ($remotePath) | ||
- | { | ||
- | Write-Host ("Finding duplicates in directory {0} ..." -f $remotePath) | ||
- | |||
- | try | ||
- | { | ||
- | $directoryInfo = $session.ListDirectory($remotePath) | ||
- | |||
- | foreach ($fileInfo in $directoryInfo.Files) | ||
- | { | ||
- | $remoteFilePath = ($remotePath + "/" + $fileInfo.Name) | ||
- | |||
- | if ($fileInfo.IsDirectory) | ||
- | { | ||
- | # Skip references to current and parent directories | ||
- | if (($fileInfo.Name -ne ".") -and | ||
- | ($fileInfo.Name -ne "..")) | ||
- | { | ||
- | # Recurse into subdirectories | ||
- | FindDuplicatesInDirectory $remoteFilePath | ||
- | } | ||
- | } | ||
- | else | ||
- | { | ||
- | Write-Host ("Found file {0} with size {1}" -f $remoteFilePath, $fileInfo.Length) | ||
- | |||
- | if ($sizes.ContainsKey($fileInfo.Length)) | ||
- | { | ||
- | $checksum = FileChecksum($remoteFilePath) | ||
- | |||
- | foreach ($otherFilePath in $sizes[$fileInfo.Length]) | ||
- | { | ||
- | $otherChecksum = FileChecksum($otherFilePath) | ||
- | |||
- | if ($checksum -eq $otherChecksum) | ||
- | { | ||
- | Write-Host ("Checksums of files {0} and {1} are identical" -f $remoteFilePath, $otherFilePath) | ||
- | $duplicates[$remoteFilePath] = $otherFilePath | ||
- | } | ||
- | } | ||
- | } | ||
- | else | ||
- | { | ||
- | $sizes[$fileInfo.Length] = @() | ||
- | } | ||
- | |||
- | $sizes[$fileInfo.Length] += $remoteFilePath | ||
- | } | ||
- | } | ||
- | } | ||
- | catch [Exception] | ||
- | { | ||
- | Write-Host ("Error processing directory {0}: {1}" -f $remotePath, $_.Exception.Message) | ||
- | } | ||
} | } | ||
Line 121: | Line 87: | ||
{ | { | ||
# Load WinSCP .NET assembly | # Load WinSCP .NET assembly | ||
- | Add-Type -Path "WinSCPnet.dll" | + | $assemblyPath = if ($env:WINSCP_PATH) { $env:WINSCP_PATH } else { $PSScriptRoot } |
+ | ····Add-Type -Path (Join-Path $assemblyPath "WinSCPnet.dll") | ||
# Setup session options from URL | # Setup session options from URL | ||
Line 128: | Line 95: | ||
$session = New-Object WinSCP.Session | $session = New-Object WinSCP.Session | ||
- | $session.SessionLogPath = "session.log" | ||
try | try | ||
{ | { | ||
- | # Connect | + | $session.SessionLogPath = $sessionLogPath |
+ | |||
+ | Write-Host "Connecting..." | ||
$session.Open($sessionOptions) | $session.Open($sessionOptions) | ||
+ | # Handle errors when enumerating the files | ||
+ | $session.add_Failed( { | ||
+ | Write-Host "Error: $($_.Error.Message)" | ||
+ | } ) | ||
+ | |||
$sizes = @{} | $sizes = @{} | ||
$checksums = @{} | $checksums = @{} | ||
Line 140: | Line 113: | ||
$sha1 = [System.Security.Cryptography.SHA1]::Create() | $sha1 = [System.Security.Cryptography.SHA1]::Create() | ||
+ | |||
+ | $files = | ||
+ | $session.EnumerateRemoteFiles( | ||
+ | $remotePath, "*", [WinSCP.EnumerationOptions]::AllDirectories) | ||
- | # Start recursion | + | foreach ($fileInfo in $files) |
- | FindDuplicatesInDirectory $remotePath | + | { |
+ | Write-Host "Found file $($fileInfo.FullName) with size $($fileInfo.Length)" | ||
+ | |||
+ | if ($sizes.ContainsKey($fileInfo.Length)) | ||
+ | { | ||
+ | $checksum = FileChecksum($fileInfo.FullName) | ||
+ | |||
+ | foreach ($otherFilePath in $sizes[$fileInfo.Length]) | ||
+ | { | ||
+ | $otherChecksum = FileChecksum($otherFilePath) | ||
+ | |||
+ | if ($checksum -eq $otherChecksum) | ||
+ | { | ||
+ | Write-Host ( | ||
+ | "Checksums of files $($fileInfo.FullName) and " + | ||
+ | "$otherFilePath are identical") | ||
+ | $duplicates[$fileInfo.FullName] = $otherFilePath | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | $sizes[$fileInfo.Length] = @() | ||
+ | } | ||
+ | |||
+ | $sizes[$fileInfo.Length] += $fileInfo.FullName | ||
+ | } | ||
} | } | ||
finally | finally | ||
Line 149: | Line 152: | ||
$session.Dispose() | $session.Dispose() | ||
} | } | ||
+ | · | ||
# Print results | # Print results | ||
Write-Host | Write-Host | ||
+ | · | ||
if ($duplicates.Count -gt 0) | if ($duplicates.Count -gt 0) | ||
{ | { | ||
Write-Host "Duplicates found:" | Write-Host "Duplicates found:" | ||
+ | · | ||
foreach ($path1 in $duplicates.Keys) | foreach ($path1 in $duplicates.Keys) | ||
{ | { | ||
- | Write-Host ("{0} <=> {1}" -f $path1, $duplicates[$path1]) | + | Write-Host "$path1 <=> $($duplicates[$path1])" |
} | } | ||
} | } | ||
Line 166: | Line 169: | ||
Write-Host "No duplicates found." | Write-Host "No duplicates found." | ||
} | } | ||
- | + | · | |
- | exit 0 | + | $result = 0 |
} | } | ||
- | catch [Exception] | + | catch |
{ | { | ||
- | Write-Host $_.Exception.Message | + | Write-Host "Error: $($_.Exception.Message)" |
- | exit 1 | + | $result = 1 |
} | } | ||
+ | |||
+ | # Pause if -pause switch was used | ||
+ | if ($pause) | ||
+ | { | ||
+ | Write-Host "Press any key to exit..." | ||
+ | [System.Console]::ReadKey() | Out-Null | ||
+ | } | ||
+ | |||
+ | exit $result | ||
</code> | </code> | ||
+ | ===== [[options]] Options ===== | ||
+ | |||
+ | &screenshotpict(extension_find_duplicate_files) | ||
+ | |||
+ | The //Checksum// selection allows you to choose, what checksum algorithm to use and if the checksum is to be calculated locally or remotely. Select the //Local sha-1// to calculate SHA-1 checksum locally. This is an universal option that will work with any server, but WinSCP will need to download all candidate files locally. If you know that the server supports [[protocols|a protocol extension for calculating checksums]], you can improve the extension efficiency by selecting a remote calculation. The list contains some common algorithms that some servers support. However you can type in name of any other algorithm supported by the server. | ||
+ | |||
+ | In the //Session log file//, you can specify a path to a [[logging|session log file]]. The option is available on the [[ui_pref_commands|Preferences dialog]] only. | ||
+ | |||
+ | In the //Keyboard shortcut//, you can specify a [[custom_key_shortcuts|keyboard shortcut]] for the extension. The option is available on the [[ui_pref_commands|Preferences dialog]] only. |