Post a reply

Options
Add an Attachment

If you do not want to add an Attachment to your Post, please leave the Fields blank.

(maximum 10 MB; please compress large files; only common media, archive, text and programming file formats are allowed)

Options

Topic review

martin

Re: duplicate file finder on remote server (script or new feature)

Maybe I'll write the script one day.
smitty123

Re: duplicate file finder on remote server (script or new feature)

I'm sure it's doable, but I'm not a programmer.
martin

Re: duplicate file finder on remote server (script or new feature)

You can script this using WinSCP .NET assembly. Some SFTP/FTP servers do even support checksum calculation for remote files. So you would not have to download the files.
smitty123

Duplicate file finder on remote server (script or new feature)

I've been cleaning out my NAS these past few days and it came to me that a duplicate search feature would be a great asset for me. I suggested the idea to the NAS manufacturer but they're being mute on the subject.

I searched for a few Windows programs which would work, but then I realized that at 30MB/s it's going to take days and days to read, transfer and compare 5-6 terabytes (2x3 TB HDD).

I wonder if there's a way to find duplicate files directly on the server, a script maybe, rather than use a Windows/Linux based program which would force each file to be read from the server as it scans for dupes.

I've thought about listing all the files and then sorting by size. But that isn't an easy solution. Plus I don't know much about Linux command line to do all that.

A quick way would be to make a list of all files on the NAS shares, sort by size as they are added to the list, and if a size already exists for whatever reason, then copy the 2 file names to a 2nd list that contains only duplicates by size, and use that list to display to the user. Then the user can have the program do a byte comparison locally on the server of the files with same size to confirm they are dupes. I read up on the Linux cmp command, it can do that for us remotely on the server, if the server has such a command in its code of course.

Exemple:
cmp  /share/HDA_DATA/hdd1/123.mp4 /share/HDB_DATA/hdd2/123.mp4

Then display true duplicates in an interface for management, delete or view/open the file or its folder for us to handle the file manually.

If there is such a way it would be very helpful for disk space management, it would really take the sting out of scanning so many terabytes.