I recently started working on hashl (no project page yet), inspired by (hashcp)[https://github.com/craig/hashcp]. It creates a list of all files below a directory and also stores a hash of their beginning (by default the first 4 MiB, but that's configurable. For music, I usually use 512KiB).
This way, when you find an FTP server or whatever, you don't have to mirror everything and later check if you already have it, but you can tell hashl to scan all files on the FTP server, download only the first few kilo-/megabytes and use those to check if you possibly already have the file on your disk. This of course doesn't prevent having one file in several versions, but against pure duplicates (which are also common) it works well.
So, you don't have to worry about duplicates, and as a side effect, leeching is faster because you do not even transfer any duplicate files.
Or, my usecase: I have an external hard disk for all my videos, and a directory on my netbook containing just a few of them. New videos also end up on the netbook. The hard disk is mounted on /media/argon, the netbook dir is ~/lib/video. hashl stores its database in .hashl.db, /media/argon/.hashl.db is a symlink to ~/lib/video/.argon. ~/lib/video/.hashl.db also points there.
Now, I can use these neat commands:
# whenever I mount the external hard disk, I update the list
descent /media/argon > hashl update
# same, but more verbose
descent /media/argon > hashl -d ~/lib/video/.argon update
# Now, to find out which files in my netbook directory are new (not yet copied)
descent ~/lib/video > hashl find-new
# same
descent ~/lib/video > hashl -d .argon find-new
# As an advantage, I can also use this on remote servers. Let's say I have an
# FTP mounted on /tmp/ftp/foo and the external disk is not connected. I can
# still only leech the videos I do not yet have.
descent /tmp/ftp/foo > hashl -d ~/lib/video/.argon copy ~/lib/video
# And of course, to sync netbook <-> hard disk
descent ~/lib/video > hashl copy /media/argon/filme/incoming
descent ~/lib/video > cd /media/argon
descent /media/argon > hashl update
For me, this tool is a great help, and I'll maybe add some more fun features too (merging databases? Marking a file from a non-connected external storage for retrieval? Moar automation for things I cannot think of yet?). So, stay tuned for the first release, whenever it will happen :p