Sanitize all files in a directory
Von Carsten
Today I wanted to finally clean up several backup copies of my music. Over the years I copied the files from device to device, moving through multiple generations of computers. I put the music into a Jungledisk backup, uploaded them to Amazon S3, copied them to my Drobo or just moved them from one harddisk to another, bigger harddisk. I even switched from Linux, to MacOS X, to Windows and am on a Debian machine once again. And my next plan is to use the new service Amazon Glacier to back it up again…
All of this moving did affect the files. The filenames are all over the place, prefixes and funny encodings, even some of the file extensions are no longer what they used to be.
A couple of months ago I started to write a bash script that should help with all the confusion. It should remove all funny characters from filenames, replace all the whitespaces with underscores and finally make it all lowercase. That is how I like my files! 😉
Today, I gave the script another try:
#!/bin/bash
#
# Sanitizes all files in the given path
# and makes them lowercase too.
#
# param 1: source directory
# param 2: target directory
shopt -s extglob;
# copy as default, you have to change this if you want to move files
MV=cp -v
#MV=mv -v
source="$1"
target="$2"
find ${source} -depth -type f -print | while read path
do
filename=$(basename "$path")
directory=$(dirname "$path")
# remove original path from target to avoid deep path in target directory
directory_clean=${directory#$source}
target="$(dirname "$target")/$(basename "$target")"
# remove invalid chars from directory/filename
directory_clean="${directory_clean//+([^[:alnum:]_-\.\/])/_}"
filename_clean="${filename//+([^[:alnum:]_-\.])/_}"
mkdir -vp "${target}/${directory_clean,,}"
${MV} "$directory/$filename" "${target}/${directory_clean,,}/${filename_clean,,}"
done
(If you don’t see the code, go to github.)
You can call the script like this:
cd /path/to/your/music
./sanitize.sh . /new/path/to/your/sanitized/music/
After the script worked for several hours your files should savely sit in the new directory, leaving the old files untouched. Then you can start going through all the directories and remove duplicate files manually.
You can clone the gist and modify it to your own needs. Just don’t ask me if something went wrong. 😀