| What to do with Checksums |
Checksums are a way of ensuring that data has not been corrupted, either accidentally or maliciously. There are dozens of different types of checksums out there - MD5 is one of the most widespread and generally supported at the moment, and thus the one we're using on our website for our download files.
The checksum is calculated by applying an algorithm to every byte in the target file. The result is a string of hexadecimal digits which is (to a very high probability) unique to the file. By calculating the checksum on a copy of a file you've downloaded and comparing it to the checksum calculated by us prior to the transfer, you can check to see if a file has been corrupted on download before, say, writing it to DVD or trying to unpack it.
Which software you use to calculate the MD5 checksum doesn't matter. Here are some commonly available routines we know about:
This site offers source code for Unix/Linux users and a 32-bit executable for Windows users. Our office Macs appear to have this utility available from the command line as well. To generate the MD5 checksum for a file:
% md5 file_name
% md5sum file_name
% openssl md5 file_name
We will collect the checksums for all the large files and ISO images associated with a data set into a single file, for convenience. If you download one or more of these large/ISO files, you should run your favorite MD5 routine on the file you receive and compare the resulting string to the checksum listed in the dataset checksum list on our website. If the strings are the same, you can unpack or burn the data knowing that at least the source file is clean an uncorrupted.
If the MD5 string is different, try downloading the file again. If you still can't match the checksum in our file, please let us know as soon as possible - it may be that our file has somehow been corrupted.
Last Update: 13 April 2007, A.C.Raugh