SBN What to do with Checksums

What is it?

Checksums are a way of ensuring that data has not been corrupted, either accidentally or maliciously. There are dozens of different types of checksums out there - MD5 is one of the most widespread and generally supported at the moment, and thus the one we're using on our website for our download files.

The checksum is calculated by applying an algorithm to every byte in the target file. The result is a string of hexadecimal digits which is (to a very high probability) unique to the file. By calculating the checksum on a copy of a file you've downloaded and comparing it to the checksum calculated by us prior to the transfer, you can check to see if a file has been corrupted on download before, say, writing it to DVD or trying to unpack it.

How Do I Find It?

Which software you use to calculate the MD5 checksum doesn't matter. Here are some commonly available routines we know about:

MD5
This utility is available for Unix/Linux and Windows/MS-DOS from here (among other places):

http://www.fourmilab.ch/md5/

This site offers source code for Unix/Linux users and a 32-bit executable for Windows users. Our office Macs appear to have this utility available from the command line as well. To generate the MD5 checksum for a file:

% md5 file_name

md5sum
Unix/Linux users may find this utility already available on their systems. A quick search of the net will turn up programs of this name that will run on a variety of platforms. To generate an MD5 checksum:

% md5sum file_name

OpenSSL
The OpenSSL suite includes utilities for computing a number of different checksums, including MD5 sums. Unix/Linux and Solaris users may find a version of this already installed on their systems. The command to calculate an MD5 checksum with OpenSSL looks like this:

% openssl md5 file_name

What Do I Do With It?

We will collect the checksums for all the large files and ISO images associated with a data set into a single file, for convenience. If you download one or more of these large/ISO files, you should run your favorite MD5 routine on the file you receive and compare the resulting string to the checksum listed in the dataset checksum list on our website. If the strings are the same, you can unpack or burn the data knowing that at least the source file is clean an uncorrupted.

If the MD5 string is different, try downloading the file again. If you still can't match the checksum in our file, please let us know as soon as possible - it may be that our file has somehow been corrupted.


Last Update: 13 April 2007, A.C.Raugh