README for LDhat LDhat is a package for analysing patterns of linkage disequilibrium and estimating the population recombination rate using the approximate-likelihood coalescent method of Hudson (2001). For each pair of segregating sites, the program pairwise estimates the coalescent likelihood of observing the data under a range of population recombination rates using the importance sampling method of Fearnhead and Donnelly (2001). The likelihoods are combined across pairs to provide a point estimate of the population recombination rate 4Ner. Further options enable the user to test for the presence of recombination using the test of McVean et al. (2001), and the relationship between summary statistics of linkage disequilibrium and physical distance. Output from the program can be used to investigate the goodness-of-fit of the model, plot the approximate likelihood surface of the estimate of 4Ner, and carry out additional analyses. To install the package on a Unix or Linux machine, first uncompress the file LDhat.tar.gz with the command %gunzip LDhat.tar.gz Then expand the archive with the command %tar -xvf LDhat.tar This will create a directory called LDhat with the C source code and makefile. To compile the programs, type %make The programs concert and pairwise should be generated. Files ending in .o may be deleted after compiling is complete. For Dos/Windows, the executable files can be run from a DOS command prompt window. Change directory with the command cd Directory_name until you are in the same directory as the programs and the input files, and run the programs as described below. Description of programs Convert: Takes aligned sequences in FASTA format and outputs the files sites, which contains the aligned, segregating sites, and locs, which contains the locations of the segregating sites. The program can be run with the command ./convert seqfile If the name of the sequence file is not entered on the command line, the user will be prompted for these. Pairwise: estimates the population recombination rate under a finite sites model with two alleles and symmetric, reversible mutation. The user is asked to input a file containing aligned, segregating sites, and another containing the location of segregating sites. There is also the option of including a previously generated likelihood file for analysis. ./pairwise sites locs likelihood_file Again, if these are not entered on the command line, the user will be prompted for them. The output files are described on the program web page. During the running of the program the user will be prompted for several options, such as the value of theta for which to estimate the likelihood, and the range of recombination rates to estimate over. It is possible to use redirected input to enter these options from a file, rather than the keyboard. If the file containing the appropriate commands is called infile, the program can be run by typing ./pairwise < infile This may be helpful for running automated replications of the program. Description of input format Sequences must be FASTA format Number of sequences Length of sequences >Seq1 TGACTGGTCAC (Max of 80 nucleotides per line) >Seq2 TGCATTGTGCT .... The location file is in the format Number of sites Total length of sequence Linear/Circular Loc1 Loc2 ... The LPL example is included in the archived package, or can be downloaded from the web site. Programs have been written by G. McVean with the use of code by P Fearnhead. For more information, contact Gil McVean (mcvean@stats.ox.ac.uk) or see the web site http://www.stats.ox.ac.uk/~mcvean/LDhat/LDhat.html. LDhat is distributed free of charge and for academic use only.