Your First PAUP Run
The purpose of this chapter is to get you familiarized with some of the features in PAUP. As you become more experienced using PAUP, you will discover that there are many alternative ways to execute the operations described below. For obvious reasons, we have chosen not to describe all the possibilities in this chapter; however, we encourage you to explore other menu and command-line options as your time permits. For this chapter you can use the following dataset:
Setting up your data file: the Nexus format
PAUP reads your data in a NEXUS file format. All NEXUS files must begin with:
NEXUS file declaration
#NEXUS
If you open up the file primate-mtDNA.nex you will notice that it is divided into blocks of text, delimited by the words “begin” and “end.” The word following “begin” defines the block-type. Notice that the block-type is always followed by a semicolon. In the primate-mtDNA.nex example, the following types of blocks are used: data
, assumptions
, and paup
. There are, however, numerous other NEXUS block-types. In fact, one of the advantages of the NEXUS format is that applications will simply skip over blocks that they do not recognize. Many of these block-types will be discussed in greater detail elsewhere in this book. For a more detailed discussion of the NEXUS format see .
Between the “begin” and “end” in a block you will find commands that PAUP will read and execute. Commands are usually followed by several options. Most options will be followed by an “=” sign to define that option. Each option -n PAUP has a default value, but can be modified by the user. Like a block-type, commands and their associated options are followed by a semicolon in a NEXUS file. At a bare minimum PAUP requires your Nexus file to have “taxa” and “characters.” These are easily defined with the dimensions command in the data block, as shown below.
DATA block:
begin data;
dimensions ntax=12 nchar=898;
format missing=? gap=- matchchar=. interleave datatype=dna;
options gapmode=missing;
matrix
Here the number of taxa and characters are defined with the dimensions command and the options “ntax” and “nchar”, respectively. In this example file, the data block also contains several other commands including the “matrix” command that contains the aligned sequences. If you scroll down the file to the end of the sequences you will see where the data block is terminated with an “end;”. Notice that both the “matrix” and “end” are followed by a semicolon.
A user can get information about the options available for any command along with their associated current settings by Typing command-name ?
in the command-line portion of display window.
Getting help:
-
Type ? in the command-line window as in Figure 2.1
-
Click
Enter
or hit the Return key

Following the data block you will find a new block named "assumptions." Here you will be able to assign specific assumptions to sets of characters or taxa. These sets allow rapid assignment of character type and weight assumptions. These user defined “Sets” in PAUP provide a way to refer to collections of objects with single names. The use of sets can greatly reduce the amount of redundant typing needed to issue commands, and can help to avoid mistakes when typing commands into the command line. Each assumption set specifies a character type (“Type Sets”), weight (“Weight Sets”) or exclusion status (“Exclusion Sets”) for each character. Any number of type, weight, or exclusion sets may be defined. You can then change the assumptions assigned to all characters in the data set simply by invoking a new assumption set.
Before you begin an analysis there is a good chance that you know something about the characters in your data matrix, which might suggest that the characters should be differentially weighted. For example, we know that substitutions at the first codon position generally occur less frequently than substitutions at third positions. The simple explanation for this is that substitutions at first position codons usually result in an amino acid substitution; whereas, third-position changes can occur without changing the amino acid translation. You will incorporate this information into the following analysis by applying a higher weight to substitutions occurring at first position codons. Codon positions have already been identified in the sample file using the charset command.
The use of character sets allows you to refer to a group of characters by a single name, for example a gene that you have sequenced. The only restriction is that the name of the character set cannot be the same as the label assigned to an individual character, for obvious reasons. After defining a character-set using a command, you can use the character-set name in any place that you would ordinarily use a character name or number. In the example file they are defined like this:
Character Sets (“CHARSETs”):
charset coding = 2-457 660-896;
charset noncoding = 1 458-659 897-898;
charset 1stpos = 2-457 660-896;
charset 2ndpos = 3-457 661-896;
charset 3rdpos = 4-457 662-.;
The command specifies and names a set of characters; this name can then be used in subsequent definitions or wherever a character-list is required. The name of a cannot be equivalent to a character name or character number.
By defining a CharSet, it is easy to include and exclude specific sets.
Excluding Character Sets:
-
Select
Data >Include/Exclude Characters...
-
Under
Included characters:
box, select the click theCharSet
pulldown -
Select the
3rdpos
CharSet and clickExclude
, then clickOK
Alternatively, this can be done by defining Exclusion sets (EXSETs). Exclusion sets allow you to specify a set of characters that are to be“excluded” from the analysis (see “Excluding Characters” above).
Exclusion Sets (“EXSETs”):
exset coding = noncoding;
exset noncoding = coding;
By default, PAUP considers all transformation costs to be equal. In this section, you will invoke a character type that will assign a higher weight to transversions than to transitions. More specifically, we will assume that transversions, changes from a purine (A or G) to pyrimidine (C or T), are two times the cost of transitions, changes from a purine to a purine and pyrimidine to a pyrimidine. One way to incorporate this assumption into the analysis is to set up a transition/transversion “step matrix.” Such a step matrix has already been defined in the sample file. To apply the transformation cost to all of the characters currently being considered, do the following:
In addition to defining specific groups of characters in your data, under the assumptions block you can also group specific taxa. The use of taxon sets allows you to refer to a group of taxa by a single name. After defining a taxa-set using a command, you can use the taxon-set name in any place that you would ordinarily use a taxon name or number. For example, you could define two taxon sets as follows:
Taxon Sets (“TAXSETs”):
taxset hominoids = Homo_sapiens Pan Gorilla Pongo Hylobates;
taxset otherSpp = 1 7-12;
and then use the taxon-set names in subsequent commands:
Taxon Sets (“TAXSETs”):
delete otherSpp; [deletes taxa other than those in “hominoids”]
outgroup otherSpp; [assigns taxa 1 and 7-12 to the outgroup]
constraints ingrp = ((hominoids)); [define a constraint]
You cannot declare a name that is the same as any taxon name in the data file. But notice that define the set with numbers (8-12) is equivalent to typing out the full taxa names (Macaca_fuscata M._mulatta M._fascicularis M._sylvanus Saimiri_sciureus Tarsius_syrichta).
Use the CharPartition to define a partition of the characters. The command is ordinarily issued from within the block. However, you may also issue it from the command line or from within a block. The following example of the command creates a character partition named gfunc which defines coding and noncoding regions of the sequences.
Character partitions:
charpartition gfunc = 1:2-457 660-896, 2:1 458-659 897-898;
The following example of the command is equivalent to the previous example except that two predefined character sets are used to define the characters included in each partition.
Character partitions:
charset coding = 2-457 660-896;
charset noncoding = 1 458-659 897-898;
charpartition gfunc = 1:coding, 2:noncoding;
Once again, depending on the number of characters in your data matrix, issuing this command can produce a lot of information to be output to the screen. There are no additional options for this command. Notice, however, that in our example file charsets are defined within the assumptions block, whereas the charpartition is defined in a paup block.
Executing your data file
Close the sample file and do the following:
Executing Your Data:
-
Double-click the PAUP application icon.
-
Select
File >Open...
and select the primate-mtDNA.nex file and click Open -
then Select
File >Execute "primate-mtDNA.nex"
If you have your Nexus file open in the text editor in PAUP, you can also execute your file by simply typing .
After executing the sample file, PAUP will display comments and some general information about the data. For this example, the source of the data set is given, followed by a section reporting the dimensions of the data matrix, the type of data, etc.
Starting a log file
It is a good idea to keep track of things that you are doing in PAUP by creating a log file. By default, PAUP will create a log file with the same name as the data file, but with a “.log” suffix on it. However, you can name it anything you want.
Logging output to a file:
-
Select File >Log Output to Disk...
-
Click on
Set...
andSave as:
practice.log -
If a file names “practice.log” already exists, you will be asked to
Append
,Cancel
, orReplace
-
if that is the case, click
Replace
-
Now in the
Log Output
dialog, clickOK
Logging can be started and stopped anytime during your PAUP session. To stop logging do the following:
Multiple commands in the command-line:
-
Type:
log stop
in the command-line window
Notice that you do not have to end the command with a semicolon (;) in the command-line. However, it is possible to enter multiple commands in the command-line. In such a case you would need to separate the individual commands with a semicolon, like:
Stop logging:
-
Type:
log stop; hs; log stop
in the command-line window
Performing a simple search
PAUP has the advantage of being able to analyze data using several different optimality criteria; parsimony, likelihood, and distance. Several chapters in this book and a plethora of published literature are devoted to comparing the performance of optimality criteria. Rather than spend time here discussing the relative merits of the available optimality criteria, we will just say that each criterion has its strengths and limitations. To begin with, you will use the default criterion, maximum parsimony, to search for optimal trees. Later in this chapter you will search under the other criteria. For starters, we will search for trees under the parsimony criterion (the default setting in PAUP)
Defining Optimality Criterion:
-
Select
Analysis >Parsimony
(Note: parsimony is the default setting and will probably already be selected).
PAUP provides two basic classes of methods for searching for optimal trees; exact and heuristic. Exact methods guarantee to find the optimal tree(s) but may require prohibitive amounts of computer time for medium to large-sized data sets. Heuristic methods do not guarantee optimality but generally require far less computer time. Even though the current data set is relatively small, you will start by conducting a heuristic search.
Once the search is started, PAUP will display general information about the options and assumptions being used during the search. If you were logging results, this information would be saved to the log file. When the search completes, PAUP will display general information about the results of the search.
Performing a Simple Search:
-
Select
Analysis >Heuristic Search...
-
Under the options menu chose
Stepwise Addition Options
-
Change
Addition sequence
fromSimple
toRandom
and clickSearch
-
Click
Close
to dismiss the search status dialog box

It is worth noticing here, that when you execute a command in PAUP, the equivalent command-line version will be printed to the display buffer.
Command printed to display buffer:
paup> HSearch
Viewing trees
According to the output on your screen, there is a single tree currently in memory. To display the tree do the following:
Show trees:
-
Select
Trees >Show Trees...
-
Click
OK
and the single most parsimonious tree is printed to the display.
The showtrees
command draws a simple picture of the branching order of the taxa. Say for example, you want to know something about the branch lengths of the tree. To get a more detailed picture of the tree do the following:
Describe trees:
-
Select
Trees >Describe Trees...
-
Under
Output
selectbranch-length table
-
Under
Plot type
deselectcladogram
and selectphylogram
-
Click
Describe
Printing trees
PAUP has the ability to print your trees or save them as PDF files to be manipulated with other graphics software.
Printing trees:
-
Select
Trees >Print/View Trees...
-
Select
Plot type >CircleTree ...
-
Click the
Show branch lengths
check box and clickPreview
-
Select
Done
and thenPrint
if you wish to print the selected tree.
There are many options in the “Print/View Trees…” dialog. Users can change the font, or font sizes, as well as the width of the branches that are being printed.
Saving results
More times than not, you will want to save trees to be looked at later. PAUP can save trees in several different formats: NEXUS, NEXUS (no translation table), Freqpars, Newick (Phylip, Mega, etc…) and Hennig86. To save the tree in NEXUS format:
Saving trees:
-
Select
Trees >Save Trees to File...
-
In the
Save Trees as:
dialog box type the file name mp.tre and clickSave
Like your data file, the newly created trees file is also a NEXUS file (notice the #NEXUS at the beginning of the file). And like other NEXUS files, your tree file stores information in blocks with “begin” and “end.” Trees are now stored in a new type of block called the “trees” block. If you look at the .tre file in a text editor, you will also notice that there is a lot of additional information contained in the NEXUS file. Most of this information is contained within square brackets ([]). By convention, PAUP ignores information contained within these brackets. This allows the user to make notes within the NEXUS file that will not cause problems when the file is executed. There are also cases in the .tre file where there is an “!” following the first bracket ([! text or comment]. When this is placed in a fill, PAUP will print to the display buffer what is contained in the brackets when the file is executed or read into the program.
Distance
As mentioned before, PAUP allows you to search for trees using several optimality criteria; parsimony, likelihood, and distance. PAUP provides a wide range of pairwise distant measures, from simple absolute differences to more complicated model-based corrected distances. Pairwise distances can be summarized in a table or used to construct UPGMA and neighbor joining trees. In addition, PAUP can use the minimum evolution and least-squares functions to evaluate trees under the distance criterion. The following section will introduce you to some of these methods.
To change the optimality criterion to distance,
Change optimality criterion to distance:
-
Select
Analysis >Distance
First you will need to choose among the distance measures that PAUP can calculate. For this example, you can chose the distance, which estimates a transition/tranversion ratio and base frequencies.
Selecting distance correction:
-
Select
Analysis >Distance Settings...
-
In the distance settings dialog box change
DNA/RNA distances
fromUncorrected ("p")
toHKY85
and clickOK
-
Select
Data >Show Distance Matrix
Next, you will construct a neighbor joining tree using the HKY85 distances.
Build a neighbor joining tree:
-
Select
Analysis >Neighbor Joining/UPGMA...
-
Click
OK
Notice here that you could have also gotten to the “Distance Options” dialog by clicking on “Distance options…” in the “Neighbor Joining/UPGMA” dialog box. Also notice that although we selected Distance as our optimality criterion, the “Neighbor Joining” method does not look for an optimal tree. This is also the case if you would have built a UPGMA tree. We can, however, search for tree that evaluate trees under the Distance criterion. Here we can search for trees using the least squares objective function.
Build a least squares tree:
-
Select
Analysis >Distance Settings...
-
Under the
Objective function
menu, selectWeighted least squares with standard (“polynomial”) weighting
-
Weighting using power=0
should be the default setting underWeighted least squares....
. If it is not, then select it now and clickOK
. -
Start the least squares search by selecting
Analysis >Heuristic Search...
-
Click
Search
in the heuristic search dialog box.
Maximum likelihood
To finish this chapter, you will search for optimal trees using the maximum likelihood (ML) criterion. Under maximum likelihood in PAUP, an explicit model of nucleotide substitution is used to evaluate trees. Selecting an appropriate model of nucleotide substitution is an important step in a likelihood analysis but is beyond the scope of this chapter. To save time, we have chosen an appropriate model; however, you are encouraged to see for a discussion of model selection under the maximum likelihood criterion. Likewise, we touch on this in other parts of this book. Here you will use the parsimony tree, that you saved earlier, to obtain an optimal set of model parameters given the data. Later you will use the same model and set of parameter estimates to search for a maximum likelihood tree.
Set the optimality criterion:
-
Select
Analysis >Likelihood
We have chosen the model of sequence evolution with gamma distributed rates. Given the parsimony topology and the data we will use PAUP to estimate the optimal transition/tranversion rate ratio, base frequencies, and among-site rate heterogeniety.
Evaluate the parsimony tree:
-
Select
Trees >Get Trees from File...
-
Click
Yes
to dismiss the dialog box warning you that there are unsaved trees. -
Select the file
mp.tre
and clickGet Trees
-
Select
Trees >Tree Scores >Likelihood...
-
In the trees scores dialog box click
Likelihoods settings...
-
In the
Likelihood Settings:
box, selectSubstitution Rates
-
Change the
Ti/tv ratio:
fromSet to:
toEstimate
-
Under the
Maximum likelihood options:
selectAcross-Site Rates
-
Under
Across Site Rates
selectGamma distribution
-
Under
Shape parameter:
change the rate fromSet to:
toEstimate
-
Click
OK
-
Click
OK
again in the Likelihood Scores dialog box
Depending on the computer you are using it may take a few seconds to several minutes for PAUP to optimize branch lengths and substitution model parameters on the tree currently in memory. When PAUP finishes it will output the negative log likelihood of the tree topology found by the parsimony search and give the estimated model parameters values.
Before starting the heuristic search, you will fix the model parameters to those estimated in the previous step. If the options are left to estimate, PAUP will estimate the parameters on each topology rearrangement made during the heuristic search. Because PAUP may make thousands of topology rearrangements during a heuristic search, leaving options set to estimate will dramatically increase the time required to complete the search. In general, a more efficient method of estimating model parameters and tree topologies under maximum likelihood is by successively estimating model parameters on novel trees generated by the tree search . More specifically, if the topology found under the likelihood criterion differs from that on which the parameters were estimated, then you reestimate parameters on the new topology and search again using the new set of parameters. For this chapter, you will complete one iteration of estimating parameters on a topology and applying the parameters to a subsequent search. In principle, you would continue until you converged on the same topology.
Set likelihood model parameters:
-
Select
Analysis >Likelihood Settings...
-
Under the
Likelihood Settings:
selectSubstitution Rates
-
Set the
Ti/tv ratio:
to the value estimated in the previous step by clicking thePrevious
button. -
Under the
Likelihood Settings:
selectAcross -site rate variation
-
Set the
Shape parameter
to the value estimated in the previous step by clicking thePrevious
button -
Click
OK
Now you are ready to search under the maximum likelihood criterion. Again, the time required to complete the search will depend on the computer you are using.
Start the tree search:
-
Select
Analysis >Heuristic Search...
-
In the
Heuristic Search
dialog box, select theStepwise Addition
options -
Under
Addition sequence
changerandom
toasis
and clickSearch
Automation: Running PAUP in batch mode
Analyses can also be conducted using a non-interactive batch method. This is especially useful when you know your analyses will require a great deal of time to complete, and you don’t have time to interact with your computer. You could make elaborate batch files that could analyze multiple datasets under a variety of conditions. In the example below, all the instructions required to complete the sample analyses described above are contained in a "paup" block. A Set
command was added at the beginning of the paup block to suppress the dialog box indicating that the heuristic search has completed and several other warnings (Mac and Windows only). To run the block in batch mode, copy the text given below to a file and save the file in the same directory as the primate-mtDNA.nex file. Now execute the file as you did the primate-mtDNA.nex file.
PAUP block:
Begin paup;
set autoclose=yes warntree=no warnreset=no;
log start file=practice.log replace;
execute primate-mtDNA.nex;
cstatus;
include coding/only;
undelete hominoids lemur_catta macaca_fuscata saimiri_sciureus/only;
weight 2:1stpos;
ctype 2_1:all;
Set criterion=parsimony;
hsearch addseq=random;
Showtree;
describetrees 1/ plot=phylogram brlens=yes;
savetrees file=mp.tre replace;
set criterion=distance;
dset distance=hky85;
showdist;
nj;
dset objective=lsfit power=2;
hsearch;
gettrees file=mp.tre;
set criterion=likelihood;
lscore nst=2 tratio=est rates=gamma shape=estimate;
set tratio=previous shape=previous;
hsearch addseq=asis;
end;