##### Last updated 22 February 2007

- Why are all the answers given in terms of commands and not menu choices?
- If I don’t find it here, does that mean that it doesn’t exist?
- Can I submit questions that I think should be part of this FAQ?
- I just updated PAUP* using the updater on your web site, yet when I try to run PAUP I still get the message that PAUP* has expired.?
- Is PAUP* Year 2000 Compliant?
- What is a batch file?
- I’m using a beta version of PAUP* 4.0. How should I cite the program?
- Is there a version of PAUP* that will run a search in parallel on a multiple processor machine or a cluster of machines?
- Could you recommend some text books that will help me to learn more about the analyses that can be done in paup?
- What are the maximum dimensions (i.e., characters x sequences) of a data matrix that PAUP* will read?
- What is the maximum number of character states that can be assigned to a character in PAUP*?
- Why doesn’t PAUP* allow me to set the criterion to likelihood after I execute my data set?
- How do I tell PAUP* I want to use the likelihood criterion?
- How do I tell PAUP* I want to use the parsimony criterion?
- How do I tell PAUP* I want to use the minimum evolution criterion?
- How do I tell PAUP* I want to use the least-squares criterion?
- How do I tell PAUP* I want to use unweighted least-squares criterion?
- Which non-NEXUS file formats will PAUP* import?
- Where can a find examples of non-NEXUS file formats that PAUP* will import?
- How do I import non-NEXUS formatted files into PAUP*?
- How do I tell PAUP* to ignore certain taxa in further analyses?
- How do I tell PAUP* to use taxa that I previously told it to ignore?
- How do I tell PAUP* to ignore certain characters (sites) in further analyses?
- How do I tell PAUP* to use characters (sites) that I previously told it to ignore?
- How do I exclude all the constant characters?
- How do I exclude all constant as well as autapomorphic characters?
- How do I combine different data set into a single NEXUS file?
- How do I code indels so that they are not treated as missing data?
- What are data partitions and why are they useful?
- How do I define and name a data partition?
- How do I do a partition homogeneity test?
- What are topological constraints?
- How do I define and name a topological constraint?
- How do I load a topological constraint in the form of a tree file?
- How do I apply a previously-defined topological constraint to a search?
- How do I get a single majority-rule bootstrap consensus tree from the results of multiple bootstrap runs performed at different times or on different machines?
- How do I tell PAUP* to save the trees currently in memory to a file?
- How do I tell PAUP* to read in trees previously saved in a file?
- Why can’t I get PAUP* to save branch length on the bootstrap consensus tree?
- How can I limit the number of rearrangements PAUP* evaluates during a heuristic search?
- Why are fractions listed in the bootstrap bipartition table when 100 bootstrap replicates are performed?
- How do I ask PAUP* to examine every possible tree topology?
- How do I evaluate 500 random-addition replicates but prevent PAUP* from branch swapping on each one?
- How do I set a maxtree limit for each random addition sequence replicate?
- I have performed an heuristic likelihood search and specified 100 replicates within the hsearch command. When I examine the progress reports, it looks like PAUP* is finding many different tree islands, however the summary at the end says that only one island was found and that island was hit 100 times. What is going on here?
- Do you have equations for estimating the relative (or actual) time required for heuristic searches for sequences of different length and for different numbers of sequences?
- Is there a version of PAUP* that will work on my new Intel-based Mac?
- Is there a version of PAUP* that will work natively under Mac OS X?
- I just purchased a new Mac and Classic support is not installed on the system. How do I run PAUP* without classic support?
- I get an error when I try to print from the classic version of PAUP*. How do I print from the classic version of PAUP*?
- How do I increase the amount of memory available to PAUP*?
- Can I download a Mac updater to a PC and transfer the updater to a Mac that is not online?
- How do I tell PAUP to automatically close the heuristic search status window at the end of the search?
- How do I keep information from scrolling off the screen before I have read it?
- How do I recall a PAUP* command?
- How do I print trees using the Windows Interface?
- How does PAUP* deal with missing characters under the parsimony criterion?
- What options are available in PAUP* for dealing with multi-state taxa?
- How do I define multistate characters as ordered in PAUP?
- If a patristic distance is the sum of branch lengths on a path between a pair of taxa, why do the summed branch lengths between a pair of taxa not add up to the patristic distance reported under the “describetrees” command?
- I did a search under the parsimony criterion and got two trees that look just alike. Why does PAUP consider them to be different?
- How do I perform a Kishino-Hasegawa test to see if the support for the first and second trees stored in memory is significantly different?
- How do I perform a partition homogeneity (congruence) test?
- How do I downweight third position transitions only in a parsimony analysis?
- How do I weight specific character positions in my alignment?
- Do stepmatrices for character state transformations have to be symmetric?
- Why does PAUP* tell me that my stepmatrix violates the triangle inequality?
- Why does PAUP* warn me that the stepmatrix supplied in Xu and Miranker (2004, “A metric model of amino acid substitution”, Bioninformatics 20:1214-1221) is “internally inconsistent”?
- What do the indices under the “pscores” command mean?
- How does PAUP* deal with missing characters under the likelihood criterion?
- How do I tell PAUP* I want to use the JC69 model (Jukes & Cantor, 1969)?
- How do I tell PAUP* I want to use the K2P model (Kimura, 1980)?
- How do I tell PAUP* I want to use the F81 model (Felsenstein, 1981)?
- How do I tell PAUP* I want to use the F84 model (i.e., the model used in DNAML)?
- How do I tell PAUP* I want to use the HKY model (Hasegawa, Kishino, & Yano, 1985)?
- How do I tell PAUP* I want to use the GTR model (i.e., the general time reversible model)?
- How do I obtain likelihoods for all trees in memory?
- How do I obtain likelihoods corresponding to each individual nucleotide site in my data using the first tree in memory?
- How do I force PAUP* to use the branch lengths I specify when computing site likelihoods?
- How do I perform a Kishino-Hasegawa test to see if the support for the first and second trees stored in memory is significantly different?
- What is the difference between the transition/transversion
*ratio*and the transition/transversion*rate ratio*? - How do I tell PAUP* to estimate the transition/transversion ratio when using the HKY substitution model?
- How do I take account of rate heterogeneity across sites using a discrete gamma distribution, four rate categories, and a shape value of 0.2?
- How do I estimate the shape parameter when I am using a four-category discrete gamma distribution to account for heterogeneity in rates across sites?
- How do I tell PAUP* to estimate the proportion of invariant sites?
- How do I tell PAUP* to assume there are no invariant sites?
- How do I tell PAUP* to estimate the proportion of invariant sites
*and*and estimate the shape parameter of a discrete, four-category gamma distribution applied to the sites that are not invariant? - I think most of the rate heterogeneity in my sequences are the result of codon structure. How can I tell PAUP* to assume a different rate for each codon position (i.e., estimate site-specific rates)?
- How do I tell paup to use site-specific rates that I have already estimated?
- When I estimate the shape parameter of the gamma-distributed rates model and the proportion of invariable sites simultaneously, PAUP tells me that pinvar is zero even though the empirical number of invariable sites is about 30 percent. Why?
- How do I get the relative probabilities for each ancestral base assignment?
- How do a import a pairwise distance matrix from another program into PAUP*?
- How does PAUP* distribute missing or ambiguous changes proportionally to unambiguous changes?
- We need to do a likelihood search on a UNIX machine with a general time reversible model (I+Gamma), i.e. some sites assumed to be invariable with gamma distributed rates at variable sites, with a heuristic search with 10 repetitions random addition taxa and TBR branch swapping ?
- I have a sequence data set for which I would like to infer the phylogeny. What is a sequence of analyses that I can perform that will cover most potential pitfalls I am likely to encounter?

### Why are all the answers given in terms of commands and not menu choices?

Primarily this is to maintain consistency. All versions of PAUP* have a command line

interface, whereas only a few versions have a menu system, thus if answers were given

in terms of menu choices, users of the UNIX and DOS versions would be out of luck.

Also, many users prefer to put all of the commands for a particular analysis in a

PAUP block directly in the data file itself. This maintains a complete record of

how the analysis was carried out, which is useful later for purposes of writing

the “Methods” section of a paper. The commands presented here can all be used within

PAUP blocks as well as on the command line itself, thus facilitating the creation

of PAUP blocks.

(Top)

### If I don’t find it here, does that mean that it doesn’t exist?

This FAQ is written as the need arises, and thus it will continue to grow in completeness

each week. Thus, this FAQ is not intended to be a replacement for the PAUP* manual, but

we hope it is a useful surrogate until the program and manual are officially published.

The FAQ’s authors frequently receive questions (usually by email) about using

PAUP*, and this provides a convenient mechanism for responding to common questions that we

receive over and over again. Please feel free to submit candidate

questions for inclusion in the FAQ.

(Top)

### Can I submit questions that I think should be part of this FAQ?

Please do. We welcome submission of candidate questions for the PAUP* FAQ, but be

aware that the decision to include any particular question resides with the authors

of the FAQ. The

questions most likely to make it into the FAQ are those that we feel would benefit a

large proportion of PAUP* users. Please submit candidate questions

Answers in the form of

a series of PAUP* commands are of course very much appreciated. Please refrain from

using abbreviations of commands, as abbreviations change over time as more commands

are added to PAUP*. Also, if you find answers that are incorrect or ambiguous,

please let us know!

(Top)

### I just updated PAUP* using the updater on your web site, yet when I try to run PAUP I still get the message that PAUP* has expired.?

Occasionally this happens because a user’s computer is not set to the

correct date or the user is clicking on an icon that is not linked to

the beta 8 binary. Because PAUP is sensitive to both the creation

date and expiration date, back-dating your computer to a time before

the program was created will also generate the expiration notice.

After checking your system date make sure that you are executing the beta

8 binary.

(Top)

### Is PAUP* Year 2000 Compliant?

Yes, PAUP* is “Year 2000 Compliant.” The only time PAUP* uses dates is

to output them to the main display and/or log file for the user’s

information. If the host operating system returns the correct date

when PAUP* requests it, then PAUP* will show the correct date in

its output. Even if the host operating system fails to return the

correct date in the year 2000, the only consequence is that the date

will not be shown correctly by PAUP* in its display output and log

files.

(Top)

### What is a batch file?

A batch file contains commands that you would otherwise issue

interactively (i.e., from pull-down menues or the command line).

For example, using the pull-down menues in the Mac version of PAUP* you could:

1) open the data file combine2.dat from the file menu and execute it

2) exclude the charactersets cytb and junk2 from the data menu

3) start a heuristic search from the analysis menu

You could obtain the same result by executing a simple text file containing the

following paup blocks.

begin paup;

execute d:\data\combine2.dat;

exclude cytb junk2;

hsearch;

end;

(Top)

### I’m using a beta version of PAUP* 4.0. How should I cite the program?

Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods).

Version 4. Sinauer Associates, Sunderland, Massachusetts.

Note: Because there are a number of beta and test versions of the program

you should mention the specific version of PAUP* somewhere in the methods.

(Top)

### Is there a version of PAUP* that will run a search in parallel on a multiple processor machine or a cluster of machines?

Right now the answer is no. PAUP* is a single threaded application that will only take advantage of one processor at a time. Dave is in the process of parallellizing the code for the portable or Unix version of PAUP*, but it will be a while before a general parallel version of PAUP* available.

(Top)

### Could you recommend some text books that will help me to learn more about the analyses that can be done in paup?

There are a number of good books out there that deal with the subject of phylogenetic analyses. The selection below is just a few of the text books that I find myself referring to.

- Felsenstein, J. 2002. Inferring Phylogenies. Sinauer Associates. Sunderland, Massachusetts.
- Li, W. 1997. Molecular Evolution. Sinauer Associates. Sunderland, Massachusetts.
- Nei, M. and Kumar, S. 2000. Molecular Evolution and Phylogenetics. Oxford University Press, New York, New York.
- Page, R. D. and Holmes, E. C. 1998. Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Oxford
- Hillis, D. M., Moritz, C., and Mable, B. Molecular Systematics (2nd ed.) Sinauer Associates. Sunderland, Massachusetts.

(Top)

### What are the maximum dimensions (i.e., characters x sequences) of a data matrix that PAUP* will read?

The maximum number of sequences (AKA taxa) is 16384. The maximum number of characters (AKA positions or sites) will depend on the type of computer you are using. If your machines uses a 32-bit processor the maximum will be 2^30 (2 raised to the power of 30), whereas machines with 64-bit processors can read a maximum of 2^62 characters.

(Top)

### What is the maximum number of character states that can be assigned to a character in PAUP*?

16 for a 16-bit machine

32 for a 32-bit machine

64 for a 64-bit machine

This limit stems from the use of bit manipulation to perform the state-set calculations in parsimony, and corresponds to the “word length” of the computer–usually 32 bits (e.g., most x86 PCs) but occasionally 64 bits (e.g., Alpha, G5, etc).

(Top)

### Why doesn’t PAUP* allow me to set the criterion to likelihood after I execute my data set?

To use the maximum likelihood criterion in PAUP* your dataset must be composed of DNA, Nucleotide, or RNA characters and the “datatype” option under the “format” command must also be set to one of these values. For example:

Begin characters;

Dimensions nchar=200;

Format datatype=dna interleave;

(Top)

### How do I tell PAUP* I want to use the likelihood criterion?

set criterion=likelihood;

(Top)

### How do I tell PAUP* I want to use the parsimony criterion?

set criterion=parsimony;

(Top)

### How do I tell PAUP* I want to use the minimum evolution criterion?

set criterion=distance;

dset objective=me;

(Top)

### How do I tell PAUP* I want to use the least-squares criterion?

set criterion=distance;

dset objective=lsfit;

The default least-squares objective function is for weighted least squares,

with the weights equal to the reciprocal of the square of the distance

between each pair of taxa (see below).

(Top)

### How do I tell PAUP* I want to use unweighted least-squares criterion?

set criterion=distance;

dset objective=lsfit power=0;

In general, the “power” specifies the power to which the reciprocal of

the distance between each pair of taxa is raised. Raising this value

to the zero(th) power is equivalent to weighting all pairwise deviations

by the constant “1”.

(Top)

### Which non-NEXUS file formats will PAUP* import?

- FrePars
- GCG MSF
- Hennig86
- MEGA
- NBRF-PIR
- Phylip 3.X
- Simple test
- Tab-delimited text

(Top)

### Where can a find examples of non-NEXUS file formats that PAUP* will import?

Sample non-NEXUS files are given at http://paup.csit.fsu.edu/nfiles.html.

(Top)

### How do I import non-NEXUS formatted files into PAUP*?

To import non-NEXUS formatted files into PAUP* you need to use the tonexus command. For example:

tonexus format=gcg fromfile=mygcgfile.gcg tofile=mynexusfile.nex;

If you are using the Mac interface you can get to the import dialog box by selecting File and then Import data…

(Top)

### How do I tell PAUP* to ignore certain taxa in further analyses?

The following lines show six alternative ways of telling PAUP* to ignore the taxa

P._articulata, P._gracilis, P._fimbriata, and P._robusta (we’ll assume these

were number 2, 3, 4 and 7 in the data matrix, respectively) in further analyses.

delete P._articulata P._gracilis P._fimbriata P._robusta;

delete ‘P. articulata’ ‘P. gracilis’ ‘P. fimbriata’ ‘P. robusta’;

delete ‘P. articulata’-‘P. fimbriata’ P._robusta’

delete 2 3 4 7;

delete 2-4 7;

Note: If you plan to refer to a set of taxa frequently, you may find it convenient

to setup a

**taxset**. Sets are defined in a sets block. For the five taxa

given above defining a taxset would look like this:

begin sets;

taxset junk = P._articulata P._gracilis P._fimbriata P._robusta;

end;

After the taxset is defined, simply refer to the taxset to ignore these taxa

in futher analyses. For example:

delete junk;

(Top)

### How do I tell PAUP* to use taxa that I previously told it to ignore?

The following lines show five alternative ways to tell PAUP* to

reinstate four taxa previously deleted (see above)

restore P._articulata P._gracilis P._fimbriata P._robusta;

restore ‘P. articulata’ ‘P. gracilis’ ‘P. fimbriata’ ‘P. robusta’;

restore ‘P. articulata’-‘P. fimbriata’ P._robusta’

restore 2 3 4 7;

restore 2-4 7;

Note: If you’ve defined a taxset then you can use the following syntax:

restore junk;

(Top)

### How do I tell PAUP* to ignore certain characters (sites) in further analyses?

The following lines show five alternative ways of telling PAUP* to ignore the

characters leaf_length, leaf_width, stamen_number, and carpel_number (we’ll assume these

were characters number 2, 3, 4 and 7 in the data matrix, respectively) in further analyses.

exclude leaf_length leaf_width stamen_number carpel_number;

exclude ‘leaf length’ ‘leaf width’ ‘stamen number’ ‘carpel number’;

exclude leaf_length-stamen_number ‘carpel number’;

exclude 2 3 4 7;

exclude 2-4 7;

If you planned to exclude these characters frequently it would be a good to define them in a characters set. This way you could exclude them by referencing the character set. For example:

charset foo = 1-4 7;

exclude foo;

Here’s how to tell PAUP* to ignore nucleotide sites

359 to 367, 586 to 588 and 693 to the last site in further analyses.

exclude 359-367 586-588 693-.;

Here’s how to tell PAUP* to ignore every third nucleotide site

in further analyses (starting with the third site).

exclude 3-.\3;

(Top)

### How do I tell PAUP* to use characters (sites) that I previously told it to ignore?

The following lines show five alternative ways to tell PAUP* to

reinstate four characters previously excluded (see above)

include leaf_length leaf_width stamen_number carpel_number;

include ‘leaf length’ ‘leaf width’ ‘stamen number’ ‘carpel number’;

include leaf_length-stamen_number ‘carpel number’;

include 2 3 4 7;

include 2-4 7;

Here’s how to tell PAUP* to include previously excluded nucleotide sites

359 to 367, 586 to 588 and 693 to the last site in further analyses.

include 359-367 586-588 693-.;

Here’s how to tell PAUP* to include every third nucleotide site

(starting with site number 1) in further analyses.

include 1-.\3;

(Top)

### How do I exclude all the constant characters?

exclude constant;

(Top)

### How do I exclude all constant as well as autapomorphic characters?

exclude uninf;

(Top)

### How do I combine different data set into a single NEXUS file?

In the example below protein and nucleotides are combined in a single interleaved data set. Notice that a character partition is used to distinguish the data sets.

#NEXUS

Begin data;

Dimensions ntax=5 nchar=20;

Format datatype=protein interleave symbols=”ACGT” gap=-;

Matrix

t1 VKYPNTNEEG

t2 VKYPNTNEEG

t3 VKYPNTNEEG

t4 VKYPNTNEDG

t5 VKYPNTNEDG

t1 AGCTAAACCT

t2 AGCTAGACCT

t3 AGCTAGACTT

t4 AGCTAGACTT

t5 AGCTAAACTT

;

end;

Begin Assumptions;

charset protein = 1-10;

charset dna = 10-.;

usertype 5_1 stepmatrix = 4 acgt

– 5 1 5

5 – 5 1

1 5 – 5

5 1 5 –

;

end;

Begin paup;

outgroup t2 t3;

ctype 5_1:dna;

hsearch addseq=random;

end;

(Top)

### How do I code indels so that they are not treated as missing data?

If you are confident about the homology of the indels then you might consider setting up an additional character for each site in the original matrix that contains an indel. The new sites would be represented by a binary character. The syntax for doing this looks like this:

begin data;

dimensions ntax=4 nchar=10;

format datatype=dna gap=- interleave symbols=”01″;

options gapmode=missing;

matrix

one ATGGT–

two AtggT–

three A-GGTTG

four A-GGTAG

one 011

two 011

three 100

four 100

;

end;

(Top)

### What are data partitions and why are they useful?

Data partitions divide the characters in your data matrix into two or more

groups. This is useful for performing the partition homogeneity test

or for estimating site-specific rates by maximum likelihood.

(Top)

### How do I define and name a data partition?

Here a partition is created and named

*codons*. The partition divides sites into

first, second and third codon positions. The first partition, named

*firstpos*,

includes every third site (the \3 means

*every third site*) starting from site

1 and ending with the last site (the period means

*last character*). The

second and third partitions, named

*secondpos*and

*thirdpos*respectively,

are defined similarly, except they have different starting points.

charpartition codons = firstpos:1-.\3, secondpos:2-.\3, thirdpos:3-.\3;

(Top)

### How do I do a partition homogeneity test?

First you’ll need to set up a partition. For this example, I’ll pretend to setup a partition

called genes for two partial gene sequences.

charpartition genes = gene1:1-210, gene2:230-.;

Next I’ll need to exclude the characters contained in the NEXUS data set but not defined in either of the

two partitions — gene1 or gene2.

exclude 211-229;

Now I can use the partition homogeneity test.

hompart partition=genes;

(Top)

### What are topological constraints?

Topological constraints are unresolved trees used to filter out trees discovered during

the search that do not match a particular topological criterion. One possible use of

a topological constraint is to force a particular group to be convex (i.e., monophyletic

if the tree is rooted outside the group). This type of topological constraint is

referred to as a

*monophyly*constraint. Monophyly constraint trees contain all

the taxa but are unresolved to some degree. A second type of constraint is called

a

*backbone*constraint. Backbone constraint trees are normally fully resolved,

but are missing one or more taxa. A tree encountered during a search is consistent

with a backbone constraint tree so long as pruning all taxa not in the constraint

tree yields the constraint tree topology. One may wish to compare the

support of the data for the best tree obtained under the constraint to the best tree

without the constraint. Note that PAUP* offers much more flexibility in terms of

topological constraints than is indicated here; the manual for version 3.1 explains

constraints thoroughly.

(Top)

### How do I define and name a topological constraint?

Suppose you are studying bot flies that parasitize either lagomorphs or

rodents depending on the species. You may be interested in finding the

best tree in which the lagomorph-infecting species of bot flies form

a monophyletic group. Assume that there are 10 taxa, and taxa

2, 3, 5, 7 and 9 are lagomorph-infecting species, while the others (1, 4,

6, 8 and 10) are rodent-infecting species.

constraints lagomorph (monophyly) = (1,4,6,8,10,(2,3,5,7,9));

Here, the word

*lagomorph*is the name of the topological constraint, and

the word

*monophyly*is a keyword indicating the type of constraint (the other

possible type is specified using the keyword

*backbone*).

Note that taxa connected directly to the root node do not have to

be specified explicitly in constraint-tree definitions, and monophyly

constraints are the default. The above example could thus also be

written:

constraints lagomorph = ((2,3,5,7,9));

(Top)

### How do I load a topological constraint in the form of a tree file?

Suppose one or more constraint trees exist as tree definitions in a tree file named

“foo.tree” (the names of the trees in the tree file will become the names of

the corresponding constraint definitions when the treefile is loaded).

loadconstr file=foo.tree;

If the trees in “foo.tree” are to be considered

*backbone*constraints,

then the keyword “asbackbone” must be included (otherwise the trees are

considered to be

*monophyly*constraints):

loadconstr file=foo.tree asbackbone;

(Top)

### How do I apply a previously-defined topological constraint to a search?

The command below will perform an heuristic search using all default options

except that the (predefined) topological constraint named lagomorph will be enforced:

hsearch constraints=lagomorph enforce=yes;

Other search-related commands for which the constraints and enforce options are

available are illustrated in the examples below:

nj constraints=lagomorph enforce=yes; [neighbor-joining]

alltrees constraints=lagomorph enforce=yes; [exhaustive search]

bandb constraints=lagomorph enforce=yes; [branch-and-bound search]

(Top)

### How do I get a single majority-rule bootstrap consensus tree from the results of multiple bootstrap runs performed at different times or on different machines?

First, save the trees found during each bootstrap run. By default, PAUP* uses the system clock to seed the random number generator; thus, provided you do not change the value of bseed characters will be sampled differently from run to run. After the bootstrap runs have completed, retrieve the tree files, and compute the consensus tree using the options given below.

begin paup;

execute my_nexus_file.nex;

bootstrap treefile=futz1.out nreps=10 bseed=0 search=heuristic;

end;

…

begin paup;

execute my_nexus_file.nex;

bootstrap treefile=futz3.out nreps=10 bseed=0 search=heuristic;

end;

begin paup;

execute my_nexus_file.nex;

gettrees file=futz1.out StoreTreeWts=yes mode=3;

gettrees file=futz2.out StoreTreeWts=yes mode=7;

gettrees file=futz3.out StoreTreeWts=yes mode=7;

contree all/strict=no majrule=yes usetreewts=yes;

end;

(Top)

### How do I tell PAUP* to save the trees currently in memory to a file?

Here’s how to save the trees (and the estimated branch lengths) to the

file ‘foo.trees’

savetrees file=foo.trees brlens;

(Top)

### How do I tell PAUP* to read in trees previously saved in a file?

Here’s how to load into memory the trees saved in the file ‘foo.trees’

gettrees file=foo.trees;

(Top)

### Why can’t I get PAUP* to save branch length on the bootstrap consensus tree?

The bootstrap tree is a consensus of the trees found for each replicate sample of the

data. Since each replicate tree will have a different set of branch lengths none are

displayed or saved on the bootstrap consensus tree.

(Top)

### How can I limit the number of rearrangements PAUP* evaluates during a heuristic search?

There are several different ways to go about this. First, there is a “rearrlimit=n” option on the hsearch

command, which

limits the total number of rearrangements for each search to n. Second, there is a “timelimit=n” option, where n is the number of seconds that PAUP* will use to search for a tree. Note that

if you use these options in conjuction with random-addition-sequence searches,

the “limitperrep=y|n” determines whether to apply this limit on a per replicate or overall basis.

You can also

specify reconlimit=n, where n is the maximum “reconnection distance” for

an SPR or TBR reconnection (1 is equivalent to NNI, infinity to TBR, and

values in between restrict the size of the neighborhood of trees that are

tested).

(Top)

### Why are fractions listed in the bootstrap bipartition table when 100 bootstrap replicates are performed?

In some cases PAUP might find multiple optimal trees for a given replicate. If it does, PAUP will give the tree a weight that is equal to the reciprocal of the number of trees found in the replicate. You can see this for yourself if you use the treefile option under the bootstrap command to save all trees during search. For example:

bootstrap treefile=bstrees.tre;

(Top)

### How do I ask PAUP* to examine every possible tree topology?

Here’s how to do this, but keep in mind that the number of possible unrooted

bifurcating tree topologies increases factorially with the number of taxa.

This means that for even a 14 taxon problem, it will take PAUP*

*several*

centuriesto complete this analysis! It probably is not a good idea to

centuries

try this command if you have more than ten taxa currently included.

alltrees;

(Top)

### How do I evaluate 500 random-addition replicates but prevent PAUP* from branch swapping on each one?

hsearch addseq=random nreps=500 swap=none;

(Top)

### How do I set a maxtree limit for each random addition sequence replicate?

If you are doing a number of random addition sequence replicates you’ll need a way to get around

the problem of hitting the maxtree limit on the first replicate and hence aborting the search

before PAUP gets to remaining replicates. For example, if you want to apply a maxtree limit of 100 to

each of 10 random addition sequence replicates then you will need to set the maxtree limit to 1000

and use two options under the the

**hsearch**command. The syntax will look like this:

set maxtrees = 1000 increase=no;

hsearch addseq=random nreps=10 nchuck=100 chuckscore=1;

(Top)

### I have performed an heuristic likelihood search and specified 100 replicates within the hsearch command. When I examine the progress reports, it looks like PAUP* is finding many different tree islands, however the summary at the end says that only one island was found and that island was hit 100 times. What is going on here?

The problem is that PAUP* makes progress reports only once per minute by

default. Once PAUP* encounters a tree in the same island as trees it has

found previously, it immediately abandons the current replicate and begins

working on the next replicate. Thus, even if you set the progress report

interval to 1 second as follows:

set dstatus=1;

you will probably never catch PAUP* at just the moment when it is finishing

one replicate and about to begin the next. As a result, it is very common

for the last entry of a replicate to report a likelihood score that is

worse than the best likelihood score found thus far.

(Top)

### Do you have equations for estimating the relative (or actual) time required for heuristic searches for sequences of different length and for different numbers of sequences?

Unfortunately, the time required to complete a heuristic search cannot be estimated based on the size of a data set. There are a number of reasons why this is so; however, one important reason has to do with the quality of the data (i.e., how homoplastic the data are).

Another important reason is that there is no simple expression for calculating the number of tree bisection-reconnection (TBR) or subtree pruning-regrafting (SPR) rearrangements that will be made on a given tree. That is, the shape of a starting tree will determine the total number of rearrangements that can be made using one of the aforementioned swapping techniques. The problem is further complicated by the fact that it is not known how many suboptimal trees will be found during a search before optimal trees are found, and what portion of potential rearrangements of a given tree will be performed before a better tree is found.

(Top)

### Is there a version of PAUP* that will work on my new Intel-based Mac?

Yes, Mac users who have upgraded to an Intel-based Mac must follow the instructions on this page to get a version of PAUP* that will work on this platform.

(Top)

### Is there a version of PAUP* that will work natively under Mac OS X?

Yes, we have compiled a command-line only version of PAUP* 4.0 beta that will run on Mac OS X in

a terminal window. Note, this version takes full advantage of Mac OS X’s memory protection

and preemptive multitasking but LACKS a Graphical User Interface (GUI). Starting with the forthcoming release of Beta 11,

Mac users will be given a choice to install the command-line version of PAUP* as well as the classic Mac GUI version.

Work is currently underway to “carbonize” the GUI Mac version of PAUP*; however, at this time, we cannot speculate on

when this version will be available. The Mac GUI version of PAUP* is compatible with Mac OS X when run in the classic layer.

If you are only interested in the command-line version of PAUP* then you may purchase

the portable version http://www.paup.csit.fsu.edu/port.html

(Top)

### I just purchased a new Mac and Classic support is not installed on the system. How do I run PAUP* without classic support?

You have two choices. The first is to install classic support on your machine. While Apple no longer installs classic support by default on new systems, you can install it yourself with very little effort. A classic support installer is included on the “Additional Software & Apple Hardware Test” CD. This CD is included with your set of system CDs. Open the file labeled “About the Additional Software & Apple Hardware Test Disc” on the “Additional Software & Apple Hardware Test” CD and you will find concise instructions for installing classic support.

Your second choice is to use the command-line version of PAUP* for OS X. Starting with the forthcoming release of Beta 11, Mac users can use a command-line version of PAUP* in addition to the classic Mac GUI version. The command-line version runs on Mac OS X in a terminal window and takes full advantage of Mac OS X’s memory protection and preemptive multitasking but LACKS a Graphical User Interface (GUI). The Beta 11 installer and updater will automatically add the command-line program to your system path. To start command-line program type “paup” in terminal window. See the quick-start document http://paup.csit.fsu.edu/quickstart.pdf for more details regarding the use of the command-line version of PAUP*.

(Top)

### I get an error when I try to print from the classic version of PAUP*. How do I print from the classic version of PAUP*?

This is probably happening because you do not have a printer setup for the classic layer. A complete description of how to setup printing can be found in Apple’s “Knowledge Base”. The short version of this site is:

- If you plan to use an Appletalk printer then you will need to Turn on AppleTalk. Go to your System Preferences > Network > Configure … Select the AppleTalk Tab and then the “Make AppleTalk active” toggle.
- Open the Desktop Printer Utility. This is typically located in the Utilities folder within the Applications (Mac OS 9) folder. A window named “New Desktop Printer” should open after a few seconds (give it some time).
- Select the printer type that you would like to use and follow the instructions.

If you are only interested in using the Mac tree preview window in PAUP*, you can also setup a “dummy” printer. Open the Desktop Printer Utility. Under “Create Desktop …” select Translator and then click OK. After you do this you should be in business.

(Top)

### How do I increase the amount of memory available to PAUP*?

This is pretty much straight out of Mac’s online help: First, quit PAUP* if it is open. Click the program’s icon to select it. (Make sure to click the program icon itself, not an alias.) Open the File menu and choose Get Info. For Mac OS 8.1 and below, double-click the “Preferred size” box and type a new number. For Mac OS 8.5 and up, you’ll need to select memory under the “Show” pull-down menu to get to the “Preferred Size” box. The program can use this amount of memory if enough memory is available.

(Top)

### Can I download a Mac updater to a PC and transfer the updater to a Mac that is not online?

Yes, download the BinHexed updater for the appropriate Mac version. From your PC click the BinHex

link. Your browser will ask you if you want to save the file or run it. Select the save option.

You’ll get another dialog box allowing you to select a save location. Save the updater to a PC

formatted floppy disk. Mount the floppy on your Mac’s desktop. If your Mac doesn’t already have

one, you’ll need an utility to decompress the BinHexed file. After the file is decompressed double

click the updater icon and you should be good to go.

(Top)

### How do I tell PAUP to automatically close the heuristic search status window at the end of the search?

set autoclose=yes;

(Top)

### How do I keep information from scrolling off the screen before I have read it?

If the PAUSE option of the SET command equals Silent, Beep, or Msg the output will stop after every screenful and wait for you to press the return key.

set pause=No|Silent|Beep|msg

(Top)

### How do I recall a PAUP* command?

We strongly recommend using the public domain command-line editor CED, which provides command-line editing and recall capabilities within PAUP*.

(Top)

### How do I print trees using the Windows Interface?

To print an asci trees, direct the general output to a file using the

**log**command,

issue the command showtrees, stop the log, and print the log file using your favorite text editor.

log file=tree.log;

showtree 1;

log stop;

Note: The windows interface of PAUP* 4.0 does not print graphical trees. We plan to make graphical

printing a part of the windows package but this feature will not be available in 4.0.

The program TreeView written

by Rod Page is an execellent program for creating and manipulating graphical trees from NEXUS files.

To output NEXUS trees from any version of PAUP* use the savetrees command.

savetrees file=mytree.trees;

(Top)

### How does PAUP* deal with missing characters under the parsimony criterion?

The way that PAUP* deals with missing characters under the parsimony criterion is to assign to the taxa the character state that would be most parsimonious given its placement on the tree. Therefore, only the characters with no missing data will affect the placement of the taxa.

(Top)

### What options are available in PAUP* for dealing with multi-state taxa?

Under the “Set” or “Pset” commands you are given an option to change the way in which PAUP deals with multi-state taxa. When the data set below is analyzed under the parsimony criterion changing the designation of multi-state taxa to uncertain (default), variable, and polymorphic gives three different scores; 5, 6, and 7, respectively. For “Pset mstaxa=uncertain” paup picks the variable state that minimizes the tree length, for “Pset mstaxa=polymorphic” paup assumes that variable characters are a heterogeneous terminal group, and for “Pset mstaxa= variable” paup treats the characters inside the curly braces as uncertain and those inside the parentheses as polymorphic.

NOTE: For display reason, the curly braces are replaced by square brackets. To get the results described above replace the square brackets with curly braces.

#NEXUS

begin data;

dimensions ntax=4 nchar=4;

format symbols=”012″;

matrix

t1 11 00

t2 1[12] 10

t3 02 1(01)

t4 00 11

;

end;

(Top)

### How do I define multistate characters as ordered in PAUP?

There are several ways to assign character types to specific characters in the data matrix. One way is to define a typeset in an assumption block and then use the assume command to set the character type. For example:

begin assumptions;

typeset myTypesetName = ord: 1 4 5;

end;

begin paup;

assume typeset = myTypesetName;

end;

You can skip the assume command and set the character type from within the assumptions block if you precede the typeset name with an asterisk (“*”). For example:

begin assumptions;

typeset *myTypesetName = ord: 1 4 5;

end;

Yet another way to set character types is by using the ctype command from within a paup block or at the command line. For example the following command has the same effect as those given above:

ctype ord:1 4 5;

(Top)

### If a patristic distance is the sum of branch lengths on a path between a pair of taxa, why do the summed branch lengths between a pair of taxa not add up to the patristic distance reported under the “describetrees” command?

The most likely reason for this is that you have unordered multistate characters in your data matrix.

PAUP does not included unordered multistate characters in the patristic distance calculation, because

reconstuction of these characters can be ambiguous. To calculate branch lengths and by extension

the entire tree length, PAUP will arbitrarily accept one of the possible ancestral state assignments.

Therefore, the sum of the branch lengths is greater then the patristic distance because the branch

length calculations included the multistate characters.

If you don’t care about what ancestral states PAUP has used there is a way to get a patristic distance

for all of the characters in your data set. First, save the tree in matrix representation

including the branch lengths as a weight set.

matrixrep brlens=yes file=mytreefile.nex;

Next, open the matrix tree file and apply the weight set to all of the characters.

execute mytreefile.nex;

assume wtset=brlens;

Finally, rebuild the tree and generate the patristic distance matrix:

hs;

describetrees 1/ patristic=yes;

The patristic distances will now equal the summed branch lengths.

(Top)

### I did a search under the parsimony criterion and got two trees that look just alike. Why does PAUP consider them to be different?

The answer involves how PAUP collapses zero-length branches. The default collapsing rule is that a branch is retained if it is supported under at least one most-parsimonious reconstruction (MPR) of the ancestral states, for at least one character.

Here is a simple data matrix that will generate this result.

characters

taxa 1 23 45

—————–

A 0 00 00

B 0 11 11

C 0 11 11

D 1 00 11

E 1 00 00

F 1 00 11

Analysis of this matrix using PAUP gives two most-parsimonious trees:

: A B C D F E

: \ \ / \ / /

: \ * * /

: \ \ / /

: \ \ / /

: tree1 \ * /

: \ | /

: \|/

: *

:

: A B C D F E

: \ \ / / / /

: \ * / / /

: \ \ / / /

: \ * / /

: \ \ / /

: tree2 \ * /

: \ | /

: \|/

: *

An MPR on tree1 for character 1 requires two steps, and there are two of them:

: A B C D F E A B C D F E

: 0 0 0 1 1 1 0 0 0 1 1 1

: \ \ / \ / / \ \ / \ / /

: \ 0 1 / \ 0 1 /

: \ \ / / \ \ / /

: \ \ / / \ \ / /

: \ 0 / \ 1 /

: \ | / \ | /

: \|/ \|/

: 0 1

Because one of these two MPRs assigns a change leading to the group DF, PAUP does not collapse the branch connecting DF to the remainder of the tree.

On the other hand, tree2 has only a single MPR for character 1:

: A B C D F E

: 0 0 0 1 1 1

: \ \ / / / /

: \ 0 / / /

: \ \ / / /

: \ 1 / /

: \ \ / /

: \ 1 /

: \ | /

: \|/

: 1

This character does not provide support for the BCD group, and since there are no other characters that support it, the branch leading to BCD is collapsed, yielding the tree:

: A B C D F E

: 0 0 0 1 1 1

: \ \ / | / /

: \ 0 | / /

: \ \ | / /

: \ \ | / /

: \ \ | / /

: \ \ | / /

: \ \|/ /

: \ 1 /

: \ | /

: \|/

: 1

PAUP considers both of these trees to be distinct, recognizing that there is a tree for which the group DF receives support (albeit ambiguous support) and another tree for which DF receives no support.

(Top)

### How do I perform a Kishino-Hasegawa test to see if the support for the first and second trees stored in memory is significantly different?

set criterion=parsimony;

pscores 1-2 / khtest;

(Top)

### How do I perform a partition homogeneity (congruence) test?

The following example uses the partition definition

named “foo”, specifies 1000 randomizations using the random number seed

1234567, and uses a branch and bound search to obtain the sum of tree lengths

for each partition.

set criterion=parsimony;

hompart partition=foo nreps=1000 seed=1234567 search=bandb;

(Top)

### How do I downweight third position transitions only in a parsimony analysis?

First you need to identify the codon positions. Probably the most efficient way to

do this is to set up a codons block where the reading frame for the coding genes

is identified. Then you need to define the weighting for transitions and transversions by

creating a step matrix within an assumptions block. Finally, use the

**ctype**

command within a paup block to apply the stepmatrix to 3rd position sites only.

begin assumptions;

charset coding = 2-457 660-896;

charset noncoding = 1 458-659 897-898;

charset 1stpos = 2-457\3 660-896\3;

charset 2ndpos = 3-457\3 661-896\3;

charset 3rdpos = 4-457\3 662-.\3;

usertype 5_1 stepmatrix = 4 acgt

– 5 1 5

5 – 5 1

1 5 – 5

5 1 5 –

;

end;

begin paup;

ctype 5_1:3rdpos;

end;

(Top)

### How do I weight specific character positions in my alignment?

You can give different weights to different character positions by using the “weights” command.

There are several ways to identify the characters to be weighted. One efficient way to

identify characters is to include them in a character set, which must be defined within an

assumptions block. For example:

begin assumptions;

charset coding = 2-457 660-896;

charset noncoding = 1 458-659 897-898;

end;

Next, you can issue the “weights” command at the command line or within a paup block. In the example

below, the first “weights” command assigns a weight of three to all characters defined as coding. The

second “weights” command does the same thing except the character are directly identified.

begin paup;

weights 3:coding;

end;

or

weights 3:2-457, 3:660-896;

(Top)

### Do stepmatrices for character state transformations have to be symmetric?

User-defined stepmatrices do not need to be symmetric. The only requirement imposed on a stepmatrix is that it may not violate the triangle inequality .

(Top)

### Why does PAUP* tell me that my stepmatrix violates the triangle inequality?

The triangle inequality requires that a single edge of a triangle not be greater than the sum of the other edges. In terms of step matrices this means that

d(ac) <= d(ct) + d(at)

According to this rule the stepmatrix given below qualifies.

3 < 1+3

Stepmatrix “asym” (asymmetric):

TO: a c g t

FROM: a – 3 1 3

c 2 – 3 1

g 1 4 – 3

t 3 1 3 –

Whereas the following matrix would be inconsistent with the triangle inequality:

Stepmatrix “asymNT” (asymmetric triangle violation):

TO: a c g t

FROM: a – 5 1 3

c 2 – 3 1

g 1 4 – 3

t 3 1 3 –

and PAUP* would adjust the a to c transformation from 5 to 4.

(Top)

### Why does PAUP* warn me that the stepmatrix supplied in Xu and Miranker (2004, “A metric model of amino acid substitution”, Bioninformatics 20:1214-1221) is “internally inconsistent”?

Symmetric stepmatrices in PAUP* are required to satisfy the triangle inequality. If they fail to do so, a warning is issued and the costs in the matrix are adjusted until the triangle inequality is satisfied for all possible triplets of states. Unfortunately, the matrix given in the paper by Xu and Miranker contained a minor error. A corrected matrix is available at the following location: http://www.cs.utexas.edu/users/mobios/Publications/mPAMErrata.pdf.

(Top)

### What do the indices under the “pscores” command mean?

PAUP outputs several indices that measure the “fit” of characters to particular trees.

The indices can be defined in terms of the following three parameters:

- s= length (number of steps) required by the characters on the tree being evaluated
- m= minimum amount of change that the character may show on any conceivable tree
- g= maximum possible amount of change that a character could possible require on any

conceivable tree (i.e., the length of the character on a completely unresolved bush).

You can calculate a value for each character using the following formulae:

ci= m/s

ri= (g-s)/(g-m)

rc= ri*ci

hi= 1-c

To get the overall value for a suite of characters you’ll simply caculate the sums

of s, m, and g for all the charachers in the suite and use the summed values in the

equations described above.

(Top)

### How does PAUP* deal with missing characters under the likelihood criterion?

The likelihood is computed by summing the likelihoods over each possible assignment of A, C, G, or T to the taxon with the missing datum. Generally, if all of the nearby taxa have the same state, this sum will be dominated by the term with this same state assigned to the “missing” value, but each of the other states will contribute some small, nonzero, value to the likelihood.

On the other hand, if there is considerable ambiguity in the sense that the surrounding taxa have different states, or the branch leading to a missing-data taxon is very long, each of the possible assignments makes a larger contribution to the total likelihood.

It’s all in the same spirit as likelihood in the absence of missing data–there are lots of ways that the pattern of nucleotides at the tips of the tree could have been generated, and all of them contribute something to the total likelihood (generally some much more than others).

With missing data, there are several states that a taxon might have taken if an insertion/deletion event had not happened (or an ambiguity in the sequencing hadn’t occurred) and likelihood considers the probability of each of those alternatives.

(Top)

### How do I tell PAUP* I want to use the JC69 model (Jukes & Cantor, 1969)?

set criterion=likelihood;

lset nst=1 basefreq=equal;

(Top)

### How do I tell PAUP* I want to use the K2P model (Kimura, 1980)?

set criterion=likelihood;

lset nst=2 basefreq=equal;

(Top)

### How do I tell PAUP* I want to use the F81 model (Felsenstein, 1981)?

set criterion=likelihood;

lset nst=1 basefreq=empirical;

(Top)

### How do I tell PAUP* I want to use the F84 model (i.e., the model used in DNAML)?

set criterion=likelihood;

lset nst=2 basefreq=empirical variant=f84;

(Top)

### How do I tell PAUP* I want to use the HKY model (Hasegawa, Kishino, & Yano, 1985)?

set criterion=likelihood;

lset nst=2 basefreq=empirical variant=hky;

(Top)

### How do I tell PAUP* I want to use the GTR model (i.e., the general time reversible model)?

set criterion=likelihood;

lset nst=6 basefreq=empirical;

(Top)

### How do I obtain likelihoods for all trees in memory?

lscores all;

Notes: you must first instruct PAUP* to use the likelihood criterion and you

may also wish to change the current substitution model before issuing the

above command.

(Top)

### How do I obtain likelihoods corresponding to each individual nucleotide site in my data using the first tree in memory?

lscores 1 / sitelikes;

Notes: you must first instruct PAUP* to use the likelihood criterion and you

may also wish to change the current substitution model before issuing the

above command.

(Top)

### How do I force PAUP* to use the branch lengths I specify when computing site likelihoods?

Assuming that you have a tree file (for example, “foo.tre”) in which descriptions

of trees contain branch length information, you could read in the trees from

this file and preserve the branch length information as follows:

gettrees file=foo.tre storebrlens;

lscores 1 / sitelikes userbrlens;

Notes: you must first instruct PAUP* to use the likelihood criterion and you

may also wish to change the current substitution model before issuing the

above commands.

An example of a tree file containing one unrooted tree with branch length

information is shown below. In this example, all branches in the four-taxon

unrooted tree have length 0.1 except for the central branch, which has length 0.2

#nexus

begin trees;

utree best = (taxonA:0.1,taxonB:0.1,(taxonC:0.1,taxonD:0.1):0.2);

end;

(Top)

lscores 1-2 / khtest;

Notes: you must first instruct PAUP* to use the likelihood criterion and you

may also wish to change the current substitution model before issuing the

above command.

(Top)

### What is the difference between the transition/transversion *ratio* and the transition/transversion *rate ratio*?

The transition/transversion

*rate ratio*is simply the instantaneous

rate of transitions divided by the instantaneous rate of transversions.

I will refer to this quantity as k. If k

is 1.0, this means that transitions are occurring at the same rate as transversions.

The transition/transversion

*ratio*, however, is the probability of

*any*transition (over a single unit of time) divided by the probability

of

*any*transversion (over a single unit of time). To find the probability

of any transition during a single unit of time, one must consider each

of the ways a transition can occur (i.e., A to G, G to A, C to T, and T

to C) and add together the probabilities of each (note that this will be

a sum of four terms). Likewise, finding the probability of any transversion

during a single unit of time involves a sum of eight terms (i.e., A to

C, A to T, G to C, G to T, C to A, C to G, T to A, and T to G). The probability

of the specific transition A to G can be determined as follows: it is the

probability that one begins in state A

*and*changes from state A

to state G in a single unit of time. Using the Felsenstein 1981 substitution

model, the probability of the second part of the above statement, namely

the probability of changing from state A to state G, can be written as

p

_{G}b. The

first part of the statement, namely the probability of starting with state

A, is simply the equilibrium nucleotide frequency of A, or p

_{A}.

The transition/transversion ratio, then, involves the equilibrium base

frequencies, whereas the transtition/transversion rate ratio does not.

Still another definition of transition/transversion ratio exists. That

definition is that this ratio is the observed number of transitions between

two sequences divided by the observed number of transversions between two

sequences. This definition is problematic because the magnitude of this

measure depends on the amount of time separating the two sequences being

considered. It is thus difficult to compare meaningfully transition/transversion

ratios obtained in this way across different pairs of sequences, since

these will generally be separated by different amounts of time. Also, one

should be aware that the symbol k has been used

in other contexts; for example, k as used in

the model implemented in the program DNAML is not comparable to k

as described here.

(Top)

### How do I tell PAUP* to estimate the transition/transversion ratio when using the HKY substitution model?

set criterion=likelihood;

lset nst=2 basefreq=empirical variant=hky;

lset tratio=estimate;

(Top)

### How do I take account of rate heterogeneity across sites using a discrete gamma distribution, four rate categories, and a shape value of 0.2?

set criterion=likelihood;

lset rates=gamma ncat=4 shape=0.2;

(Top)

### How do I estimate the shape parameter when I am using a four-category discrete gamma distribution to account for heterogeneity in rates across sites?

set criterion=likelihood;

lset rates=gamma ncat=4 shape=estimate;

(Top)

### How do I tell PAUP* to estimate the proportion of invariant sites?

set criterion=likelihood;

lset pinvar=estimate;

(Top)

### How do I tell PAUP* to assume there are no invariant sites?

set criterion=likelihood;

lset pinvar=0;

(Top)

### How do I tell PAUP* to estimate the proportion of invariant sites *and* and estimate the shape parameter of a discrete, four-category gamma distribution applied to the sites that are not invariant?

set criterion=likelihood;

lset pinvar=estimate;

lset rates=gamma ncat=4 shape=estimate;

(Top)

### I think most of the rate heterogeneity in my sequences are the result of codon structure. How can I tell PAUP* to assume a different rate for each codon position (i.e., estimate site-specific rates)?

set criterion=likelihood;

charpartition codons = firstpos:1-.\3, secondpos:2-.\3, thirdpos:3-.\3;

lset rates=sitespec siterates=partition:codons;

At this point, any command that causes likelihoods to be computed will make use of the

charpartition named

*codons*and a different rate will be estimated for each

codon position class of sites.

(Top)

### How do I tell paup to use site-specific rates that I have already estimated?

How do I tell paup to use site-specific rates that I have already estimated?

You can do this a couple of different ways. The first way is to estimate the rates on a given tree and then apply the estimated rates by using the previous option. In the following example, a character partition defines three genes and the site specific rates for each gene are estimated on a neighbor joining tree. Finally, a heuristic search is executed using the site-specific rates estimated on the neighbor joining tree.

charpartition genes=g1:1-300, g2:301-600, g3:601-700;

nj;

lscore 1/rates=sitespec siterates=partition:genes;

lset rates=sitespec siterates=previous;

hsearch;

The second way to use previously estimated site-specific rates is to define them explicitly in a rate set. In the following example 1st, 2nd, and 3rd positions are assigned a rate of 2, 1, and 3, respectively. Characters sets are used to defined which characters represent the codon positions.

charset 1stpos = 2-457\3 660-896\3;

charset 2ndpos = 3-457\3 661-896\3;

charset 3rdpos = 4-457\3 662-.\3;

rateset codonrates = 2.0:1stpos, 1.0:2ndpos, 3.0:3rdpos;

lscore / rates=sitespec siterates = rateset:codonrates;

(Top)

### When I estimate the shape parameter of the gamma-distributed rates model and the proportion of invariable sites simultaneously, PAUP tells me that pinvar is zero even though the empirical number of invariable sites is about 30 percent. Why?

When you use gamma-distributed rates, invariable sites can sometimes be accommodated by the left tail of the gamma distribution (i.e., while these sites are technically not “invariable”, they are changing slowly enough that a fair number of constant sites are expected when the gamma shape parameter is small). The two parameters are highly correlated; often similar likelihood scores can be achieved with a small pinv and small gamma shape or a larger pinv with a correspondingly larger gamma shape. When the gamma shape parameter is larger, fewer low-rate sites are expected, and the pinv must increase to account for the presence of these low-rate sites. The following article deals with this issue in more depth:

Sullivan, J.; Swofford, D. L., and Naylor, G. J. P. The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molecular Biology and Evolution. 1999; 16:1347-1356.

(Top)

### How do I get the relative probabilities for each ancestral base assignment?

There are basically two step to getting the relative probabilities of each base assignment.

First you need to tell PAUP to display the values when characters are reconstructed and then

you’ll need to reconstruct the characters. The following block shows how this may be done.

begin paup;

nj;

set crit=like;

lset allprobs=yes;

describetrees 1/plot=no xout=internal;

end;

(Top)

### How do a import a pairwise distance matrix from another program into PAUP*?

The easiest way to do this is to include the custom distance in a NEXUS formatted distance block.

For example, below is a distance matrix for four sequences followed by a paup block that uses the

distances to build a neighbor joining tree.

#NEXUS

[!user defined distances]

Begin distances;

Dimensions ntax=4;

format nodiagonal;

matrix

t1

t2 4

t3 3 4

t4 2 3 4 ;

end;

[! nj with user defined distances]

Begin paup;

dset distance=user;

nj;

end;

A more detailed description of the distance block is given in the command reference pdf document .

(Top)

### How does PAUP* distribute missing or ambiguous changes proportionally to unambiguous changes?

Take for exampe the following sequences:

t1 aaaaaccg

t2 tgca-gtt

t3 tgcaagtt

The distance p-distance or dissimilarity between sequences t1 and t3 is pretty easy to calculate. That is, 6 of the 8 comparisons do not match, therefore the p-distance between t1 and t3 is 3/4 or .75. If you chose to ignore missing sites, the comparison between sequences t1 and t2 would be equally straightforward; 6 of the 7 comparison do not match giving a p-distance of .85714. Deciding to distribute the missing comparisons to the unambiguous changes tells PAUP* to look at all the “a” pairs between sequence t1 and t2. For the example above these are:

1 a-t

1 a-g

1 a-c

1 a-a

Distributing the changes proportionally to each unambiguous change would give 1/4 to each “a” comparison. Therefore if we tallied the number of comparisons between sequence t1 and t2 we would get a matrix that looked like this:

. a c g t a 1.25 1.25 1.25 1.25 c 0 1.00 1.00 g 0 1.00 t 0

To get the p-distance we add up the off diagonals to get 6.75 differences out of 8 comparisons or .84375.

(Top)

### We need to do a likelihood search on a UNIX machine with a general time reversible model (I+Gamma), i.e. some sites assumed to be invariable with gamma distributed rates at variable sites, with a heuristic search with 10 repetitions random addition taxa and TBR branch swapping ?

begin paup;

set criterion=likelihood;

lset nst=6 basefreq=empirical;

lset pinvar=estimate;

lset rates=gamma ncat=4 shape=estimate;

hsearch nreps=10 addseq=random swap=tbr;

end;

Notes: This analysis would be expected to take a

*very*long time if more

than four or five taxa are included in the analysis. Simply using the GTR

model is going to cost a lot in terms of computation time, since there

are many more rate parameters that need estimating in GTR compared with

HKY or even simpler models. The amount of time could be reduced considerably

by not estimating both the gamma shape parameter and the pinvar parameter.

Instead of

`pinvar=estimate`, for example, use

`pinvar=0.1`,

and instead of

`shape=estimate`, use

`shape=0.25`. These

values need not come out of thin air, however. One could supply a pretty

good tree, estimate these parameters using that tree, and then set the

pinvar and shape parameters to those estimates for purposes of conducting

a search. Once the search is finished, these parameters could be estimated

again to see if they change much. If so, it might be worth redoing the

search using the new, better estimates.

(Top)

### I have a sequence data set for which I would like to infer the phylogeny. What is a sequence of analyses that I can perform that will cover most potential pitfalls I am likely to encounter?

begin paup;

log file=log.txt start;

set criterion=parsimony;

hsearch nreps=10 addseq=random swap=tbr;

savetrees file=mp.tre brlens;

set criterion=distance;

dset distance=logdet objective=me;

hsearch nreps=10 addseq=random swap=tbr;

savetrees file=me.tre brlens;

set criterion=likelihood;

lset nst=2 basefreq=empirical rates=gamma ncat=4;

lset tratio=estimate shape=estimate;

lscore 1;

lset tratio=previous shape=previous;

hsearch nreps=1 swap=tbr start=1;

savetrees file=ml.tre brlens;

log stop;

end;

Notes: This PAUP block infers phylogeny using three different optimality

criteria and stores all the output in a log file named log.txt. The first

analysis uses the criterion of maximum parsimony to obtain a tree (or set

of trees), which are then saved to a tree file named mp.tre. The second

analysis uses the minimum evolution criterion in conjunction with

LogDet/paralinear pairwise distances and saves the resulting tree(s) in a

tree file named me.tre. The third analysis makes use of the maximum

likelihood criterion in conjunction with the HKY-gamma substitution model.

Estimates of the tratio (the transition/transversion ratio) parameter and

the gamma shape parameter are obtained using the LogDet tree already in

memory. Then, these two parameters are fixed at these estimated values for

the duration of the heuristic search. The tree(s) resulting from the

hsearch command are saved in the tree file ml.tre.

Each phylogeny method has its Achilles Heel. Maximum parsimony can be

mislead if there is too much heterogeneity in substitution rates among

lineages (the classic “long edges attract” problem) in the underlying true

phylogeny. Minimum evolution using LogDet distances can be mislead if there

is too much site-to-site rate heterogeneity, or if some of the pairwise

distances are undefined (use the “showdist” command to check). Maximum

likelihood under the HKY-gamma model can be mislead if parameters that are

assumed to be constant across the phylogeny (such as the tratio or base

frequencies) actually vary among lineages in the true phylogeny. Because

of these inherent weaknesses in individual methods, it is a good idea to

try several methods that have strengths in different areas. If you get

the same tree under all methods, then you are in good shape because

apparently there are no major pitfalls in your data. Of course, there may

be a major unknown pitfall affecting all methods, but there is not much you

can do about that. You may get trees that are not identical, but are also

not significantly different (in terms of data support) from one another.

The Kishino-Hasegawa test can be used to see whether one tree is supported

significantly less by the data than a second tree. The last possibility is

that you get truly different trees from the different methods. In this

case, it is in your best interest to examine these trees carefully for

evidence that a particular method has fallen victim to its particular

Achilles Heel. For example, if you log.txt file shows that there is strong

rate heterogeneity in your data (let’s say the shape parameter is estimated

to be 0.01), then the LogDet and parsimony trees fall under a certain

degree of suspicion compared to the likelihood tree, which should be

relatively immune to this pitfall since the model used allows for rate

heterogeneity. If the parsimony tree differs from the LogDet and

likelihood tree, look for evidence of long branch (edge) attraction in the

parsimony tree. If the LogDet tree differs from the parsimony and

likelihood trees, see if the base frequencies vary considerably between tip

taxa (a useful tool for this purpose is the basefreq command). In other

words, use PAUP* as a tool for discovering what evolutionary factors are at

work in your particular set of sequences, and use this knowledge to make an

intelligent choice between the alternatives presented to you by different

phylogeny methods.

(Top)