Calculates and maintains patristic distance information of taxa on a tree.
Convenience method to return list of community sets from a delimited file that lists taxon (labels) in columns and community presence/absences or abundances in rows.
Calculates the distances. Note that the path length (in number of steps) between taxa that span the root will be off by one if the tree is unrooted.
Returns distance between taxon1 and taxon2.
Returns list of patristic distances.
Iterates over all distinct pairs of taxa in matrix.
Instantiates a new PhylogeneticDistanceMatrix instance with data from an external source.
| Parameters: |
|
|---|---|
| Returns: | pdm (A |PhylogeneticDistanceMatrix| instance) |
Examples
import dendropy
pdm1 = dendropy.PhylogeneticDistanceMatrix.from_csv(
src=open("data.csv"),
delimiter=",")
pdm2 = dendropy.PhylogeneticDistanceMatrix.from_csv(
src=open("data.tsv"),
delimiter=" ")
Creates and returns a PhylogeneticDistanceMatrix based on the given tree.
Note that this creates a “snapshot” of the current state of the tree. Subsequent changes to the tree will not be reflected in PhylogeneticDistanceMatrix instances previously created.
Also note that syntactically you may prefer to use:
pdm = tree.phylogenetic_distance_matrix()
instead of:
pdm = PhylogeneticDistanceMatrix.from_tree(tree)
| Parameters: | tree (a Tree instance) – The Tree from which to get the phylogenetic distances. |
|---|---|
| Returns: | pdm (A |PhylogeneticDistanceMatrix| instance) |
Examples
import dendropy
tree = dendropy.Tree.get(path="tree.nex",
schema="nexus")
pdm1 = dendropy.PhylogeneticDistanceMatrix.from_tree(tree)
# following is equivalent to above and probably preferred:
pdm2 = tree.phylogenetic_distance_matrix()
Calculates the phylogenetic ecology statistic “MNTD”[1,2] for the tree (only considering taxa for which filter_fn returns True when applied if filter_fn is specified).
The mean nearest taxon distance (mntd) is given by:
\[mntd = \frac{ \sum_{i}^{n} min(\delta_{i,j}) }{n},\]
where \(i \neq j\), \(\delta_{i,j}\) is the phylogenetic distance between species \(i\) and \(j\), and \(n\) is the number of species in the sample.
| Parameters: |
|
|---|---|
| Returns: | mntd (float) – The Mean Nearest Taxon Distance (MNTD) statistic for the daata. |
Examples
import dendropy
tree = dendropy.Tree.get(path="data.nex",
schema="nexus")
pdm = dendropy.PhylogeneticDistanceMatrix(tree)
# consider all tips
mntd = pdm.mean_nearest_taxon_distance()
# only tips within the same community, based on the node annotation
# "community"
mntds_by_community = {}
for community_label in ("1", "2", "3",):
filter_fn = lambda x: x.annotations["community"] == community_label
mntd = pdm.mean_pairwise_distance(filter_fn=filter_fn)
mntds_by_community[community_label] = mntd
References
[1] Webb, C.O. 2000. Exploring the phylogenetic structure of ecological communities: An example for rainforest trees. The American Naturalist 156: 145-155.
[2] Swenson, N.G. Functional and Phylogenetic Ecology in R.
Calculates the phylogenetic ecology statistic “MPD”[1,2] for the tree (only considering taxa for which filter_fn returns True when applied if filter_fn is specified).
The mean pairwise distance (mpd) is given by:
\[mpd = \frac{ \sum_{i}^{n} \sum_{j}^{n} \delta_{i,j} }{n \choose 2},\]
where \(i \neq j\), \(\delta_{i,j}\) is the phylogenetic distance between species \(i\) and \(j\), and \(n\) is the number of species in the sample.
| Parameters: |
|
|---|---|
| Returns: | mpd (float) – The Mean Pairwise Distance (MPD) statistic for the daata. |
Examples
import dendropy
tree = dendropy.Tree.get(path="data.nex",
schema="nexus")
pdm = dendropy.PhylogeneticDistanceMatrix(tree)
# consider all tips
mpd1 = pdm.mean_pairwise_distance()
# only tips within the same community, based on the node annotation
# "community"
mpds_by_community = {}
for community_label in ("1", "2", "3",):
filter_fn = lambda x: x.annotations["community"] == community_label
mpd = pdm.mean_pairwise_distance(filter_fn=filter_fn)
mpds_by_community[community_label] = mpd
References
[1] Webb, C.O. 2000. Exploring the phylogenetic structure of ecological communities: An example for rainforest trees. The American Naturalist 156: 145-155.
[2] Swenson, N.G. Functional and Phylogenetic Ecology in R.
Returns an Neighbor-Joining (NJ) tree based on the distances in the matrix.
Calculates and returns a tree under the Neighbor-Joining algorithm of Saitou and Nei (1987) for the data in the matrix.
| Parameters: | is_weighted_edge_distances (bool) – If True then edge lengths will be considered for distances. Otherwise, just the number of edges. |
|---|---|
| Returns: | t (|Tree|) – A Tree instance corresponding to the Neighbor-Joining (NJ) tree for this data. |
Examples
import dendropy
# Read data from a CSV file into a PhylogeneticDistanceMatrix
# object
with open("distance_matrix.csv") as src:
pdm = dendropy.PhylogeneticDistanceMatrix.from_csv(
src,
is_first_row_column_names=True,
is_first_column_row_names=True,
is_allow_new_taxa=True,
delimiter=",",
)
# Calculate the tree
nj_tree = pdm.nj_tree()
# Print it
print(nj_tree.as_string("nexus"))
References
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4: 406-425.
Returns the number of edges between two taxon objects.
Returns patristic distance between two taxon objects.
Randomly shuffles taxa in-situ.
Returns the standardized effect size value for the MNTD statistic under a null model under various community compositions.
The S.E.S. is given by:
\[SES(statistic) = \frac{observed - mean(model_{null})}{sd(model_{null})}\]
This removes any bias associated with the decrease in variance in the MPD statistic value as species richness increases to the point where communities become saturated. Equivalent to -1 times the Nearest Taxon Index when using phylogenetic distances.
In contrast to the function calculating the non-standardized effect size version of this statistic, which uses filter function to specify the subset of taxa to be considerd, here a collection of (multiple) sets of taxa constituting a community is specified. This to allow calculation of the null model statistic across all community sets for each randomization replicate.
| Parameters: |
|
|---|---|
| Returns: | r (list of results) – A list of results, with each result corresponding to a community set given in assemblage_memberships. Each result consists of a named tuple with the following elements:
|
Examples
import dendropy
tree = dendropy.Tree.get_from_path(
src="data/community.tree.newick",
schema="newick",
rooting="force-rooted")
pdm = dendropy.PhylogeneticDistanceMatrix.from_tree(tree)
assemblage_memberships = pdm.assemblage_membership_definitions_from_csv("data/comm1.csv")
results = pdm.standardized_effect_size_mean_nearest_taxon_distance(assemblage_memberships=assemblage_memberships)
print(results)
Returns the standardized effect size value for the MPD statistic under a null model under various community compositions.
The S.E.S. is given by:
\[SES(statistic) = \frac{observed - mean(model_{null})}{sd(model_{null})}\]
This removes any bias associated with the decrease in variance in the MPD statistic value as species richness increases to the point where communities become saturated. Equivalent to -1 times the Nearest Relative Index (NRI) when using phylogenetic distances.
In contrast to the function calculating the non-standardized effect size version of this statistic, which uses filter function to specify the subset of taxa to be considerd, here a collection of (multiple) sets of taxa constituting a community is specified. This to allow calculation of the null model statistic across all community sets for each randomization replicate.
| Parameters: |
|
|---|---|
| Returns: | r (list of results) – A list of results, with each result corresponding to a community set given in assemblage_memberships. Each result consists of a named tuple with the following elements:
|
Examples
import dendropy
tree = dendropy.Tree.get_from_path(
src="data/community.tree.newick",
schema="newick",
rooting="force-rooted")
pdm = tree.phylogenetic_distance_matrix()
assemblage_membership_definitions = pdm.assemblage_membership_definitions_from_csv("data/comm1.csv")
results = pdm.standardized_effect_size_mean_pairwise_distance(assemblage_memberships=assemblage_membership_definitions.values())
print(results)
Returns sum of patristic distances on tree.
Iterates over taxa in matrix. Note that this could be a subset of the taxa in the associated taxon namespace.
Returns an Unweighted Pair Group Method with Arithmetic Mean (UPGMA) tree based on the distances in the matrix.
| Parameters: | is_weighted_edge_distances (bool) – If True then edge lengths will be considered for distances. Otherwise, just the number of edges. |
|---|---|
| Returns: | t (|Tree|) – A Tree instance corresponding to the UPGMA tree for this data. |
Examples
import dendropy
# Read data from a CSV file into a PhylogeneticDistanceMatrix
# object
with open("distance_matrix.csv") as src:
pdm = dendropy.PhylogeneticDistanceMatrix.from_csv(
src,
is_first_row_column_names=True,
is_first_column_row_names=True,
is_allow_new_taxa=True,
delimiter=",",
)
# Calculate the tree
upgma_tree = pdm.upgma_tree()
# Print it
print(upgma_tree.as_string("nexus"))