*&color(,yellow){このページは編集中です}; [#hf6160ea]
*演習2:Unixライク環境での系統解析:実際のデータを用いて [#p1e5cf33]

**Windows用の解析ソフトウェア [#rbc9a738]
**ソフトウェアの準備:Windows用の解析ソフトウェア [#rbc9a738]
-MESQUITE http://mesquiteproject.org/mesquite/mesquite.html
--JRE5 Mesquiteに必要な、Java 環境 http://java.sun.com/javase/downloads/index_jdk5.jsp~
Java Runtime Environment (JRE) 5.0 (2009年1月時点ではUpdate 17)をダウンロードしてインストール。
-Bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html

**論文の選択 [#bb8e3860]
この演習では、Ted R. Schultz and Sean G. Brady. 2008. [[PNAS. 105(14): 5435-5440.>http://www.pnas.org/content/105/14/5435.full]], "Major evolutionary transitions in ant agriculture"に発表されているデータを用いて、実際にGenBankから配列データをダウンロードし、系統解析を行ってみましょう。~

**塩基配列データの準備: accession番号を用いてGenBankからダウンロード [#w378e38b]
上で選んだ論文を見てみると、約90サンプル、4領域合計約2,500bpの配列データを用いた解析を行っています。この論文の[[supplement information, Table S2>http://www.pnas.org/content/suppl/2008/03/24/0711024105.DCSupplemental/0711024105SI.pdf#nameddest=ST1]]には、GenBankに登録された個々の遺伝子領域のaccession番号が載っています。~
やりたい解析は、用いた全ての配列を連結した 約2,500bpの長さの配列データを用いた解析ですから、ダウンロードした配列を%%%連結する%%%作業が必要になります。~
+論文([[Table S2>http://www.pnas.org/content/suppl/2008/03/24/0711024105.DCSupplemental/0711024105SI.pdf#nameddest=ST1]]からaccession番号を遺伝子領域ごとに取得

**Major evolutionary transitions in ant agriculture [#ma7c1896]
-Ted R. Schultz and Sean G. Brady. 2008~
[[PNAS. 105(14): 5435-5440.>http://www.pnas.org/content/105/14/5435.full]]
***accession番号を遺伝子領域ごとに取得 [#bf04d98e]
+[[Table S2>http://www.pnas.org/content/suppl/2008/03/24/0711024105.DCSupplemental/0711024105SI.pdf#nameddest=ST1]]からデータをコピー
--[[Table S2>http://www.pnas.org/content/suppl/2008/03/24/0711024105.DCSupplemental/0711024105SI.pdf#nameddest=ST1]]をみると、それぞれのカラムはスペースで区切られている。ただ、"no seq", " sp."とか、カラムの区切り以外でスペースが使われているところもある。
 no seq → no_seq
 cf.  → cf_
   + → ¥t
--改行をカンマに一括置換、 ",no_seq"を削除
 ¥n → ,
 ,no_seq → <何も入力しない>
---EF1aF1_e1: EU204345, EU204298, EU204378, EU204363, EU204364, EU204331, EU204360, EU204361, EU204323, EU204377, EU204348, EU204317, EU204347, EU204374, EU204349, EU204334, EU204318, EU204367, EU204335, EU204350, EU204314, EU204324, EU204359, EU204379, EU204315, EU204380, EU204313, EU204299, EU204328, EU204355, EU204366, EU204321, EU204320, EU204330, EU204342, EU204369, EU204354, EU204368, EU204365, EU204376, EU204346, EU204326, EU204351, EU204307, EU204371, EU204319, EU204358, EU204311, EU204370, EU204312, EU204341, EU204310, EU204357, EU204343, EU204305, EU204381, EU204375, EU204344, EU204373
---EF1aF1_e2: EU204436, EU204389, EF013211, EU204453, EU204454, EU204422, EU204450, EU204451, EU204414, EF013230, EU204439, EU204408, EU204438, EU204464, EU204440, EU204425, EU204409, EU204457, EU204426, EU204441, EU204405, EU204415, EU204449, EF013232, EU204406, EF013240, EU204404, EU204390, EU204419, EU204445, EU204456, EU204412, EU204411, EU204421, EU204433, EU204459, EU204458, EU204455, EF013251, EU204437, EU204417, EU204442, EU204398, EU204461, EU204410, EU204448, EU204402, EU204460, EU204403, EU204432, EU204401, EU204447, EU204434, EU204396, EF013296, EF013299, EU204435, EU204463
---EU204586, EU204541, EF013373, EU204604, EU204605, EU204573, EU204601, EU204602, EU204565, EF013392, EU204589, EU204559, EU204588, EU204615, EU204590, EU204576, EU204560, EU204608, EU204577, EU204591, EU204556, EU204566, EU204600, EF013394, EU204557, EF013402, EU204555, EU204570, EU204596, EU204607, EU204563, EU204562, EU204572, EU204583, EU204610, EU204595, EU204609, EU204606, EF013414, EU204587, EU204568, EU204592, EU204549, EU204612, EU204561, EU204599, EU204553, EU204611, EU204554, EU204582, EU204552, EU204598, EU204584, EU204547, EF013458, EF013461, EU204585, EU204614
---EU204511, EU204465, EF013534, EU204529, EU204530, EU204497, EU204526, EU204527, EU204490, EF013549, EU204514, EU204484, EU204513, EU204540, EU204515, EU204500, EU204485, EU204533, EU204501, EU204516, EU204481, EU204491, EU204525, EF013551, EU204482, EF013558, EU204480, EU204466, EU204494, EU204521, EU204532, EU204488, EU204487, EU204496, EU204508, EU204535, EU204520, EU204534, EU204531, EF013565, EU204512, EU204517, EU204474, EU204537, EU204486, EU204524, EU204478, EU204536, EU204479, EU204507, EU204477, EU204523, EU204509, EU204472, EF013598, EF013600, EU204510, EU204539
---EU204268, EU204222, EF013534, EU204286, EU204287, EU204254, EU204283, EU204284, EU204247, EF013549, EU204271, EU204241, EU204270, EU204297, EU204272, EU204257, EU204242, EU204290, EU204258, EU204273, EU204238, EU204248, EU204282, EF013551, EU204239, EF013558, EU204237, EU204223, EU204251, EU204278, EU204289, EU204245, EU204244, EU204253, EU204265, EU204292, EU204277, EU204291, EU204288, EF013565, EU204269, EU204274, EU204231, EU204294, EU204243, EU204281, EU204235, EU204293, EU204236, EU204264, EU204234, EU204280, EU204266, EU204229, EF013598, EF013600, EU204267, EU204296
---EU204192, EU204145, EF013662, EU204210, EU204211, EU204178, EU204207, EU204208, EU204170, EF013677, EU204195, EU204164, EU204194, EU204221, EU204196, EU204181, EU204165, EU204214, EU204182, EU204197, EU204161, EU204171, EU204206, EF013679, EU204162, EF013686, EU204160, EU204146, EU204175, EU204202, EU204213, EU204168, EU204167, EU204177, EU204189, EU204216, EU204201, EU204215, EU204212, EF013693, EU204193, EU204173, EU204198, EU204154, EU204218, EU204166, EU204205, EU204158, EU204217, EU204159, EU204188, EU204157, EU204204, EU204190, EU204152, EF013726, EF013728, EU204191, EU204220

***Data [#l57a712c]
-2,459 aligned nucleotide sites from the coding regions of four nuclear genes:
--elongation factor 1-F1 (EF1-F1) (1,075 bp)
--elongation factor 1-F2 (EF1-F2) (517 bp)
--wingless (409 bp)
--long-wavelength rhodopsin (opsin) (458 bp)
-All data in this study represent protein-coding (exon) sequences~
intervening introns in opsin and EF1F1 were not used because they could not be aligned confidently. 
-Sample: 65 attine taxa and 26 nonattine outgroups.~
Primers used for PCR amplification and sequencing are found in supporting information (SI) Table S1.
-Of the total 2,459 included nucleotide positions from all genes, 952 were variable and 847 parsimony informative. Sequences are deposited in GenBank; taxa and accession numbers are listed in Table S2.

***Phylogenetic Analyses [#u92e7045]
-(i) Maximum parsimony (MP) analyses
--PAUP* v4.0b10
---heuristic searches with tree bisection.reconnection (TBR) and 1,000 random-taxon-addition replicates. ~
Analyses identified 12 most-parsimonious trees (MPTs) of length  4,383, CI  0.270, RI  0.704. Successive-approximations-weighting analyses identified a single tree, one of the MPTs.
---Nonparametric bootstrap analyses used TBR branch-swapping and consisted of 1,000 pseudoreplicates, with 10 random-taxon-addition replicates per pseudoreplicate. 

-(ii) Maximum likelihood (ML)
-- ModelTest v3.06~
The data and the MPT identified by weighting were evaluated under the Akaike information criterion (AIC) as calculated in, 
---identifying the GTR model of evolution. 
--GARLI v0.951 using the GTR model (with six  rate categories), with a heuristiclosuccessiveapproximationsg likelihood of 24,868.84927. 
---Nonparametric bootstrap analyses consisted of 500 pseudoreplicates in GARLI under the same conditions as the ML search.
---A subsequent  search in PAUP* using the most likely tree identified by the GARLI searches as the starting tree and employing TBR branch-swapping and the GTRI model (with six  rate categories) resulted in exactly the same topology and likelihood score. 

-(iii) Bayesian nucleotide-model Markov Chain Monte Carlo (MCMC):~
MrBayes v3.1.2 (59). 
--Burn-in and run convergence were assessed by comparing the mean and variance of log likelihoods, both by eye and by using the program
---Tracer v1.3
---MrBayes  e e.stat f f output file
---MrBayes bthe split frequencies diagnostic.
--Eight character partitions for nucleotide-model analyses:
---four partitions consisting of the combined first and second codon positions for each of the four genes
---four partitions consisting of the third codon position for each of the four genes.
---based on ModelTest results~
the wingless third-position - GTR model~
opsin and EF1F2 third positions - separately assigned the HKYI model~
all other character partitions - separately assigned the GTRI model
-(iv) Bayesian codon-model MCMC~

**Phylogenetic Mapping of Agricultural Systems. [#m346c506]
//-Terminal taxa were assigned states for a single six-state character representing the four attine agricultural systems and leaf-cutter agriculture (i.e., no agriculture, lower agriculture, yeast agriculture, higher agriculture, leaf-cutter agriculture, coral-fungus agriculture).
//-Five species (Myrmicocrypta n. sp. Brazil, Mycetagroicus triangularis, Cyphomyrmex n. sp., Cyphomyrmex morschi, Trachymyrmex irmgardae, and Pseudoatta n. sp.) received  e eunknown f f (i.e.,  e e? f f) state assignments, and @Trachymyrmex papulatus received a  e elower agriculture f f state assignment based on a single garden collection from Argentina (a second colony from the same locality cultivated a typical higher attine garden). 
//-Character evolution was optimized onto the Bayesian codon-model consensus tree (with branch lengths) under both parsimony using MacClade and maximum likelihood using the StochChar module provided in the Mesquite package.
//-Under parsimony, ancestral-state optimizations were unambiguous. 
//-Under the Markov k-state 1-parameter model,  the likelihood that each agricultural system arose in the most recent common ancestor of the corresponding ant clade was, as a proportion of the total probability ( 1.0) distributed across the six character states, 0.9831 for lower agriculture, 0.9995 for yeast agriculture, 0.9905 for higher agriculture, 0.9924 for leaf-cutter agriculture, and 0.9998 for coral-fungus agriculture.

**Divergence Dating [#gae0451f]
//-We inferred divergence dates using both semiparametric and Bayesian relaxed clock methods.
//-The first method used was the semiparametric penalized likelihood approach implemented in r8s v1.7 (64, 65). 
//--Branch lengths were first estimated on the ML topology using PAUP* under a GTRI model. 
//--The Pogonomyrmex and two Myrmica species were used to root the tree during branch length estimation and were subsequently removed from all dating analyses. 
//--Thus, the root of the tree for all dating analyses represents the origin of the  e ecore myrmicines, f f a well supported clade established by previous work (33). 
//--Smoothing parameters were estimated by using the cross-validation feature in r8s. 
//--Confidence intervals were calculated by using 100 nonparametric bootstrap replicates of the dataset generated by Mesquite, followed by reestimation of branch lengths and divergence times for each replicate.
//-We calibrated three nodes with minimum-age constraints using attine Dominican amber fossils. 
//^^These fossils are 
//---(i) Apterostigma electropilosum, a member of the A. pilosum group 
//---(ii) Cyphomyrmex maya and Cyphomyrmex taino, both members of the C. rimosus group 
//---(iii) Trachymyrmex primaevus, a fossil of uncertain placement within the genus (but see below).