*演習1:Cygwinを用いたUnixライク環境の体験 [#ob72433f]
系統解析ソフトウェアには、Mac版PAUP*やMesquiteのように、使いやすいグラフィックインターフェースを備えていないものが多くあります。また、ソースコードで配布されるため、自分の環境に合わせてコンパイルしなければ使用できないものも多くあります。そういうソフトウェアを操作し、自分のコンピュータで使えるようにするには、Unix系のコマンドラインでの操作に慣れておくことが必須です。この講義では、Windows上で走るUnix環境エミュレータであるCygwinを使って、コマンドラインコンピュータの操作を学習します。~
Some softwares for phylogenetic analysis are not equipped with GUI like PAUP* of Macintosh or Mesquite.  Some softwares also to be compilied under the configuration of each terminal.  Here, we try to use Cygwin, an emulater of Unix environment on Windows, to use r8s, PAML, and others.
#contents

**Cygwinのインストール: Installation of Cygwin[#mef1baf8]
-Cygwinを右のサイトからダウンロード http://cygwin.com/~
Download Cygwin from the URL above.
-デスクトップ上のSetupをクリックしてインストール~
Double click Setup on your desktop.
--c:\cygwinにインストール~
Cygwin will be installed under c:\cygwin
--インストールするパッケージの選択で、Devel(開発環境)とデータベースのいくつかのセットを選択しておく
~Choose following packages under Devel when you see the list of packages
	make
	gcc       #C compiler 一つクリックすると、関連するものがすべて選択される。
	gcc-g77
//&ref(./WS000000.JPG,50%);
~Select sqlite3 under Database 
-インストールが終了したら、デスクトップかスタートメニューのCygwinアイコンをクリック。コマンドラインウィンドウが表示される。~
Click the Cygwin Icon on the desktop to start.
-一度立ち上げることで、c:\cygwinに/home(自分のユーザー名:以下の例ではuser1)というディレクトリができる。~
In this example, user have a directory named "user1" under c:\cygwinに/home. 
-これでUnixライクな環境が構築できた。~
Now you are in a Unix-like environment running in Windows PC.
--[[See Tutorials for Unix/Linux of Marine Biological Laboratory>http://workshop.molecularevolution.org/mbl/resources/computing.php]]


-r8s を右のサイトからダウンロード http://loco.biosci.arizona.edu/r8s/~
eoなどの解凍ソフトがインストールされていたら、ファイルが自動的に展開されて、r8s1.71というフォルダができる。~
Download r8s from the site above.  You will have a folder named "r8s1.71" on your desktop.
-このフォルダを、c:\cygwin/home/user1 に移動~
Move the folder to c:\cygwin/home/user1.


**r8sのインストールとサンプルファイルの解析 :  Installation of r8s and analysis of a sample file[#kb0c2d51]
-cygwinを立ち上げ、以下のコマンド($ の後の文字)を順番に入力~
Start Cygwin and input following commands.  Each command is shown after $.
 $ pwd
 /home/user1	#自分が今いるディレクトリ: Your current directory
 
 $ ls		#そのディレクトリの内容をリスト:  List the contents of the current directory
 r8s1.71  sample
 
 
 $ cd r8s1.71	#r8s1.71というディレクトリに移動: Move to the directory "r8s1.71"
 
 $ ls
 bin  doc  sample  src
 
 $ cd src	#srcディレクトリに移動:  Move to the directory "src"
 
 $ make		#コンパイル。うまくゆくと、srcディレクトリ内にr8s.exeができる。
                          Compile with this command.  You will have "r8s.exe" if succeed.
 
 $ mv r8s.exe /usr/local/bin	#できた実行形式ファイルを、/usr/local/binに移動: Move the binary file to /usr/local/bin
 				#こうしておくと、ファイル名だけで実行できる
                                   You can execute the .exe files in /usr/local/bin from any directories
 $ ls /usr/local/bin		#ちゃんと移動できたことを確認:  Confirm if the file was moved.
 r8s.exe
 
 $ r8s -v			#起動テスト: Test starting r8s.  You will see the version.
 r8s version 1.71 (compiled Dec 12 2007)
 r8s version 1.71 (Dec 12 2007)
 
 r8s>quit			#起動テスト終了:  quit r8s.
 
 $ cd /home/user1/r8s1.71/sample    #sampleファイルのディレクトリに移動: Move to the sample directory
 
 $ ls
 SAMPLE_1.7          SAMPLE_CROSSVAL  SAMPLE_LOCAL_CLOCK  SAMPLE_SUPERTREE
 SAMPLE_CONSTRAINTS  SAMPLE_FLU       SAMPLE_SIMPLE
 
 $ r8s -f SAMPLE_SIMPLE                      #sampleファイルを一つ解析してみる。結果がすぐに表示される。
                                                                         Analyze one of the sample file.  You will see the result soon.

**PAML, Multidivtimeのbinディレクトリへのインストール: Installation of PAML etc to /usr/local/bin/ [#of7b523a]
これらのファイルはウィンドウズのコマンドプロンプトからでも実行できるが、パスの設定が面倒な場合がある。また、cygwinのシェルを使う方が、コマンド入力や解析が便利な場合がある。そこで、実行形式ファイル(exeファイル)をパスの通ったc:\cygwin//usr/local/binにまとめておく。~
To use PAML and Multidivtime easier, I recommend you to install them under c:\cygwin/home/user1/.
//(注:大量の繰り返しを必要とする解析の場合、Cygwin環境で使うと計算速度が遅くなるかもしれない。その場合、コマンドプロンプトから実行する方が良いかもしれない。)

***PAMLのダウンロードと動作確認 [#c17db6c1]
-右のサイトからPAMLをダウンロード。 http://abacus.gene.ucl.ac.uk/software/paml.html~
eo等がインストールされていれば、paml4というフォルダが自動的に展開される。この中のbinフォルダに入っている以下9つのexeファイルを、すべて、 c:\cygwin/usr/local/bin に移動。~
Download PAML from the site above,, and move 9 executable files under the bin folder of paml4 folder to c:\cygwin/home/user1/.
 baseml.exe
 basemlg.exe
 chi2.exe
 codeml.exe
 evolver.exe
 mcmctree.exe
 pamp.exe
 TreeTimeJeff.exe
 yn00.exe
-cygwinで確認     #下の様に表示される。~
Confirm by ls if the files have moved.
 $ ls /usr/local/bin
     TreeTimeJeff.exe  basemlg.exe  codeml.exe   mcmctree.exe  r8s.exe
     baseml.exe        chi2.exe     evolver.exe  pamp.exe      yn00.exe
-paml4フォルダをc:\cygwin/home/user1/ディレクトリにコピー。doc, bin, src, Technicalというディレクトリは不要なので削除。~
Move paml4 folder to c:\cygwin/home/user1/ .  You can delete doc, bin, src, Technical directories.  Just type follwing commands to run a test analysis.
 $ ls /home/user1
    paml4
-cygwinで下記の様に入力して、codemlの動作確認]
 $ cd /home/user1/paml4
 $ codeml   #解析結果が画面に表示される


***Multidivtimeのダウンロードとbinディレクトリへのインストール [#c4067225]
-右のサイトからmultidivtimeをダウンロード: http://statgen.ncsu.edu/thorne/multidivtime.html~
デスクトップ上にmultidistributeというフォルダができる。フォルダを開いて種類で並び替え、以下4つのファイルを c:\cygwin/usr/local/binに移動。
 estbranches_aa.exe
 estbranches_dna.exe
 multidivtime.exe
 paml2modelinf.exe
-cygwinで確認     #下の様に表示される。
  $ ls /usr/local/bin
     TreeTimeJeff.exe  chi2.exe            estbranches_dna.exe  multidivtime.exe   r8s.exe
     baseml.exe        codeml.exe          evolver.exe          paml2modelinf.exe  yn00.exe
     basemlg.exe       estbranches_aa.exe  mcmctree.exe         pamp.exe
-サンプルファイルの入ったフォルダ, multidistribute を/home/user1にコピーしておく。|
Pamlの場合と同様に、サンプルファイルの入ったフォルダの中から、multidivtimeの実行ファイルを実行できる。

**その他、Cygwin環境で使うと便利なソフトウェア : other useful softerwares under cygwin environment[#xb512a1a]
-MrBayes http://mrbayes.scs.fsu.edu/
-MrModeltest  http://www.abc.se/~nylander/

いずれも、上で説明した他のソフトウェアと同様に、ダウンロードしたら、c:\cygwin/usr/local/bin/ に入れておく。Unix系の操作が必要な解析のデータファイルはいつも自分のホームディレクトリ(/home/user1)に入れておくと決めておくとよい。。解析ごとにサブディレクトリをつくれば、データの整理が簡単になる。~
After downloading from above sites, move executable files to c:\cygwin/usr/local/bin/.  To put data files in order, making data directories under your home directory (/home/user1 (or your own user name)) is recommendes.



**Major evolutionary transitions in ant agriculture [#h0c5f655]
-Ted R. Schultz and Sean G. Brady. 2008~
[[PNAS. 105(14): 5435-5440.>http://www.pnas.org/content/105/14/5435.full]]


***Data [#tbd7b358]
-2,459 aligned nucleotide sites from the coding regions of four nuclear genes:
--elongation factor 1-F1 (EF1-F1) (1,075 bp)
--elongation factor 1-F2 (EF1-F2) (517 bp)
--wingless (409 bp)
--long-wavelength rhodopsin (opsin) (458 bp)
-All data in this study represent protein-coding (exon) sequences~
intervening introns in opsin and EF1F1 were not used because they could not be aligned confidently. 
-Sample: 65 attine taxa and 26 nonattine outgroups.~
Primers used for PCR amplification and sequencing are found in supporting information (SI) Table S1.
-Of the total 2,459 included nucleotide positions from all genes, 952 were variable and 847 parsimony informative. Sequences are deposited in GenBank; taxa and accession numbers are listed in Table S2.

***Phylogenetic Analyses [#f15b24f4]
-(i) Maximum parsimony (MP) analyses
--PAUP* v4.0b10
---heuristic searches with tree bisection.reconnection (TBR) and 1,000 random-taxon-addition replicates. ~
Analyses identified 12 most-parsimonious trees (MPTs) of length  4,383, CI  0.270, RI  0.704. Successive-approximations-weighting analyses identified a single tree, one of the MPTs.
---Nonparametric bootstrap analyses used TBR branch-swapping and consisted of 1,000 pseudoreplicates, with 10 random-taxon-addition replicates per pseudoreplicate. 

-(ii) Maximum likelihood (ML)
-- ModelTest v3.06~
The data and the MPT identified by weighting were evaluated under the Akaike information criterion (AIC) as calculated in, 
---identifying the GTR model of evolution. 
--GARLI v0.951 using the GTR model (with six  rate categories), with a heuristiclosuccessiveapproximationsg likelihood of 24,868.84927. 
---Nonparametric bootstrap analyses consisted of 500 pseudoreplicates in GARLI under the same conditions as the ML search.
---A subsequent  search in PAUP* using the most likely tree identified by the GARLI searches as the starting tree and employing TBR branch-swapping and the GTRI model (with six  rate categories) resulted in exactly the same topology and likelihood score. 

-(iii) Bayesian nucleotide-model Markov Chain Monte Carlo (MCMC):~
MrBayes v3.1.2 (59). 
--Burn-in and run convergence were assessed by comparing the mean and variance of log likelihoods, both by eye and by using the program
---Tracer v1.3
---MrBayes  e e.stat f f output file
---MrBayes bthe split frequencies diagnostic.
--Eight character partitions for nucleotide-model analyses:
---four partitions consisting of the combined first and second codon positions for each of the four genes
---four partitions consisting of the third codon position for each of the four genes.
---based on ModelTest results~
the wingless third-position - GTR model~
opsin and EF1F2 third positions - separately assigned the HKYI model~
all other character partitions - separately assigned the GTRI model
-(iv) Bayesian codon-model MCMC~

**Phylogenetic Mapping of Agricultural Systems. [#s89bfd51]
//-Terminal taxa were assigned states for a single six-state character representing the four attine agricultural systems and leaf-cutter agriculture (i.e., no agriculture, lower agriculture, yeast agriculture, higher agriculture, leaf-cutter agriculture, coral-fungus agriculture).
//-Five species (Myrmicocrypta n. sp. Brazil, Mycetagroicus triangularis, Cyphomyrmex n. sp., Cyphomyrmex morschi, Trachymyrmex irmgardae, and Pseudoatta n. sp.) received  e eunknown f f (i.e.,  e e? f f) state assignments, and @Trachymyrmex papulatus received a  e elower agriculture f f state assignment based on a single garden collection from Argentina (a second colony from the same locality cultivated a typical higher attine garden). 
//-Character evolution was optimized onto the Bayesian codon-model consensus tree (with branch lengths) under both parsimony using MacClade and maximum likelihood using the StochChar module provided in the Mesquite package.
//-Under parsimony, ancestral-state optimizations were unambiguous. 
//-Under the Markov k-state 1-parameter model,  the likelihood that each agricultural system arose in the most recent common ancestor of the corresponding ant clade was, as a proportion of the total probability ( 1.0) distributed across the six character states, 0.9831 for lower agriculture, 0.9995 for yeast agriculture, 0.9905 for higher agriculture, 0.9924 for leaf-cutter agriculture, and 0.9998 for coral-fungus agriculture.

**Divergence Dating [#b0e583dd]
//-We inferred divergence dates using both semiparametric and Bayesian relaxed clock methods.
//-The first method used was the semiparametric penalized likelihood approach implemented in r8s v1.7 (64, 65). 
//--Branch lengths were first estimated on the ML topology using PAUP* under a GTRI model. 
//--The Pogonomyrmex and two Myrmica species were used to root the tree during branch length estimation and were subsequently removed from all dating analyses. 
//--Thus, the root of the tree for all dating analyses represents the origin of the  e ecore myrmicines, f f a well supported clade established by previous work (33). 
//--Smoothing parameters were estimated by using the cross-validation feature in r8s. 
//--Confidence intervals were calculated by using 100 nonparametric bootstrap replicates of the dataset generated by Mesquite, followed by reestimation of branch lengths and divergence times for each replicate.
//-We calibrated three nodes with minimum-age constraints using attine Dominican amber fossils. 
//^^These fossils are 
//---(i) Apterostigma electropilosum, a member of the A. pilosum group 
//---(ii) Cyphomyrmex maya and Cyphomyrmex taino, both members of the C. rimosus group 
//---(iii) Trachymyrmex primaevus, a fossil of uncertain placement within the genus (but see below).