Conservation Genetics of Mangroves/Seminars/Data Analysis

Data Analysis †

DNA sequence †

1. Open Sequence data by Sequence Scanner Software v1.0. You can vie and edit your sequence data with this software. This software also display the quality of identification of each nucleotide by graphically expressive reports on results.

2. Excellent alternative software to edit ABI sequence data is Bioedit. This software is to edit, align, and analyze sequences. You can edit sequence data (electropherogram) generated by ABI sequencer using this software. The reverse-compliment view of electropherogram is very useful when you compare the data from both strand. To do this,
- 1. Open the ABI sequence data file (.ab1) with BioEdit. You will see the electropherogram
- 2. Menu Bar > View > Reverse Compliment: You can see the "reverse complimented" electropherogram.
- Mac user may want to use Se-Al(http://iubio.bio.indiana.edu/soft/iubionew/molbio/dna/analysis/Pist/main.html) or Mesquite (http://mesquiteproject.org/mesquite/mesquite.html) to edit sequence data.

When you finish editing data by this software, do multiple alignment by
Menu Bar > Accessory Application > Clustalw Multiple Alignment.
Save the alignment as a FASTA formatted file.

3. Notepad++ is a free text editor and Notepad replacement that supports several languages. Running in the MS Windows environment, its use is governed by GPL License. Regular expression can be used in find/replace. See the explanation of Regular Expression in http://www.scintilla.org/SciTERegEx.html . This is a extremely useful way of find/replace text using something like "wild cards." For example, FASTA format data obtained from GenBank,

>gi|125634588|gb|EF054876.1| Sinorhizobium terangae strain ORS 3520 16S ribosomal..
-------------TGCGCGGCTACC--ATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
>gi|110002849|emb|AM181743.1| Sinorhizobium terangae partial 16S rRNA gene, strain..
-----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
>gi|110002847|emb|AM181741.1| Sinorhizobium terangae partial 16S rRNA gene, strain..
-----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
>gi|13872799|emb|AJ295074.1| Sinorhizobium terangae 16S rRNA gene, strain ORS1082
-----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA

can be find replaced by Notepad++ using the following find/replace strings with the use of regular expression ON.

find string: ^>gi\|[0-9]+\|[^|]+\|([A-Z0-9]+)[^\|]+\| ([A-Z])[a-zA-Z]+ ([a-z][a-z][a-z][a-z]).*
replace string: >\2_\3\1

then, you can make the sample name short and unique (which is required for some software)

>S_teraEF054876
-------------TGCGCGGCTACC--ATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
>S_teraAM181743
-----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
>S_teraAM181741
-----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
>S_teraAJ295074
-----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA

4: DNAsp is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DNA sequence variation within and between populations, linkage disequilibrium, gene flow etc., can be estimated by this excellent software. In this training course, you can obtain the list of haplotypes of chloroplast data using this software.

Connecting sequence data using text editor and Excel
- Replace delimiter in to tab
- copy paste to Excel
- Sort and connect two sequence data
- some functions are useful
```
left
exact
concatenate
```
- copy and paste into text editor
- replace tab into return

↑

Example: work flow for Xylocarpus data †

Koji has kindly provided us data for Xylocarpus

XylocarpusKT_trnD-trnT.fas
XylocarpusKT_accD-psaI.fas

Open these file by BioEdit

Menu > File > Export > tab delimited text file

Open the file with excel (or open it with text editor and copy & paste to Excel
Do the same to another file
In Excel, copy-paste each data side by side in a spread-sheet. Each two columns contain sample name and sequence data.
- Sort by name (sample name) for each two columns
- Connect the two sequence by concatenate function, for example
```
=CONCATENATE(C1,F1)
```
- Save the two column of sample name and concatenated sequence to a tab-delimited file
In BioEdit, Menu Bar > File > New Alignment
- Import the text data
```
Menu Bar > File > Import Data > from tab-delimited file
```
- If your data are not aligned, do complete multiple alignment,
```
Menu Bar > Accessory Application > Clustalw Multiple Alignment
```
- when you have aligned data save it as FASTA formatted file

In DNAsp, open the FASTA formatted file,

Menu Bar > Generate > Haplotype data file

Now you get the haploytpe data with sample names

↑

Analyses of population differentiation †

1. GenAlex http://www.anu.edu.au/BoZo/GenAlEx/
This is an Excel addin software to analyze population genetic data.
- GenAlex 6.3 is now available: http://www.anu.edu.au/BoZo/GenAlEx/new_version.php
- Download and documentations: http://www.anu.edu.au/BoZo/GenAlEx/new_version.php

Conservation Genetics of Mangroves/Seminars/Data Analysis

M E N U

recent(10)

Data Analysis †

DNA sequence †

Example: work flow for Xylocarpus data †

Analyses of population differentiation †