Data Analysis

DNA sequence

  • 1. Open Sequence data by Sequence Scanner Software v1.0. You can vie and edit your sequence data with this software. This software also display the quality of identification of each nucleotide by graphically expressive reports on results.
  • 2. Excellent alternative software to edit ABI sequence data is Bioedit. This software is to edit, align, and analyze sequences. You can edit sequence data (electropherogram) generated by ABI sequencer using this software. The reverse-compliment view of electropherogram is very useful when you compare the data from both strand. To do this,

When you finish editing data by this software, do multiple alignment by
Menu Bar > Accessory Application > Clustalw Multiple Alignment.
Save the alignment as a FASTA formatted file.

  • 3. Notepad++ is a free text editor and Notepad replacement that supports several languages. Running in the MS Windows environment, its use is governed by GPL License. Regular expression can be used in find/replace. See the explanation of Regular Expression in http://www.scintilla.org/SciTERegEx.html . This is a extremely useful way of find/replace text using something like "wild cards." For example, FASTA format data obtained from GenBank,
    >gi|125634588|gb|EF054876.1| Sinorhizobium terangae strain ORS 3520 16S ribosomal..
    -------------TGCGCGGCTACC--ATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    >gi|110002849|emb|AM181743.1| Sinorhizobium terangae partial 16S rRNA gene, strain..
    -----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    >gi|110002847|emb|AM181741.1| Sinorhizobium terangae partial 16S rRNA gene, strain..
    -----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    >gi|13872799|emb|AJ295074.1| Sinorhizobium terangae 16S rRNA gene, strain ORS1082
    -----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    can be find replaced by Notepad++ using the following find/replace strings with the use of regular expression ON.
    find string: ^>gi\|[0-9]+\|[^|]+\|([A-Z0-9]+)[^\|]+\| ([A-Z])[a-zA-Z]+ ([a-z][a-z][a-z][a-z]).*
    replace string: >\2_\3\1
    then, you can make the sample name short and unique (which is required for some software)
    >S_teraEF054876
    -------------TGCGCGGCTACC--ATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    >S_teraAM181743
    -----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    >S_teraAM181741
    -----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
    >S_teraAJ295074
    -----------GCGGCAGGCTTAACACATGCAAGTCGAGCGCGTAGCAATACGAGCGGCA
  • 4: DNAsp is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DNA sequence variation within and between populations, linkage disequilibrium, gene flow etc., can be estimated by this excellent software. In this training course, you can obtain the list of haplotypes of chloroplast data using this software.
  • Connecting sequence data using text editor and Excel
    • Replace delimiter in to tab
    • copy paste to Excel
    • Sort and connect two sequence data
    • some functions are useful
      left
      exact
      concatenate
    • copy and paste into text editor
    • replace tab into return

Example: work flow for Xylocarpus data

  • Koji has kindly provided us data for Xylocarpus
    XylocarpusKT_trnD-trnT.fas
    XylocarpusKT_accD-psaI.fas
  • Open these file by BioEdit
    Menu > File > Export > tab delimited text file
  • Open the file with excel (or open it with text editor and copy & paste to Excel
  • Do the same to another file
  • In Excel, copy-paste each data side by side in a spread-sheet. Each two columns contain sample name and sequence data.
    • Sort by name (sample name) for each two columns
    • Connect the two sequence by concatenate function, for example
      =CONCATENATE(C1,F1)
    • Save the two column of sample name and concatenated sequence to a tab-delimited file
  • In BioEdit, Menu Bar > File > New Alignment
    • Import the text data
      Menu Bar > File > Import Data > from tab-delimited file
    • If your data are not aligned, do complete multiple alignment,
      Menu Bar > Accessory Application > Clustalw Multiple Alignment
    • when you have aligned data save it as FASTA formatted file
  • In DNAsp, open the FASTA formatted file,
    Menu Bar > Generate > Haplotype data file
  • Now you get the haploytpe data with sample names

Analyses of population differentiation


[ Front page ]   [ Edit | Freeze | Diff | Backup | Upload | Reload ]   [ New | List of pages | Search | Recent changes | Help ]
Last-modified: 2015-05-11 (Mon) 04:44:00 (3290d)