Tophat2を用いたマッピング (RNAseq)

インデックスの作成
基本文
オプションの設定
参考

インデックスの作成

Tophat２は内部的にBowtieを使用しているため、Bowtieを用いてゲノムのインデックスを作ります。

bowtie2-build -f [対象とするゲノムのファスタファイル] [output]
bowtie2-build -f TAIR10_genome.fasta TAIR10 #example

注意：後述する-Gを使用する場合は、クロマチン名が一致するようにする

基本文

tophat2 [option] [index] input.fastq

オプションの設定

特によく使いそうなオプションを抜粋しています。

option	説明
-N	許容するミスマッチ数　デフォルトは2
-o	出力先の指定
-i	最小のイントロン長　デフォルトは70
-I	最大のイントロン長　500000
-p	スレッド数　デフォルトは1
-g	許容するマルチヒット数
-G/--GTF	GTF 2.2、GFF3を基にマッピングする
-j/--raw-juncs	ジャンクション情報を基にマッピングを行う
--no-novel-juncs	既存のスプライシング情報のみを使用する

-G/--GTF
TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output
最初のマッピングは、指定したファイルを基にマッピングが行われ、マップされたリードは、その後ゲノムを基にマッピングが行われる（新規のスプライシングを含む）。
TopHat　Manual　より引用

GTF、GFFファイルとゲノム情報のchromosomeとcontig namesが合わないとエラーが出るため注意

エラーがでた場合、bowtie-inspect --names your_indexを用いてテェックを行う

参考

TopHat Manual

pythonってすごいね

RNAseqを用いた遺伝子発現量解析、機械学習を用いた回帰、分類などの解析を中心に記事を書いていきたいです！

Tophat2を用いたマッピング (RNAseq)

インデックスの作成

基本文

オプションの設定

参考