小编给大家分享一下perl如何提取miRNA前后500bp序列,相信大部分人都还不怎么了解,因此分享这篇文章给大家参考一下,希望大家阅读完这篇文章后大有收获,下面让我们一起去了解一下吧!

在没有位置信息时,提取miRNA前后500bp的序列

在提取miRNA前后500bp序列时,若没有其位置信息,提取会比较麻烦,可用以下方法提取:

1.首先利用blast软件将miRNA的前体序列与基因组比对,获取前体序列的位置信息,比对方法:

makeblastdb-inDonkey_Hic_genome.20180408.fa-dbtypenucl-titleDonkey_Hic_genome.20180408.fablastall-imiRNA_Pre.fa-dDonkey_Hic_genome.20180408.fa-pblastn-e0.01-b5-v5-m8-oblast.out#Donkey_Hic_genome.20180408.fa参考基因组染色体序列#miRNA_Pre.famiRNA的前体序列

由于序列可能较短,所以e-value值可调高一些。比对结果如下:

bta-miR-34cChr20100.0011300111339385825393857136e-57224bta-miR-370Chr07100.0010900110970327160703272681e-54216bta-miR-34bChr20100.0011200111239386420393863092e-56222bta-miR-383Chr27100.0011000111021958895219590044e-55218bta-miR-449aChr01100.001120011121036036651036035542e-56222bta-miR-378Chr09100.0011100111151049164510490549e-56220bta-miR-378Chr1396.883210679832631593326316243e-0656.0bta-miR-378Chr1393.943320629429900614299005822e-0450.1bta-miR-378Chr07100.002700719765072541650725151e-0554.0bta-miR-378Chr3096.552910699728926810289267822e-0450.1bta-miR-378Chr1893.943320629421721834217218662e-0450.1bta-miR-206Chr08100.0011200111278794660787947712e-56222bta-miR-1Chr15100.0010800110848711439487115465e-54214bta-miR-1Chr0788.896370329426712206267122682e-1069.9bta-let-7cChr18100.0011200111232412721324126102e-56222bta-let-7cChr2089.197480189129703021297030941e-1483.8

然后对比对结果筛选,尽量选择完全匹配,并且匹配长度最长的。筛选后的blast.out结果如下:

bta-miR-34cChr20100.0011300111339385825393857136e-57224bta-miR-370Chr07100.0010900110970327160703272681e-54216bta-miR-34bChr20100.0011200111239386420393863092e-56222bta-miR-383Chr27100.0011000111021958895219590044e-55218bta-miR-449aChr01100.001120011121036036651036035542e-56222bta-miR-378Chr09100.0011100111151049164510490549e-56220bta-miR-206Chr08100.0011200111278794660787947712e-56222bta-miR-1Chr15100.0010800110848711439487115465e-54214bta-let-7cChr18100.0011200111232412721324126102e-56222

3.接下来就可以提取序列啦,我有提供脚本哦,用法:

perl/share/work/wangq/script/lv/miRNA_500_fasta.pl-idblast.out-faDonkey_Hic_genome.20180408.fa-outresult.fa-miRmiRNA.fa-l500

-id 后跟筛选的blast结果;-fa后跟参考基因组染色体序列;-miR后跟miRNA的序列文件;-l后跟提取miRNA前后多长序列,默认500。

提取结果(miRNA序列大写标注,其余小写):

>bta-miR-34cacaaggcacagcatcacccgccgccctgctgggaggaggccgccaccctccgcggcgaactgccgacccgaagcgtttgagaggagaaagctgcgcttcgagactggatgcgtcagcatttcttccgcgcggcgcggcgagcgcgtggtccgcgtgcgagcaaaccccctgcaaacgcaggcgggctgatgcttctagctggagttaaacagttagctatcactaggagtaaaagcaagattgggcgatcccatgtaaggaaagcaaaatctggggcgtttagtagctttaattgtaaggtgtaaagacatttgaattgctgggggaagccccctgtgtaacgttcagagctgtgagtcactgtgcctattttccattgtttatcagggtactcaccaatccaccaactaaagtgagtaccacggagccagtgatctgcctgtcacgacgcatgggggcaccaacttgagactgaagtttgtgatgaaagtaaagcttttttgctgtgagtctagttactAGGCAGTGTAGTTAGCTGATTGctaataataccaatcactaaccacacggccaggtaaaaagatttgggaattcatccagatgagctgcgtgtgcacaccagtgggtttgggggcaagaaggggattggaataccctaatagtacgcattgcctgtttatccatagctcagccaagagagaaatcagcattttagctgctaaatatacaacatatgtagtaaatatacatttttaacatataatttttcaatattccttcaggcgcttaaccaaaaaaattttcagatatattgaggaatagaggttttctctcttagctcatttatgtcattgtgtaaacttgtgattattttgaactatctgtaagactgtattactattttgaaagaataaagtgccctaaagtcataaattgtggtcattcatgtgtccattgcctttcctaagttggctttatgatgtccttctcagcgcctgccacagtaattataactttcctatcgctaaaatggttaaattgttcctagattaacaaaatctcccgggaaggctattcacaagcactttttagtatttttctaagaacacgtta>bta-miR-370tgtttattgttcgtgtctcagagttgctctgaggccagtgtgctgggcacccagcaggccatctgggaggggagaaaggaagctagccatgtatgtcagtaggggtcagcgtggaggcctttggggtttctgggaaacggttcgaacatggagaatctcgtgatgtggacgcccccgagggcagccccatttcatgaactggatcccttagtcggatttctgttttccagggcacttgtttaagctttcatgccgtgtccaaaaaaaagagcagatcccaagagtttcctgcccgaggcccgtcttgccagaagccctcggcttggcctgagcatcgagatcattcctctaatcagcctgggggtggtctgacccgcggggtgggcggcgtggggcggggaggggcagggcgtgtgcggggcgggagctgctgggggagctggcggcggttcctttcaccctcgccgtggacccgcgggggcgtggctgtcctcggtctacaaatcgcgcaagtcggggcacaagacagagaggccaggtcacgtctctgcagttacacagctcatgagtGCCTGCTGGGGTGGAACCTGGTctgtctgtctgtctagcaccacagctcgggcgctgctgcagagggaacaaagatttgggtgagggcctcagagacgggctggggaaggggtttactcgggctgactttgacatgaaaacaatagctaattctgcttggggcctagtgctgtgactatcttacagatgggaaacaggcacagacaggttagttaactttcctgatgctttccagctcatcaggattggaggctggatttggaggcaggcggtgtggttcgagcatatgtgctctaaccattgggaaacactgcctcctgactgtgagcacacgtagagatggcacatggagtccaagaatctgggtttgagtccttttaccactgcctggtgggtgtactgtccagtcaatcatacatatttcattcttcctcttgacagagtctctgcagaggcccagcgagtcgttggcccgtgggaaatatttttttaagaatgcatgtgtgtgatagaattttatatctatcgaaggggagggg

脚本代码如下:

#!/usr/bin/perl-wusestrict;useGetopt::Long;useBio::SeqIO;useBio::Seq;my$version="1.3";my%opts;GetOptions(\%opts,"id=s","fa=s","miR=s","l=s","out=s","h");if(!defined($opts{out})||!defined($opts{fa})||!defined($opts{miR})||!defined($opts{id})||defined($opts{h})){print<<"UsageEnd.";Description:$version:lefseanalysisUsageForcedparameter:-idblast.out<infile>mustbegiven-outoutfile<outfile>mustbegiven-fagenomefastafile<infile>mustbegiven-miRmiRNAfastafile<infile>mustbegiven-llength<int>Otherparameter:-hHelpdocumentUsageEnd.exit;}my$len=$opts{l};$len||=500;my$in=Bio::SeqIO->new(-file=>"$opts{fa}",-format=>'Fasta');my%fasta;while(my$seq=$in->next_seq()){my($id,$sequence)=($seq->id,$seq->seq);$fasta{$id}=$sequence;}my$ina=Bio::SeqIO->new(-file=>"$opts{miR}",-format=>'Fasta');my%mi;while(my$seq=$ina->next_seq()){my($id,$sequence)=($seq->id,$seq->seq);$mi{$id}=$sequence;}open(IN,"$opts{id}")||die"openfile$opts{id}faild.\n";open(OUT,">$opts{out}")||die"openfile$opts{out}faild.\n";while(<IN>){chomp;my@line=split("\t");if($line[8]<$line[9]){my$miR=lc(substr($fasta{$line[1]},$line[8]-1,$line[9]-$line[8]+1));my$small=lc($mi{$line[0]});$miR=~s/$small/$mi{$line[0]}/;my$before=lc(substr($fasta{$line[1]},$line[8]-$len-1,$len));my$laft=lc(substr($fasta{$line[1]},$line[9],$len));printOUT">$line[0]\n$before$miR$laft\n";}elsif($line[8]>$line[9]){my$miR=lc(substr($fasta{$line[1]},$line[9]-1,$line[8]-$line[9]+1));my$before=lc(substr($fasta{$line[1]},$line[9]-$len-1,$len));my$laft=lc(substr($fasta{$line[1]},$line[8],$len));my$gene=$before.$miR.$laft;$gene=&reverse_complement_IUPAC($gene);my$small=lc($mi{$line[0]});$gene=~s/$small/$mi{$line[0]}/;printOUT">$line[0]\n$gene\n";}}close(IN);close(OUT);subreverse_complement_IUPAC{my$dna=shift;#reversetheDNAsequencemy$revcomp=reverse($dna);#complementthereversedDNAsequence$revcomp=~tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/;return$revcomp;}

以上是“perl如何提取miRNA前后500bp序列”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注亿速云行业资讯频道!