注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Puriney's Notes

Puriney=purine+Y, my Wonderland

 
 
 

日志

 
 

[bio] counts for the problems in mapping reads to repetitive regions  

2012-10-26 01:36:55|  分类: Bio |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
Here is a situation that you probably come accross. For those highly-expressed newly-transcribed pre-mRNA, some of their introns are absent, indicated by the reads coverage of RNA-seq experiments. 

Three explanations show up: 
1) Rapid co-transcriptional splicing, which means that transcrption takes place and in the meantime introns are excised and decay very rapidly. 
2) Sequencing problem. 
3) Problems in mapping these parts of the pre-mRNA, e.g., due to repetitive sequences. 

Basically, the first one makes biologically sense. And as the surrounding exons are well-covered by reads and thus well delimited, it is indicated that the second problem, sequencing problem, should not be to blame. Finally, there is no significant increase for the absent introns in the frequency of repetitive sequences identified by tool, RepeatMasker, or the frequency of non-unique read mappings, thus the third one is exluded. 

And more interestingly, exon-intron junctions seem to be retained, shown as the gene RPL23A [1]. As introns should be excised, how could only the intermediated region is excised while the two ends stay? Could be an artifact by removing non-unique reads before mapping? 

[bio] counts for the problems in mapping reads to repetitive regions - Puriney - Purineys Notes
 

The suggested approaches to investigate the issue, may be: 

1) Could the sequence be identified as (or contain) certain repetitive seqeunces? 
  • If no, maybe not an artifact. 
  • If yes, furthermore, check whether the divergence to the corresponding repeat consensus is high or low. 
    • If high, it is suggested that these sequences may not be repetitive sequences. 
    • If low, perhaps an artifact. 

2) The un-unique reads can be applied now. Assess how many un-unique reads can be mapped to this region full of mysteries. 
  • If not many, the missig region could be considered as non-artifact.
  • If way too many, you'd better think twice. 
- - - 

Following the above issue which sometimes must exclude the effect from repetitive sequences, here is another issue -- RPKM. 

RPKM = Reads number * 10^9 / Block Actual Length / Total reads number

When calculating RPKM for certain genes, (exon or intron), how to count for problems in mapping reads to repetive sequence regions? Here the actual length of given exons (or introns) can be replaced by the "effective length". 

The followed are how to assess it. 
[bio] counts for the problems in mapping reads to repetitive regions [2] - Puriney - Purineys Notes

First, reads in silico are simulated by sliding a window across gene regions with the size of the read length in the sequencing experiment( for example 35nt). Thus, the simulated read set contains exactly only one read from each single position in each given gene, shown in the above figure. 

Second, all the simulated (a.k.a artifact) reads are then mapped to reference genome. With (G)ENCODE project data published, both RefGene annotation and ENCODE.v12 annotation are suggested. All the mapped reads can be mapped to RefGene, while the left unmapped reads can be next mapped to ENCODE.v12. 

Finally, having mapped and harvested .bam file recording all the mapping results, we can now assess the effective length which is defined as the number of positions within the respective region which has exactly one correctly and uniquely mapped read starting at this position. 


 

- - - end && ref
  • Ultrashort and progressive 4sU-tagging reveals key characteristics of RNA processing at nucleotide resolution

 
  评论这张
 
阅读(611)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017