注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Puriney's Notes

Puriney=purine+Y, my Wonderland

 
 
 

日志

 
 

[bio]Reproduce ENCODE/CSHL Long RNA-seq data visualization viewed in UCSC  

2012-10-05 16:41:21|  分类: Bio |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Motivation

The ENCODE data comes out, and luckily they provide both .bam file and .bigwig file. Thus, it occurs to me that I want to give a try to reproduce the data visualization with tool: BEDtools and other related tools.

Result

I'll first upload the difference between my-version and official version:

[bio]Reproduce ENCODE/CSHL Long RNA-seq da<wbr>ta visualization viewed in UCSC - Puriney - Purineys Notes

Top to Bottom:

  • Black: my-version-POSitive-strand.bigwig
  • Blue: Official-version-POSitive-strand.bigwig
  • Red: Official-version-REVerse-strand.bigwig
  • Grey: my-version-REVerse-strand.bigwig

From the image, we will find my-version-data and official-version-data roughly share the same peaks, however, my-version-peaks are somehow masked by certain uniform noises. And it drives me crazy.

Note that I know not all the bioinformatics works can be reproduces, but this issue dose not get involved with much algorithms, decisions, etc. Therefore, it's supposed to be reproducible, I think.

Data Set

ENCODE/CSHL long RNA-seq Data set can be found here:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/ And here I use K562-chromatin-subcellular fraction (Rep4) to explore as an example:

Data Processing

BAM sort

samtools sort wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqK562ChromatinTotalAlnRep4.bam wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqK562ChromatinTotalAlnRep4.bam.sort

Genome Coverage

I refer to the standard manual of BEDtools, I'll use forward strand as example, and the reverse strand signal is generated in the same way.

genomeCoverageBed -bg -ibam wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqK562ChromatinTotalAlnRep4.bam.sort -g hg19.chromInfo -strand + >K562-Chromatin-POS-4.bedgraph

Note that I've used -strand flag to separate the two strands.

bedgraphtoBigWig

bedGraphToBigWig executive script available from UCSC exe list

bedGraphToBigWig K562-Chromatin-POS-4.bedgraph hg19.chromInfo K562-Chromatin-POS-4.bigwig

Upload to ftp and finally to UCSC genome browser.

Discussion

I was wondering which filtering step I've missed.

I've checked whether all the reads in the .bam file are unique mapped. As the reads are mapped to genome with a tool named, STAR.. According to the manual and common sense, the mapping quality in .sam file equaling 255 means unique mapped reads. Thus, all the reads in the .bam file are unique mapped after I've check the mapping quality.

[bio]Reproduce ENCODE/CSHL Long RNA-seq da<wbr>ta visualization viewed in UCSC - Puriney - Purineys Notes
 
eh...hey guys, at first, -split flag comes to my mind when I used coverageBED, but based on the manual of BEDtools, I misunderstood the -split will omit all the intron reads (or maybe it is because my mother tongue is not English). Nevertheless, having added the -split flag, I managed to reproduce the data visualization of ENCODE data. Here is the image:
[bio]Reproduce ENCODE/CSHL Long RNA-seq da<wbr>ta visualization viewed in UCSC - Puriney - Purineys Notes

The green one is the POSitive strand signal, and it's exactly the same as the official version.

The key is to add the -split option when one runs the genomeCoverageBed command.

Enjoy


- - -end &reference

http://www.biostars.org/post/show/54351/reproduce-encodecshl-long-rna-seq-data-visualization-viewed-in-ucsc-but-failed/#54357

- - - 

话外音

这已经是第二次在biostar上自问自答了,折腾了很久一直没折腾出来,还冷落了女友大人。

一边晚上等着加上-split选项的结果(这完全是多疑性格所致),一边再一次刷着samtools、bam/sam各种细节,刷着ENCODE计划用的的map工具STAR,最后问完biostar后数据出来,可视化,嗯,安心了。虽然很多东西都白刷了,不过又刷到不少新认识。

另外给妹纸买的衣服也一定要安全寄到哇!顺丰快递!

  评论这张
 
阅读(875)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017