注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Puriney's Notes

Puriney=purine+Y, my Wonderland

 
 
 

日志

 
 

[bio]Reproduce ENCODE/CSHL Long RNA-seq data visualization[2]--coSI  

2012-10-09 18:35:26|  分类: Bio |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Motivation

Reproduce the boxplot of the coSI shown in the original paper Figure2 with the graphic tool ggplot.

Data Set

Raw data is suggested to be formated for friendly usage in R console.

Data Processing

Here I calculate the relative distance to the polyA site.

Relative Distance = Absolute Distance to polyA / Gene Length

Thus the Relative Distance ranges from 0 to 1.0.

Note that this actually does not make any biologically sense.

Finally the data will formated as following:

复制代码
> head(df)   bin position     coSI Bin 1 2nd 0.833110 0.897727 2nd 2 2nd 0.837699 0.897727 2nd 3 2nd 0.837699 0.897727 2nd 4 2nd 0.837699 0.897727 2nd 5 2nd 0.843272 0.897727 2nd 6 1st 0.980673 0.897727 1st
复制代码

 

Classifying Position

In the R console, put the following commands:

复制代码
> position<-df$position > position[position<0.1] = "10th" > position[position>=0.1 & position<0.2] = "9th" > position[position>=0.2 & position<0.3] = "8th" > position[position>=0.3 & position<0.4] = "7th" > position[position>=0.4 & position<0.5] = "6th" > position[position>=0.5 & position<0.6] = "5th" > position[position>=0.6 & position<0.7] = "4th" > position[position>=0.7 & position<0.8] = "3rd" > position[position>=0.8 & position<0.9] = "2nd" > position[position>=0.9 & position<=1.0] = "1st" > cbind(bin=position,df)
复制代码

 

Factoring bin

Without factoring bin, the x-axis labels will not be properly displayed.

df$Bin<- factor(df$bin, levels=c("1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th"), labels=c("1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th"))

 [bio]Reproduce ENCODE/CSHL Long RNA-seq data visualization[2]--coSI - Puriney - Purineys Notes

 

Boxploting

Let's make a contrast. First one is the wanted one.

[bio]Reproduce ENCODE/CSHL Long RNA-seq data visualization[2]--coSI - Puriney - Purineys Notes
 
ggplot(df, aes(x=Bin, y=coSI, fill=Bin)) + geom_boxplot()+theme_bw()+ xlab('Notice x-label order')+ggtitle('Exon bins by relative distance to polyA site')

  评论这张
 
阅读(596)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017