R

From Research
Jump to navigation Jump to search

R is a statistical analysis and visualization package similar to the commercial "S".

I used R to load the data files created by the python program and create the plots.

Command Summary

You can get help on R functions:

help.search("anova")
help(plot)

Load the time series Library into R:

library(tseries)

Load the python data file: ("d1-001" is the code I used for episodes, it refers to disk-1, title 1 of the DVD.

d1.001 = read.ts(file="data/d1-001.log")

Get some descriptives on the data:

summary(d1.001)

Do a line plot of the data in blue, with custom labels for the plot (main) and the Y Axix (ylab)

plot(d1.001,type="l",col="blue",ylab="Pixel mean of interframe difference",main="mean(csi) : Season 1, Epsiode 1")

Plot a nice histogram of the data with 3000 bins, for x values between 0 and 30:

hist(d1.001,col="blue",breaks=3000,xlim=c(0,30))

Find the location of one click of the mouse:

locator(1)

Save a 11 point plots vertically to a single PDF with 0.3 inch margins on all sides:

pdf("Season-1-raw.pdf",width=8.5,height=11) # create a PDF file rather than plotting to screen.
par(mfrow=c(11,1),mai=c(0.3,0.3,0.3,0.3))   # set the page to 11 rows, 1 column, and 0.3 margins.
plot(d1.001,type="p",pch=".")
plot(d1.004,type="p",pch=".")
plot(d1.007,type="p",pch=".")
plot(d1.010,type="p",pch=".")
plot(d2.001,type="p",pch=".")
plot(d2.004,type="p",pch=".")
plot(d2.007,type="p",pch=".")
plot(d2.010,type="p",pch=".")
plot(d3.001,type="p",pch=".")
plot(d3.004,type="p",pch=".")
plot(d3.007,type="p",pch=".")
plot(d3.010,type="p",pch=".")
dev.off() # close the file

Do a linear filter of the data, using a moving average function with equal weights on all coefficients, using blocks of 1min (3600 frames). This gives us the following coefficient: 1/(2*3600+1)=7201. "filter()" applies the function where the coeffecients are 1/7201 copied 7201 times. The repetion is done by the rep() function. The results of the filter are stored in the variable d1.004.filter_1min where any NA's (missing values) returned by the filter are ignored using na.exclude().

d1.004.filter_1min = na.exclude(filter(d1.004,filter=rep(1/7201,7201)))

We can write this filtered timeseries to disk:

write(d1.004.filter_1min,file="data/d1-004.filter_1min.data",ncolumns=1)

To autocorrelate (for all possible lag times) the filtered data from one episode to another you can use the "acf()" function. This makes a copy of the time series and time shifts it. It compares the correlation between the timeseries and the lagged version of itself. If you see a repeating pattern in the result then there us a periodic component for that particular lag.

acf(d1.004.filter_1min,lag.max=length(d1.004.filter_1min))

We can also do a crosscorelation, where two timeseries are compared to time lagged versions of each other rather than themselves. When we see a periodic pattern in this case it means that there is some temporal commonality between them. Usually this means one series contributes components to another.

ccf(d1.001.filter_1min,d1.004.filter_1min,lag.max=80000)