Exercise 1 – statistical observations course

 

PDFs (probability distribution functions) and the importance of removing the mean signal.

 

Download the matlab file TAT1000mb.mat, and load it into matlab.

The relevant command is listed in TAsfcT.m

An alternative, more general way, is to use net cdf format. Download the file TAT1000mb.cdf, and change the command "load TAT1000mb" in TAsfcT.m to "ncload TAT1000mb.cdf". This will result in the same thing, but the net cdf file was created directly from the web data tool, while the mat file had to be created from the loaded net cdf.

 

The file includes the NCEP/NCAR reanalysis daily temperature at 1000mb at Tel Aviv (actually in the 2.5x2.5 degree grid box centered at 32.5N, 35E, which includes Tel Aviv).  

The file includes daily data for the 60 year period Jan 1st 1948-Dec 31st 2007.

 

1. First verify that the length of the data set fits this time span, by calculating the number of days there should be in this period. Remember leap years.

 

2. Plot the observations as a function of time. Describe the signal. Calculate the range (minimum, maximum and their difference), mean and standard deviation of the temperature. Plot a histogram of the observations and describe it. Plot the pdf of the observations (histogram divided by the number of "events").  

 

3. Next we want to remove a climatological seasonal cycle. To do this, we need to average each day over all years. Matlab allows us to easily do that by reshaping the data vector into a matrix of size 365x60. The leap years complicate this a bit. A simple "dirty" way to deal with them is to remove all Feb 29s. This is done using the program leapyear.m.

So, get rid of the leap-days, reshape the vector into a matrix and calculate the climatological seasonal cycle. Plot all the yearly time series on one plot, and plot the climatological time series on top to see that indeed it lies in the middle of all the yearly plots.

Calculate the daily standard deviation (the standard deviation of each day, over 60 years of data), and plot as a function of day. Describe the plot. When is the standard deviation largest? Does this make sense?

 

4. Calculate the daily anomalies (deviations form a seasonal cycle). Create a long vector of the anomalies, and calculate the anomalies pdf (a histogram divided by the total number of "events").  

How is this pdf different from the full time series pdf? Why?