Regression Modeling

Description

Statistical regression models are used to establish a relationship between a particular climate variable and the tree-ring chronologies (Fritts 1976). Typically, the period of overlap between the climate and the tree-ring data is broken up and the regression model is "trained" or calibrated on a portion of the overlap period. Another portion of the overlap is used as a verification period, which will evaluate the skill of the regression model on data purposely withheld from the calibration period (Fritts 1976; Cook and Pederson 2011). Dendro Tools performs simple linear, multiple linear, and principal components regression modeling. Leading and lagging of the predictor data is supported. If multiple linear or principle components regression modeling occurs, then the variables are added to the model in a forward stepwise fashion. See regression modeling texts for further details (e.g., Draper and Smith 1981; Kutner et al. 2005; Wilks 2006). This tool is based on the program PcReg developed by Ed Cook and Paul Krusic.

Running the Tool
This complex tool performs key procedures in a series of steps. These steps and the files that are output are detailed in the following outline:

  1. Prewhitening:
    • If prewhitening is checked for either the predictor or predictand data, then autoregressive models are selected for each variable based on the criterion selected by the user (see "Criterion for Model Selection" under "Parameters" below). Otherwise, Dendro Tools proceeds to step 2.
    • The whitened variables proceed on to step 2.
    • The whitened variables are output to a separate output file with the words "Whitened Variables" appended to the end of the filename.
    • Autoregressive modeling results for each variable are output to a separate output file with the word "Statistics" appended to the end of the filename.
  2. Leading and Lagging of Predictors:
    • If leading and lagging is checked, then additional predictor variables are derived. Otherwise, Dendro Tools proceeds to step 3.
    • The new predictor variables with leading and lagging done proceed on to step 3.
    • Predictand variables are left unchanged and proceed on to step 3.
    • All variables are output to a separate output file with the words "All Variables with Leads and Lags" appended to the end of the filename.
  3. Predictor Screening:
    • If "Screen Predictors Prior to Regression Modeling" is checked, then all predictors are correlated with the predictand. Otherwise, Dendro Tools proceeds to step 4. Predictors will pass if their p-values are lower than the probability criterion (see "Screen Predictors Prior to Regression Modeling" under "Parameters" below).
    • Only predictors that pass the screening proceed on to step 4.
    • Correlations and p-values for each predictor are output to a separate output file with the word "Statistics" appended to the end of the filename.
  4. Principal Components Analysis of Predictors:
    • If "Perform Principal Components Regression" is checked, then principal components analysis is run on the predictors during the calibration period. Otherwise, Dendro Tools proceeds to step 5.
    • Principal components (PCs) that satisfy the eigenvalue cutoff criterion proceed on step 5 (see "Perform Principal Components Regression" under "Parameters" below).
    • PCs that satisfy the eigenvalue cutoff criterion are output to a separate output file with the words "PC Scores Considered for Regression Modeling" appended to the end of the filename.
    • The results of the principal components analysis are output to a separate output file with the word "Statistics" appended to the end of the filename.
  5. Regression Model Variable Selection:
    • If two or more predictors are passed to regression, then forward stepwise regression will be run to determine the predictor(s) that will be utilized for regression modeling. Otherwise, Dendro Tools proceeds to step 6.
    • The predictor(s) will be determined by the "Criterion for Model Selection" selected by the user, and the selected predictor(s) will be passed on to step 6.
  6. Regression Model Calibration:
    • Regression modeling is performed between the predictor and the predictand data during the calibration period.
    • The predictors, predictand, y-hat (the predicted values), and residuals during the calibration period are output to a separate output file with the word "Residuals" appended to the end of the filename.
    • Coefficients of the regression model, adjusted R-Squared, standard error of the residuals, and the Durbin-Watson test results are output to a separate output file with the word "Statistics" appended to the end of the filename. The null hypothesis for the Durbin-Watson test is that autocorrelation is zero, so small p-values suggest significant autocorrelation. R uses the Applied Statistics Algorithm AS 153 by Farebrother (1980, 1984) to compute p-values for the Durbin-Watson test.
  7. Regression Model Verification:
    • If a verification period, completely independent of the calibration period, is entered, then Dendro Tools will assess the skill of the derived regression model. Otherwise, Dendro Tools proceeds to step 8.
    • Correlations between the reconstructed and observed data, the reduction of error (RE), and the coefficient of efficiency (CE) are computed and output to a separate output file with the word "Statistics" appended to the end of the filename.
  8. Output Whitened Reconstruction:
    • If prewhitening was performed on the predictor data, then the reconstruction at this stage is a white reconstruction and is output to a separate output file with the words "White Reconstruction" appended to the end of the filename.
    • If prewhitening was not performed on the predictor data, Dendro Tools proceeds to step 9.
  9. Reredden Reconstruction, Recalibrate and Reverify:
    • If prewhitening was performed on the predictor data, then the resulting white reconstruction is rereddened using the autoregressive model of the predictand data, which allows the reconstruction to closely "mimic" the predictand.
    • The rereddened reconstruction is then recalibrated and reverified during the same calibration and verification periods as before. The same statistics as noted in steps 6 and 7 are computed, but no residuals are output. Calibration and verification results typically improve due to rereddening, which suggests that this step does add value into the final reconstruction.
    • The statistical results are output to a separate output file with the word "Statistics" appended to the end of the filename.
  10. Rescale Reconstruction:
    • In order to recover lost variance due to regression modeling, the final reconstruction is rescaled by multiplying the reconstruction by the quantity sdObs / sdRecon, where sdObs is the standard deviation of the observed data and sdRecon is the standard deviation of the reconstructed data in the calibration period.
  11. Output Reconstruction:
    • The final reconstruction is output to the file specified by the user along with the observed data.

Input Predictor Data

File
Select the input data to analyze by clicking on "Browse" and navigating to the proper file.

Input Data Format
Year / Data

Each "/" is a tab (i.e., data should be tab-delimited). Missing Data should be denoted as "-99".

Start Row
This input is the row number where the main data to be processed begin in the input file selected above. An example of how row numbers are entered is shown in Figure 2 located in the Running Dendro Tools topic.

Number of Variables
Enter the number of variables located in the input file.

Prewhiten
Check this box to perform autoregressive modeling on the input predictor data. Dendro Tools will also ask for years to start and end the autoregressive modeling. This is often the same years as the calibration period.

Lead / Lag
Check this box to create predictor data with leads and/or lags. Dendro Tools will then activate the lead and lag checkboxes. Check boxes with a minus sign to lead the predictor data (i.e., -1 will move each predictor series back one year so year t is directly compared with year t + 1). Check boxes with a plus sign to lag the predictor data (i.e., +1 will move each predictor series forward one year so year t is directly compared with year t - 1). A checkmark included in the box "0" will tell Dendro Tools to also consider the original predictor data in year t with no lead or lag.

Input Predictand Data

File
Select the input data to analyze by clicking on "Browse" and navigating to the proper file.

Input Data Format
Year / Data

Each "/" is a tab (i.e., data should be tab-delimited). Missing Data should be denoted as "-99".

Start Row
This input is the row number where the main data to be processed begin in the input file selected above. An example of how row numbers are entered is shown in Figure 2 located in the Running Dendro Tools topic.

Prewhiten
Check this box to perform autoregressive modeling on the input predictand data. Dendro Tools will also ask for the years to start and end the autoregressive modeling. These are often the same years as the calibration period.

Output Data

File
Select or enter the name of the file where the results will be stored.

Header
Information entered here will be output in the first line of the output file.

Parameters

Perform Principal Components Regression
Check this box to perform principal components regression. Dendro Tools will then activate the eigenvalue cutoff criterion. The following five options are available: eigenvalue < 1, a specific number of eigenvalues, a proportional variance threshold, a cumulative variance threshold, and all eigenvalues. Eigenvalues that exceed the cutoff will not be considered in the regression modeling phase. Eigenvalue <1 is selected by default.

Screen Predictors Prior to Regression Modeling
Check this box to correlate all predictors against the predictand. A probability criterion is also required. This routine will examine the Pearson correlation between each predictor and the predictand for statistical significance. If p is less than the value entered for the probability criterion, then the specific predictor passes the screening test. Predictors that fail to pass screening will not be considered in the regression modeling phase. This parameter is checked by default with a probability criterion of 0.10 automatically entered.

Criterion for Model Selection
Autoregressive and forward stepwise regression models are selected based on a specific criterion that maximizes the explained variance while minimizing the number of variables fit to the model. Two options are available, the minimum AICc and the minimum BIC. See Cook et al. (1999) for a discussion of the minimum AICc, which is selected by default. Note that there can be multiple minima in the AICc, the first minimum value is the one selected by Dendro Tools.

Calibration Period
Enter the years to start and end the regression model calibration.

Verification Period
Enter the years to start and end the regression model verification. These input boxes can be left blank if verification will not be performed (e.g., use the entire overlapping period to calibrate the regression model).

Output Statistics

Statistics Information
Once the regression modeling tool has been run, the output file with the word "Statistics" appended to the end of the filename is read into Dendro Tools and displayed for the user to scroll through (Fig. 1).

Figure 1. The regression modeling tool has been successfully run. The results are output in a series of output files as detailed in outline of steps located under "Running the Tool" above. The main statistics needed for assessment of the regression model quality are output to their own output file and are displayed under "Output Statistics" as illustrated by the redbox. Two buttons underneath this textbox allow for further visual inspection (see below).

Plot...
Clicking this button will display the observed and reconstructed data during the calibration and verification periods and the final reconstruction for visual inspection (Fig. 2). The observed and reconstructed data during the calibration and verification periods are displayed by default. The user can display the final reconstruction by selecting "Reconstruction" from the drop-down list next to "Displayed Data". In Figure 2 below, the calibration was performed on the older data (denoted by the orange arrow; left of the gray vertical line) and verification analysis was performed on the more recent data (denoted by the green arrow; right of the gray vertical line). The calibration and verification periods will always be separated by the gray vertical line, and the text in the header will change depending on which side of the gray line (left or right) the calibration and verification periods reside. If the calibration period is left of the gray line, the text will read "Calibration vs. Verification" (i.e., like Fig. 2), but if verification period is left of the gray line, the text will read "Verification vs. Calibration" (i.e., opposite of Fig. 2). See the Running Dendro Tools topic for more general details about plots in Dendro Tools.

Figure 2. Plots of the observed and reconstructed data during the calibration and verification periods and the final reconstruction can be accessed by clicking on the "Plot..." button shown above in Figure 1.

Residuals...
Clicking this button will display various residual plots (Fig. 3). Residual plots can be used to check for potential biases in the regression model (e.g., residuals that are asymmetric relative to the zero line suggesting non-randomness). Each variable used in the reconstruction, the estimate value (y-hat), and the year can be plotted against the observed data or the residuals. The first variable vs. observed data are displayed by default. Drop-down lists next to "Displayed Data" can be used to select other variables. Once the proper selection has been made, press the "Replot" button to show the specified plot. Hovering the mouse arrow over a red dot will display the year when that red dot occurs. See the Running Dendro Tools topic for more general details about plots in Dendro Tools.

Figure 3. Example plot of the residuals against a predictor variable.