Reconstruct Temperature

Description

This tool reconstructs daily temperature from several homogenous stations or station segments. Station data are processed in a series of steps. Each step is followed by a pause in the run of the tool so the user can assess the quality of the regression models and analyze plots of the residuals. Outliers can then be purged, so that the quality of the reconstruction is not impacted by any temperature data of questionable quality. HOB Tools does not purge outliers blindly, but instead assesses all identified outliers relative to a network of stations input as a "screening file" by the user.

Note: This is a complex tool and can take a long time to run depending on the quantity and length of the stations or station segments (e.g., a run with 11 station segments, analysis of regression models and all residuals can take four hours to complete). It is not possible to close the tool and restart it again from where you left off. If you close the tool before it is complete, then you will have to start over from the beginning.

Running the Tool
The structure of the input datasets should be carefully thought out before running this tool. HOB Tools assumes "Station 1" in the reconstruction file selected by the user is the dependent station that will be reconstructed (i.e., the column of data immediately after the day of the month in the reconstruction file; see "Input Data Format" below). The reconstruction then proceeds from left to right. The first station that will be modeled and transferred is "Station 2", followed by "Station 3", and so on up to Station N. The general proceedure for this tool is as follows:

  1. The user enters the required input and output information in the upper-left corner of the tool (see Fig. 1 and Input and Output sections below).
  2. In the lower-right corner of the tool, the option to "Proceed to First Station" will already be selected. Here the option to "Perform Inter-station Screening on Outliers" can be checked.
  3. Click the "Run" button.
  4. HOB Tools reads in both the reconstruction and screening data files. This step can take some time to run, so do not be alarmed if no progress is indicated on the progress bar for several minutes.
  5. Station 1 is assigned as the dependent station and Station 2 is assigned as the independent station.
  6. The overlap between the dependent and independent stations is separated into approximately equal halves.
  7. If "Perform Inter-station Screening on Outliers" is checked, then the two overlapping periods between the dependent and independent stations are assessed separately by subtracting the independent station from dependent station to create a difference series. Each difference series is then sorted and the top 90% and bottom 10% of the values are flagged as outliers (cf., Cocheo and Camuffo 2002). Each outlier is assessed one at a time by using the screening file input by the user. The average daily temperature is computed from the screening file on the date in question. The daily values for the dependent and independent stations are then assessed relative to the computed average, and the value farthest away from the average is purged. If "Perform Inter-station Screening on Outliers" is not checked, then HOB Tools skips step 7 and proceeds to step 8.
  8. Forward and backward regression calibration/verification is run using an "every third day per month" approach on two halves of the overlapping record (Burnette et al. 2010). If there is insufficient data to run the forward and backward calibration and verification, then HOB Tools will display a message and proceed to step 12 to derive the final models.
  9. Results of the foward and backward calibration/verification are displayed along with the names of the dependent and independent stations. Forward and backward regression calibration/verification results are displayed in the textbox under "Regression Model Statistics" in the lower-left corner of the tool (Fig. 1). These results include the sample size (N), estimates (B0 and B1), standard error (StdErr), coefficient of determination (R-Sq), and the Durbin-Watson statistic (DurWat) for each model in the calibration period and the sample size (N), coefficient of efficiency (CE), and Pearson correlation between actual and estimate values (r) for each model in the verification period. See regression papers and textbooks for details about these metrics (e.g., Draper and Smith 1981; Cook et al.1994; Kutner et al. 2005; Wilks 2006). Residuals are output under "Residual Analysis", and those from the forward calibration are shown by default. If "Show Backward Regression Residuals" is selected, then HOB Tools will display the residuals from the backward calibration under "Residual Analysis". Clicking the "View Plots" button will display plots of the forward or backward regression residuals (e.g., Fig. 2).
  10. A modeling cycle is now complete, and HOB Tools awaits user feedback on what to do next. Options are in the lower-right corner of the tool and include: performing inter-station screening on outliers again, checking specific residuals under "Residual Analysis" for inter-station screening, or deriving the final regression models (Fig. 1). The user makes the appropriate selection and then clicks the "Continue Run" button.
  11. If "Perform Inter-station Screening on Outliers" is checked, then HOB Tools returns to step 6. If specific residuals are checked for inter-station screening, then each checked residual is assessed one at a time by using the screening file input by the user. The average daily temperature is computed from the screening file on the date in question. The daily values from the dependent and independent stations are then assessed relative to the computed average, and the value most different from the average is purged. HOB Tools then returns to step 6. If "Derive Final Models" is selected, then HOB Tools skips step 11 and goes to step 12.
  12. The entire overlap period between the dependent and the independent stations is used to calibrate a suite of final regression models using the "every third day per month" method (Burnette et al. 2010). If there is insufficient data to calibrate the final regression models, then HOB Tools will present the user with a critical error message and the entire reconstruction process will end. The user will need to address the small overlap detected by HOB Tools and then restart the temperature reconstruction from the beginning.
  13. Results from the calibration of the final models are displayed under "Regression Model Statistics" in the lower-left corner of the tool (Fig. 1). These results include the sample size (N), estimates (B0 and B1), standard error (StdErr), coefficient of determination (R-Sq), and the Durbin-Watson statistic (DurWat) for each model. See regression textbooks for details about these metrics (e.g., Draper and Smith 1981; Kutner et al. 2005; Wilks 2006). Residuals are output under "Residual Analysis", and clicking the "View Plots" button will display plots of these residuals (e.g., Fig. 2).
  14. A modeling cycle is now complete, and HOB Tools awaits further instruction from the user. Options are in the lower-right corner of the tool and include checking specific residuals under "Residual Analysis" for inter-station screening or proceeding to the next station. The user makes the appropriate selection and then clicks the "Continue Run" button. It is also possible to perform inter-station screening on outliers again, but the option "Remove Selected Residuals / Calibrate and Verify Models" will have to be selected first, then the "Perform Inter-station Screening on Outliers" option can be checked. Finally, if the independent station currently being modeled is the last station in the reconstruction file, then instead of "Proceed to Next Station", HOB Tools will present the option "Complete Reconstruction". Selecting this option will change the text of the button to "Finish Run".
  15. If specific residuals are checked for inter-station screening, then each checked residual is assessed one at a time by using the screening file input by the user. The average daily temperature is computed from the screening file on the date in question. The daily values from the dependent and independent stations are then assessed relative to the computed average, and the value most different from the average is purged. HOB Tools then returns to step 6. If "Perform Inter-station Screening on Outliers" option is checked, then HOB Tools returns to step 6. If "Proceed to Next Station" is selected, then the final models are used to transfer the daily temperature data from the independent to the dependent station to fill gaps and extend the dependent station's record forward and/or backward. The next station number is assigned as the independent station and HOB Tools returns to step 6. If "Complete Reconstruction" is selected, then the final models are used to transfer daily temperature from the independent to the dependent station to fill gaps and extend the dependent station's record forward and/or backward. The final results are output, and the reconstruction is complete.
Figure 1. In this example, the overlap between Ottawa and Manhattan was modeled and the results from the forward and backward calibration and verification phase are located in the textbox under "Regression Model Statistics". All residuals from the forward calibration are displayed under "Residual Analyses". The user can select "Show Backward Regression Residuals" to display the other set of residuals. Plots of the currently displayed residuals can be consulted by clicking the "View Plots" button (see Fig. 2). Clicking on any residuals within the box under "Residual Analyses" will place a checkmark by the residual indicating that it has been selected. Any selected residuals are assessed with inter-station screening if "Remove Selected Residuals" is selected when the "Continue Run" button is pressed. If "Perform Inter-station Screening on Outliers" is checked, then the option to "Remove Selected Residuals" will not be available.

Figure 2. Four different plots can be constructed for each model on a monthly basis, which can be specified at the bottom. The four different plots are: X vs. Y, Y vs. Y-Hat, Y-Hat vs. Residuals, and X vs. Residuals. Once the proper selection has been made, press the “Plot” button to show the specified plot.

Final Output Files
The output file specified by the user (see "Output" section below) contains the daily temperature reconstruction along with the standard error of each estimate and the name of the station used. Two other files are also output. The first is named similar to the output file specified by the user except with the words "Test Models" appended to the end of the name. This file contains the forward and backward calibration and verification results for each station separated by a dashed line. The second is named similar to the output file specified by the user except with the words "Final Models" appended to the end of the name. This file contains the models that were used to transfer daily temperature data from each independent station to the dependent station. Each station is separated by a dashed line.

Input

Reconstruction File
Select the reconstruction data clicking on "Browse" and navigating to the proper file.

Input Data Format
Year / Month / Day / Station 1 / Station 2 / ... / Station N

Each "/" is a tab (i.e., data should be tab-delimited). Station 1 through Station N are the names of the individual stations. All columns must be filled with observations and in the same units. If a station does not have observations during a particular day, then a “-99” should be entered indicating a missing value.

Start Row
This input is the row number where the main data to be processed begin in the reconstruction dataset. An example of how row numbers are entered is shown in Figure 2 located in the Running HOB Tools topic.

Number of Stations
This input is the number of stations or station segments with temperature data in the reconstruction file selected above.

Screening File
Select the screening data clicking on "Browse" and navigating to the proper file.

Input Data Format
Year / Month / Day / Station 1 / Station 2 / ... / Station N

Each "/" is a tab (i.e., data should be tab-delimited). Station 1 through Station N are the names of the individual stations. All columns must be filled with observations and in the same units. If a station does not have observations during a particular day, then a “-99” should be entered indicating a missing value.

Start Row
This input is the row number where the temperature data to be processed begin in the input file selected above. An example of how row numbers are entered is shown in Figure 2 located in the Running HOB Tools topic.

Number of Stations
This input is the number of stations or station segments with temperature data in the screening file selected above.

Output

Output File
Select the file where the numerical results will be stored.

Output File Header
Information entered here will be output in the first line of the output file.