Data Entry and Gap Quality Control

Description

Legibility of historical weather observations can be difficult. This could result in missed daily data and/or the same daily data being entered twice. Modern data from the National Climatic Data Center can also have missing days (or gaps) within the dataset that would be good to detect. This is a basic tool that assumes the input dataset is continuous and then searches it for any gaps and duplicate entries.

Discussion of the Output Data
The results are output to two separate text boxes--gaps in the left box and duplicate entries in the right as shown in Figure 1 below. The gaps take the format of Date 1 > Date 2, where Date 1 is the last date before the gap and Date 2 is the first date after the gap. All days between Date 1 and Date 2 are missing. The duplicate entries take the format of Date 1 - Date 2, where the same data were entered for both Date 1 and Date 2.

Figure 1. Five gaps and 961 duplicate entries are reported in this example, one per row.

Input

Input File
Select the input file by clicking on "Browse" and navigating to the proper file.

Input Data Format
Year / Month / Day / Data

Each "/" is a tab (i.e., data should be tab-delimited).

Start Row
This input is the row number where the main data to be processed begin in the dataset. An example of how row numbers are entered is shown in Figure 2 located in the Running HOB Tools topic.

Number of Data Columns
The number of columns in the main data, which should be checked for gaps and duplicate entries. An example of how column numbers are entered is shown in Figure 2 located in the Running HOB Tools topic. Do note that this field is asking for the total number of data columns not the data columns themselves and the count of these columns does not include dates (e.g., there would be four data columns in the example located in the Running HOB Tools topic).