Double Mass Analysis
Description
This tool, inspired by the Interactive Double Mass Analysis program used by the Office of Hydrological Development at the National Oceanographic and Atmospheric Administration (NOAA 1999), creates cumulative sums of two datasets, plots the cumulated data, and then uses simple linear regression and user-specified input information to detect breakpoints or analyze a suspected breakpoint in the cumulated data. All breakpoints are specified in the text output file and shown in the double mass plot.
Note: This tool can be extremely sensitive to the "Initial Points to Analyze per Segment" and the "Tolerance Level" specified by the user. It is highly recommended that the prediction interval be used as the tolerance level because it is less subjective and less sensitive than a specific value. Kohler (1949) recommends the examination of any available metadata for the times identified as potential breakpoints.
Discussion of the Output Data
Depending on the type of breakpoint analysis the user selects, all potential breakpoints detected will be flagged or the suspected breakpoint will be flagged in the output text file and in the double mass plot (see "Tolerance Level" below for a discussion of how breakpoints are detected). The slopes of the segments before and after each breakpoint are tested to determine whether or not they are significantly different from one another (p < 0.05), and the p-value is output with the data flag along with the slope before the break and the slope after the break. The flags in the output text file generally take the form "* p = 0.xxxx, Slope 1 = x.xxx, Slope 2 = x.xxx". Each flag represents the last data point for "Slope 1". The next data point after the flag begins the line segment associated with "Slope 2".
The last data segment analyzed at the end of the double mass analysis may have fewer data points than the "Initial Points to Analyze per Segment" input by the user. This requires HOB Tools to calculate "Slope 2" differently as follows:
This type of analysis is most effective with datasets that contain strong correlation (Kohler 1949). Therefore, a correlation between Station 1 and Station 2 is computed and output prior to the cumulated data (i.e., look for a "r = 0.xxx", where xxx represents the correlation, toward the top of the output text file).
The output text file is then used by HOB Tools to generate the double mass plot (Fig. 1). A straight line plot with no breakpoints is the strongest possible result (Kohler 1949). This double mass plot combined with the output text file can then be used to assess problematic data segments. Kohler (1949) suggests a ratio of the slopes between the best data segment and each of the poorer data segments could be used to correct the problematic data.
![]() |
Figure 1. One breakpoint was detected in a double mass analysis of monthly precipitation data displayed above, and represents the end of a line segment. |
Input
Input File
Select the input file by clicking on "Browse" and navigating to the proper file.
Start Row
This input is the row number where the main data to be processed begin in the dataset. An example of how row numbers are entered is shown in Figure 2 located in the Running HOB Tools topic.
Station 1 Column
This input is the column number where the station 1 data are located in the input file selected above. An example of how column numbers are entered is shown in Figure 2 located in the Running HOB Tools topic.
Station 2 Column
This input is the column number where the station 2 data are located in the input file selected above. An example of how column numbers are entered is shown in Figure 2 located in the Running HOB Tools topic.
Start Analysis
Enter the year and, if necessary, the month when the cumulated series analysis should start. If "Year / Data" is selected under "Input Data Format," then the month field will be disabled.
End Analysis
Enter the year and, if necessary, the month when the cumulated series analysis should end. If "Year / Data" is selected under "Input Data Format," then the month field will be disabled.
Input Data Format
If Year / Month / Data is selected:
If Year / Data is selected:
Each "/" is a tab (i.e., data should be tab-delimited). Station 1 through Station N are the names of the individual stations. All columns must be filled with observations and in the same units. If a station does not have observations during a particular year or day, then a “-99” should be entered indicating a missing value.
Analysis Parameters
Type of Breakpoint Analysis
This tool can either attempt to detect breakpoints or analyze a suspected breakpoint for significance (e.g., a known date of a station move). "Detect Possible Breakpoints" is seleced by default, which enables the "Initial Points to Analyze per Segment" and "Tolerance Level" fields (see below). Selecting "Analyze Suspected Breakpoint" will activate the "Date of Suspected Breakpoint" field (see below).
Initial Points to Analyze per Segment
Enter the minimum number of points that will be used to estimate a least squares line of best fit associated with the double mass plot (default value is nine). This regression line and the "Tolerance Level" are used to identify potential breakpoints.
Tolerance Level
Select the tolerance level that will be used to detect potential breakpoints. The prediction interval, using an alpha value of 0.05, is selected by default, but the user can select "specific value" and enter a number to use as the tolerance level. The detection of potential breakpoints is performed by first reading the number of initial points to analyze and then computing a least squares line of best fit. This regression line is then used to predict the next data point in the dependent data input above. If this prediction is within the tolerance level, then no slope change is assumed. The new data point will then be used with the previous data points to calculate a new line of best fit and the process will repeat. If the prediction breaches the tolerance level in either the positive or negative direction, then a breakpoint may have occurred and a flag is generated (see "Discussion of the Output Data" above for additional details). It is highly recommended that the prediction interval be used as the tolerance level because specific values are highly subjective and can be overly sensitive to breakpoints as datasets are cumulated.
Date of Suspected Breakpoint
Enter the year and, if necessary, the month when the suspected breakpoint occurs. If "Year / Data" is selected under "Input Data Format," then the month field will be disabled.
Output
Output File
Select the file where the new data will be stored.
Output File Header
Information entered here will be output in the first line of the output file.