Skip to the content of the web site.
 brodland collage

Java Applets - Linear Regression

A series of Java applets were written to help explain concepts in Probability and Statistics. One of these is shown here.

The Linear Regression Applet

Purpose

In setting up problems of this type, we assume that the data (xi, yi) arise from a physical process that has a linear relationship between two upper-case variables X and Y. The relationship between them is assumed to take the form

(1) Y = A + BX.

This is called the population regression equation because it describes the population of process from which the data points (xi, yi) are taken. In the Java applet shown below, these parameters are set in the Population Regression Equation section at the upper left of the applet. The default values have been set to A=0 and B=1.

Standard Error of Estimate

The actual population or process may contain some variability. The standard error of estimate, Sy|x, is defined as the standard deviation of the spread of the points about the population regression equation. It may arise from variability in the original population or process, or from measurement errors introduced during data collection. Thus the actual process might be described by the equation

(2) Y = A + BX + E,

where E is a random variable with a standard deviation equal to Sy|x. Usually, E would be assumed to be normally distributed. Note also, that in this approach, the error is implicitly assumed to arise entirely in y, while x is assumed to be without error. Individual data points that arise from the above equation would be

(3) yi = A + Bxi + Ei.

In the Linear Regression Applet shown below, Sy|x can be set by the user.

Generation of Data Points

The Generation of Data Points section of the applet allows the user to specify how many data points are to be generated using equation (3) and to specify the range of the data (X Mimimum and X Maximum). The user can also specify whether data points are to be distributed Uniformly or Randomly over the specified range.

Regression Analysis

The regression analysis is run by clicking on the Run Regression Simulation button. A different set of data is generated and a new regression analysis is carried out each time the button is clicked. Some intermediate and final calculations are given in the sidebar to the right of the output graph. Note that to facilitate programming, all confidence intervals are calculated using a normal distribution, even though a Student-t distribution would be more appropriate when the number of data points is less than about 30.

The user can determine whether or not to show the calculated regression line (y=a+bx), the 90% Confidence Interval for the Conditional Mean, and the 90% Confidence Interval for an Individual y Given x (i.e., for an additional data point).

Suggestions for Use

Basic Operation:

  1. Deselect all Regression Analysis output. (Do this by unchecking any checked boxes in the Regression Analysis section of the applet.)
  2. Uncheck the Show Pop. Reg. Line (Y+A+BX) box in the Population Regression Equation section.
  3. Click on the Run Regression Simulation button and display the resulting data points, only (Show Data Points should be checked).
  4. Then add the regression line to the display by checking Show Regression Line (y=a+bx), discuss it.
  5. Then add the 90% Confidence Intervals one at a time and discuss them.
  6. Since the population regression line is usually not known, it should normally be revealed last.

Other Suggestions:

With all boxes checked except for the 90% Confidence Interval for Individual Y Given X, run the regression analysis several times and determine the fraction of cases where Y=A+BX crosses the 90% Confidence Interval for the Conditional Mean. You would expect it to cross these curves approximately 10% of the time, since what is shown are the 90% confidence curves.

Use the simulation to investigate the effect of changes to such parameters as Sy|x and the Number of Data Points.

Note that the html source code of this page allows the user to adjust the colors and line widths in the output graphics. This feature allows the user to customize the graphics for classroom or desktop use. Colors and line widths in the original version (the version in this original web site) were optimized for use in a large classroom with a high-resolution video projector (at least 1024x768 pixels).

Comments on this applet are welcome at brodland@uwaterloo.ca