Dependent & Independent Variables
Regarding data sets of two variables:
the independent variable (controlled) is the variable representing the value being manipulated/altered
the dependent variable (outcome) is the observed result
Example
In an experiment, dependent variables are the factors that are influenced.
So consider an experiment to measure the effect of sunlight on plant growth.
Varying the amount of sunlight is a controlled action. So this is the independent variable.
The amount of plant growth produced for a given level of light is an outcome. Therefore plant growth is the dependent variable.
By convention, the dependent variable is on the yaxis, while the independent variable is on the xaxis.
Scatter Diagrams
The first graph indicates a positive correlation. Points are scattered about a straight line with a positive gradient.
The second graph would have a negative correlation. The line would have a negative gradient.
The third graph would have a correlation close to zero. There is no line in any particular direction indicated.
When two sets of data are plotted against each other, a scatter of points is produced. Correlation is simply the relationship between one set of data and the other. It is expressed as a number called the correlation coefficient(r) and has a range of values between 1 and +1 .
Linear Regression
Regression is a method of describing the relationship between two variables by formulating an equation. Linear regression is simply where the equation is that of a straight line and has the form y = a + bx *.
The Method of Least Squares
The method relies on the vertical component (y) being the dependent variable and the horizontal component(x) being the independent variable. This is important when we consider correlation coefficient in detail later.
The formulation of a 'line of best fit' is achieved by examining each point in the scatter. The vertical distance(e_{i}) of points above and below the line is recorded and squared.
An equation for the line can be obtained when the sum of the squares:
is a minimum.
When this applies, the gradient b can be expressed in terms of two derived quantities, S_{xy} the covariance of x on y and S_{xx} the variance of x.
(i
(ii
(derivations not shown due to lack of space)
(iii
The intercept on the yaxis 'a' is found from:
If the mean x and y values are given by:
then,
rearranging, (iv
Method for solving problems
Given a list of xy data, results are found for:
the sum of x (Σx)
the sum of y (Σy)
the sum of xy (Σy)
the sum of x^{2} (Σx^{2})
S_{xy} and S_{xx} are calculated from equations (i & (ii .
Hence the gradient 'b' is found from (iii .
The mean value of x () is found by dividing Σx by the number of values 'n' .
The mean value of y ()is found by dividing Σy by the number of values 'n' .
Hence the intercept 'a' is found from (iv .
The linear equation is then presented as .
Example
Given the values of x and y below, find by linear regression an equation representing the data.
(gradient & intercept to 1 d.p.)
x 
1 
2 
4 
4 
6 
9 
y 
2 
3 
5 
7 
7 
8 
no. data pairs (n) = 6
x 
y 
xy 
x^{2} 
1 
2 
2 
1 
2 
3 
6 
4 
4 
5 
20 
16 
4 
7 
28 
16 
6 
7 
42 
36 
9 
8 
72 
81 
26 
32 
170 
154 
(Σx) 
(Σy) 
(Σxy) 
(Σx^{2}) 
Using , the equation representing the data is:

