Section 1 : Linear Regresssion
 

[dependent & independent variables]

[scatter diagrams][linear regression]

 

 

 

Dependent & Independent Variables

Regarding data sets of two variables:

the independent variable (controlled) is the variable representing the value being manipulated/altered

the dependent variable (outcome) is the observed result

Example

In an experiment, dependent variables are the factors that are influenced.
So consider an experiment to measure the effect of sunlight on plant growth.

Varying the amount of sunlight is a controlled action. So this is the independent variable.

The amount of plant growth produced for a given level of light is an outcome. Therefore plant growth is the dependent variable.

By convention, the dependent variable is on the y-axis, while the independent variable is on the x-axis.

dependent and independent variables

 

back to top

 

Scatter Diagrams

scatter diagrams

The first graph indicates a positive correlation. Points are scattered about a straight line with a positive gradient.

The second graph would have a negative correlation. The line would have a negative gradient.

The third graph would have a correlation close to zero. There is no line in any particular direction indicated.

When two sets of data are plotted against each other, a scatter of points is produced. Correlation is simply the relationship between one set of data and the other. It is expressed as a number called the correlation coefficient(r) and has a range of values between -1 and +1 .

 

back to top

 

Linear Regression

Regression is a method of describing the relationship between two variables by formulating an equation. Linear regression is simply where the equation is that of a straight line and has the form y = a + bx *.

The Method of Least Squares

regression explained

The method relies on the vertical component (y) being the dependent variable and the horizontal component(x) being the independent variable. This is important when we consider correlation coefficient in detail later.

The formulation of a 'line of best fit' is achieved by examining each point in the scatter. The vertical distance(ei) of points above and below the line is recorded and squared.
An equation for the line can be obtained when the sum of the squares:

sum of squares of vertical error(e)   is a minimum.

When this applies, the gradient b can be expressed in terms of two derived quantities, Sxy the covariance of x on y and Sxx the variance of x.

covariance of x and y            (i

the variance of x               (ii

(derivations not shown due to lack of space)

the regression coefficient                     (iii                 

The intercept on the y-axis 'a' is found from:

regression line y on x

If the mean x and y values are given by:

            mean value of x               mean value of y

 

then,                                

rearranging,                                   (iv

 

Method for solving problems

Given a list of x-y data, results are found for:

the sum of x  (Σx)

the sum of y (Σy)

the sum of xy (Σy)

the sum of x2 (Σx2)

Sxy and Sxx are calculated from equations (i & (ii .

Hence the gradient 'b' is found from (iii .

The mean value of x (mean of x) is found by dividing Σx by the number of values 'n' .

The mean value of y (mean of y)is found by dividing Σy by the number of values 'n' .

Hence the intercept 'a' is found from (iv .

The linear equation is then presented as regression line y on x.

Example

Given the values of x and y below, find by linear regression an equation representing the data.
(gradient & intercept to 1 d.p.)

x
1
2
4
4
6
9
y
2
3
5
7
7
8

no. data pairs (n) = 6

x
y
xy
x2
1
2
2
1
2
3
6
4
4
5
20
16
4
7
28
16
6
7
42
36
9
8
72
81
26
32
170
154
(Σx)
(Σy)
(Σxy)
(Σx2)

x bar equals 4.33

y bar equals 5.33

 

Sxy = 31.33

Sxx = 41.33

 

b = 0.8

 

intercept 'a' =2.0

 

Using regression line y on x , the equation representing the data is:

linear regression - least squares equation

 

 

back to top

 

your stop for the best in math, science & programming tutorials on the Net revision help to get a better result incremental success advanced physics for secondary/high school, including much in-depth content common to first year university courses your one stop for the best in math, science and programming tuition revision help for a better result incremental success advanced physics for high school/secondary and 1st year university fast-track learning for everyone

[ PURE MATHS ][ MECHANICS ][ STATISTICS ]

VIDEO

the mean
the median
stand. deviation 1
stand. deviation 2
stand. deviation 3
z-scores
confidence interval
goodness of fit
distrib. sample mean
t interval
chi-squared test
 
more...Video Library
 

INTERACTIVE

normal distribution
mean,median comprd 1
mean,median comprd 2
type I & II errors
linear regression
histogram,box whisker
 
 

EXAM PAPERS(.pdf)

Edxl S1 Statistics spec.
Edxl S1 Statistics ans.
Edxl S2 Statistics spec.
Edxl S2 Statistics ans.
Edxl S3 Statistics spec.
Edxl S3 Statistics ans.
 

TOPIC NOTES(.pdf)

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Google