STATISTICS - Section 1

 

Regression

 

 

Dependent & Independent Variables

Scatter Diagrams

Linear Regression

 

 

 

Dependent & Independent Variables

 

Regarding data sets of two variables:

 

The independent variable (controlled) is the variable representing the value being manipulated/altered.

 

The dependent variable (outcome) is the observed result.

 

 

Example

 

In an experiment, dependent variables are the factors that are influenced.


So consider an experiment to measure the effect of sunlight on plant growth.

 

Varying the amount of sunlight is a controlled action.

So this is the independent variable.

 

The amount of plant growth produced for a given level of light is an outcome.

Therefore plant growth is the dependent variable.

 

By convention, the dependent variable is on the y-axis, while the independent variable is on the x-axis.

 

 

dependent and independent variables

 

 

back to top

 

 

Scatter Diagrams

 

 

scatter diagrams

 

 

The first graph indicates a positive correlation. Points are scattered about a straight line with a positive gradient.

 

The second graph would have a negative correlation. The line would have a negative gradient.

 

The third graph would have a correlation close to zero. There is no line in any particular direction indicated.

 

 

When two sets of data are plotted against each other, a scatter of points is produced.

 

Correlation is simply the relationship between one set of data and the other.

 

It is expressed as a number called the correlation coefficient (r) with values -1 < r < +1 .

 

 

back to top

 

 

Linear Regression

 

Regression is a method of describing the relationship between two variables by formulating an equation.

 

Linear regression is simply where the equation is that of a straight line and has the form:

 

 

 

 

where 'a' is the intercept on the y-axis and 'b' is the gradient of the line.

 

 

The Method of Least Squares

 

 

regression explained

 

 

The method relies on the vertical component (y) being the dependent variable and the horizontal component(x) being the independent variable.

 

This is important when we consider correlation coefficient in detail later.

 

The formulation of a 'line of best fit' is achieved by examining each point in the scatter.

The vertical distance(ei) of points above and below the line is recorded and squared.


An equation for the line can be obtained when the sum of the squares given by:

 

 

 

 

is at a minimum.

 

 

When this applies, the gradient b can be expressed in terms of two derived quantities:

 

 

Sxy the covariance of x on y

 

 

Sxx the variance of x

 

 

These are given by:

 

            (i

 

 

               (ii

 

 

while 'b' , the gradient is given by:

 

                                                   (iii 

 

 

The intercept on the y-axis 'a' is found from:

 

 

 

 

If the mean x and y values are given by:

 

 

                          

 

 

then,                               

 

 

rearranging,

                                  (iv

 

 

 

Method for Solving Problems

 

Given a list of x-y data, results are found for:

 

the sum of x  (Σx)

the sum of y (Σy)

the sum of xy (Σxy)

the sum of x2 (Σx2)

 

Sxy and Sxx are calculated from equations (i & (ii .

 

Hence the gradient 'b' is found from (iii .

 

The mean value of x () is found by dividing Σx by the number of values 'n' .

 

The mean value of y ()is found by dividing Σy by the number of values 'n' .

 

Hence the intercept 'a' is found from (iv .

 

The linear equation is then presented as .

 

 

back to top

 

 

Example

 

Given the values of x and y below, find by linear regression an equation representing the data
(gradient & intercept to 1 d.p.).

 

x
1
2
4
4
6
9
y
2
3
5
7
7
8

no. data pairs (n) = 6

 

 

x
y
xy
x2
1
2
2
1
2
3
6
4
4
5
20
16
4
7
28
16
6
7
42
36
9
8
72
81
26
32
170
154
(Σx)
(Σy)
(Σxy)
(Σx2)

 

 

 

 

 

 

 

 

 

 

 

 

Using , the equation representing the data is:

 

 

 

 

 

back to top

 

 

 

this week's promoted video

 

 from Physics Trek

 

 

creative commons license

All downloads are covered by a Creative Commons License.
These are free to download and to share with others provided credit is shown.
Files cannot be altered in any way.
Under no circumstances is content to be used for commercial gain.

 

 

 

 

©copyright a-levelmathstutor.com 2020 - All Rights Reserved