Least square method solved examples pdf

The given example explains how to find the equation of a straight line or a least square line by using the method of least square, which is very useful in statistics as well as in mathematics.

Inhaltsverzeichnis Show

1. Method of Least Squares
2. Fitting of Simple Linear Regression Equation
Important Considerations in the Use of Regression Equation:
Example 5.1
Example 5.2
Properties of regression equation
Example 5.3
Example 5.9
Correlation
LEARNING OBJECTIVES
Introduction
Definition of Index Numbers
Uses of Index Numbers

Example:

Fit a least square line for the following data. Also find the trend values and show that $$\sum \left( {Y – \widehat Y} \right) = 0$$.

$$X$$	1	2	3	4	5
$$Y$$	2	5	3	8	7

Solution:

$$X$$	$$Y$$	$$XY$$	$${X^2}$$	$$\widehat Y = 1.1 + 1.3X$$	$$Y – \widehat Y$$
1	2	2	1	2.4	-0.4
2	5	10	4	3.7	+1.3
3	3	9	9	5.0	-2
4	8	32	16	6.3	1.7
5	7	35	25	7.6	-0.6
$$\sum X = 15$$	$$\sum Y = 25$$	$$\sum XY = 88$$	$$\sum {X^2} = 55$$	Trend Values	$$\sum \left( {Y – \widehat Y} \right) = 0$$

The equation of least square line $$Y = a + bX$$

Normal equation for ‘a’ $$\sum Y = na + b\sum X{\text{ }}25 = 5a + 15b$$ —- (1)

Normal equation for ‘b’ $$\sum XY = a\sum X + b\sum {X^2}{\text{ }}88 = 15a + 55b$$ —-(2)

Eliminate $$a$$ from equation (1) and (2), multiply equation (2) by 3 and subtract from equation (2). Thus we get the values of $$a$$ and $$b$$.

Here $$a = 1.1$$ and $$b = 1.3$$, the equation of least square line becomes $$Y = 1.1 + 1.3X$$.

For the trends values, put the values of $$X$$ in the above equation (see column 4 in the table above).

METHOD OF LEAST SQUARES

In most of the cases, the data points do not fall on a straight line (not highly correlated), thus leading to a possibility of depicting the relationship between the two variables using several different lines. Selection of each line may lead to a situation where the line will be closer to some points and farther from other points. We cannot decide which line can provide best fit to the data.

Method of least squares can be used to determine the line of best fit in such cases. It determines the line of best fit for given observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line.

1. Method of Least Squares

To obtain the estimates of the coefficients ‘a’ and ‘b’, the least squares method minimizes the sum of squares of residuals. The residual for the ith data point ei is defined as the difference between the observed value of the response variable, yi, and the estimate of the response variable, ŷi, and is identified as the error associated with the data. i.e., ei = yi–ŷi , i =1 ,2, ..., n.

The method of least squares helps us to find the values of unknowns ‘a’ and ‘b’ in such a way that the following two conditions are satisfied:

Sum of the residuals is zero. That is

Sum of the squares of the residuals E ( a , b ) =

is the least

2. Fitting of Simple Linear Regression Equation

The method of least squares can be applied to determine the estimates of ‘a’ and ‘b’ in the simple linear regression equation using the given data (x1,y1), (x2,y2), ..., (xn,yn) by minimizing

Here, yˆi = a + bx i is the expected (estimated) value of the response variable for given xi.

It is obvious that if the expected value (y^ i) is close to the observed value (yi), the residual will be small. Since the magnitude of the residual is determined by the values of ‘a’ and ‘b’, estimates of these coefficients are obtained by minimizing the sum of the squared residuals, E(a,b).

Differentiation of E(a,b) with respect to ‘a’ and ‘b’ and equating them to zero constitute a set of two equations as described below:

These equations are popularly known as normal equations. Solving these equations for ‘a’ and ‘b’ yield the estimates ˆa and ˆb.

It may be seen that in the estimate of ‘ b’, the numerator and denominator are respectively the sample covariance between X and Y, and the sample variance of X. Hence, the estimate of ‘b’ may be expressed as

Further, it may be noted that for notational convenience the denominator of bˆ above is mentioned as variance of nX. But, the definition of sample variance remains valid as defined in Chapter I, that is,

From Chapter 4, the above estimate can be expressed using, rXY , Pearson’s coefficient of the simple correlation between X and Y, as

Important Considerations in the Use of Regression Equation:

1. Regression equation exhibits only the relationship between the respective two variables. Cause and effect study shall not be carried out using regression analysis.

2. The regression equation is fitted to the given values of the independent variable. Hence, the fitted equation can be used for prediction purpose corresponding to the values of the regressor within its range. Interpolation of values of the response variable may be done corresponding to the values of the regressor from its range only. The results obtained from extrapolation work could not be interpreted.

Example 5.1

Construct the simple linear regression equation of Y on X if

Solution:

The simple linear regression equation of Y on X to be fitted for given data is of the form

ˆY = a + bx ……..(1)

The values of ‘a’ and ‘b’ have to be estimated from the sample data solving the following normal equations.

Substituting the given sample information in (2) and (3), the above equations can be expressed as

7 a + 113 b = 182 (4)

113 a + 1983 b = 3186 (5)

(4) × 113 ⇒ 791 a + 12769 b = 20566

(5) × 7 ⇒ 791 a + 13881 b = 22302

Substituting this in (4) it follows that,

7 a + 113 × 1.56 = 182

7 a + 176.28 = 182

7 a = 182 – 176.28

= 5.72

Hence, a = 0.82

Example 5.2

Number of man-hours and the corresponding productivity (in units) are furnished below. Fit a simple linear regression equation ˆY = a + bx applying the method of least squares.

Solution:

The simple linear regression equation to be fitted for the given data is

ˆYˆ = a + bx

Here, the estimates of a and b can be calculated using their least squares estimates

From the given data, the following calculations are made with n=9

Substituting the column totals in the respective places in the of the estimates aˆ and bˆ , their values can be calculated as follows:

Thus, bˆ = 1.48 .

Now aˆ can be calculated using bˆ as

aˆ = 121/9 – (1.48× 62.1/9)

= 13.40 – 10.21

Hence, aˆ = 3.19

Therefore, the required simple linear regression equation fitted to the given data is

ˆYˆ = 3.19 +1.48x

It should be noted that the value of Y can be estimated using the above fitted equation for the values of x in its range i.e., 3.6 to 10.7.

In the estimated simple linear regression equation of Y on X

ˆYˆ = aˆ + ˆbx

we can substitute the estimate aˆ =

− bˆ

. Then, the regression equation will become as

It shows that the simple linear regression equation of Y on X has the slope bˆ and the corresponding straight line passes through the point of averages (

). The above representation of straight line is popularly known in the field of Coordinate Geometry as ‘Slope-Point form’. The above form can be applied in fitting the regression equation for given regression coefficient bˆ and the averages

and

As mentioned in Section 5.3, there may be two simple linear regression equations for each X and Y. Since the regression coefficients of these regression equations are different, it is essential to distinguish the coefficients with different symbols. The regression coefficient of the simple linear regression equation of Y on X may be denoted as bYX and the regression coefficient of the simple linear regression equation of X on Y may be denoted as bXY.

Using the same argument for fitting the regression equation of Y on X, we have the simple linear regression equation of X on Y with best fit as

The slope-point form of this equation is

Also, the relationship between the Karl Pearson’s coefficient of correlation and the regression coefficient are

Page 2

PROPERTIES OF REGRESSION COEFFICIENTS

1. Correlation coefficient is the geometric mean between the regression coefficients.

2. It is clear from the property 1, both regression coefficients must have the same sign. i.e., either they will positive or negative.

3. If one of the regression coefficients is greater than unity, the other must be less than unity.

4. The correlation coefficient will have the same sign as that of the regression coefficients.

5. Arithmetic mean of the regression coefficients is greater than the correlation coefficient.

6. Regression coefficients are independent of the change of origin but not of scale.

Properties of regression equation

1. If r = 0, the variables are uncorrelated, the lines of regression become perpendicular to each other.

2. If r = 1, the two lines of regression either coincide or parallel to each other.

3. Angle between the two regression lines is θ = tan-1 (m1- m2 / 1+m1m2) where m1 and m2 are the slopes of regression lines X on Y and Y on X respectively.

4. The angle between the regression lines indicates the degree of dependence between the variable.

5. Regression equations intersect at (

)

Example 5.3

Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations from actual means of X and Y.

Estimate the likely demand when the X = 25.

Solution:

The regression line of U on V is computed as under

Hence, the regression line of U on V is U = ˆbUV v + ˆa = −0.12v

Thus, the regression line of X on Y is (Y–43) = –0.25(x–15)

When x = 25, y – 43 = –0.25 (25–15)

y = 40.5

Important Note: If

are not integers then the above method is tedious and time consuming to calculate bYX and bXY. The following modified formulae are easy for calculation.

Example 5.4

The following data gives the experience of machine operators and their performance ratings as given by the number of good parts turned out per 50 pieces.

Obtain the regression equations and estimate the ratings corresponding to the experience x=15.

Solution:

Regression equation of Y on X,

The above two means are in decimal places so for the simplicity we use this formula to compute bYX .

The regression equation of Y on X,

ˆY – 27.5 = 2.098 (x – 7.875)

ˆY – 27.5 = 2.098 x – 16.52

ˆY = 2.098x + 10.98

When x = 15

ˆY = 2.098 × 15 +10.98

Y = 31.47 + 10.98

= 42.45

Regression equation of X on Y,

The regression equation of X on Y,

ˆX – 7.875 = 0.169 (y – 27.5)

ˆX – 7.875 = 0.169y – 0.169 × 27.5

ˆX = 0.169y + 3.222

Example 5.5

The random sample of 5 school students is selected and their marks in statistics and accountancy are found to be

Find the two regression lines.

Solution:

The two regression lines are:

Regression equation of Y on X,

Regression equation of X on Y,

Since the mean values are in decimals format not as integers and numbers are big, we take origins for x and y and then solve the problem.

Regression equation of Y on X,

bUV = 1.038

ˆX – 69.6 = 1.038 (y – 72.6)

ˆX – 69.6 = 1.038y – 75.359

ˆX = 1.038y – 5.759

Example 5.6

Is there any mistake in the data provided about the two regression lines Y = −1.5 X + 7, and X = 0.6 Y + 9? Give reasons.

Solution:

The regression coefficient of Y on X is bYX = –1.5

The regression coefficient of X on Y is bXY = 0.6

Both the regression coefficients are of different sign, which is a contrary. So the given equations cannot be regression lines.

Example: 5.7

Correlation coefficient: 0.5

Estimate the yield when rainfall is 9 inches

Solution:

Let us denote the dependent variable yield by Y and the independent variable rainfall by X.

Regression equation of Y on X is given by

When x = 9,

Y – 10 = 2 (9 – 8)

Y = 2 + 10

= 12 kg (per unit area)

Corresponding to the annual rain fall 9 inches the expected yield is 12 kg ( per unit area).

Example 5.8

For 50 students of a class the regression equation of marks in Statistics ( X) on marks in Accountancy (Y) is 3Y – 5X + 180 = 0. The mean marks in of Accountancy is 50 and variance of marks in statistics is 16/25 of the variance of marks in Accountancy.

Find the mean marks in statistics and the coefficient of correlation between marks in the two subjects when the variance of Y is 25.

Solution:

We are given that:

n = 50, Regression equation of X on Y as 3Y – 5X + 180 = 0

= 50 , V (X) = 16/25 V (Y ) , and V(Y) = 25.

We have to find (i)

and (ii) rXY

(i) Calculation for

Since (

) is the point of intersection of the two regression lines, they lie on the regression line 3Y – 5X + 180 = 0

(ii) Calculation for coefficient of correlation.

3Y - 5 X + 180 = 0

- 5 X = - 180 - 3Y

Example 5.9

If two regression coefficients are bYX = 5/6 and bXY = 9/20 , what would be the value of rXY?

Solution:

The correlation coefficient rXY =

Since both the signs in bYX and bXY are positive, correlation coefficient between X and Y is positive.

Example 5.10

Given that bYX = 18/7 and bXY = -5/6 . Find r ?

Solution:

Since both the signs in bYX and bXY are negative, correlation coefficient between X and Y is negative.

Page 3

DIFFERENCE BETWEEN CORRELATION AND REGRESSION

Correlation

1. It indicates only the nature and extent of linear relationship

2. If the linear correlation is coefficient is positive / negative , then the two variables are positively / or negatively correlated

3. One of the variables can be taken as x and the other one can be taken as the variable y.

4. It is symmetric in x and y,

ie., rXY = rYX

Regression

1. It is the study about the impact of the independent variable on the dependent variable. It is used for predictions.

2. The regression coefficient is positive, then for every unit increase in x, the corresponding average increase in y is bYX. Similarly, if the regression coefficient is negative , then for every unit increase in x, the corresponding average decrease in y is bYX.

3. Care must be taken for the choice of independent variable and dependent variable. We can not assign arbitrarily x as independent variable and y as dependent variable.

4. It is not symmetric in x and y, that is, bXY and bYX have different meaning and interpretations.

Page 4

INDEX NUMBERS

Irving Fisher (1867–1947) was an American Statistician born in New York and his father was a teacher. As a child, he had remarkable mathematical ability and a flair for invention. In 1891, Fisher received the first Ph.D in economics from Yale University. Fisher had shown particular talent and inclination for mathematics, but he found that economics offered greater scope for his ambition and social concerns. He made important contributions to economics including index numbers. He edited the Yale Review from 1896 to 1910 and was active in many learned societies, institutes, and welfare organizations. He was a president of the American Economic Association. He died in New York City in 1947, at the age of 80.

LEARNING OBJECTIVES

The students will able to

· understand the concept and purpose of Index Numbers.

· calculate the indices to measure price and quantity changes over period of time.

· understand different tests an ideal Index Number satisfies. understand consumer price Index Numbers.

· understand the limitations of the construction of Index Numbers.

Introduction

Index number is a technique of measuring changes in a variable or a group of variables with respect to time, location or other characteristics. It is one of the most widely used statistical methods. Index number is a specialized average designed to measure the change in a group of related variables over a period of time. For example, the price of cotton in 2010 is studied with reference to its price in 2000. It is used to feel the pulse of the economy and it reveals the inflationary or deflationary tendencies. In reality, it is viewed as barometers of economic activity because if one wants to have an idea as to what is happening in an economy, he should check the important indicators like the index number of agricultural production, index number of industrial production, and the index number business activity etc., There are several types of index numbers and the students will learn them in this chapter.

Page 5

DEFINITION AND USES OF INDEX NUMBERS

Definition of Index Numbers

An Index Number is defined as a relative measure to compare and describe the average change in price, quantity value of an item or a group of related items with respect to time, geographic location or other characteristics accordingly.

In the words of Maslow “An index number is a numerical value characterizing the change in complex economic phenomenon over a period of time or space”

Spiegal defines, “An index number is a statistical measure designed to show changes in a variable on a group of related variables with respect to time, geographical location or other characteristics”.

According to Croxton and Cowden “Index numbers are devices for measuring differences in the magnitude of a group of related variables”.

Bowley describes “Index Numbers as a series which reflects in its trend and fluctuations the movements of some quantity”.

Uses of Index Numbers

The various uses of index numbers are:

Economic Parameters

The Index Numbers are one of the most useful devices to know the pulse of the economy.

It is used as an indicator of inflanationary or deflanationary tendencies.

Measures Trends

Index numbers are widely used for measuring relative changes over successive periods of time. This enable us to determine the general tendency. For example, changes in levels of prices, population, production etc. over a period of time are analysed.

Useful for comparsion

The index numbers are given in percentages. So it is useful for comparison and easy to understand the changes between two points of time.

Help in framing suitable policies

Index numbers are more useful to frame economic and business policies. For example, consumer price index numbers are useful in fixing dearness allowance to the employees.

Useful in deflating

Price index numbers are used for connecting the original data for changes in prices. The price index are used to determine the purchasing power of monetary unit.

Compares standard of living

Cost of living index of different periods and of different places will help us to compare the standard of living of the people. This enables the government to take suitable welfare measures.

Special type of average

All the basic ideas of averages are employed for the construction of index numbers. In averages, the data are homogeneous (in the same units) but in index number, we average the variables which have different units of measurements. Hence, it is a special type of average.