# Correlation regression and Chi square

1). what information is provided by the numerical value of the pearson correlation?

Pearson correlation is a measure of the degree of association between two variables. The degree of association becomes stronger as it approaches the values 1 and -1. It also suggest something about the direction of the association, whether it is downward or upward.

2). in the following data, there are three scores (x, y, and z) for each of the n =5 individuals.

a. Sketch a graph showing the relationship between X and Y, compute the Pearson correlation between X and Y.

Pearson correlation (r) = (?XY – (?X?Y/N))/(?(?X2-(?X)2/N)(?Y2-(?Y)2/N)

= (36-(10*15)/5)/(?(30-100/5)(55-(225/5))

= (36-30)/(?(30-20)(55-45))

= 6/(?(10*10))

= 6/10

= 0.6

b. Sketch a graph showing the relationship between Y and Z. compose the pearson correlation between Y and Z.

Pearson correlation (r) = (?YZ – (?Z?Y/N))/(?(?Z2-(?Z)2/N)(?Y2-(?Y)2/N)

= (66-(20*15)/5)/(?(90-(400)/5)(55-(225/5))

= (66-60)/(?(90-80)(55-45))

= 6/(?(10*10))

= 6/10

= 0.6

c. Given the result of parts A and B, what would you predict for the correlation between X and Z?

The Pearson correlation in parts A and B are equal. Then we would predict that the correlation between X and Z are the same with parts A and B.

d. Sketch a graph showing the relationship between X and Z. compute the Pearson correlation for these data.

Pearson correlation (r) = (?XZ – (?X?Z/N))/(?(?X2-(?X)2/N)(?Z2-(?Z)2/N)

= (38-(10*20)/5)/(?(30-100/5)(90-(400)/5))

= (38-40)/(?(30-20)(90-80))

= -2/(?(10*10))

= -2/10

= -0.2

e. What general conclusion can you make concerning relationship among correlations? If X is related to Y and Y is related to Z, does this necessarily mean that X is related to Z?

Transitivity is not applicable relationship among correlations. When one speaks of transitivity, this only means that for any variables a, b and c, if a = b and b = c then a = c. This definition is not applicable to correlations of variables. One cannot predict correlation between two variables based from correlations of other set of variables. For example, X is related to Y, and Y is related to Z. Then, one cannot predict or say that X is related to Z based on the given correlations.

3) Sketch a graph showing the line for the equation Y=2x-3 on the same graph, show the line for Y= -2x+8

4) A set of scores produces a regression equation of F=7x-2. Use the equation to find the predicted value of Y for each of the following X scores: 0, 2, 5, 8, 10

F = 7x – 2

F(0) = 7(0) – 2

= 0 – 2 = -2

F(2) = 7(2) – 2

= 14 – 2 = 12

F(5) = 7(5) – 2

= 35 – 2 = 33

F(8) = 7(8) – 2

= 56 – 2 = 54

F(10) = 7(10) – 2

= 70 – 2 = 68

5) For the following data:

a. find the regression equation for predicting Y from X

? = bX + A

b = r(Sy/Sx), where r is the pearson correlation, Sy and Sx are the standard deviation of Y and X.

r = (?XY – (?X?Y/N))/(?(?X2-(?X)2/N)(?Y2-(?Y)2/N)

= (170-(25*30)/5)/(?(135-(625)/5)(346-(900)/5))

= (170-150)/(?(135-125)(346-180))

= (20)/?(10)(166)

= 20/?1660

= 20/40.7431

= 0.4909

b = 0.4909(6.4420/1.5811)

= 2.0001

A = My – bMx

= 6 – 2.0001(5)

= 6 – 10.0005

= -4.0005

? = 2.0001X – 4.0005

b. use the regression equation to find a predicted Y for each X

? = 2.0001X – 4.0005

X

?

7

10.0002

5

6.0000

6

8.0001

3

1.9998

4

3.9999

c. find the difference between the actual Y value and the predicted Y value for each individual, square the differences, and add the squared values to obtain SSresidual.

?

Y

Y – ?

(Y – ?)2

10.0002

16

5.9998

35.9976

6.0000

2

-4.0000

16.0000

8.0001

1

-7.0001

49.0014

1.9998

2

-0.0002

0.0000

3.9999

9

5.0001

25.0010

SSresidual = ?(Y – ?)2 = 126

d. calculate the Pearson correlation for these data. Use r² and SS? to compute SSresidual, with equation 15.13. You should obtain the same value as in part c.

r = (?XY – (?X?Y/N))/(?(?X2-(?X)2/N)(?Y2-(?Y)2/N)

= (170-(25*30)/5)/(?(135-(625)/5)(346-(900)/5))

= (170-150)/(?(135-125)(346-180))

= (20)/?(10)(166)

= 20/?1660

= 20/40.7431

= 0.4909

SSy = ?(Y – ?)2

= 166

SSresidual = (1 – r2)SSy

= (1 – 0.49092)(166)

= (1 – 0.2410)(166)

= (0.759)(166)

= 125.994 or 126

Chapter 16.

6) A professor noticed that the representatives on the college student government consist of 31 males and only 9 females. The general college population on the other hand, consists of 55% females and 45% males. Is the gender distribution for student government representatives significantly different from the distribution for the population? Test at the 0.5 level of significance.

Null Hypothesis: The gender distribution for student government representatives is not significantly different from the distribution for the population.

Alternative: The gender distribution for student government representatives is significantly different from the distribution for the population.

Observed

Expected

Male

31

18

Female

9

22

X2 (computed)= ?(observed – expected)2/(expected)

= (31-18)2/18 + (9-22)2/22

= 9.3889 + 7.6818

= 17.0707

X2 (critical) = 3.841

Decision rule, reject null hypothesis if X2 ? 3.841. Otherwise, fail to reject the null hypothesis.

At a = 0.05, X2 = 17.0707 ? 3.841, then we reject the null hypothesis.

We are 95% confident that the gender distribution for student government representatives is significantly different from the distribution for the population.

7) Data from the department of motor vehicles indicate that 80% of all licensed drivers are older than 25.

a. In a sample of n= 50 people who recently received speeding tickets, 32 were older than 25 years and the other 18 were age 25 or younger. Is the age distribution for this sample significantly different from the distribution for the population licensed divers? Use ? =.05.

Null Hypothesis: The age distribution for this sample is not significantly different from the distribution for the population licensed divers.

Alternative: The age distribution for this sample is not significantly different from the distribution for the population licensed divers.

Licensed Drivers

Observed

Expected

> 25 years old

32

40

? 25 years old

18

10

X2 (computed)= ?(observed – expected)2/(expected)

= (32-40)2/40 + (18-10)2/10

= 1.6 + 6.4

= 8

X2 (critical) = 3.841

Decision rule, reject null hypothesis if X2 ? 3.841. Otherwise, fail to reject the null hypothesis.

At a = 0.05, X2 = 8 ? 3.841, then we reject the null hypothesis.

We are 95% confident that the age distribution for this sample is significantly different from the distribution for the population licensed divers.

b. In a sample of n=50 people who recently received parking tickets. 38 were older than 25 years and the other 12 were age 25 or younger. Is the age distribution for this sample significantly different from the distribution for the population of licensed drivers? Use ? =.05.

Null Hypothesis: The age distribution for this sample is not significantly different from the distribution for the population licensed divers.

Alternative: The age distribution for this sample is not significantly different from the distribution for the population licensed divers.

Licensed Drivers

Observed

Expected

> 25 years old

38

40

? 25 years old

12

10

X2 (computed)= ?(observed – expected)2/(expected)

= (38-40)2/40 + (12-10)2/10

= 0.1 + 0.4

= 0.5

X2 (critical) = 3.841

Decision rule, reject null hypothesis if X2 ? 3.841. Otherwise, fail to reject the null hypothesis.

At a = 0.05, X2 = 0.5 ? 3.841, then we fail to reject the null hypothesis.

We are 95% confident that the age distribution for this sample is not significantly different from the distribution for the population licensed divers.

8) A researcher obtained a random sample of n = 60 students to determine whether there were any significant preferences among three leading brands of colas. Each student tasted all the brands and then selected his or her favorite. The resulting frequency distribution is as follows:

BRAND A BRAND B BRAND C

28 14 18

Are the data sufficient to indicate any preferences among the three brands? Test with ? =.05.

Null hypothesis: There are no significant preferences among three leading brands of colas.

Alternative: There are significant preferences among three leading brands of colas.

Brand

Observed

Expected

A

28

20

B

14

20

C

18

20

X2 (computed)= ?(observed – expected)2/(expected)

= (28-20)2/20 + (14-20)2/20 + (18-20)2/20

= 3.2 + 1.8 + 0.2

= 5.2

X2 (critical) = 5.991

Decision rule, reject null hypothesis if X2 ? 5.991. Otherwise, fail to reject the null hypothesis.

At a = 0.05, X2 = 5.2 ? 5.991, then we fail to reject the null hypothesis.

We are 95% confident that there are no significant preferences among three leading brands of colas.

9) A social psychologist suspects that people who serve on juries tends to be much older than citizens in the general population. Jurors are selected from the list of registered voters, so the ages for jurors should have the same distribution as the ages for voters. The psychologist obtains voters registration records and finds that 20% of registered voters are between 18 and 29 years old, and 35% are age 50or older. The psychologist also monitors jury composition over several weeks and observes the following distribution of ages for actual juries.

Age categories for jurors

18-29 30-49 50 and over

12 36 32

Are the data sufficient to conclude that the age distribution for the jurors is significantly different from the distribution for the population of registered voters? Test with ? =.05.

Null hypothesis: The age distribution for the jurors is not significantly different from the distribution for the population of registered voters.

Alternative: The age distribution for the jurors is not significantly different from the distribution for the population of registered voters

Age

Observed

Expected

18-29

12

16

30-49

36

36

50 and over

32

28

X2 (computed)= ?(observed – expected)2/(expected)

= (12-16)2/16 + (36-36)2/36 + (32-28)2/28

= 1 + 0 + 0.5714

= 1.5714

X2 (critical) = 5.991

Decision rule, reject null hypothesis if X2 ? 5.991. Otherwise, fail to reject the null hypothesis.

At a=0.05, X2 = 1.5714 ? 5.991. Thus, we fail to reject the null hypothesis.

We are 95% confident that the age distribution for the jurors is not significantly different from the distribution for the population of registered voters.

10) A psychology professor is trying to decide which test-book to use for next year’s introductory class. To help make the decision, the professor asks the current students to make to review three texts and identify which one they prefer. The distribution preference for the current class is as follows:

Book 1 book 2 book 3

52 41 27

Do the data indicate any significant preferences among the three books? Test with ? =.05.

Null hypothesis: There are no significant preferences among the three books.

Alternative: There are significant differences among the three books.

Book

Observed

Expected

1

52

40

2

41

40

3

27

40

X2 (computed)= ?(observed – expected)2/(expected)

= (52-40)2/40 + (41-40)2/40 + (27-40)2/40

= 3.6 + 0.025 + 4.225

= 7.85

X2 (critical) = 5.991

Decision rule, reject null hypothesis if X2 ? 5.991. Otherwise, fail to reject the null hypothesis.

At a=0.05, X2 = 7.85 ? 5.991. Thus, we reject the null hypothesis.

We are 95% confident that there are significant differences among the three books.