What is the method by which one makes a conclusion about the entire population based on information obtained from a sample?

Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution. First, a tentative assumption is made about the parameter or distribution. This assumption is called the null hypothesis and is denoted by H0. An alternative hypothesis (denoted Ha), which is the opposite of what is stated in the null hypothesis, is then defined. The hypothesis-testing procedure involves using sample data to determine whether or not H0 can be rejected. If H0 is rejected, the statistical conclusion is that the alternative hypothesis Ha is true.

For example, assume that a radio station selects the music it plays based on the assumption that the average age of its listening audience is 30 years. To determine whether this assumption is valid, a hypothesis test could be conducted with the null hypothesis given as H0: μ = 30 and the alternative hypothesis given as Ha: μ ≠ 30. Based on a sample of individuals from the listening audience, the sample mean age, x̄, can be computed and used to determine whether there is sufficient statistical evidence to reject H0. Conceptually, a value of the sample mean that is “close” to 30 is consistent with the null hypothesis, while a value of the sample mean that is “not close” to 30 provides support for the alternative hypothesis. What is considered “close” and “not close” is determined by using the sampling distribution of x̄.

Ideally, the hypothesis-testing procedure leads to the acceptance of H0 when H0 is true and the rejection of H0 when H0 is false. Unfortunately, since hypothesis tests are based on sample information, the possibility of errors must be considered. A type I error corresponds to rejecting H0 when H0 is actually true, and a type II error corresponds to accepting H0 when H0 is false. The probability of making a type I error is denoted by α, and the probability of making a type II error is denoted by β.

In using the hypothesis-testing procedure to determine if the null hypothesis should be rejected, the person conducting the hypothesis test specifies the maximum allowable probability of making a type I error, called the level of significance for the test. Common choices for the level of significance are α = 0.05 and α = 0.01. Although most applications of hypothesis testing control the probability of making a type I error, they do not always control the probability of making a type II error. A graph known as an operating-characteristic curve can be constructed to show how changes in the sample size affect the probability of making a type II error.

A concept known as the p-value provides a convenient basis for drawing conclusions in hypothesis-testing applications. The p-value is a measure of how likely the sample results are, assuming the null hypothesis is true; the smaller the p-value, the less likely the sample results. If the p-value is less than α, the null hypothesis can be rejected; otherwise, the null hypothesis cannot be rejected. The p-value is often called the observed level of significance for the test.

A hypothesis test can be performed on parameters of one or more populations as well as in a variety of other situations. In each instance, the process begins with the formulation of null and alternative hypotheses about the population. In addition to the population mean, hypothesis-testing procedures are available for population parameters such as proportions, variances, standard deviations, and medians.

Hypothesis tests are also conducted in regression and correlation analysis to determine if the regression relationship and the correlation coefficient are statistically significant (see below Regression and correlation analysis). A goodness-of-fit test refers to a hypothesis test in which the null hypothesis is that the population has a specific probability distribution, such as a normal probability distribution. Nonparametric statistical methods also involve a variety of hypothesis-testing procedures.

The methods of statistical inference previously described are often referred to as classical methods. Bayesian methods (so called after the English mathematician Thomas Bayes) provide alternatives that allow one to combine prior information about a population parameter with information contained in a sample to guide the statistical inference process. A prior probability distribution for a parameter of interest is specified first. Sample information is then obtained and combined through an application of Bayes’s theorem to provide a posterior probability distribution for the parameter. The posterior distribution provides the basis for statistical inferences concerning the parameter.

A key, and somewhat controversial, feature of Bayesian methods is the notion of a probability distribution for a population parameter. According to classical statistics, parameters are constants and cannot be represented as random variables. Bayesian proponents argue that, if a parameter value is unknown, then it makes sense to specify a probability distribution that describes the possible values for the parameter as well as their likelihood. The Bayesian approach permits the use of objective data or subjective opinion in specifying a prior distribution. With the Bayesian approach, different individuals might specify different prior distributions. Classical statisticians argue that for this reason Bayesian methods suffer from a lack of objectivity. Bayesian proponents argue that the classical methods of statistical inference have built-in subjectivity (through the choice of a sampling plan) and that the advantage of the Bayesian approach is that the subjectivity is made explicit.

Bayesian methods have been used extensively in statistical decision theory (see below Decision analysis). In this context, Bayes’s theorem provides a mechanism for combining a prior probability distribution for the states of nature with sample information to provide a revised (posterior) probability distribution about the states of nature. These posterior probabilities are then used to make better decisions.

Statistical inference is the branch of statistics concerned with drawing conclusions and/or making decisions concerning a population based only on sample data.

Let’s consider an awesome example given by great professor. Suppose you are cooking some recipe and you want to test it before serving to the guest to get an idea about the dish as a whole. You will never eat the full dish to get that idea. Rather you will taste very little portion of your dish with a spoon.

  • So here you are only doing exploratory analysis to get idea what you cook with a sample in your hand.
  • Next if you generalize that your dish required some extra sugar or salt then that making an inference.
  • To get a valid and right inference your portion of dish that you tested should be representative of your sample. Otherwise conclusion will be wrong.

Population:

The term “population” is used in statistics to represent all possible measurements or outcomes that are of interest to us in a particular study.

Census:

Census  attempt to gather information from each and every unit of the population of interest.

Sample:

The term “sample” refers to a portion of the population that is representative of the population from which it was selected.

Depending on the sampling method, a sample can have fewer observations than the population, the same number of observations, or more observations. More than one sample can be derived from the same population.

Now the question is why we use sample in statistics why don’t we go for census?

Why using a sample? Why not census?

  1. Less time consuming than a census;
  2. less costly to administer than a census;
  3. measuring the variable of interest may involve the destruction of the population unit;
  4. a population may be infinite.

Parameters and Statistics:

One goal of statistical inference is to estimate a population parameter from a sample statistic.

What is the method by which one makes a conclusion about the entire population based on information obtained from a sample?

– Numerical characteristic of a population

– Constant (fixed) at any one moment

 – Usually unknown

– Numerical summary of a sample

 – Calculated from sample data (not constant)

– Used to estimate a parameter

Sampling:

sampling method is a procedure for selecting sample elements from a population. Sampling is necessary to make inferences about a population. If sample is not representative it is biased — you cannot generalize to the population from your statistical data.

Sampling Bias:

1. Convenience Sample:

Suppose you are conducting a survey on job employment of woman and man. Now neighbors of yours are very easily accessible to you and they are more likely to be include in your sample. If you do that then your inference will suffer from convenience sampling bias.

“Statistical inference with convenience samples is a risky business.”- David A. Freedman, Statistical Models and Causal Inference, p. 23

 If a convenience sample is used, inferences are not as trustworthy as if a random sample is used.

2. Non-Response:

If only a fraction of the randomly sampled people respond to your survey such that the sample is no longer repetitive of the population then it suffers from non-response bias.

Suppose you are conducting a survey on drug intake rate used by young students. In this case, some students might not reveal the information for personal reason. This is called non-response sample bias.

3. Voluntary Response:

Voluntary response occurs when sample consist of people who volunteer to respond because they have strong opinion on the issue. Often, voluntary response samples oversample people who have strong opinions and undersample people who don’t care much about the topic of the survey. Thus inferences from a voluntary response sample are not as trustworthy as conclusions based on a random sample of the entire population under consideration. Note that in voluntary response there is no initial random sample.

Sampling Method:

Simple Random Sample (SRS):

A sampling method is a procedure for selecting sample elements from a population. Simple random sampling refers to a sampling method that has the following properties.

  • The population consists of N objects.
  • The sample consists of n objects.
  • All possible samples of n objects are equally likely to occur.

Here we randomly select cases from the population such that each case is equally likely to be selected.

Stratified Sampling:

In stratified sampling, we divide the population into homogenous group called strata, then randomly sample from within each stratum. If you want to conduct a survey and first you divide the population as male and female them collect randomly 100 female and 100 male then it is called Stratified Sampling.

Cluster Sample:

In cluster sample, we divide the population in clusters or groups. Then randomly sample a few clusters then randomly sample from within these clusters. Here Sampling error is greater than with random sampling. The main difference between Stratified sampling and cluster sampling is clusters may not be homogeneous.

What is simple Random Sampling and Random Assignment?

Random sampling and random assignment are commonly confused or used interchangeably, though the terms refer to entirely different processes.

If subjects are selected from the population randomly and each members of population has equal chance to get selected and the sample is the representative of the entire population then it is called  random sampling. Therefore the studies result are generalizable for population at large.  Random assignment is an aspect of experimental design in which study participants are assigned to the treatment or control group using a random procedure.

What is the method by which one makes a conclusion about the entire population based on information obtained from a sample?

 
What is Sampling with Replacement and Without Replacement?

Suppose you pick a card from the deck, you can put the card aside or you can put it back into the deck. If you put the card back into the deck, it may be selected more than once; if we put it aside, it can be selected only one time.

When a population element can be selected more than one time, we are sampling with replacement. When a population element can be selected only one time, we are sampling without replacement.