    # What is the name of the type of statistics that are used to indicate whether the results for a sample are likely to generalize to a population?

While descriptive statistics summarize the characteristics of a data set, inferential statistics help you come to conclusions and make predictions based on your data.

When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken.

Inferential statistics have two main uses:

• making estimates about populations (for example, the mean SAT score of all 11th graders in the US).
• testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).

## Descriptive versus inferential statistics

Descriptive statistics allow you to describe a data set, while inferential statistics allow you to make inferences based on a data set.

### Descriptive statistics

Using descriptive statistics, you can report characteristics of your data:

• The distribution concerns the frequency of each value.
• The central tendency concerns the averages of the values.
• The variability concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations.

Example: Descriptive statisticsYou collect data on the SAT scores of all 11th graders in a school for three years.

You can use descriptive statistics to get a quick overview of the school’s scores in those years. You can then directly compare the mean SAT score with the mean scores of other schools.

### Inferential statistics

Most of the time, you can only acquire data from samples, because it is too difficult or expensive to collect data from the whole population that you’re interested in.

While descriptive statistics can only summarize a sample’s characteristics, inferential statistics use your sample to make reasonable guesses about the larger population.

With inferential statistics, it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t make valid statistical inferences or generalize.

Example: Inferential statisticsYou randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics.

You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

### Sampling error in inferential statistics

Since the size of a sample is always smaller than the size of the population, some of the population isn’t captured by sample data. This creates sampling error, which is the difference between the true population values (called parameters) and the measured sample values (called statistics).

Sampling error arises any time you use a sample, even if your sample is random and unbiased. For this reason, there is always some uncertainty in inferential statistics. However, using probability sampling methods reduces this uncertainty.

## Estimating population parameters from sample statistics

The characteristics of samples and populations are described by numbers called statistics and parameters:

• A statistic is a measure that describes the sample (e.g., sample mean).
• A parameter is a measure that describes the whole population (e.g., population mean).

Sampling error is the difference between a parameter and a corresponding statistic. Since in most cases you don’t know the real population parameter, you can use inferential statistics to estimate these parameters in a way that takes sampling error into account.

There are two important types of estimates you can make about the population: point estimates and interval estimates.

• A point estimate is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
• An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

### Confidence intervals

A confidence interval uses the variability around a statistic to come up with an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take sampling error into account.

While a point estimate gives you a precise value for the parameter you are interested in, a confidence interval tells you the uncertainty of the point estimate. They are best used in combination with each other.

Each confidence interval is associated with a confidence level. A confidence level tells you the probability (in percentage) of the interval containing the parameter estimate if you repeat the study again.

A 95% confidence interval means that if you repeat your study with a new sample in exactly the same way 100 times, you can expect your estimate to lie within the specified range of values 95 times.

Although you can say that your estimate will lie within the interval a certain percentage of the time, you cannot say for sure that the actual population parameter will. That’s because you can’t know the true value of the population parameter without collecting data from the full population.

However, with random sampling and a suitable sample size, you can reasonably expect your confidence interval to contain the parameter a certain percentage of the time.

Example: Point estimate and confidence intervalYou want to know the average number of paid vacation days that employees at an international company receive. After collecting survey responses from a random sample, you calculate a point estimate and a confidence interval.

Your point estimate of the population mean paid vacation days is the sample mean of 19 paid vacation days.

With random sampling, a 95% confidence interval of [16 22] means you can be reasonably confident that the average number of vacation days is between 16 and 22.

Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing by making sure your paper is free of vague language, redundant words and awkward phrasing.  See editing example

## Hypothesis testing

Hypothesis testing is a formal process of statistical analysis using inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples.

Hypotheses, or predictions, are tested using statistical tests. Statistical tests also estimate sampling errors so that valid inferences can be made.

Statistical tests can be parametric or non-parametric. Parametric tests are considered more statistically powerful because they are more likely to detect an effect if one exists.

Parametric tests make assumptions that include the following:

When your data violates any of these assumptions, non-parametric tests are more suitable. Non-parametric tests are called “distribution-free tests” because they don’t assume anything about the distribution of the population data.

Statistical tests come in three forms: tests of comparison, correlation or regression.

### Comparison tests

Comparison tests assess whether there are differences in means, medians or rankings of scores of two or more groups.

To decide which test suits your aim, consider whether your data meets the conditions necessary for parametric tests, the number of samples, and the levels of measurement of your variables.

Means can only be found for interval or ratio data, while medians and rankings are more appropriate measures for ordinal data.

Comparison test Parametric? What’s being compared? Samples
t test Yes Means 2 samples
ANOVA Yes Means 3+ samples
Mood’s median No Medians 2+ samples
Wilcoxon signed-rank No Distributions 2 samples
Wilcoxon rank-sum (Mann-Whitney U) No Sums of rankings 2 samples
Kruskal-Wallis H No Mean rankings 3+ samples

### Correlation tests

Correlation tests determine the extent to which two variables are associated.

Although Pearson’s r is the most statistically powerful test, Spearman’s r is appropriate for interval and ratio variables when the data doesn’t follow a normal distribution.

The chi square test of independence is the only test that can be used with nominal variables.

Correlation test Parametric? Variables
Pearson’s r Yes Interval/ratio variables
Spearman’s r No Ordinal/interval/ratio variables
Chi square test of independence No Nominal/ordinal variables

### Regression tests

Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes.

Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations.

Data transformations help you make your data normally distributed using mathematical operations, like taking the square root of each value.

Regression test Predictor Outcome
Simple linear regression 1 interval/ratio variable 1 interval/ratio variable
Multiple linear regression 2+ interval/ratio variable(s) 1 interval/ratio variable
Logistic regression 1+ any variable(s) 1 binary variable
Nominal regression 1+ any variable(s) 1 nominal variable
Ordinal regression 1+ any variable(s) 1 ordinal variable 