Skip to main content

Understanding the distribution of sample mean (x_bar)

Cool, say now we have a huge population with characteristics (Mu, Sigma^2). When doing a study by sampling, we take a random sample (size n items) and then perform the study on the sample and conclude results back for the population.

From Central Limit Theorem, we know that the sample mean will always follow a normal distribution apart from what the population distribution is, such that:

x_bar ~ N (Mu, Sigma^2/n)
or say:
Expected (x_bar) = Mu
Variance (x_bar) = Sigma^2/n


Well, let's see a simple illustrating example: Suppose we have a population with mean Mu=100.
Now, we have taken a sample, and computed the sample mean, x_bar. We mostly will have x_bar near 100 but not exactly 100. OK, let take another 9 separate samples... suppose these results:

First sample --> x_bar = 99.8
Second sample --> x_bar = 100.1
..
..
..
10th sample --> x_bar = 100.3

What we see that the sample mean is usually close to real population mean, that is the meaning of the expected value of x_bar will be Mu.

Regarding the variance of sample mean (x_bar), variance will always decrease as sample size increase (sample variance=Sigma^2/n) which is natural behavior. We may think of this as the larger sample size we use, we tend to have more precise values for population mean.
When sample size goes to infinity (theoretically), the x_bar variance will be zero. The reason here is that the sample will be exactly the same as the population (all items). Thus, sample mean will give the real exact value for the population mean. There will be no variability in the sample mean because the it fully represents the population mean.
___


Comments

Popular posts from this blog

The "Sample"

Anytime you aim to perform a study on the entire population, you will surely find that this task will be: Much time and/or efforts consuming as populations are normally huge . Impossible if the population is infinite (such as products). Here comes the role of taking samples. Yes! we just take a sample from the whole population, perform the study on the chosen sample, apply the results back to our population. This is the core of  inferential statistics because what we do is to infer parameters/properties of the population using information from a small sample. Well, this does not mean we will obtain 100% exact accurate estimations or inferences. But to be as close as possible, sample elements should be taken randomly ! At least, being random in sample selection will mostly include the diversity of information/facts within our population.

Conclusions of Hypothesis Testing

A general hypothesis is defined as following (eg a hypothesis on the population mean): H0: Mu = Mu0 H1: Mu !=  Mu0 OK, apart from we have a two or one sided hypothesis, after performing the checking and statistical tests: our conclusion should be one of the following: Rejecting the null hypothesis (H0). Failing to reject the null hypothesis (H0). The following statements for conclusions are not accurate : Accepting the null hypothesis (H0). Accepting the alternative hypothesis (H1). But why? When we fail to reject H0, it does not mean we accept H0 as a fact because we still could not prove it as a fact. But what happened is that we failed to prove it to be false. This goes like following: we have suspected new factors may affected the population mean, then we have taken all possible evidences and checking, but all checking failed to prove our suspects. As well, rejecting H0 does not mean accepting H1 as a fact. What happens in this case is we p