Now, let's look at these questions. Do husbands tend to be older than their wives? Well, we can collect some data, and here are the ages of five couples. What you see is well, there's a husband of age 43, whose wife is 41. So, there's a two-year age difference. And then we have a couple where the husband is 71, and the wife is 70. Then we have 32 and 31. 68 and 66, and 27 and 26. So, it looks like the husband is always a little bit older than his wife. So, how would we analyze those data? Clearly, we have two samples. There's the sample of the husbands' ages, and then there's a sample of the wives' ages. But of course the problem is that in this case, the two samples are not independent. The reason they are not independent is that while the husband's age is typically a bit larger than the wife's age, they tend to be closer together. For example, here the couple is both in their 70s, and then there's another couple in their 20s. So, this similarity in age makes the two samples dependent. So, there's one major assumption of having two independent samples that is not met in that case. But even if we could use the two-sample z-test, it would probably be not significant. And the reason is that the age differences are quite small, but the variation within each group is quite big. For example, husbands' age are 43, 71, 32. So, there's quite some fluctuation in there. And what the two-sample z-test does is, it compares the differences to the fluctuations within each population. And in this case, the differences are small compared to the fluctuations within each population. It turns out there's a very simple solution for these types of data. Since we have paired data, we can simply analyze the differences obtained from each pair with a simple t-test. So, again let's start out with specifying our null hypothesis, which would be that the difference in population means is 0. Then the t-test would look at the average of the differences, subtract off the expected value, which is 0 under the null hypothesis, and divide by the standard error of the average. Now, the formula for the standard error of the average is the standard deviation in the population of the differences divided by square root sample size. And again, we would simply estimate sigma by the sample standard deviation of the differences, which is 0.55. Plugging this in, we find that our t-statistic is 5.69. Now, we look up the t-table on some software, and what we have to do is, we have to find the area to the right in the t-table from 5.69 on, and again, this curve is student t-distribution with four degrees of freedom, which is one less than the sample size, and we would look that up and find the area there is 0.2%. So, the p-value in this case, is double that which is 0.4% and clearly this is strong evidence to reject the null. So, it turns out that the paired t-test is very powerful in that situation. Again, remember that we do not need an assumption of independence between the two samples, rather what happens here is that the independence assumptions is in the sampling of the pairs. Now, what would happen if we didn't know the age difference, but only if the husband was older than his wife or not? In that case, we can still assess the evidence. We would have to write the null hypothesis a little bit different. We would say half the husbands in the population are older than their wives. And the reason why we write it down that way is because we are now only interested whether the husband is older or not, and we specify the proportion of husbands who are older which is half. So, that hints that we are looking at 0/1 labels because we are counting how many husbands are older than their wives. And then we can just use a simple z-test. Remember, this is really the same thing that we looked at when we tested whether a coin is fair. So, the z-statistic is simply the sum of the 1s, minus the expected values, and so we have n trials, and each has a chance of a half of coming up 1, divided by the standard error of the sum. So, in this case, a label 1 would mean that a husband is older than the wife, and a label 0 would be the other way around. So, the z-statistic is 2.24, and now we can draw a normal curve, and we have to look up the area to the right of 2.24, and that area is 1.25%. And the p-value is double that. So, it's still significant, but not quite as significant as the one before. The reason why it's not as significant is because we don't use the information that's contained in the age difference. We only use the information whether the husband is older or not. This test is very famous. It's called a sign test. The way to think about it is the coin tossing example we looked earlier. The coin tossing analogy makes it very easy to interpret the sign test even for people who are not experts in statistics. And that's one of the main reasons why it's so popular.