Using goodness of fit and Benford's Law for hypothesis testing to detect check fraud

5/12/2018

Intro

Howdy! I'm Professor Curtis of Aspire Mountain Academy here with more statistics homework help. Today we're going to learn how to use goodness of fit and Benford's law for hypothesis testing to detect check fraud. Here's our problem statement: An investigator analyzed the leading digits from 774 checks issued by seven suspect companies. The frequencies were found to be 222, 150, 180, 77, 66, 57, 48, 35, and 38, and those digits correspond to the leading digits of 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively. If the observed frequencies are substantially different from their frequencies expected with Benford's law shown below, the check amounts appear to result from fraud. Use a 10% significance level to test for goodness of fit with Benford's law. Does it appear that the checks are the result of fraud?

Part 1

OK, the first part asks us to determine the null and alternative hypotheses. This is the usual first step when we're doing hypothesis testing. Here the hypothesis testing is for goodness of fit. That means we have fixed hypotheses, so no matter what it is that we're looking at, the hypotheses for both the null and the alternative will be fixed.

The null hypothesis is by definition a statement of equality, and so it's always going to be (in terms of goodness of fit) that the data will correspond to some claimed distribution (in this case, Benford's law). So we look at our answer options, and we want the one that says the leading digits are from a population that conforms to Benford's law. It says that it's equal; the null hypothesis is by definition a statement of equality. So we want this statement of equality here.

The alternative hypothesis is then going to be that at least one of those digits does not conform to Benford's law. At least one of the observed frequencies does not conform to the claimed distribution, so we can ignore “at least two.” “At least one has a frequency that does not conform to Benford's law” — that's probably the one we want. “At most three” is not what we want. And “the leading digits are from a population that conforms to Benford's law” — that's also not what we want. So we're going to select this answer here. Excellent!

Part 2

Now the second part asks us to calculate the test statistic. To do that, we're going to put the data into StatCrunch and let StatCrunch do the heavy lifting for us. Here's my data in StatCrunch. Now I’m going to resize this window so we can get a better view of what's going on here. OK, so here we have in the first column the leading digits, and the second column the actual frequencies (these are what we're actually observed), and then in the last column we have the corresponding percentages for Benford's law distribution.

To run the goodness of fit test, I go to Stat –> Goodness-of-fit –> Chi-square (because we're not looking to compare with the normal distribution, so everything that we want to compare with is going to be with the chi-square test). We need to select our observed and expected frequencies. The observed frequencies are what we actually observe in the real world; this is the actual frequency column. In this case, the expected frequencies are what we would theretically expect or anticipate. So in this case, it's going to be the distribution that we're claiming the digits conform to — Benford's law. We actually have a column for that, so I'm going to select it here.

Everything else is just fine for our purposes, so I'm going to hit Compute!, and here is my results window with the test statistic and the P-value right here in this table at the top. It wants me to round the test statistic to three decimal places, so that's what I'll do. Fantastic!

Part 3

And now the P-value as always is in our results window right next door to the test statistic. Here we are asked to provide four decimal places, so again that's what I will do. Excellent!

Part 4

Now, the last part asks us to state the conclusion. Our P-value (55%) is well above our significance level of 10%. Therefore, we are outside the region of rejection, and we will fail to reject the null hypothesis. Because we fail to reject the null hypothesis, that means we don't have enough evidence to say that the leading digits are from a population that conforms to Benford's law, because that is what we see here is our null hypothesis.

Since we fail to reject it, it means that this actually could be true. So we don't really see anything that leads us to believe that we're straying away from Benford's law distribution, so sufficient evidence to warrant rejection of the claim that the leading digits are from a population with a distribution that conforms to Benford's law — well it's probable that we're actually conforming to Benford's law, so that means we're not going to reject that claim. Therefore, we're going to say there is not sufficient evidence. And since it seems like we're conforming to Benford's law distribution, it doesn't appear that we have any instance of check fraud here. I check my answer. Fantastic!

And that's how we do it at Aspire Mountain Academy. Be sure to leave your comments below to let us know how good a job we did or how we can improve. And if if your stats teacher is boring or just doesn't care to help you learn stats, go to aspiremountainacademy.com, where you can find out more about accessing our lecture videos or provide feedback on what you'd like to see. Thanks for watching! We'll see you in the next video.

1 Comment

Ana Carol link

5/6/2023 06:23:00 am

Thank you for explaining how to use goodness of fit and Benford's law for hypothesis testing to detect check fraud. It's important to note that in this case, since we fail to reject the null hypothesis, there is not sufficient evidence to suggest check fraud.