Wednesday, July 19, 2017

Understanding the Reliability of Educational and Psychological Tests

Why aren't tests reliable?

The reason tests are not reliable is that reliability is a property of the interpretation of scores not the tests themselves.

This isn't a matter of semantics.

Think about it this way.

Give all the students in one school an achievement test. The test items don't change so they appear stable, consistent, and reliable. However, when publishers report reliability values, they calculate the reliability statistics based on scores. Scores vary from one administration to another. If you ever took a test twice and got a different score, you know what I mean. Individuals change from day to day. And we change from year to year. Also, even a representative sample of students for a nation can be different each year.

Everytime we calculate a reliability statistic, the statistic is slightly different.

Reliability values vary with the sample.

Reliability values vary with the method of calculation.

Reliability values also vary with the method used for calculation. You can get high reliability values using coefficient alpha with scores from a one-time administration. This method is common in research articles. But you will see different values from the same research team in different samples in the same article.

If we use a split-half method, which usually calculates reliability based on a correlation between two halves of one test, then we can get a reliability value based on one administration. But that's only half a test! Researchers use the Spearman-Brown formula to correct for the shortened half-test problem- but that's just an estimate of what the full test could be.

There's also a test-retest reliability method. Give a test one time, wait awhile- maybe a week or several weeks, then retest. That gives you an estimate of stability. But if you have a good memory, you can score higher on the second test on some tests like intelligence and achievement.

By now you get the point. Any one test can be associated with a lot of reliability values. The reliability problem is not just about tests. The problem can be understanding that tests do not have one reliability value. As with many things in science, there are many variables to consider when answering a question.

Reputable test publishers include reliability values in their test manuals. Teachers, Counselors, Psychologists, and other users ought to know about test score reliability.

Quick Notes on Reliability

Reliabilility is a property of scores not tests.

Reliability may mean stability of scores over time.

Reliability may mean how consistently test questions measure whatever the test measures.

Reliable test scores in one culture do not mean they will be reliable in another culture.

Reliable test scores do not guarantee the score are valid - but reliability places a limit on validity.

Reliability statistical concepts apply to tests, quizzes, polls, surveys...sets of questions yielding numerical scores.

Note: This is a re-posting of a post to this new blog.

