Recall that test score reliability is a necessary but insufficient condition for test score validity.
Many tests in psychology, medicine, and education are useful. The reliability of the scores will vary depending on such factors as the properties of the test itself as well as how well the user follows standard procedures in administering the test, environmental factors that can affect the scores, and factors within the person taking the test.
The scores on many tests conform to the pattern called the normal curve or bell curve. In classical test theory, the scores people obtain on tests are simply called obtained scores (symbol X). Statisticians consider the variation in scores to estimate a "true score." Variations of obtained scores around the theoretical true score (symbol T) indicate error because a reliable test ought to yield the same score every time it is used. The deviations of those obtained scores are referred to as error (symbol E). In a formula, X = T + E.
Theoretically, the reliability of test scores depends on the ratio of variances of the true scores divided by the variances of the obtained scores. A perfectly reliable test would yield a reliability value of rxx = 1.0. In reality, most of the better tests yield average reliability values above .90. Test publishers are obligated by professional ethics to include reliability values in their test manuals.
Studies of score patterns allow statisticians to calculate the average variability of score error. Thus, for any given published test, there ought to be a statistic known as the Standard Error of Measurement, which is abbreviated as SEM.
Once the history of the SEM for a test is known based upon large scale studies, users can use that value to estimate how the scores of test takers might vary if the test taker were to take the same test again under similar conditions. The estimates are based on the properties of the normal curve thus, the test must yield scores that conform to the normal score pattern to use a SEM based on this model.
Example, suppose a student obtains an IQ score of 100 and one SEM = 4 then on future administration of the same test, the student would likely score between 96 and 104 68% of the time.
The process of forming a range of values around the obtained score should remind users and test takers that scores are not fixed properties. Scores vary and they tend to vary in a "standard" pattern. In this theory, the error variance has been standardized. Clearly, a user who wanted to be careful could use 2 SEMs, which would then allow a range of plus and minus 8 points. In the example, the IQ could range between 108 and 92.
It is important to keep in mind that tests are neither reliable or unreliable because reliability is the property of scores not tests. Thus it is incorrect to refer to a test as reliable or unreliable. We can speak about the degree of reliability of the scores.
There are other theories about testing and reliability.
The concept of how well a test accurately identifies a criterion, see the discussion of validity.
Sutton, G.W. (2020, April 21). Measurement error standard error of measurement. Assessment, Statistics, & Research. https://statistics.suttong.com /2020/04/measurement-error-standard-error-of.html
Read more about statistics in these two books.