Skip to main content

Measurement Error Standard Error of Measurement

In testing, measurement error usually refers to the fact that the same people can obtain different scores on the same test at different times. In a broad sense, measurement error can also refer to the degree of accuracy of a test to correctly identify a condition, which is discussed as test validity.

Recall that test score reliability is a necessary but insufficient condition for test score validity.

Many tests in psychology, medicine, and education are useful. The reliability of the scores will vary depending on such factors as the properties of the test itself as well as how well the user follows standard procedures in administering the test, environmental factors that can affect the scores, and factors within the person taking the test.

The scores on many tests conform to the pattern called the normal curve or bell curve. In classical test theory, the scores people obtain on tests are simply called obtained scores (symbol X). Statisticians consider the variation in scores to estimate a "true score." Variations of obtained scores around the theoretical true score (symbol T) indicate error because a reliable test ought to yield the same score every time it is used. The deviations of those obtained scores are referred to as error (symbol E). In a formula, X = T + E.

Theoretically, the reliability of test scores depends on the ratio of variances of the true scores divided by the variances of the obtained scores. A perfectly reliable test would yield a reliability value of rxx = 1.0. In reality, most of the better tests yield average reliability values above .90. Test publishers are obligated by professional ethics to include reliability values in their test manuals.

Studies of score patterns allow statisticians to calculate the average variability of score error. Thus, for any given published test, there ought to be a statistic known as the Standard Error of Measurement, which is abbreviated as SEM.

Once the history of the SEM for a test is known based upon large scale studies, users can use that value to estimate how the scores of test takers might vary if the test taker were to take the same test again under similar conditions. The estimates are based on the properties of the normal curve thus, the test must yield scores that conform to the normal score pattern to use a SEM based on this model.

Example, suppose a student obtains an IQ score of 100 and one SEM = 4 then on future administration of the same test, the student would likely score between 96 and 104 68% of the time.

The process of forming a range of values around the obtained score should remind users and test takers that scores are not fixed properties. Scores vary and they tend to vary in a "standard" pattern. In this theory, the error variance has been standardized. Clearly, a user who wanted to be careful could use 2 SEMs, which would then allow a range of plus and minus 8 points. In the example, the IQ could range between 108 and 92.

It is important to keep in mind that tests are neither reliable or unreliable because reliability is the property of scores not tests. Thus it is incorrect to refer to a test as reliable or unreliable. We can speak about the degree of reliability of the scores.

There are other theories about testing and reliability.

The concept of how well a test accurately identifies a criterion, see the discussion of validity.

Cite this Blog Post

Sutton, G.W. (2020, April 21). Measurement error standard error of measurement. Assessment, Statistics, & Research. /2020/04/measurement-error-standard-error-of.html

Read more about statistics in these two books.

Read more about basic statistics in APPLIED STATISTICS: CONCEPTS FOR COUNSELORS at  AMAZON

Creating Surveys on AMAZON    or   GOOGLE  Worldwide


My Page
My Books  AMAZON                       GOOGLE STORE

FACEBOOK   Geoff W. Sutton
TWITTER  @Geoff.W.Sutton

Publications (many free downloads)
Academia   Geoff W Sutton   (PhD)     

  ResearchGate   Geoffrey W Sutton   (PhD)


Popular posts from this blog

Personal Self-Concept Questionnaire (PSQ)

  The Personal Self-Concept Questionnaire  ( PSQ )   Overview The Personal Self-Concept Questionnaire (PSQ) measures self-concept based on ratings of 18 items, which are grouped into four categories: Self-fulfilment, autonomy, honesty, and emotional self-concept. Subscales : The PSQ has four subscales 1. Self-fulfilment (6 items) 2. Autonomy (4 items) 3. Honesty (3 items) 4. Emotional self-concept (5 items)  👉 [ Read more about Self-Concept and Self-Identity] The PSQ is a Likert-type scale with five response options ranging from totally disagree to totally agree. Reliability and Validity In the first study, coefficient alpha = .85 and in study two, alpha = .83. Data analysis supported a four-dimensional model (see the four categories above). Positive correlations with other self-concept measures were statistically significant. Other notes The authors estimated it took about 10 minutes to complete the PSQ. Their first study included people ages 12 to 36 ( n = 506). In the second s

Student Self-Efficacy

  Assessment name:  STUDENT SELF-EFFICACY SCALE * Note. This post has been updated to provide an available measure of student self-efficacy. ———- Scale overview:  The  student self-efficacy scale i s a 10-item measure of self-efficacy. It was developed using data from university nursing students in the United States. Authors: Melodie Rowbotham and Gerdamarie Schmitz Response Type:  A four-choice rating scale as follows: 1 = not at all true 2 = hardly true 3 = moderately true 4 = exactly true   Self-efficacy is the perception that a person can act in a way to achieve a desired goal.  Scale items There are 10 items. Examples: I am confident in my ability to learn, even if I am having a bad day. If I try hard enough, I can obtain the academic goals I desire.   Psychometric properties The authors reported that their sample scores ranged from 25 to 40 with a scale mean of 34.23 ( SD  = 3.80. Internal consistency was high at alpha = .84. The authors reported the results of a principal compon

Mathematics Self-Efficacy and Anxiety Questionnaire (MSEAQ)

  Scale name: Mathematics Self-Efficacy and Anxiety Questionnaire (MSEAQ) Scale overview: The Mathematics Self-Efficacy and Anxiety Questionnaire (MSEAQ) is a 29-item self-report measure of both mathematics self-efficacy and mathematics anxiety. Author: Diana Kathleen May Response Type: Items are rated on a 5-point Likert-type scale following a “no response” option: 1 = Never 2 = Seldom 3 = Sometimes 4 = Often 5 = usually Sample items 1. I feel confident enough to ask questions  in my mathematics class. 6. I worry that I will not be able to get a  good grade in my mathematics course.   Subscales and basic statistics for the MSEAQ       Self-Efficacy M = 44.11, SD = 10.78, alpha = .93       Anxiety M = 46.47, SD = 12.61, alpha = .93       Total Scale M = 90.58, SD = 22.78, alpha = .96 Reliability: See the Cronbach’s alpha levels reported above. Validity: There were significant positive correlations with similar measures. The results of a Fa