Skip to main content

Understanding the Reliability of Educational and Psychological Tests


Why aren't tests reliable?

The reason tests are not reliable is that reliability is a property of the interpretation of scores not the tests themselves.
 

This isn't a matter of semantics.

Think about it this way.

Give all the students in one school an achievement test. The test items don't change so they appear stable, consistent, and reliable. However, when publishers report reliability values, they calculate the reliability statistics based on scores. Scores vary from one administration to another. If you ever took a test twice and got a different score, you know what I mean. Individuals change from day to day. And we change from year to year. Also, even a representative sample of students for a nation can be different each year.


Every time we calculate a reliability statistic, the statistic is slightly different.

Reliability values vary with the sample.

Reliability values vary with the method of calculation.

Reliability values also vary with the method used for calculation. You can get high reliability values using coefficient alpha with scores from a one-time administration. This method is common in research articles. But you will see different values from the same research team in different samples in the same article.


If we use a split-half method, which usually calculates reliability based on a correlation between two halves of one test, then we can get a reliability value based on one administration. But that's only half a test! Researchers use the Spearman-Brown formula to correct for the shortened half-test problem- but that's just an estimate of what the full test could be.


There's also a test-retest reliability method. Give a test one time, wait awhile- maybe a week or several weeks, then retest. That gives you an estimate of stability. But if you have a good memory, you can score higher on the second test on some tests like intelligence and achievement.


By now you get the point. Any one test can be associated with a lot of reliability values. The reliability problem is not just about tests. The problem can be understanding that tests do not have one reliability value. As with many things in science, there are many variables to consider when answering a question.

Reputable test publishers include reliability values in their test manuals. Teachers, Counselors, Psychologists, and other users ought to know about test score reliability.

Learn more assessment and statistical concepts in


Applied Statistics: Concepts for Counselors 

AMAZON BOOKS




Learn more about assessment and statistics at the Applied Statistics website


Learn more about Creating Surveys




Quick Notes on Test Reliability

Reliability is a property of scores not tests.

Reliability may mean stability of scores over time.

Reliability may mean how consistently test questions measure whatever the test measures.

Reliable test scores in one culture do not mean they will be reliable in another culture.

Reliable test scores do not guarantee the score are valid - but reliability places a limit on validity.

Reliability statistical concepts apply to tests, quizzes, polls, surveys...sets of questions yielding numerical scores.



Note: This is a re-posting of a post to this new blog. 

Links to Connections

My Page    www.suttong.com

  

My Books  AMAZON          and             GOOGLE STORE

 

FOLLOW   FACEBOOK   Geoff W. Sutton   TWITTER  @Geoff.W.Sutton

 

PINTEREST  www.pinterest.com/GeoffWSutton

 

Articles: Academia   Geoff W Sutton   ResearchGate   Geoffrey W Sutton 

 

 



Comments

Popular posts from this blog

Personal Self-Concept Questionnaire (PSQ)

  The Personal Self-Concept Questionnaire  ( PSQ )   Overview The Personal Self-Concept Questionnaire (PSQ) measures self-concept based on ratings of 18 items, which are grouped into four categories: Self-fulfilment, autonomy, honesty, and emotional self-concept. It is a likert-type rating scale with high internal consistency values and has been used with youth and adults. Subscales : The PSQ has four subscales 1. Self-fulfilment (6 items) 2. Autonomy (4 items) 3. Honesty (3 items) 4. Emotional self-concept (5 items)  ðŸ‘‰ [ Read more about Self-Concept and Self-Identity] The PSQ is a Likert-type scale with five response options ranging from totally disagree to totally agree. Reliability and Validity In the first study, coefficient alpha = .85 and in study two, alpha = .83. Data analysis supported a four-dimensional model (see the four categories above). Positive correlations with other self-concept measures were statistically significant. Other notes The authors e...

Mathematics Self-Efficacy and Anxiety Questionnaire (MSEAQ)

  Scale name: Mathematics Self-Efficacy and Anxiety Questionnaire (MSEAQ) Scale overview: The Mathematics Self-Efficacy and Anxiety Questionnaire (MSEAQ) is a 29-item self-report measure of both mathematics self-efficacy and mathematics anxiety. Author: Diana Kathleen May Response Type: Items are rated on a 5-point Likert-type scale following a “no response” option: 1 = Never 2 = Seldom 3 = Sometimes 4 = Often 5 = usually Sample items 1. I feel confident enough to ask questions  in my mathematics class. 6. I worry that I will not be able to get a  good grade in my mathematics course.   Subscales and basic statistics for the MSEAQ       Self-Efficacy M = 44.11, SD = 10.78, alpha = .93       Anxiety M = 46.47, SD = 12.61, alpha = .93       Total Scale M = 90.58, SD = 22.78, alpha = .96 Reliability: See the Cronbach’s alpha levels reported above. Validity: There were significant ...

Spiritual Bypass Scale (SBS-13)

  Assessment name:   Spiritual Bypass Scale-13 (SBS-13) Scale overview: To assess the observed spiritual bypassing phenomenon, Fox et al. (2017) developed the 13 item Spiritual Bypass Scale . Authors: Fox, Cashwell, and Picciotto    [ Read more about Spiritual Bypassing in Psychotherapy] Response Type: The 13 items are rated on a four-point scale of agreement. Scale items Data analyses from two ethnically diverse US adult samples supported two factors (Psychological Avoidance, PA; Spiritualizing, SP). PA example: When I am in pain, I believe God will deliver me from it SP example: When someone I know is in trouble, I believe it is because they have done something wrong spiritually.   Psychometric properties Cronbach’salphas: Total scale = .85, PA = .82; Sp = .75. The total SBS score was associated with the ASPIRES subscales except for connectedness. PA was associated with depression and SP with stress and anxiety (DASS-21). The over...