Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Instrument reliability.

Reliability (visit the concept map that shows the various types of reliability)

A test is reliable to the extent that whatever it measures, it measures it consistently. If I were to stand on a scale and the scale read 15 pounds, I might wonder. Suppose I were to step off the scale and stand on it again, and again it read 15 pounds. The scale is producing consistent results. From a research point of view, the scale seems to be reliable because whatever it is measuring, it is measuring it consistently. Whether those consistent results are valid is another question.  However, an instrument cannot be valid if it is not reliable.

There are three major categories of reliability for most instruments: test-retest, equivalent form, and internal consistency. Each measures consistency a bit differently and a given instrument need not meet the requirements of each. Test-retest measures consistency from one time to the next. Equivalent-form measures consistency between two versions of an instrument. Internal-consistency measures consistency within the instrument (consistency among the questions). A fourth category (scorer agreement)  is often used with performance and product assessments. Scorer agreement is consistency of rating a performance or product among different judges who are rating the performance or product. Generally speaking, the longer a test is, the more reliable it tends to be (up to a point). For research purposes, a minimum reliability of .70 is required for attitude instruments. Some researchers feel that it should be higher. A reliability of .70 indicates 70% consistency in the scores that are produced by the instrument. Many tests, such as achievement tests, strive for .90 or higher reliabilities.

Relationship of Test Forms and Testing Sessions Required for Reliability Procedures



Split-Half
Kuder-Richardson
Cronbach’s Alpha
Equivalent (Alternative)-Form
Test-Retest

Test-Retest Method (stability: measures error because of changes over time) The same instrument is given twice to the same group of people. The reliability is the correlation between the scores on the two instruments.  If the results are consistent over time, the scores should be similar. The trick with test-retest reliability is determining how long to wait between the two administrations. One should wait long enough so the subjects don’t remember how they responded the first time they completed the instrument, but not so long that their knowledge of the material being measured has changed. This may be a couple weeks to a couple months.

If one were investigating the reliability of a test measuring mathematics skills, it would not be wise to wait two months. The subjects probably would have gained additional mathematics skills during the two months and thus would have scored differently the second time they completed the test. We would not want their knowledge to have changed between the first and second testing.

Equivalent-Form (Parallel or Alternate-Form) Method (measures error because of differences in test forms) Two different versions of the instrument are created. We assume both measure the same thing. The same subjects complete both instruments during the same time period. The scores on the two instruments are correlated to calculate the consistency between the two forms of the instrument.

Internal-Consistency Method (measures error because of idiosyncrasies of the test items) Several internal-consistency methods exist. They have one thing in common. The subjects complete one instrument one time. For this reason, this is the easiest form of reliability to investigate. This method measures consistency within the instrument three different ways.

– Split-Half A total score for the odd number questions is correlated with a total score for the even number questions (although it might be the first half with the second half). This is often used with dichotomous variables that are scored 0 for incorrect and 1 for correct.The Spearman-Brown prophecy formula is applied to the correlation to determine the reliability.

– Kuder-Richardson Formula 20 (K-R 20) and  Kuder-Richardson Formula 21 (K-R 21) These are alternative formulas for calculating how consistent subject responses are among the questions on an instrument. Items on the instrument must be dichotomously scored (0 for incorrect and 1 for correct). All items are compared with each other, rather than half of the items with the other half of the items. It can be shown mathematically that the Kuder-Richardson reliability coefficient is actually the mean of all split-half coefficients (provided the Rulon formula is used) resulting from different splittings of a test. K-R 21 assumes that all of the questions are equally difficult. K-R 20 does not assume that. The formula for K-R 21 can be found on page 179.

– Cronbach’s Alpha When the items on an instrument are not scored right versus wrong, Cronbach’s alpha is often used to measure the internal consistency. This is often the case with attitude instruments that use the Likert scale. A computer program such as SPSS is often used to calculate Cronbach’s alpha. Although Cronbach’s alpha is usually used for scores which fall along a continuum, it will produce the same results as KR-20 with dichotomous data (0 or 1).

I have created an Excel spreadsheet that will calculate Spearman-Brown, KR-20, KR-21, and Cronbach’s alpha. The spreadsheet will handle data for a maximum 1000 subjects with a maximum of 100 responses for each.

Scoring Agreement (measures error because of the scorer) Performance and product assessments are often based on scores by individuals who are trained to evaluate the performance or product. The consistency between rating can be calculated in a variety of ways.

– Interrater Reliability Two judges can evaluate a group of student products and the correlation between their ratings can be calculated (r=.90 is a common cutoff).

– Percentage Agreement Two judges can evaluate a group of products and a percentage for the number of times they agree is calculated (80% is a common cutoff).

———

All scores contain error. The error is what lowers an instrument’s reliability. Obtained Score = True Score + Error Score

———-

There could be a number of reasons why the reliability estimate for a measure is low. Four common sources of inconsistencies of test scores are listed below:

Test Taker — perhaps the subject is having a bad day Test Itself — the questions on the instrument may be unclear Testing Conditions — there may be distractions during the testing that detract the subject Test Scoring — scores may be applying different standards when evaluating the subjects’ responses

Del Siegle, Ph.D. Neag School of Education – University of Connecticut [email protected] www.delsiegle.info

Created 9/24/2002 Edited 10/17/2013

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • The 4 Types of Reliability in Research | Definitions & Examples

The 4 Types of Reliability in Research | Definitions & Examples

Published on August 8, 2019 by Fiona Middleton . Revised on June 22, 2023.

Reliability tells you how consistently a method measures something. When you apply the same method to the same sample under the same conditions, you should get the same results. If not, the method of measurement may be unreliable or bias may have crept into your research.

There are four main types of reliability. Each can be estimated by comparing different sets of results produced by the same method.

Type of reliability Measures the consistency of…
The same test over .
The same test conducted by different .
of a test which are designed to be equivalent.
The of a test.

Table of contents

Test-retest reliability, interrater reliability, parallel forms reliability, internal consistency, which type of reliability applies to my research, other interesting articles, frequently asked questions about types of reliability.

Test-retest reliability measures the consistency of results when you repeat the same test on the same sample at a different point in time. You use it when you are measuring something that you expect to stay constant in your sample.

Why it’s important

Many factors can influence your results at different points in time: for example, respondents might experience different moods, or external conditions might affect their ability to respond accurately.

Test-retest reliability can be used to assess how well a method resists these factors over time. The smaller the difference between the two sets of results, the higher the test-retest reliability.

How to measure it

To measure test-retest reliability, you conduct the same test on the same group of people at two different points in time. Then you calculate the correlation between the two sets of results.

Test-retest reliability example

You devise a questionnaire to measure the IQ of a group of participants (a property that is unlikely to change significantly over time).You administer the test two months apart to the same group of people, but the results are significantly different, so the test-retest reliability of the IQ questionnaire is low.

Improving test-retest reliability

  • When designing tests or questionnaires , try to formulate questions, statements, and tasks in a way that won’t be influenced by the mood or concentration of participants.
  • When planning your methods of data collection , try to minimize the influence of external factors, and make sure all samples are tested under the same conditions.
  • Remember that changes or recall bias can be expected to occur in the participants over time, and take these into account.

Prevent plagiarism. Run a free check.

Interrater reliability (also called interobserver reliability) measures the degree of agreement between different people observing or assessing the same thing. You use it when data is collected by researchers assigning ratings, scores or categories to one or more variables , and it can help mitigate observer bias .

People are subjective, so different observers’ perceptions of situations and phenomena naturally differ. Reliable research aims to minimize subjectivity as much as possible so that a different researcher could replicate the same results.

When designing the scale and criteria for data collection, it’s important to make sure that different people will rate the same variable consistently with minimal bias . This is especially important when there are multiple researchers involved in data collection or analysis.

To measure interrater reliability, different researchers conduct the same measurement or observation on the same sample. Then you calculate the correlation between their different sets of results. If all the researchers give similar ratings, the test has high interrater reliability.

Interrater reliability example

A team of researchers observe the progress of wound healing in patients. To record the stages of healing, rating scales are used, with a set of criteria to assess various aspects of wounds. The results of different researchers assessing the same set of patients are compared, and there is a strong correlation between all sets of results, so the test has high interrater reliability.

Improving interrater reliability

  • Clearly define your variables and the methods that will be used to measure them.
  • Develop detailed, objective criteria for how the variables will be rated, counted or categorized.
  • If multiple researchers are involved, ensure that they all have exactly the same information and training.

Parallel forms reliability measures the correlation between two equivalent versions of a test. You use it when you have two different assessment tools or sets of questions designed to measure the same thing.

If you want to use multiple different versions of a test (for example, to avoid respondents repeating the same answers from memory), you first need to make sure that all the sets of questions or measurements give reliable results.

The most common way to measure parallel forms reliability is to produce a large set of questions to evaluate the same thing, then divide these randomly into two question sets.

The same group of respondents answers both sets, and you calculate the correlation between the results. High correlation between the two indicates high parallel forms reliability.

Parallel forms reliability example

A set of questions is formulated to measure financial risk aversion in a group of respondents. The questions are randomly divided into two sets, and the respondents are randomly divided into two groups. Both groups take both tests: group A takes test A first, and group B takes test B first. The results of the two tests are compared, and the results are almost identical, indicating high parallel forms reliability.

Improving parallel forms reliability

  • Ensure that all questions or test items are based on the same theory and formulated to measure the same thing.

Internal consistency assesses the correlation between multiple items in a test that are intended to measure the same construct.

You can calculate internal consistency without repeating the test or involving other researchers, so it’s a good way of assessing reliability when you only have one data set.

When you devise a set of questions or ratings that will be combined into an overall score, you have to make sure that all of the items really do reflect the same thing. If responses to different items contradict one another, the test might be unreliable.

Two common methods are used to measure internal consistency.

  • Average inter-item correlation : For a set of measures designed to assess the same construct, you calculate the correlation between the results of all possible pairs of items and then calculate the average.
  • Split-half reliability : You randomly split a set of measures into two sets. After testing the entire set on the respondents, you calculate the correlation between the two sets of responses.

Internal consistency example

A group of respondents are presented with a set of statements designed to measure optimistic and pessimistic mindsets. They must rate their agreement with each statement on a scale from 1 to 5. If the test is internally consistent, an optimistic respondent should generally give high ratings to optimism indicators and low ratings to pessimism indicators. The correlation is calculated between all the responses to the “optimistic” statements, but the correlation is very weak. This suggests that the test has low internal consistency.

Improving internal consistency

  • Take care when devising questions or measures: those intended to reflect the same concept should be based on the same theory and carefully formulated.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

research instrument reliability

It’s important to consider reliability when planning your research design , collecting and analyzing your data, and writing up your research. The type of reliability you should calculate depends on the type of research  and your  methodology .

What is my methodology? Which form of reliability is relevant?
Measuring a property that you expect to stay the same over time. Test-retest
Multiple researchers making observations or ratings about the same topic. Interrater
Using two different tests to measure the same thing. Parallel forms
Using a multi-item test where all the items are intended to measure the same variable. Internal consistency

If possible and relevant, you should statistically calculate reliability and state this alongside your results .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

You can use several tactics to minimize observer bias .

  • Use masking (blinding) to hide the purpose of your study from all observers.
  • Triangulate your data with different data collection methods or sources.
  • Use multiple observers and ensure interrater reliability.
  • Train your observers to make sure data is consistently recorded between them.
  • Standardize your observation procedures to make sure they are structured and clear.

Reproducibility and replicability are related terms.

  • A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
  • A successful replication shows that the reliability of the results is high.

Research bias affects the validity and reliability of your research findings , leading to false conclusions and a misinterpretation of the truth. This can have serious implications in areas like medical research where, for example, a new form of treatment may be evaluated.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Middleton, F. (2023, June 22). The 4 Types of Reliability in Research | Definitions & Examples. Scribbr. Retrieved June 24, 2024, from https://www.scribbr.com/methodology/types-of-reliability/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, reliability vs. validity in research | difference, types and examples, what is quantitative research | definition, uses & methods, data collection | definition, methods & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

research instrument reliability

  • My Bookings
  • How to Determine the Validity and Reliability of an Instrument

How to Determine the Validity and Reliability of an Instrument By: Yue Li

Validity and reliability are two important factors to consider when developing and testing any instrument (e.g., content assessment test, questionnaire) for use in a study. Attention to these considerations helps to insure the quality of your measurement and of the data collected for your study.

Understanding and Testing Validity

Validity refers to the degree to which an instrument accurately measures what it intends to measure. Three common types of validity for researchers and evaluators to consider are content, construct, and criterion validities.

  • Content validity indicates the extent to which items adequately measure or represent the content of the property or trait that the researcher wishes to measure. Subject matter expert review is often a good first step in instrument development to assess content validity, in relation to the area or field you are studying.
  • Construct validity indicates the extent to which a measurement method accurately represents a construct (e.g., a latent variable or phenomena that can’t be measured directly, such as a person’s attitude or belief) and produces an observation, distinct from that which is produced by a measure of another construct. Common methods to assess construct validity include, but are not limited to, factor analysis, correlation tests, and item response theory models (including Rasch model).
  • Criterion-related validity indicates the extent to which the instrument’s scores correlate with an external criterion (i.e., usually another measurement from a different instrument) either at present ( concurrent validity ) or in the future ( predictive validity ). A common measurement of this type of validity is the correlation coefficient between two measures.

Often times, when developing, modifying, and interpreting the validity of a given instrument, rather than view or test each type of validity individually, researchers and evaluators test for evidence of several different forms of validity, collectively (e.g., see Samuel Messick’s work regarding validity).

Understanding and Testing Reliability

Reliability refers to the degree to which an instrument yields consistent results. Common measures of reliability include internal consistency, test-retest, and inter-rater reliabilities.

  • Internal consistency reliability looks at the consistency of the score of individual items on an instrument, with the scores of a set of items, or subscale, which typically consists of several items to measure a single construct. Cronbach’s alpha is one of the most common methods for checking internal consistency reliability. Group variability, score reliability, number of items, sample sizes, and difficulty level of the instrument also can impact the Cronbach’s alpha value.
  • Test-retest measures the correlation between scores from one administration of an instrument to another, usually within an interval of 2 to 3 weeks. Unlike pre-post tests, no treatment occurs between the first and second administrations of the instrument, in order to test-retest reliability. A similar type of reliability called alternate forms , involves using slightly different forms or versions of an instrument to see if different versions yield consistent results.
  • Inter-rater reliability checks the degree of agreement among raters (i.e., those completing items on an instrument). Common situations where more than one rater is involved may occur when more than one person conducts classroom observations, uses an observation protocol or scores an open-ended test, using a rubric or other standard protocol. Kappa statistics, correlation coefficients, and intra-class correlation (ICC) coefficient are some of the commonly reported measures of inter-rater reliability.

Developing a valid and reliable instrument usually requires multiple iterations of piloting and testing which can be resource intensive. Therefore, when available, I suggest using already established valid and reliable instruments, such as those published in peer-reviewed journal articles. However, even when using these instruments, you should re-check validity and reliability, using the methods of your study and your own participants’ data before running additional statistical analyses. This process will confirm that the instrument performs, as intended, in your study with the population you are studying, even though they are identical to the purpose and population for which the instrument was initially developed. Below are a few additional, useful readings to further inform your understanding of validity and reliability.

Resources for Understanding and Testing Reliability

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985).  Standards for educational and psychological testing . Washington, DC: Authors.
  • Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental measurement in the human sciences . Mahwah, NJ: Lawrence Erlbaum.
  • Cronbach, L. (1990).  Essentials of psychological testing .   New York, NY: Harper & Row.
  • Carmines, E., & Zeller, R. (1979).  Reliability and Validity Assessment . Beverly Hills, CA: Sage Publications.
  • Messick, S. (1987). Validity . ETS Research Report Series, 1987: i–208. doi: 10.1002/j.2330-8516.1987.tb00244.x
  • Liu, X. (2010). Using and developing measurement instruments in science education: A Rasch modeling approach . Charlotte, NC: Information Age.
  • Search for:

Recent Posts

  • Avoiding Data Analysis Pitfalls
  • Advice in Building and Boasting a Successful Grant Funding Track Record
  • Personal History of Miami University’s Discovery and E & A Centers
  • Center Director’s Message

Recent Comments

  • November 2016
  • September 2016
  • February 2016
  • November 2015
  • October 2015
  • Uncategorized
  • Entries feed
  • Comments feed
  • WordPress.org

Grad Coach

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

research instrument reliability

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

research instrument reliability

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Research aims, research objectives and research questions

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Reliability and validity: Importance in Medical Research

Affiliations.

  • 1 Al-Nafees Medical College,Isra University, Islamabad, Pakistan.
  • 2 Fauji Foundation Hospital, Foundation University Medical College, Islamabad, Pakistan.
  • PMID: 34974579
  • DOI: 10.47391/JPMA.06-861

Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool controls random error. The current narrative review was planned to discuss the importance of reliability and validity of data-collection or measurement techniques used in research. It describes and explores comprehensively the reliability and validity of research instruments and also discusses different forms of reliability and validity with concise examples. An attempt has been taken to give a brief literature review regarding the significance of reliability and validity in medical sciences.

Keywords: Validity, Reliability, Medical research, Methodology, Assessment, Research tools..

PubMed Disclaimer

Similar articles

  • The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments. Walters SJ, Stern C, Robertson-Malt S. Walters SJ, et al. JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159. JBI Database System Rev Implement Rep. 2016. PMID: 27532315 Review.
  • Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Bolarinwa OA. Bolarinwa OA. Niger Postgrad Med J. 2015 Oct-Dec;22(4):195-201. doi: 10.4103/1117-1936.173959. Niger Postgrad Med J. 2015. PMID: 26776330
  • Validity and reliability of measurement instruments used in research. Kimberlin CL, Winterstein AG. Kimberlin CL, et al. Am J Health Syst Pharm. 2008 Dec 1;65(23):2276-84. doi: 10.2146/ajhp070364. Am J Health Syst Pharm. 2008. PMID: 19020196 Review.
  • [Psychometric characteristics of questionnaires designed to assess the knowledge, perceptions and practices of health care professionals with regards to alcoholic patients]. Jaussent S, Labarère J, Boyer JP, François P. Jaussent S, et al. Encephale. 2004 Sep-Oct;30(5):437-46. doi: 10.1016/s0013-7006(04)95458-9. Encephale. 2004. PMID: 15627048 Review. French.
  • Evaluation of research studies. Part IV: Validity and reliability--concepts and application. Fullerton JT. Fullerton JT. J Nurse Midwifery. 1993 Mar-Apr;38(2):121-5. doi: 10.1016/0091-2182(93)90146-8. J Nurse Midwifery. 1993. PMID: 8492191
  • A psychometric assessment of a novel scale for evaluating vaccination attitudes amidst a major public health crisis. Cheng L, Kong J, Xie X, Zhang F. Cheng L, et al. Sci Rep. 2024 May 4;14(1):10250. doi: 10.1038/s41598-024-61028-z. Sci Rep. 2024. PMID: 38704420 Free PMC article.
  • Test-Retest Reliability of Isokinetic Strength in Lower Limbs under Single and Dual Task Conditions in Women with Fibromyalgia. Gomez-Alvaro MC, Leon-Llamas JL, Melo-Alonso M, Villafaina S, Domínguez-Muñoz FJ, Gusi N. Gomez-Alvaro MC, et al. J Clin Med. 2024 Feb 24;13(5):1288. doi: 10.3390/jcm13051288. J Clin Med. 2024. PMID: 38592707 Free PMC article.
  • Bridging, Mapping, and Addressing Research Gaps in Health Sciences: The Naqvi-Gabr Research Gap Framework. Naqvi WM, Gabr M, Arora SP, Mishra GV, Pashine AA, Quazi Syed Z. Naqvi WM, et al. Cureus. 2024 Mar 8;16(3):e55827. doi: 10.7759/cureus.55827. eCollection 2024 Mar. Cureus. 2024. PMID: 38590484 Free PMC article. Review.
  • Reliability, validity, and responsiveness of the simplified Chinese version of the knee injury and Osteoarthritis Outcome Score in patients after total knee arthroplasty. Yao R, Yang L, Wang J, Zhou Q, Li X, Yan Z, Fu Y. Yao R, et al. Heliyon. 2024 Feb 21;10(5):e26786. doi: 10.1016/j.heliyon.2024.e26786. eCollection 2024 Mar 15. Heliyon. 2024. PMID: 38434342 Free PMC article.
  • Psychometric evaluation of the Chinese version of the stressors in breast cancer scale: a translation and validation study. Hu W, Bao J, Yang X, Ye M. Hu W, et al. BMC Public Health. 2024 Feb 9;24(1):425. doi: 10.1186/s12889-024-18000-3. BMC Public Health. 2024. PMID: 38336690 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Pakistan Medical Association

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • How it works

researchprospect post subheader

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threat Definition Example
Confounding factors Unexpected events during the experiment that are not a part of treatment. If you feel the increased weight of your experiment participants is due to lack of physical activity, but it was actually due to the consumption of coffee with sugar.
Maturation The influence on the independent variable due to passage of time. During a long-term experiment, subjects may feel tired, bored, and hungry.
Testing The results of one test affect the results of another test. Participants of the first experiment may react differently during the second experiment.
Instrumentation Changes in the instrument’s collaboration Change in the   may give different results instead of the expected results.
Statistical regression Groups selected depending on the extreme scores are not as extreme on subsequent testing. Students who failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection bias Choosing comparison groups without randomisation. A group of trained and efficient teachers is selected to teach children communication skills instead of randomly selecting them.
Experimental mortality Due to the extension of the time of the experiment, participants may leave the experiment. Due to multi-tasking and various competition levels, the participants may leave the competition because they are dissatisfied with the time-extension even if they were doing well.

Threats of External Validity

Threat Definition Example
Reactive/interactive effects of testing The participants of the pre-test may get awareness about the next experiment. The treatment may not be effective without the pre-test. Students who got failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection of participants A group of participants selected with specific characteristics and the treatment of the experiment may work only on the participants possessing those characteristics If an experiment is conducted specifically on the health issues of pregnant women, the same treatment cannot be given to male participants.

How to Assess Reliability and Validity?

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Type of reliability What does it measure? Example
Test-Retests It measures the consistency of the results at different points of time. It identifies whether the results are the same after repeated measures. Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from a various group of participants, it means the validity of the questionnaire and product is high as it has high test-retest reliability.
Inter-Rater It measures the consistency of the results at the same time by different raters (researchers) Suppose five researchers measure the academic performance of the same student by incorporating various questions from all the academic subjects and submit various results. It shows that the questionnaire has low inter-rater reliability.
Parallel Forms It measures Equivalence. It includes different forms of the same test performed on the same participants. Suppose the same researcher conducts the two different forms of tests on the same topic and the same students. The tests could be written and oral tests on the same topic. If results are the same, then the parallel-forms reliability of the test is high; otherwise, it’ll be low if the results are different.
Inter-Term It measures the consistency of the measurement. The results of the same tests are split into two halves and compared with each other. If there is a lot of difference in results, then the inter-term reliability of the test is low.

Types of Validity

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Type of reliability What does it measure? Example
Content validity It shows whether all the aspects of the test/measurement are covered. A language test is designed to measure the writing and reading skills, listening, and speaking skills. It indicates that a test has high content validity.
Face validity It is about the validity of the appearance of a test or procedure of the test. The type of   included in the question paper, time, and marks allotted. The number of questions and their categories. Is it a good question paper to measure the academic performance of students?
Construct validity It shows whether the test is measuring the correct construct (ability/attribute, trait, skill) Is the test conducted to measure communication skills is actually measuring communication skills?
Criterion validity It shows whether the test scores obtained are similar to other measures of the same concept. The results obtained from a prefinal exam of graduate accurately predict the results of the later final exam. It shows that the test has high criterion validity.

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Segments Explanation
All the planning about reliability and validity will be discussed here, including the chosen samples and size and the techniques used to measure reliability and validity.
Please talk about the level of reliability and validity of your results and their influence on values.
Discuss the contribution of other researchers to improve reliability and validity.

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

What are the different research strategies you can use in your dissertation? Here are some guidelines to help you choose a research strategy that would make your research more credible.

In historical research, a researcher collects and analyse the data, and explain the events that occurred in the past to test the truthfulness of observations.

A case study is a detailed analysis of a situation concerning organizations, industries, and markets. The case study generally aims at identifying the weak areas.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Grad Med Educ
  • v.3(2); 2011 Jun

A Primer on the Validity of Assessment Instruments

1. what is reliability 1.

Reliability refers to whether an assessment instrument gives the same results each time it is used in the same setting with the same type of subjects. Reliability essentially means consistent or dependable results. Reliability is a part of the assessment of validity.

2. What is validity? 1

Validity in research refers to how accurately a study answers the study question or the strength of the study conclusions. For outcome measures such as surveys or tests, validity refers to the accuracy of measurement. Here validity refers to how well the assessment tool actually measures the underlying outcome of interest. Validity is not a property of the tool itself, but rather of the interpretation or specific purpose of the assessment tool with particular settings and learners.

Assessment instruments must be both reliable and valid for study results to be credible. Thus, reliability and validity must be examined and reported, or references cited, for each assessment instrument used to measure study outcomes. Examples of assessments include resident feedback survey, course evaluation, written test, clinical simulation observer ratings, needs assessment survey, and teacher evaluation. Using an instrument with high reliability is not sufficient; other measures of validity are needed to establish the credibility of your study.

3. How is reliability measured? 2 – 4

Reliability can be estimated in several ways; the method will depend upon the type of assessment instrument. Sometimes reliability is referred to as internal validity or internal structure of the assessment tool.

For internal consistency 2 to 3 questions or items are created that measure the same concept, and the difference among the answers is calculated. That is, the correlation among the answers is measured.

Cronbach alpha is a test of internal consistency and frequently used to calculate the correlation values among the answers on your assessment tool. 5 Cronbach alpha calculates correlation among all the variables, in every combination; a high reliability estimate should be as close to 1 as possible.

For test/retest the test should give the same results each time, assuming there are no interval changes in what you are measuring, and they are often measured as correlation, with Pearson r.

Test/retest is a more conservative estimate of reliability than Cronbach alpha, but it takes at least 2 administrations of the tool, whereas Cronbach alpha can be calculated after a single administration. To perform a test/retest, you must be able to minimize or eliminate any change (ie, learning) in the condition you are measuring, between the 2 measurement times. Administer the assessment instrument at 2 separate times for each subject and calculate the correlation between the 2 different measurements.

Interrater reliability is used to study the effect of different raters or observers using the same tool and is generally estimated by percent agreement, kappa (for binary outcomes), or Kendall tau.

Another method uses analysis of variance (ANOVA) to generate a generalizability coefficient, to quantify how much measurement error can be attributed to each potential factor, such as different test items, subjects, raters, dates of administration, and so forth. This model looks at the overall reliability of the results. 6

5. How is the validity of an assessment instrument determined? 4 – 7 , 8

Validity of assessment instruments requires several sources of evidence to build the case that the instrument measures what it is supposed to measure. , 9,10 Determining validity can be viewed as constructing an evidence-based argument regarding how well a tool measures what it is supposed to do. Evidence can be assembled to support, or not support, a specific use of the assessment tool. Evidence can be found in content, response process, relationships to other variables, and consequences.

Content includes a description of the steps used to develop the instrument. Provide information such as who created the instrument (national experts would confer greater validity than local experts, who in turn would have more validity than nonexperts) and other steps that support the instrument has the appropriate content.

Response process includes information about whether the actions or thoughts of the subjects actually match the test and also information regarding training for the raters/observers, instructions for the test-takers, instructions for scoring, and clarity of these materials.

Relationship to other variables includes correlation of the new assessment instrument results with other performance outcomes that would likely be the same. If there is a previously accepted “gold standard” of measurement, correlate the instrument results to the subject's performance on the “gold standard.” In many cases, no “gold standard” exists and comparison is made to other assessments that appear reasonable (eg, in-training examinations, objective structured clinical examinations, rotation “grades,” similar surveys).

Consequences means that if there are pass/fail or cut-off performance scores, those grouped in each category tend to perform the same in other settings. Also, if lower performers receive additional training and their scores improve, this would add to the validity of the instrument.

Different types of instruments need an emphasis on different sources of validity evidence. 7 For example, for observer ratings of resident performance, interrater agreement may be key, whereas for a survey measuring resident stress, relationship to other variables may be more important. For a multiple choice examination, content and consequences may be essential sources of validity evidence. For high-stakes assessments (eg, board examinations), substantial evidence to support the case for validity will be required. 9

There are also other types of validity evidence, which are not discussed here.

6. How can researchers enhance the validity of their assessment instruments?

First, do a literature search and use previously developed outcome measures. If the instrument must be modified for use with your subjects or setting, modify and describe how, in a transparent way. Include sufficient detail to allow readers to understand the potential limitations of this approach.

If no assessment instruments are available, use content experts to create your own and pilot the instrument prior to using it in your study. Test reliability and include as many sources of validity evidence as are possible in your paper. Discuss the limitations of this approach openly.

7. What are the expectations of JGME editors regarding assessment instruments used in graduate medical education research?

JGME editors expect that discussions of the validity of your assessment tools will be explicitly mentioned in your manuscript, in the methods section. If you are using a previously studied tool in the same setting, with the same subjects, and for the same purpose, citing the reference(s) is sufficient. Additional discussion about your adaptation is needed if you (1) have modified previously studied instruments; (2) are using the instrument for different settings, subjects, or purposes; or (3) are using different interpretation or cut-off points. Discuss whether the changes are likely to affect the reliability or validity of the instrument.

Researchers who create novel assessment instruments need to state the development process, reliability measures, pilot results, and any other information that may lend credibility to the use of homegrown instruments. Transparency enhances credibility.

In general, little information can be gleaned from single-site studies using untested assessment instruments; these studies are unlikely to be accepted for publication.

8. What are useful resources for reliability and validity of assessment instruments?

The references for this editorial are a good starting point.

Gail M. Sullivan, MD, MPH, is Editor-in-Chief, Journal of Graduate Medical Education .

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 5: Psychological Measurement

Reliability and Validity of Measurement

Learning Objectives

  • Define reliability, including the different types and how they are assessed.
  • Define validity, including the different types and how they are assessed.
  • Describe the kinds of evidence that would be relevant to assessing the reliability and validity of a particular measure.

Again, measurement involves assigning scores to individuals so that they represent some characteristic of the individuals. But how do researchers know that the scores actually represent the characteristic, especially when it is a construct like intelligence, self-esteem, depression, or working memory capacity? The answer is that they conduct research using the measure to confirm that the scores make sense based on their understanding of the construct being measured. This is an extremely important point. Psychologists do not simply  assume  that their measures work. Instead, they collect data to demonstrate  that they work. If their research does not demonstrate that a measure works, they stop using it.

As an informal example, imagine that you have been dieting for a month. Your clothes seem to be fitting more loosely, and several friends have asked if you have lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds, this would make sense and you would continue to use the scale. But if it indicated that you had gained 10 pounds, you would rightly conclude that it was broken and either fix it or get rid of it. In evaluating a measurement method, psychologists consider two general dimensions: reliability and validity.

Reliability

Reliability  refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time.  Test-retest reliability  is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the  same  group of people at a later time, and then looking at  test-retest correlation  between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson’s  r . Figure 5.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. Pearson’s r for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Score at time 1 is on the x-axis and score at time 2 is on the y-axis, showing fairly consistent scores

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal Consistency

A second kind of reliability is  internal consistency , which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioural and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a  split-half correlation . This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5.3 shows the split-half correlation between several university students’ scores on the even-numbered items and their scores on the odd-numbered items of the Rosenberg Self-Esteem Scale. Pearson’s  r  for these data is +.88. A split-half correlation of +.80 or greater is generally considered good internal consistency.

Score on even-numbered items is on the x-axis and score on odd-numbered items is on the y-axis, showing fairly consistent scores

Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called  Cronbach’s α  (the Greek letter alpha). Conceptually, α is the mean of all possible split-half correlations for a set of items. For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

Interrater Reliability

Many behavioural measures involve significant judgment on the part of an observer or a rater.  Inter-rater reliability  is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other. Inter-rater reliability would also have been measured in Bandura’s Bobo doll study. In this case, the observers’ ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated. Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.

Validity  is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. As an absurd example, imagine someone who believes that people’s index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. The fact that one person’s index finger is a centimetre longer than another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to interpret these types is that they are other kinds of evidence—in addition to reliability—that should be taken into account when judging the validity of a measure. Here we consider three basic kinds: face validity, content validity, and criterion validity.

Face Validity

Face validity  is the extent to which a measurement method appears “on its face” to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behaviour, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression. In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.

Content Validity

Content validity  is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that he or she thinks positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Criterion Validity

Criterion validity  is the extent to which people’s scores on a measure are correlated with other variables (known as  criteria ) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people’s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people’s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People’s scores on this measure should be correlated with their participation in “extreme” activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity ; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

Criteria can also include other measures of the same construct. For example, one would expect new measures of test anxiety or physical risk taking to be positively correlated with existing measures of the same constructs. This is known as convergent validity .

Assessing convergent validity requires collecting data using the measure. Researchers John Cacioppo and Richard Petty did this when they created their self-report Need for Cognition Scale to measure how much people value and engage in thinking (Cacioppo & Petty, 1982) [1] . In a series of studies, they showed that people’s scores were positively correlated with their scores on a standardized academic achievement test, and that their scores were negatively correlated with their scores on a measure of dogmatism (which represents a tendency toward obedience). In the years since it was created, the Need for Cognition Scale has been used in literally hundreds of studies and has been shown to be correlated with a wide variety of other variables, including the effectiveness of an advertisement, interest in politics, and juror decisions (Petty, Briñol, Loersch, & McCaslin, 2009) [2] .

Discriminant Validity

Discriminant validity , on the other hand, is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.

When they created the Need for Cognition Scale, Cacioppo and Petty also provided evidence of discriminant validity by showing that people’s scores were not correlated with certain other variables. For example, they found only a weak correlation between people’s need for cognition and a measure of their cognitive style—the extent to which they tend to think analytically by breaking ideas into smaller parts or holistically in terms of “the big picture.” They also found no correlation between people’s need for cognition and measures of their test anxiety and their tendency to respond in socially desirable ways. All these low correlations provide evidence that the measure is reflecting a conceptually distinct construct.

Key Takeaways

  • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.
  • Practice: Ask several friends to complete the Rosenberg Self-Esteem Scale. Then assess its internal consistency by making a scatterplot to show the split-half correlation (even- vs. odd-numbered items). Compute Pearson’s  r too if you know how.
  • Discussion: Think back to the last college exam you took and think of the exam as a psychological measure. What construct do you think it was intended to measure? Comment on its face and content validity. What data could you collect to assess its reliability and criterion validity?
  • Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42 , 116–131. ↵
  • Petty, R. E, Briñol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle (Eds.), Handbook of individual differences in social behaviour (pp. 318–329). New York, NY: Guilford Press. ↵

The consistency of a measure.

The consistency of a measure over time.

The consistency of a measure on the same group of people at different times.

Consistency of people’s responses across the items on a multiple-item measure.

Method of assessing internal consistency through splitting the items into two sets and examining the relationship between them.

A statistic in which α is the mean of all possible split-half correlations for a set of items.

The extent to which different observers are consistent in their judgments.

The extent to which the scores from a measure represent the variable they are intended to.

The extent to which a measurement method appears to measure the construct of interest.

The extent to which a measure “covers” the construct of interest.

The extent to which people’s scores on a measure are correlated with other variables that one would expect them to be correlated with.

In reference to criterion validity, variables that one would expect to be correlated with the measure.

When the criterion is measured at the same time as the construct.

when the criterion is measured at some point in the future (after the construct has been measured).

When new measures positively correlate with existing measures of the same constructs.

The extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

research instrument reliability

Validity and Reliability of the Research Instrument; How to Test the Validation of a Questionnaire/Survey in a Research

9 Pages Posted: 31 Jul 2018

Hamed Taherdoost

Hamta Group

Date Written: August 10, 2016

Questionnaire is one of the most widely used tools to collect data in especially social science research. The main objective of questionnaire in research is to obtain relevant information in most reliable and valid manner. Thus the accuracy and consistency of survey/questionnaire forms a significant aspect of research methodology which are known as validity and reliability. Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). This review article explores and describes the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability tests.

Keywords: Research Instrument, Questionnaire, Survey, Survey Validity, Questionnaire Reliability, Content Validity, Face Validity, Construct Validity, Criterion Validity

Suggested Citation: Suggested Citation

Hamed Taherdoost (Contact Author)

Hamta group ( email ).

Vancouver Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, social sciences education ejournal.

Subscribe to this fee journal for more curated articles on this topic

Political Behavior: Voting & Public Opinion eJournal

Political methods: experiments & experimental design ejournal.

IACE logo

Measuring Reliability and Validity of Evaluation Instruments

How do you know if your evaluation instrument is “good”? Or if the instrument you find on CSEdResearch.org is a decent one to use in your study?

Evaluation instruments (like surveys, questionnaires, and interview protocols) can go through their own evaluation to assess whether or not they have evidence of reliability or validity. In the Filters section on the Evaluation Instruments page, you can find a category called Assessed where you can include instruments in your search that have been previously shown to have evidence of reliability and validity. So, what do these measures mean? And, what is the difference between them?

Evaluation instruments are often designed to measure the impact of outreach activities, curriculum, and other interventions in computing education. But how do you know if these evaluation instruments actually measure what they say they are measuring? We gain confidence in these instruments by assessing evidence of their reliability and validity.

Instruments with evidence of reliability yield the same results each time they are administered. Let’s say that you created an evaluation instrument in computing education research, and you gave it to the same group of high school students four times at (nearly) the same time. If the instrument was reliable, you would expect that the results of these tests to be the same, statistically speaking.

Instruments with evidence of validity are those that have been checked in one or more ways to determine whether or not the instrument measures what it is supposed to measure. So, if your instrument is designed to measure whether or not parental support of high school students taking computer science courses is positively correlated with their grades in these courses, then statistical tests and other steps can be taken to ensure that the instrument does exactly that.

Those are still very broad definitions. Let’s break it down some more. But before we do, there is one very important caveat.

Evidence of reliability and/or validity are assessed for a specified particular demographic in a particular setting. Using an instrument that has evidence for reliability and/or validity does not mean that the evidence applies to your usage of the instrument. It can provide, however, a greater measure of confidence than an instrument that has no evidence of validity or reliability. And, if you are able to find an instrument that has evidence of validity with a population similar to your own (e.g. Hispanic students in an urban middle school), this can provide even greater confidence.

Now, let’s take a look at what each of these terms mean and how they can be measured.

Select here to go to next page to learn about Reliability.

research instrument reliability

Privacy Overview

CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Penn State University Libraries

Educational and psychological instruments.

  • Welcome and Finding a Few Questions Fast
  • Identifying More Instruments for Your Research Project
  • Reliability, Validity, and Ethics
  • Getting Copies of Instruments
  • Penn State Instrument Collections
  • Instrument Design and Assistance

Contact the librarian at your campus for more help!

Ellysa Cahoy

University Park / World Campus: Ellysa Cahoy ([email protected] or 814-865-9696)

research instrument reliability

Harrisburg / University Park / World Campus: Bernadette Lear ([email protected] or 717-948-6360)

Librarians at additional locations

Reliability and Validity

How do you know whether you've found a “good” or “bad” instrument? Is the instrument well-designed?

Researchers often discuss the "reliability" and “validity” of instruments, rather than whether they are “good” or “bad.” According to this video  and other resources from Sage Research Methods Core , reliability is about the consistency of test results. Validity is about whether test results represent what they are supposed to represent.

At this point, the library doesn’t have staff with expertise to recommend or evaluate instruments. So, please contact your professor.

Using Instruments Ethically

If you find a copy of an instrument, can you just go ahead and use it?

No. Some instruments can only be purchased, administered, or interpreted by a licensed or certified professional.

Even if you are qualified, there are other things you may need to do first. These include, but are not limited to:

  • Talking with your professor about whether the instrument is appropriate for your project.
  • Getting Penn State IRB training and approval for your project.
  • Getting the author’s/publisher’s permission to use the instrument.
  • Getting any training or certification that is required to administer the instrument properly.
  • Recruiting test subjects in a proper and ethical manner.
  • Finding an appropriate environment to test them.
  • Making arrangements for storing and analyzing your data.

Always consult with your professor about the design of your research project, before you undertake it!

Regulations, Policies, and Guidelines

CAUTION: Various government regulations, professional codes, and institutional policies determine how educational/psychological testing must be conducted. Below are only some of the documents and agencies that pertain to Penn State faculty and students:

  • Protection of Human Subjects (45 CFR Part 46), U.S. Department of Health and Human Services
  • Common Rule (Federal Policy for the Protection of Human Subjects)
  • Standards for Educational and Psychological Testing
  • American Educational Research Association Code of Ethics
  • Professional Psychologists Practice Act (63 Pennsylvania Statutes §§ 1201-1218), State of Pennsylvania
  • State Board of Psychology (49 Pennsylvania Code, Chapter 41), State of Pennsylvania
  • Research Administration Policies and Guidelines, Penn State
  • Office of Research Protections, U.S. Department of Health and Human Services
  • National Council on Measurement in Education
  • Science Directorate, American Psychological Association
  • State Board of Psychology, State of Pennsylvania
  • Office for Research Protections, Penn State
  • << Previous: Identifying More Instruments for Your Research Project
  • Next: Getting Copies of Instruments >>
  • Last Updated: Jan 5, 2024 5:10 PM
  • URL: https://guides.libraries.psu.edu/edpsychinstruments

Encyclopedia

  • Scholarly Community Encyclopedia
  • Log in/Sign up

research instrument reliability

Version Summary Created by Modification Content Size Created at Operation
1 + 1483 word(s) 1483 2022-03-10 03:27:19 |
2 formating -48 word(s) 1435 2022-03-14 03:32:50 | |
3 reference Meta information modification 1435 2022-03-15 07:42:16 |

Video Upload Options

  • MDPI and ACS Style
  • Chicago Style

Questionnaire is one of the most widely used tools to collect data in especially social science research. The main objective of questionnaire in research is to obtain relevant information in most reliable and valid manner. Thus the accuracy and consistency of survey/questionnaire forms a significant aspect of research methodology which are known as validity and reliability. Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). 

1. Introduction

Validity explains how well the collected data covers the actual area of investigation [ 1 ] . Validity basically means “measure what is intended to be measured” [ 2 ] .

2. Face Validity

Face validity is a subjective judgment on the operationalization of a construct. Face validity is the degree to which a measure appears to be related to a specific construct, in the judgment of non-experts such as test takers and representatives of the legal system. That is, a test has face validity if its content simply looks relevant to the person taking the test. It evaluates the appearance of the questionnaire in terms of feasibility, readability, consistency of style and formatting, and the clarity of the language used.

In other words, face validity refers to researchers’ subjective assessments of the presentation and relevance of the measuring instrument as to whether the items in the instrument appear to be relevant, reasonable, unambiguous and clear [ 3 ] .

In order to examine the face validity, the dichotomous scale can be used with categorical option of “Yes” and “No” which indicate a favourable and unfavourable item respectively. Where favourable item means that the item is objectively structured and can be positively classified under the thematic category. Then the collected data is analysed using Cohen’s Kappa Index (CKI) in determining the face validity of the instrument. DM. et al. [ 4 ] recommended a minimally acceptable Kappa of 0.60 for inter-rater agreement. Unfortunately, face validity is arguably the weakest form of validity and many would suggest that it is not a form of validity in the strictest sense of the word. 

3. Content Validity

Content validity is defined as “the degree to which items in an instrument reflect the content universe to which the instrument will be generalized” (Straub, Boudreau et al. [ 5 ] ). In the field of IS, it is highly recommended to apply content validity while the new instrument is developed. In general, content validity involves evaluation of a new survey instrument in order to ensure that it includes all the items that are essential and eliminates undesirable items to a particular construct domain [ 6 ] ). The judgemental approach to establish content validity involves literature reviews and then follow-ups with the evaluation by expert judges or panels. The procedure of judgemental approach of content validity requires researchers to be present with experts in order to facilitate validation. However it is not always possible to have many experts of a particular research topic at one location. It poses a limitation to conduct validity on a survey instrument when experts are located in different geographical areas  (Choudrie and Dwivedi [ 7 ] ). Contrastingly, a quantitative approach may allow researchers to send content validity questionnaires to experts working at different locations, whereby distance is not a limitation. In order to apply content validity following steps are followed:

1.An exhaustive literature reviews to extract the related items.

2.A content validity survey is generated (each item is assessed using three point scale (not necessary, useful but not essential and essential).

3.The survey should sent to the experts in the same field of the research.

4.The content validity ratio (CVR) is then calculated for each item by employing Lawshe [ 8 ] (1975) ‘s method.

5.Items that are not significant at the critical level are eliminated. In following the critical level of Lawshe method is explained.

4. Construct Validity

If a relationship is causal, what are the particular cause and effect behaviours or constructs involved in the relationship? Construct validity refers to how well you translated or transformed a concept, idea, or behaviour that is a construct into a functioning and operating reality, the operationalization. Construct validity has two components: convergent and discriminant validity.

4.1 Discriminant Validity

Discriminant validity is the extent to which latent variable A discriminates from other latent variables (e.g., B, C, D). Discriminant validity means that a latent variable is able to account for more variance in the observed variables associated with it than a) measurement error or similar external, unmeasured influences; or b) other constructs within the conceptual framework. If this is not the case, then the validity of the individual indicators and of the construct is questionable (Fornell and Larcker [ 9 ] ). In brief, Discriminant validity (or divergent validity) tests that constructs that should have no relationship do, in fact, not have any relationship.

4.2 Convergent Validity

Convergent validity, a parameter often used in sociology, psychology, and other behavioural sciences, refers to the degree to which two measures of constructs that theoretically should be related, are in fact related.  In brief, Convergent validity tests that constructs that are expected to be related are, in fact, related.

With the purpose of verifying the construct validity (discriminant and convergent validity), a factor analysis can be conducted utilizing principal component analysis (PCA) with varimax rotation method (Koh and Nam [ 9 ] , Wee and Quazi, [ 10 ] ). Items loaded above 0.40, which is the minimum recommended value in research are considered for further analysis. Also, items cross loading above 0.40 should be deleted. Therefore, the factor analysis results will satisfy the criteria of construct validity including both the discriminant validity (loading of at least 0.40, no cross-loading of items above 0.40) and convergent validity (eigenvalues of 1, loading of at least 0.40, items that load on posited constructs) (Straub et al., [ 11 ] ). There are also other methods to test the convergent and discriminant validity.

5. Criterion Validity

Criterion or concrete validity is the extent to which a measure is related to an outcome.  It measures how well one measure predicts an outcome for another measure. A test has this type of validity if it is useful for predicting performance or behavior in another situation (past, present, or future).

Criterion validity is an alternative perspective that de-emphasizes the conceptual meaning or interpretation of test scores. Test users might simply wish to use a test to differentiate between groups of people or to make predictions about future outcomes. For example, a human resources director might need to use a test to help predict which applicants are most likely to perform well as employees. From a very practical standpoint, she focuses on the test’s ability to differentiate good employees from poor employees. If the test does this well, then the test is “valid” enough for her purposes. From the traditional three-faceted view of validity, criterion validity refers to the degree to which test scores can predict specific criterion variables. The key to validity is the empirical association between test scores and scores on the relevant criterion variable, such as “job performance.”

Messick [ 12 ] suggests that “even for purposes of applied decision making, reliance on criterion validity or content coverage is not enough. The meaning of the measure, and hence its construct validity, must always be pursued – not only to support test interpretation but also to justify test use”. There are two types of criterion validity namely; concurrent validity, predictive and postdictive validity.

6. Reliability

Reliability concerns the extent to which a measurement of a phenomenon provides stable and consist result (Carmines and Zeller [ 13 ] ). Reliability is also concerned with repeatability. For example, a scale or test is said to be reliable if repeat measurement made by it under constant conditions will give the same result (Moser and Kalton [ 14 ] ).

Testing for reliability is important as it refers to the consistency across the parts of a measuring instrument (Huck [ 15 ] ). A scale is said to have high internal consistency reliability if the items of a scale “hang together” and measure the same construct (Huck [ 16 ] Robinson [ 17 ] ). The most commonly used internal consistency measure is the Cronbach Alpha coefficient. It is viewed as the most appropriate measure of reliability when making use of Likert scales (Whitley [ 18 ] , Robinson [ 19 ] ). No absolute rules exist for internal consistencies, however most agree on a minimum internal consistency coefficient of .70 (Whitley [ 20 ] , Robinson [ 21 ] ).

For an exploratory or pilot study, it is suggested that reliability should be equal to or above 0.60 (Straub et al. [ 22 ] ). Hinton et al. [ 23 ] have suggested four cut-off points for reliability, which includes excellent reliability (0.90 and above), high reliability (0.70-0.90), moderate reliability (0.50-0.70) and low reliability (0.50 and below)(Hinton et al., [ 24 ] ). Although reliability is important for study, it is not sufficient unless combined with validity. In other words, for a test to be reliable, it also needs to be valid [ 25 ] .

  • ACKOFF, R. L. 1953. The Design of Social Research, Chicago, University of Chicago Press.
  • BARTLETT, J. E., KOTRLIK, J. W. & HIGGINS, C. C. 2001. Organizational research: determining appropriate sample size in survey research. Learning and Performance Journal, 19, 43-50.
  • BOUDREAU, M., GEFEN, D. & STRAUB, D. 2001. Validation in IS research: A state-of-the-art assessment. MIS Quarterly, 25, 1-24.
  • BREWETON, P. & MILLWARD, L. 2001. Organizational Research Methods, London, SAGE.
  • BROWN, G. H. 1947. A comparison of sampling methods. Journal of Marketing, 6, 331-337.
  • BRYMAN, A. & BELL, E. 2003. Business research methods, Oxford, Oxford University Press.
  • CARMINES, E. G. & ZELLER, R. A. 1979. Reliability and Validity Assessment, Newbury Park, CA, SAGE.
  • CHOUDRIE, J. & DWIVEDI, Y. K. Investigating Broadband Diffusion in the Household: Towards Content Validity and Pre-Test of the Survey Instrument. Proceedings of the 13th European Conference on Information Systems (ECIS 2005), May 26-28, 2005 2005 Regensburg, Germany.
  • DAVIS, D. 2005. Business Research for Decision Making, Australia, Thomson South-Western.
  • DM., G., DP., H., CC., C., CL., S. & ., P. B. 1975. The effects of instructional prompts and praise on children's donation rates. Child Development 46, 980-983.
  • ENGELLANT, K., HOLLAND, D. & PIPER, R. 2016. Assessing Convergent and Discriminant Validity of the Motivation Construct for the Technology Integration Education (TIE) Model. Journal of Higher Education Theory and Practice 16, 37-50.
  • FIELD, A. P. 2005. Discovering Statistics Using SPSS, Sage Publications Inc.
  • FORNELL, C. & LARCKER, D. F. 1981. Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18, 39-50.
  • FOWLER, F. J. 2002. Survey research methods, Newbury Park, CA, SAGE.
  • GHAURI, P. & GRONHAUG, K. 2005. Research Methods in Business Studies, Harlow, FT/Prentice Hall.
  • GILL, J., JOHNSON, P. & CLARK, M. 2010. Research Methods for Managers, SAGE Publications.
  • HINTON, P. R., BROWNLOW, C., MCMURRAY, I. & COZENS, B. 2004. SPSS explained, East Sussex, England, Routledge Inc.
  • HUCK, S. W. 2007. Reading Statistics and Research, United States of America, Allyn & Bacon.
  • KOH, C. E. & NAM, K. T. 2005. Business use of the internet: a longitudinal study from a value chain perspective. Industrial Management & Data Systems, 105 85-95.
  • LAWSHE, C. H. 1975. A quantitative approach to content validity. Personnel Psychology, 28, 563-575.
  • LEWIS, B. R., SNYDER, C. A. & RAINER, K. R. 1995. An empirical assessment of the Information Resources Management construct. Journal of Management Information Systems, 12, 199-223.
  • MALHOTRA, N. K. & BIRKS, D. F. 2006. Marketing Research: An Applied Approach, Harlow, FT/Prentice Hall.
  • MAXWELL, J. A. 1996. Qualitative Research Design: An Intractive Approach London, Applied Social Research Methods Series.
  • MESSICK, S. 1989. Validity. In: LINN, R. L. (ed.) Educational measurement. New York: Macmillan.
  • MOSER, C. A. & KALTON, G. 1989. Survey methods in social investigation, Aldershot, Gower.

encyclopedia

  • Terms and Conditions
  • Privacy Policy
  • Advisory Board

research instrument reliability

Research-Methodology

Research Reliability

Reliability refers to whether or not you get the same answer by using an instrument to measure something more than once. In simple terms, research reliability is the degree to which research method produces stable and consistent results.

A specific measure is considered to be reliable if its application on the same object of measurement number of times produces the same results.

Research reliability can be divided into three categories:

1. Test-retest reliability relates to the measure of reliability that has been obtained by conducting the same test more than one time over period of time with the participation of the same sample group.

Example: Employees of ABC Company may be asked to complete the same questionnaire about   employee job satisfaction two times with an interval of one week, so that test results can be compared to assess stability of scores.

Research Reliability

2. Parallel forms reliability relates to a measure that is obtained by conducting assessment of the same phenomena with the participation of the same sample group via more than one assessment method.

Example: The levels of employee satisfaction of ABC Company may be assessed with questionnaires, in-depth interviews and focus groups and results can be compared.

Research Reliability

3. Inter-rater reliability as the name indicates relates to the measure of sets of results obtained by different assessors using same methods. Benefits and importance of assessing inter-rater reliability can be explained by referring to subjectivity of assessments.

Example: Levels of employee motivation at ABC Company can be assessed using observation method by two different assessors, and inter-rater reliability relates to the extent of difference between the two assessments.

Research Reliability

4. Internal consistency reliability is applied to assess the extent of differences within the test items that explore the same construct produce similar results. It can be represented in two main formats.

a) average inter-item correlation is a specific form of internal consistency that is obtained by applying the same construct on each item of the test

b) split-half reliability as another type of internal consistency reliability involves all items of a test to be ‘spitted in half’.

Research Reliability

My e-book,  The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance  offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Research Reliability

research instrument reliability

  • Get new issue alerts Get alerts
  • Submit a Manuscript

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Principles and Methods of Validity and Reliability Testing of Questionnaires Used in Social and Health Science Researches

Bolarinwa, Oladimeji Akeem

From the Department of Epidemiology and Community Health, University of Ilorin and University of Ilorin Teaching Hospital, Ilorin, Nigeria

Address for correspondence: Dr. Oladimeji Akeem Bolarinwa, E-mail: [email protected]

The importance of measuring the accuracy and consistency of research instruments (especially questionnaires) known as validity and reliability, respectively, have been documented in several studies, but their measure is not commonly carried out among health and social science researchers in developing countries. This has been linked to the dearth of knowledge of these tests. This is a review article which comprehensively explores and describes the validity and reliability of a research instrument (with special reference to questionnaire). It further discusses various forms of validity and reliability tests with concise examples and finally explains various methods of analysing these tests with scientific principles guiding such analysis.

INTRODUCTION

The different measurements in social science research require quantification of abstracts, intangible and construct that may not be observable. [ 1 ] However, these quantification will come in the different forms of inference. In addition, the inferences made will depend on the type of measurement. [ 1 ] These can be observational, self-report, interview and record review. The various measurements will ultimately require measurement tools through which the values will be captured. One of the most common tasks often encountered in social science research is ascertaining the validity and reliability of a measurement tool. [ 2 ] The researchers always wish to know if the measurement tool employed actually measures the intended research concept or construct (is it valid? or true measures?) or if the measurement tools used to quantify the variables provide stable or consistent responses (is it reliable? or repeatable?). As simple as this may seems, it is often omitted or just mentioned passively in the research proposal or report. [ 2 ] This has been adduced to the dearth of skills and knowledge of validity and reliability test analysis among social and health science researchers. From the author's personal observation among researchers in developing countries, most students and young researchers are not able to distinguish validity from reliability. Likewise, they do not have the prerequisite to understand the principles that underline validity and reliability testing of a research measurement tool.

This article therefore sets out to review the principles and methods of validity and reliability measurement tools used in social and health science researches. To achieve the stated goal, the author reviewed currents articles (both print and online), scientific textbooks, lecture notes/presentations and health programme papers. This is with a view to critically review current principles and methods of reliability and validity tests as they are applicable to questionnaire use in social and health researches.

Validity expresses the degree to which a measurement measures what it purports to measure. Several varieties have been described, including face validity, construct validity, content validity and criterion validity (which could be concurrent and predictive validity). These validity tests are categorised into two broad components namely; internal and external validities. [ 3 ][ 4 ][ 5 ] Internal validity refers to how accurately the measures obtained from the research was actually quantifying what it was designed to measure whereas external validity refers to how accurately the measures obtained from the study sample described the reference population from which the study sample was drawn. [ 5 ]

Reliability refers to the degree to which the results obtained by a measurement and procedure can be replicated. [ 3 ][ 4 ][ 5 ] Though reliability importantly contributes to the validity of a questionnaire, it is however not a sufficient condition for the validity of a questionnaire. [ 6 ] Lack of reliability may arise from divergence between observers or instruments of measurement such as a questionnaire or instability of the attribute being measured [ 3 4 ] which will invariably affect the validity of such questionnaire. There are three aspects of reliability, namely: Equivalence, stability and internal consistency (homogeneity). [ 5 ] It is important to understand the distinction between these three aspects as it will guide the researcher on the proper assessment of reliability of a research tool such as questionnaire. [ 7 ] [ Figure 1 ] shows graphical presentation of possible combinations of validity and reliability. [ 8 ]

F1-1

Questionnaire is a predetermined set of questions used to collect data. [ 2 ] There are different formats of questionnaire such as clinical data, social status and occupational group.[ 3 ] It is a data collection ‘tool’ for collecting and recording information about a particular issue of interest. [ 2 5 ] It should always have a definite purpose that is related to the objectives of the research, and it needs to be clear from the outset on how the findings will be used. [ 2 5 ] Structured questionnaires are usually associated with quantitative research, which means research that is concerned with numbers (how many? how often? how satisfied?). It is the mostly used data collection instrument in health and social science research. [ 9 ]

In the context of health and social science research, questionnaires can be used in a variety of survey situations such as postal, electronic, face-to-face (F2F) and telephone. [ 9 ] Postal and electronic questionnaires are known as self-completion questionnaires, i.e., respondents complete them by themselves in their own time. F2F and telephone questionnaires are used by interviewers to ask a standard set of questions and record the responses that people give to them. [ 9 ] Questionnaires that are used by interviewers in this way are sometimes known as interview schedules. [ 9 ] It could be adapted from an already tested one or could be developed as a new data tool specific to measure or quantify a particular attribute. These conditions therefore warrant the need to test validity and reliability of questionnaire. [ 2 5 9 ]

METHODS USED FOR VALIDITY TEST OF A QUESTIONNAIRE

A drafted questionnaire should always be ready for establishing validity. Validity is the amount of systematic or built-in error in questionnaire. [ 5 9 ] Validity of a questionnaire can be established using a panel of experts which explore theoretical construct as shown in Figure 2 . This form of validity exploits how well the idea of a theoretical construct is represented in an operational measure (questionnaire). This is called a translational or representational validity. Two subtypes of validity belongs to this form namely; face validity and content validity. [ 10 ] On the other hand, questionnaire validity can be established with the use of another survey in the form of a field test and this examines how well a given measure relates to one or more external criterion, based on empirical constructs as shown in Figure 2 . These forms could be criterion-related validity [ 10 11 ] and construct validity. [ 11 ] While some authors believe that criterion-related validity encompasses construct validity, [ 10 ] others believe both are separate entities. [ 11 ] According to the authors who put the 2 as separate entities, predictive validity and concurrence validity are subtypes of criterion-related validity while convergence validity, discriminant validity, known-group validity and factorial validity are sub-types of construct validity [ Figure 2 ]. [ 10 ] In addition, some authors included hypothesis-testing validity as a form of construct validity. [ 12 ] The detailed description of the subtypes are described in the next paragraphs.

F2-1

FACE VALIDITY

Some authors [ 7 13 ] are of the opinion that face validity is a component of content validity while others believe it is not. [ 2 14 15 ] Face validity is established when an individual (and or researcher) who is an expert on the research subject reviewing the questionnaire (instrument) concludes that it measures the characteristic or trait of interest. [ 7 13 ] Face validity involves the expert looking at the items in the questionnaire and agreeing that the test is a valid measure of the concept which is being measured just on the face of it. [ 15 ] This means that they are evaluating whether each of the measuring items matches any given conceptual domain of the concept. Face validity is often said to be very casual, soft and many researchers do not consider this as an active measure of validity. [ 11 ] However, it is the most widely used form of validity in developing countries. [ 15 ]

CONTENT VALIDITY

Content validity pertains to the degree to which the instrument fully assesses or measures the construct of interest. [ 7 15 ][ 16 ][ 17 ] For example, a researcher is interested in evaluating employees' attitudes towards a training program on hazard prevention within an organisation. He wants to ensure that the questions (in the questionnaire) fully represent the domain of attitudes towards the occupational hazard prevention. The development of a content valid instrument is typically achieved by a rational analysis of the instrument by raters (experts) familiar with the construct of interest or experts on the research subject. [ 15 ][ 16 ][ 17 ] Specifically, raters will review all of the questionnaire items for readability, clarity and comprehensiveness and come to some level of agreement as to which items should be included in the final questionnaire. [ 15 ] The rating could be a dichotomous where the rater indicates whether an item is ‘favourable’ (which is assign a score of +1) or ‘unfavourable’ (which is assign score of +0). [ 15 ] Over the years however, different ratings have been proposed and developed. These could be in Likert scaling or absolute number ratings. [ 18 ][ 19 ][ 20 21 ] Item rating and scale level rating have been proposed for content validity. The item-rated content validity indices (CVI) are usually denoted as I-CVI. [ 15 ] While the scale-level CVI termed S-CVI will be calculated from I-CVI. [ 15 ] S-CVI means the level of agreement between raters. Sangoseni et al. [ 15 ] proposed a S-CVI of ≥0.78 as significant level for inclusion of an item into the study. The Fog Index, Flesch Reading Ease, Flesch-Kincaid readability formula and Gunning-Fog Index are formulas that have also been used to determine readability in validity. [ 7 12 ] Major drawback of content validity is that it is also adjudged to be highly subjective like face validity. However, in some cases, researchers could combine more than one form of validity to increase validity strength of the questionnaire. For instance, face validity has been combined with content validity [ 15 22 23 ] criterion validity. [ 13 ]

CRITERION-RELATED VALIDITY

Criterion-related validity is assessed when one is interested in determining the relationship of scores on a test to a specific criterion. [ 24 25 ] It is a measure of how well questionnaire findings stack up against another instrument or predictor. [ 5 25 ] Its major disadvantage is that such predictor may not be available or easy to establish. There are 2 variants of this validity type as follows:

Concurrence

This assesses the newly developed questionnaire against a highly rated existing standard (gold standard). When the criterion exists at the same time as the measure, we talk about concurrent validity. [ 24 ][ 25 ][ 26 27 ] Concurrent validity refers to the ability of a test to predict an event in the present form. For instance, in a simplest form, a researcher may use questionnaire to elucidate diabetic patients' blood sugar level reading in the last hospital follow-up visits and compare this response to laboratory reading of blood glucose for such patient.

It assesses the ability of the questionnaire (instrument) to forecast future events, behaviour, attitudes or outcomes. This is assessed using correlation coefficient. Predictive validity is the ability of a test to measure some event or outcome in the future. [ 24 28 ] A good example of predictive validity is the use of hypertensive patients' questionnaire on medication adherence to medication to predict their future medical outcome such as systolic blood pressure control. [ 28 29 ]

CONSTRUCT VALIDITY

Construct validity is the degree to which an instrument measures the trait or theoretical construct that it is intended to measure. [ 5 16 30 ][ 31 ][ 32 ][ 33 ][ 34 ] It does not have a criterion for comparison rather it utilizes a hypothetical construct for comparison. [ 5 11 30 ][ 31 ][ 32 ][ 33 ][ 34 ] It is the most valuable and most difficult measure of validity. Basically, it is a measure of how meaningful the scale or instrument is when it is in practical use. [ 5 24 ] There are four types of evidence that can be obtained for the purpose of construct validity depending on the research problem, as discussed below:

Convergent validity

There is evidence that the same concept measured in different ways yields similar results. In this case, one could include two different tests. In convergent validity where different measures of the same concept yield similar results, a researcher uses self-report versus observation (different measures). [ 12 33 34 ][ 35 ][ 36 ] The 2 scenarios given below illustrate this concept.

Scenario one

A researcher could place meters on respondent's television (TV) sets to record the time that people spend with certain health programmes on TV. Then, this record can be compared with survey results on ‘exposure to health program on televised’ using questionnaire.

Scenario two

The researcher could send someone to observe respondent's TV use at their home and compare the observation results with the survey results using questionnaire.

Discriminant validity

There is evidence that one concept is different from other closely related concepts. [ 12 34 36 ] Using the scenarios of TV health programme exposure above, the researcher can decide to measure the exposure to TV entertainment programmes and determine if they differ from TV health programme exposure measures. In this case, the measures of exposure to TV health programme should not be highly related to the measures of exposure to TV entertainment programmes.

Known-group validity

In known-group validity, a group with already established attribute of the outcome of construct is compared with a group in whom the attribute is not yet established. [ 11 37 ] Since the attribute of the two groups of respondents is known, it is expected that the measured construct will be higher in the group with related attribute but lower in the group with unrelated attribute. [ 11 36 ][ 37 ][ 38 ] For example, in a survey that used questionnaire to explore depression among two groups of patients with clinical diagnosis of depression and those without. It is expected (in known-group validity) that the construct of depression in the questionnaire will be scored higher among the patients with clinically diagnosed depression than those without the diagnosis. Another example was shown in a study by Singh et al. [ 38 ] where cognitive interview study was conducted among school pupils in 6 European countries.

Factorial validity

This is an empirical extension of content validity. This is because it validates the contents of the construct employing the statistical model called factor analysis. [ 11 39 40 ][ 41 ][ 42 ] It is usually employed when the construct of interest is in many dimensions which form different domains of a general attribute. In the analysis of factorial validity, the several items put up to measure a particular dimension within a construct of interest is supposed to be highly related to one another than those measuring other dimensions. [ 11 39 40 ][ 41 ][ 42 ] For instance, using health-related quality of life questionnaire using short form - 36 version 2 (SF-36v2). This tool has 8 dimensions and it is therefore expected that all the items of SF-36v2 questionnaire measuring social function (SF), which is one of the 8 dimension, should be highly related than those items measuring mental health domain which measure another dimension. [ 43 ]

Hypothesis-testing validity

Evidence that a research hypothesis about the relationship between the measured concept (variable) or other concepts (variables), derived from a theory, is supported. [ 12 44 ] In the case of TV viewing, for example, there is a social learning theory stating how violent behaviour can be learned from observing and modelling televised physical violence. From this theory, we could derive a hypothesis stating a positive correlation between physical aggression and the amount of televised physical violence viewing. If the evidence collected supports the hypothesis, we can conclude that there is a high degree of construct validity in the measurements of physical aggression and viewing of televised physical violence since the two theoretical concepts are measured and examined in the hypothesis-testing process.

METHODS USED FOR RELIABILITY TEST OF A QUESTIONNAIRE

Reliability is an extent to which a questionnaire, test, observation or any measurement procedure produces the same results on repeated trials. In short, it is the stability or consistency of scores over time or across raters. [ 7 ] Keep in mind that reliability pertains to scores not people. Thus, in research, one would never say that someone was reliable. As an example, consider judges in a platform diving competition. The extent to which they agree on the scores for each contestant is an indication of reliability. Similarly, the degree to which an individual's responses (i.e., their scores) on a survey would stay the same over time is also a sign of reliability. [ 7 ] It is worthy to note that lack of reliability may arise from divergences between observers or instruments of measurement or instability of the attribute being measured. [ 3 ] Reliability of the questionnaire is usually carried out using a pilot test. Reliability could be assessed in three major forms; test-retest reliability, alternate-form reliability and internal consistency reliability. These are discussed below.

TEST-RETEST RELIABILITY (OR STABILITY)

Test-retest correlation provides an indication of stability over time. [ 5 12 27 37 ] This aspect of reliability or stability is said to occur when the same or similar scores are obtained with repeated testing with the same group of respondents. [ 5 25 35 37 ] In other words, the scores are consistent from 1 time to the next. Stability is assessed through a test-retest procedure that involves administering the same measurement instrument such as questionnaire to the same individuals under the same conditions after some period of time. It is the most common form in surveys for reliability test of questionnaire.

Test-rest reliability is estimated with correlations between the scores at time 1 and those at time 2 (to time x ). Two assumptions underlie the use of the test-retest procedure; [ 12 ]

  • The first required assumption is that the characteristic that is measured does not change over the time period called ‘testing effect’ [ 11 ]
  • The second assumption is that the time period is long enough yet short in time that the respondents' memories of taking the test at time 1 do not influence their scores at time 2 and subsequent test administrations called ‘memory effect’.

It is measured by having the same respondents complete a survey at two different points in time to see how stable the responses are. In general, correlation coefficient ( r ) values are considered good if r ≥ 0.70. [ 38 45 ]

If data are recorded by an observer, one can have the same observer make two separate measurements. The comparison between the two measurements is intra-observer reliability. In using this form of reliability, one needs to be careful with questionnaire or scales that measure variables which are likely to change over a short period of time, such as energy, happiness and anxiety because of maturation effect. [ 24 ] If the researcher has to use such variables, then he has to make sure that test-retest is done over very short periods of time. Potential problem with test-retest in practice effect is that the individuals become familiar with the items and simply answer based on their memory of the last answer. [ 45 ]

ALTERNATE-FORM RELIABILITY (OR EQUIVALENCE)

Alternate form refers to the amount of agreement between two or more research instruments such as two different questionnaires on a research construct that are administered at nearly the same point in time. [ 7 ] It is measured through a parallel form procedure in which one administers alternative forms of the same measure to either the same group or different group of respondents. It uses differently worded questionnaire to measure the same attribute or construct. [ 45 ] Questions or responses are reworded or their order is changed to produce two items that are similar but not identical. This administration of the various forms occurs at the same time or following some time delay. The higher the degree of correlation between the two forms, the more equivalent they are. In practice, the parallel forms procedure is seldom implemented, as it is difficult, if not impossible, to verify that two tests are indeed parallel (i.e., have equal means, variances and correlations with other measures). Indeed, it is difficult enough to have one well-developed instrument or questionnaire to measure the construct of interest let alone two. [ 7 ]

Another situation in which equivalence will be important is when the measurement process entails subjective judgements or ratings being made by more than one person. [ 5 7 ] Say, for example, that we are a part of a research team whose purpose is to interview people concerning their attitudes towards health educational curriculum for children. It should be self-evident to the researcher that each rater should apply the same standards towards the assessment of the responses. The same can be said for a situation in which multiple individuals are observing health behaviour. The observers should agree as to what constitutes the presence or absence of a particular health behaviour as well as the level to which the behaviour is exhibited. In these scenarios, equivalence is demonstrated by assessing inter-observer reliability which refers to the consistency with which observers or raters make judgements. [ 7 ]

The procedure for determining inter-observer reliability is:

No of agreements/no of opportunities for agreement ×100.

Thus, in a situation in which raters agree in a total of 75 times out of 90 opportunities (i.e. unique observations or ratings) produces 83% agreement that is 75/90 = 0.83 × 100 = 83%.

INTERNAL CONSISTENCY RELIABILITY (OR HOMOGENEITY)

Internal consistency concerns the extent to which items on the test or instrument are measuring the same thing. The appeal of an internal consistency index of reliability is that it is estimated after only one test administration and therefore avoids the problems associated with testing over multiple time periods. [ 5 ] Internal consistency is estimated via the split-half reliability index [ 5 ] and coefficient alpha index [ 22 ][ 23 ][ 25 ][ 37 ][ 42 46 47 ][ 48 ][ 49 ] which is the most common used form of internal consistency reliability. Sometimes, Kuder-Richardson formula 20 (KR-20) index was used. [ 7 50 ]

The split-half estimate entails dividing up the test into two parts (e.g. odd/even items or first half of the items/second half of the items), administering the two forms to the same group of individuals and correlating the responses. [ 7 10 ] Coefficient alpha and KR-20 both represent the average of all possible split-half estimates. The difference between the two is when they would be used to assess reliability. Specifically, coefficient alpha is typically used during scale development with items that have several response options (i.e., 1 = strongly disagree to 5 = strongly agree) whereas KR-20 is used to estimate reliability for dichotomous (i.e., yes/no; true/false) response scales. [ 7 ]

The formula to compute KR-20 is:

KR-20 = n /( n − 1)[1 − Sum(piqi)/Var(X)].

n = Total number of items

Sum(piqi) = Sum of the product of the probability of alternative responses

Var(X) = Composite variance.

And to calculate coefficient alpha ( a ) by Allen and Yen, 1979: [ 51 ]

a = n /( n − 1)[1 − Sum Var (Yi)/Var (X)].

Where n = Number of items

Sum Var(Yi) = Sum of item variances

It should be noted that KR-20 and Cronbach alpha can easily be estimated using several statistical analysis software these days. Therefore, researchers do not have to go through the laborious exercise of memorising the mathematical formula given above. As a rule of thumb, the higher the reliability value, the more reliable the measure. The general convention in research has been prescribed by Nunnally and Bernstein, [ 52 ] which states that one should strive for reliability values of 0.70 or higher. It is worthy of note that reliability values increase as test length increases. [ 53 ] That is, the more items we have in our scale to measure the construct of interest, the more reliable our scale will become. However, the problem with simply increasing the number of scale items when performing applied research is that respondents are less likely to participate and answer completely when confronted with the prospect of replying to a lengthy questionnaire. [ 7 ] Therefore, the best approach is to develop a scale that completely measures the construct of interest and yet does so in as parsimonious or economical manner as is possible. A well-developed yet brief scale may lead to higher levels of respondent participation and comprehensiveness of responses so that one acquires a rich pool of data with which to answer the research question.

SHORT NOTE ON SPSS AND RELIABILITY TEST

Reliability can be established using a pilot test by collecting data from 20 to 30 subjects not included in the sample. Data collected from pilot test can be analysed using SPSS (Statistical Package for Social Sciences, by IBM incorporated) or any other related software. SPSS provides two key pieces of information in the output viewer. These are ‘correlation matrix’ and ‘view alpha if item deleted’ columns. [ 54 55 ] Cronbach alpha ( a ) is the most commonly used measure of internal consistency reliability [ 45 ] and so it will be discussed here. Conditions that could affect Cronbach values are [ 54 55 ]

  • Numbers of items; scale of <10 variables could cause Cronbach alpha to be low
  • Distribution of score; normality increases Cronbach alpha value while skewed data reduces it
  • Timing; Cronbach alpha does not indicate the stability or consistency of the test over time
  • Wording of the items; negative-worded questionnaire should be reversed before scoring
  • Items with 0, 1 and negative scores: Ensure that items/statements that have 0 s, 1 s and negatives are eliminated.

The detailed step by step procedure for the reliability analysis using SPSS can be found on internet and standard tests. [ 54 55 ] But, note that the reliability coefficient (alpha) can range from 0 to 1, with 0 representing a questionnaire that is not reliable and 1 representing absolutely reliable questionnaire. A reliability coefficient (alpha) of 0.70 or higher is considered acceptable reliability in SPSS.

This article reviewed validity and reliability of questionnaire as an important research tool in social and health science research. The article observed the importance of validity and reliability tests in research and gave both literary and technical meanings of these tests. Various forms and methods of analysing validity and reliability of questionnaire were discussed with the main aim of improving the skills and knowledge of these tests among researchers in developing countries.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Cited Here |
  • Google Scholar
  • PubMed | CrossRef |
  • View Full Text | PubMed | CrossRef |

Questionnaire; reliability; social and health; validity

  • + Favorites
  • View in Gallery

Readers Of this Article Also Read

Blood pressure pattern and prevalence of hypertension amongst apparently..., individual-level predictors of birth preparedness and complication readiness:..., coronavirus disease 2019 vaccination coverage and seropositivity amongst..., vaccine safety: assessing the prevalence and severity of adverse events..., serum copper, zinc and selenium levels in women with unexplained infertility in ....

Reliability and Validity

Reliability and validity are important aspects of selecting a survey instrument.  Reliability refers to the extent that the instrument yields the same results over multiple trials.  Validity refers to the extent that the instrument measures what it was designed to measure.  In research, there are three ways to approach validity and they include content validity, construct validity, and criterion-related validity.

Content validity measures the extent to which the items that comprise the scale accurately represent or measure the information that is being assessed.  Are the questions that are asked representative of the possible questions that could be asked?

Construct validity measures what the calculated scores mean and if they can be generalized.  Construct validity uses statistical analyses, such as correlations , to verify the relevance of the questions.  Questions from an existing, similar instrument, that has been found reliable, can be correlated with questions from the instrument under examination to determine if construct validity is present.  If the scores are highly correlated it is called convergent validity.  If convergent validity exists, construct validity is supported.

Criterion-related validity has to do with how well the scores from the instrument predict a known outcome they are expected to predict.  Statistical analyses, such as correlations, are used to determine if criterion-related validity exists.  Scores from the instrument in question should be correlated with an item they are known to predict.  If a correlation of > .60 exists, criterion related validity exists as well.

Reliability can be assessed with the test-retest method, alternative form method, internal consistency method, the split-halves method, and inter-rater reliability.

Need help with your analysis?

Schedule a time to speak with an expert using the calendar below.

Test-retest is a method that administers the same instrument to the same sample at two different points in time, perhaps one year intervals.  If the scores at both time periods are highly correlated, > .60, they can be considered reliable.  The alternative form method requires two different instruments consisting of similar content.  The same sample must take both instruments and the scores from both instruments must be correlated.  If the correlations are high, the instrument is considered reliable.  Internal consistency uses one instrument administered only once.  The coefficient alpha (or Cronbach’s alpha) is used to assess the internal consistency of the item.  If the alpha value is .70 or higher, the instrument is considered reliable.  The split-halves method also requires one test administered once.  The number of items in the scale are divided into halves and a correlation is taken to estimate the reliability of each half of the test.   To estimate the reliability of the entire survey, the Spearman-Brown correction must be applied.  Inter-rater reliability involves comparing the observations of two or more individuals and assessing the agreement of the observations.  Kappa values can be calculated in this instance.

Examples

Research Instrument

Ai generator.

research instrument reliability

A research instrument is a tool or device used by researchers to collect, measure, and analyze data relevant to their study. Common examples include surveys, questionnaires , tests, and observational checklists. These instruments are essential for obtaining accurate, reliable, and valid data, enabling researchers to draw meaningful conclusions and insights. The selection of an appropriate research instrument is crucial, as it directly impacts the quality and integrity of the research findings.

What is a Research Instrument?

A research instrument is a tool used by researchers to collect and analyze data. Examples include surveys, questionnaires, and observation checklists. Choosing the right instrument is essential for ensuring accurate and reliable data.

Examples of Research Instruments

Examples of Research Instruments

  • Surveys: Structured questionnaires designed to gather quantitative data from a large audience.
  • Questionnaires: Sets of written questions used to collect information on specific topics.
  • Interviews: Structured or semi-structured conversations used to obtain in-depth qualitative data.
  • Observation Checklists: Lists of specific behaviors or events that researchers observe and record.
  • Tests: Standardized exams used to assess knowledge, skills, or abilities.
  • Scales: Tools like Likert scales to measure attitudes, perceptions, or opinions.
  • Diaries: Participant logs documenting activities or experiences over time.
  • Focus Groups: Group discussions facilitated to explore collective views and experiences.

Examples of a Quantitative Research Instruments

  • Structured Surveys: These are detailed questionnaires with predefined questions and response options, designed to collect numerical data from a large sample. They are often used in market research and social sciences to identify trends and patterns.
  • Standardized Tests: These are assessments that measure specific knowledge, skills, or abilities using uniform procedures and scoring methods. Examples include IQ tests, academic achievement tests, and professional certification exams.
  • Closed-Ended Questionnaires: These questionnaires contain questions with a limited set of response options, such as multiple-choice or yes/no answers. They are useful for gathering specific, quantifiable data efficiently.
  • Rating Scales: These tools ask respondents to rate items on a fixed scale, such as 1 to 5 or 1 to 10. They are commonly used to measure attitudes, opinions, or satisfaction levels.
  • Structured Observation Checklists: These checklists outline specific behaviors or events that researchers observe and record in a systematic manner. They are often used in studies where direct observation is needed to gather quantitative data.
  • Statistical Data Collection Tools: These include various instruments and software used to collect and analyze numerical data, such as spreadsheets, databases, and statistical analysis programs like SPSS or SAS.
  • Likert Scales: A type of rating scale commonly used in surveys to measure attitudes or opinions. Respondents indicate their level of agreement or disagreement with a series of statements on a scale, such as “strongly agree” to “strongly disagree.”

Examples of a Qualitative Research Instruments

  • Open-Ended Interviews: These interviews involve asking participants broad, open-ended questions to explore their thoughts, feelings, and experiences in depth. This method allows for rich, detailed data collection.
  • Focus Groups: A small, diverse group of people engage in guided discussions to provide insights into their perceptions, opinions, and attitudes about a specific topic. Focus groups are useful for exploring complex behaviors and motivations.
  • Unstructured Observation: Researchers observe participants in their natural environment without predefined criteria, allowing them to capture spontaneous behaviors and interactions in real-time.
  • Case Studies: In-depth investigations of a single individual, group, event, or community. Case studies provide comprehensive insights into the subject’s context, experiences, and development over time.
  • Ethnographic Studies: Researchers immerse themselves in the daily lives of participants to understand their cultures, practices, and perspectives. This method often involves long-term observation and interaction.
  • Participant Diaries: Participants keep detailed, personal records of their daily activities, thoughts, and experiences over a specific period. These diaries provide firsthand insights into participants’ lives.
  • Field Notes: Researchers take detailed notes while observing participants in their natural settings. Field notes capture contextual information, behaviors, and interactions that are often missed in structured observations.
  • Narrative Analysis: This method involves analyzing stories and personal accounts to understand how people make sense of their experiences and the world around them.
  • Content Analysis: Researchers systematically analyze textual, visual, or a content to identify patterns, themes, and meanings. This method is often used for analyzing media, documents, and online content.
  • Document Analysis: Researchers review and interpret existing documents, such as reports, letters, or official records, to gain insights into the context and background of the research subject.

Characteristics of a Good Research Instrument

  • Validity: A good research instrument accurately measures what it is intended to measure. This ensures that the results are a true reflection of the concept being studied.
  • Reliability: The instrument produces consistent results when used repeatedly under similar conditions. This consistency is crucial for the credibility of the research findings.
  • Objectivity: The instrument should be free from researcher bias, ensuring that results are based solely on the data collected rather than subjective interpretations.
  • Sensitivity: The instrument is capable of detecting subtle differences or changes in the variable being measured, allowing for more nuanced and precise data collection.
  • Practicality: It is easy to administer, score, and interpret. This includes being time-efficient, cost-effective, and user-friendly for both researchers and participants.
  • Ethical Considerations: The instrument respects the rights and confidentiality of participants, ensuring informed consent and protecting their privacy throughout the research process.
  • Comprehensiveness: It covers all relevant aspects of the concept being studied, providing a complete and thorough understanding of the research topic.
  • Adaptability: The instrument can be modified or adapted for different contexts, populations, or research settings without losing its effectiveness.
  • Clarity: The questions or items in the instrument are clearly worded and unambiguous, ensuring that participants understand what is being asked without confusion.
  • Cultural Sensitivity: The instrument is appropriate for the cultural context of the participants, avoiding language or content that may be misinterpreted or offensive.

Research Instrument Questionnaire

A questionnaire is a versatile and widely used research instrument composed of a series of questions aimed at gathering information from respondents. It is designed to collect both quantitative and qualitative data through a mix of open-ended and closed-ended questions. Open-ended questions allow respondents to express their thoughts in their own words, providing rich, detailed insights, while closed-ended questions offer predefined response options, facilitating easier statistical analysis. Questionnaires can be administered in various formats, including paper-based, online, or via telephone, making them accessible to a wide audience and suitable for large-scale studies.

The design of a questionnaire is crucial to its effectiveness. Clear, concise, and unbiased questions are essential to ensure reliable and valid results. A well-crafted questionnaire minimizes respondent confusion and reduces the risk of biased answers, which can skew data. Moreover, the order and wording of questions can significantly impact the quality of the responses. Properly designed questionnaires are invaluable tools for a range of research purposes, from market research and customer satisfaction surveys to academic studies and social science research. They enable researchers to gather a broad spectrum of data efficiently and effectively, making them a cornerstone of data collection in many fields.

Research instrument Sample Paragraph

A research instrument is a vital tool used by researchers to collect, measure, and analyze data from participants. These instruments vary widely and include questionnaires, surveys, interviews, observation checklists, and standardized tests, each serving distinct research needs. For example, questionnaires and surveys are commonly employed to gather quantitative data from large groups, providing statistical insights into trends and patterns. In contrast, interviews and focus groups are used to delve deeper into participants’ experiences and perspectives, yielding rich qualitative data. The careful selection and design of a research instrument are crucial, as they directly impact the accuracy, reliability, and validity of the collected data,

How to Make Research Instrument

Creating an effective research instrument involves several key steps to ensure it accurately collects and measures the necessary data for your study:

1. Define the Research Objectives

  • Identify the Purpose : Clearly outline what you aim to achieve with your research.
  • Specify the Variables : Determine the specific variables you need to measure.

2. Review Existing Instruments

  • Literature Review : Look at existing studies and instruments used in similar research.
  • Evaluate Suitability : Assess if existing instruments can be adapted for your study.

3. Select the Type of Instrument

  • Choose the Format : Decide whether a survey, questionnaire, interview guide, test, or observation checklist best fits your needs.
  • Determine the Method : Consider whether your data collection will be qualitative, quantitative, or mixed-methods.

4. Develop the Content

  • Draft Questions or Items : Write questions that align with your research objectives and variables.
  • Ensure Clarity and Relevance : Make sure each question is clear, concise, and directly related to the research objectives.
  • Use Simple Language : Avoid jargon to ensure respondents understand the questions.

5. Validate the Instrument

  • Expert Review : Have experts in your field review the instrument for content validity.
  • Pilot Testing : Conduct a pilot test with a small, representative sample to identify any issues.

6. Refine the Instrument

  • Revise Based on Feedback : Modify the instrument based on feedback from experts and pilot testing.
  • Check for Reliability : Ensure the instrument consistently measures what it is supposed to.

7. Finalize the Instrument

  • Create Instructions : Provide clear instructions for respondents on how to complete the instrument.
  • Format Appropriately : Ensure the layout is user-friendly and the instrument is easy to navigate.

8. Implement and Collect Data

  • Administer the Instrument : Distribute your instrument to the target population.
  • Monitor Data Collection : Ensure the data collection process is conducted consistently.

FAQ’s

How do you choose a research instrument.

Select based on your research goals, type of data needed, and the target population.

What is the difference between qualitative and quantitative research instruments?

Qualitative instruments collect non-numerical data, while quantitative instruments collect numerical data.

Can you use multiple research instruments in one study?

Yes, using multiple instruments can provide a more comprehensive understanding of the research problem.

How do you ensure the reliability of a research instrument?

Test the instrument multiple times under the same conditions to check for consistent results.

What is the validity of a research instrument?

Validity refers to how well an instrument measures what it is intended to measure.

How can you test the validity of a research instrument?

Use methods like content validity, criterion-related validity, and construct validity to test an instrument.

What is a pilot study?

A pilot study is a small-scale trial run of a research instrument to identify any issues before the main study.

Why is a pilot study important?

It helps refine the research instrument and improve its reliability and validity.

What is an unstructured interview?

An unstructured interview allows more flexibility, with open-ended questions that can adapt based on responses.

What is the role of observation in research?

Observation allows researchers to collect data on behaviors and events in their natural settings.

Twitter

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

  • DOI: 10.1111/jan.16252
  • Corpus ID: 270205330

Emergency department triage decision-making by registered nurses: An instrument development study.

  • G. Reay , James A Rankin , +2 authors Lorraine Smith-MacDonald
  • Published in Journal of Advanced Nursing 2 June 2024

27 References

Triage emergency nurse decision-making: incidental findings from a focus group study., using the five-level taiwan triage and acuity scale computerized system: factors in decision making by emergency department triage nurses, an educational framework for triage nursing based on gatekeeping, timekeeping and decision-making processes., pain assessment by emergency nurses at triage in the emergency department: a qualitative study, accuracy and concordance of nurses in emergency department triage., accuracy and reliability of emergency department triage using the emergency severity index: an international multicenter assessment, the accuracy and consistency of rural, remote and outpost triage nurse decision making in one western australia country health service region., experiences of nurses working in a triage area: an integrative review., triaging the emergency department, not the patient: united states emergency nurses’ experience of the triage process, transition in care from ems providers to emergency department nurses: a systematic review, related papers.

Showing 1 through 3 of 0 Related Papers

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

micromachines-logo

Article Menu

research instrument reliability

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Research on key technologies of dual-light-type photoelectric colorimetric method for phosphate determination.

research instrument reliability

Share and Cite

Guo, H.; Zhang, H.; Sun, T.; Wang, X.; Gong, P. Research on Key Technologies of Dual-Light-Type Photoelectric Colorimetric Method for Phosphate Determination. Micromachines 2024 , 15 , 821. https://doi.org/10.3390/mi15070821

Guo H, Zhang H, Sun T, Wang X, Gong P. Research on Key Technologies of Dual-Light-Type Photoelectric Colorimetric Method for Phosphate Determination. Micromachines . 2024; 15(7):821. https://doi.org/10.3390/mi15070821

Guo, Hongzhuang, Hao Zhang, Tingting Sun, Xin Wang, and Ping Gong. 2024. "Research on Key Technologies of Dual-Light-Type Photoelectric Colorimetric Method for Phosphate Determination" Micromachines 15, no. 7: 821. https://doi.org/10.3390/mi15070821

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. Research Instrument Reliability

    research instrument reliability

  2. Reliability of the research instrument

    research instrument reliability

  3. Reliability of Research Instrument

    research instrument reliability

  4. Research Instrument Reliability Test

    research instrument reliability

  5. Reliability of Research Instrument

    research instrument reliability

  6. Validity and reliability Chapter 2-Research methodology/proposal lecture Notes by Dr stephen

    research instrument reliability

VIDEO

  1. Research Instrument 1

  2. Research Instrument, Validity and Reliability

  3. Reliability and Validity in Research || Validity and Reliability in Research in Urdu and Hindi

  4. Developing the Research Instrument/Types and Validation

  5. Development of Research Instrument in the field of Education/ Social Sciences (Part one )

  6. 6.2 The supporting materials used by researchers

COMMENTS

  1. Instrument Reliability

    For research purposes, a minimum reliability of .70 is required for attitude instruments. Some researchers feel that it should be higher. A reliability of .70 indicates 70% consistency in the scores that are produced by the instrument. Many tests, such as achievement tests, strive for .90 or higher reliabilities.

  2. Reliability vs. Validity in Research

    Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...

  3. The 4 Types of Reliability in Research

    Reliability is a key concept in research that measures how consistent and trustworthy the results are. In this article, you will learn about the four types of reliability in research: test-retest, inter-rater, parallel forms, and internal consistency. You will also find definitions and examples of each type, as well as tips on how to improve reliability in your own research.

  4. How to Determine the Validity and Reliability of an Instrument

    A similar type of reliability called alternate forms, involves using slightly different forms or versions of an instrument to see if different versions yield consistent results. Inter-rater reliability checks the degree of agreement among raters (i.e., those completing items on an instrument). Common situations where more than one rater is ...

  5. Validity & Reliability In Research

    As with validity, reliability is an attribute of a measurement instrument - for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the "thing" it's supposed to be measuring, reliability is concerned with consistency and stability.

  6. Reliability and validity: Importance in Medical Research

    Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtain …

  7. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  8. Reliability

    Research reliability refers to the consistency, stability, and repeatability of research findings. It indicates the extent to which a research study produces consistent and dependable results when conducted under similar conditions. ... The quality and characteristics of the measurement instrument can impact reliability. If the instrument lacks ...

  9. Measuring the Validity and Reliability of Research Instruments

    2. Research Objectives The objectives of the study are as follows: i) To analyse the reliability of the ILS, SPCD, and CMAT instruments; ii) To analyse the value of separation index in the ILS, SPCD, and CMAT instruments; iii) To distinguish the sufficiency of PTMEA and item fit in defining the terms in research instruments; and iv) To analyse ...

  10. A Primer on the Validity of Assessment Instruments

    Validity in research refers to how accurately a study answers the study question or the strength of the study conclusions. ... Discuss whether the changes are likely to affect the reliability or validity of the instrument. Researchers who create novel assessment instruments need to state the development process, reliability measures, pilot ...

  11. Reliability and Validity of Measurement

    Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to. Validity is a judgment based on various types of evidence.

  12. Validity and Reliability of the Research Instrument; How to Test the

    Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). This review article explores and describes the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability tests.

  13. Measuring Reliability and Validity of Evaluation Instruments

    Instruments with evidence of reliability yield the same results each time they are administered. Let's say that you created an evaluation instrument in computing education research, and you gave it to the same group of high school students four times at (nearly) the same time. If the instrument was reliable, you would expect that the results ...

  14. Library Guides: Educational and Psychological Instruments: Reliability

    Is the instrument well-designed? Researchers often discuss the "reliability" and "validity" of instruments, rather than whether they are "good" or "bad." According to this video and other resources from Sage Research Methods Core, reliability is about the consistency of test results. Validity is about whether test results represent ...

  15. Validity and Reliability of the Research Instrument; How to Test the

    The instrument serves two functions: (1) to create a coherent, theoretical foundation for further research on the IRM construct, and (2) to provide reference norms for practicing managers to use ...

  16. Validity and Reliability of the Research Instrument

    There are two types of criterion validity namely; concurrent validity, predictive and postdictive validity. 6. Reliability. Reliability concerns the extent to which a measurement of a phenomenon provides stable and consist result (Carmines and Zeller [ 13] ). Reliability is also concerned with repeatability.

  17. Research Reliability

    Research Reliability. Reliability refers to whether or not you get the same answer by using an instrument to measure something more than once. In simple terms, research reliability is the degree to which research method produces stable and consistent results. A specific measure is considered to be reliable if its application on the same object ...

  18. Principles and Methods of Validity and Reliability Testing o ...

    oping countries. This has been linked to the dearth of knowledge of these tests. This is a review article which comprehensively explores and describes the validity and reliability of a research instrument (with special reference to questionnaire). It further discusses various forms of validity and reliability tests with concise examples and finally explains various methods of analysing these ...

  19. Quantitative Research Excellence: Study Design and Reliable and Valid

    Learn how to design and measure quantitative research with excellence and validity from this comprehensive article.

  20. Reliability and Validity

    Quantitative Methodology. Reliability and validity are important aspects of selecting a survey instrument. Reliability refers to the extent that the instrument yields the same results over multiple trials. Validity refers to the extent that the instrument measures what it was designed to measure. In research, there are three ways to approach ...

  21. Reliability Vs Validity

    Reliability and validity are two important concepts in research that are used to evaluate the quality of measurement instruments or research studies. Reliability. Reliability refers to the degree to which a measurement instrument or research study produces consistent and stable results over time, across different observers or raters, or under ...

  22. Validity and Reliability of the Research Instrument; How to Test the

    the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability tests. Key Words Research Instrument, Questionnaire, Survey, Survey Validity, Questionnaire Reliability, Content Validity, Face Validity, Construct Validity, and Criterion Validity. I. INTRODUCTION

  23. Reliability and Validity of Research Instruments Correspondence to

    This paper primarily focuses explicitly on two terms namely; reliability and validity as used in. the field of educational research. When conducting any educa tional study it is worth noting that ...

  24. Research Instrument

    Characteristics of a Good Research Instrument. Validity: A good research instrument accurately measures what it is intended to measure. This ensures that the results are a true reflection of the concept being studied. Reliability: The instrument produces consistent results when used repeatedly under similar conditions. This consistency is ...

  25. Emergency department triage decision-making by registered nurses: An

    The triage decision-making instrument meets the criteria for face validity, content validity and internal consistency and can be used for targeted triage interventions aimed at improving throughput and staff education. AIM To develop and psychometrically test the triage decision-making instrument, a tool to measure Emergency Department Registered Nurses decision-making. DESIGN Five phases: (1 ...

  26. Research on Key Technologies of Dual-Light-Type Photoelectric ...

    The primary performance of the instrument was verified, and comparative experiments with a UV-1780 spectrophotometer were conducted to validate its performance. The experimental results demonstrate that this device exhibits a high degree of linearity with an R2 value of 0.9956 and a repeatability of ≤1.72%.