Reliability analysis of binary outcomes: sample size and calculations of kappa statistic
https://doi.org/10.22328/2413-5747-2023-9-3-102-112
Abstract
Reliability analysis is an important methodological tool used in medical research to assess the degree of agreement between measurements taken by different methods or by multiple investigators. In this article, we provide an easy-to-understand overview of the basic concepts associated with reliability analysis, as well as the statistical criteria used in its application in biomedical research. The similarities and differences between the analysis of validity and the analysis of reliability are also presented. The principles of calculating Cohen’s kappa for the simplest situation with two researchers and binary variables are demonstrated both by using the formulas and by applying the SPSS software. Advantages and disadvantages of using kappa statistic are discussed. The article is intended for novice researchers and young scientists and will be useful for planning of research projects and training data collectors.
About the Authors
Ekaterina A. MitkinaRussian Federation
student of the faculty of dentistry
Yulia G. Kozlova
Russian Federation
student of the faculty of dentistry
Maria A. Gorbatova
Russian Federation
Cand. of Sci. (Med.), MPH, Associate professor at the Department of Pediatric Dentistry
Andrej M. Grjibovski
Russian Federation
D-r of Sci. (Med), Master of International Community Health, Head of the Directorate for Research and Innovations, Director of the Central Scientific Research Laboratory; Professor at the Department of Public Health, Public Health, General Hygiene and Bioethics
References
1. Whittemore R., Chase S.K., Mandle C.L. Validity in Qualitative Research. Qualitative Health Research, 2001, Vol. 11, № 4, pp. 522–537. doi: 10.1177/104973201129119299.
2. Ahmed I., Ishtiaq S. Reliability and validity: Importance in Medical Research. J Pak Med Assoc, 2021, Vol. 71, № 10, pp. 2401–2406. doi: 10.47391/JPMA.06-861.
3. McHugh Mary L. Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 2012, Vol. 22, № 3, pp. 276–282.
4. Tang W., Hu J., Zhang H., Wu P., He H. Kappa coefficient: a popular measure of rater agreement. Shanghai Arch Psychiatry, 2015, Vol. 27, № 1, pp. 62–67. doi: 10.11919/j.issn.1002-0829.215010.
5. Noble H., Smith J. Issue of validity and reliability in quantitative research. Evid Based Nurs, 2015, Vol. 18, № 2, pp. 34–35. doi: 10.1136/eb-2015-102054.
6. Aoki K., Hall T., Takasaki H. Reporting on the level of validity and reliability of questionnaires measuring Katakori severity: A systematic review. SAGE Open Med, 2019, Vol. 7, pp. 1–13. doi: 10.1177/2050312119836617.
7. Akturk Z. Reliability and validity in medical research. Dicle Med J, 2012, Vol. 39, № 2, pp. 196–202. doi: 10.5798/diclemedj.0921.2012.02.0150.
8. Fyffe H.E., Deery C., Nugent Z.J., Nuttall N.M., Pitts N.B. Effect of diagnostic threshold on the validity and reliability of epidemiological caries diagnosis using the Dundee selectable threshold method for caries diagnosis (DSTM). Community Dent Oral Epidemiol, 2000, Vol. 28, № 1, pp. 42–51. doi: 10.1034/j.1600-0528.2000.280106.x.
9. Рождественская Е.Ю. Надежность качественных методов и качество данных // INTER. 2014. Т. 8. C. 16–28 [Rozhdestvenskaya E.Y. Reliability of qualitative methods and data quality. INTER, 2014, № 8, 16–29 (In Russ.)].
10. Rechmann P., Jue B., Santo W., Rechmann B.M.T., Featherstone J.D.B. Calibration of dentists for Caries Management by Risk Assessment Research in a Practice Based Research network - cambra pbrn. BMC Oral Health, 2018, Vol. 18, №2. doi: 10.1186/s12903-017-0457-3.10.1186/s12903-017-0457-3
11. Tavakol M., Sandars J. Quantitative and qualitative methods in medical education research: AMEE Guide No 90: Part II. Medical Teacher, 2014, Vol. 36, № 10, pp. 838–848. doi: 10.3109/0142159X.2014.915297.
12. Warren J.J., Weber-Gasparoni K., Tinanoff N., Batliner T.S., Jue B., Santo W., Garcia R.I., Gansky S.A., Early Childhood Caries Collaborating Centers. Examination criteria and calibration procedures for prevention trials of the Early Childhood Caries Collaborating Centers. Public Health Dent, 2015, Vol. 75, № 4, pp. 317–326. doi: 10.1111/jphd.12102.
13. Amarante B.C., Arima L.Y., Pinheiro E., Carvalho P., Michel-Crosato E., Bönecker M. Diagnosis training and calibration for epidemiological studies on primary and permanent teeth with hypomineralization. Eur Arch Paediatr Dent, 2022, Vol. 23, № 1, pp. 169–177. doi: 10.1007/s40368-021-00686-3.
14. Shoukri M. Measurement of Agreement. Wiley StatsRef: Statistics Reference Online, 2015, pp. 1–31. doi: 10.1002/9781118445112.stat05301.pub2.
15. Donner A., Rotondi M.A. Sample Size Requirements for Interval Estimation of the Kappa Statistic for Interobserver Agreement Studies with a Binary Outcome and Multiple Raters. The International Journal of Biostatistics, 2010, Vol. 6, № 1. doi: 10.2202/1557-4679.1275.
16. Hyunsook H., Yunhee C., Seokyung H., Sue K.P., Byung-Joo P. Nomogram for sample size calculation on a straightforward basis for the kappa statistic. Annals of Epidemiology, 2014, Vol. 24, № 9, pp. 673–680. doi: 10.1016/j.annepidem.2014.06.097.
17. Guggenmoos-Holzmann I. The meaning of kappa: probabilistic concepts of reliability and validity revisited. Clin Epidemiol, 1996, Vol. 49, № 7, pp. 775–782. doi: 10.1016/0895-4356(96)00011-x.
18. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, Vol. 20, №1, pp. 37–46. doi: 10.1177/001316446002000104.
19. Sim J., Wright C.C. The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 2005, Vol. 85, № 3, pp. 257–268. doi: 10.1093/ptj/85.3.257.
20. Кригер Е.А., Гржибовский А.М., Постоев В.А. Оценка распространенности заболеваний с учетом диагностической эффективности тестов на примере использования серологических тестов для диагностики новой коронавирусной инфекции (COVID-19) // Экология человека. 2022. Т. 29, № 5. С. 301–309 [Kriger E.A., Grjibovski A.M., Postoev V.A. Prevalence assessment adjusted for laboratory test performance using an example of the COVID-19 serological tests. Ekologiya cheloveka [Human Ecology], 2022, Vol. 29, № 5, 301–309 (In Russ.)]. doi: 10.17816/humeco108116.
21. Zec S., Soriani N., Comoretto R., Baldi I. High Agreement and High Prevalence: The Paradox of Cohen’s Kappa. Open Nurs J, 2017, Vol. 11, pp. 211–218. doi: 10.2174/1874434601711010211.
Supplementary files
|
1. Fig. 1. Two-by-two table for calculating coefficients used in reliability analysis | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(30KB)
|
Indexing metadata ▾ |
|
2. Fig. 2. Two-by-two table for manual calculation of kappa statistic. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(59KB)
|
Indexing metadata ▾ |
|
3. Fig. 3. Example from Table 1 in SPSS data window. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(127KB)
|
Indexing metadata ▾ |
|
4. Fig. 4. Dialog box for crosstabulation. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(93KB)
|
Indexing metadata ▾ |
|
5. Fig. 5. Dialog box Crosstabs: Statistics with selection of kappa statistic. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(120KB)
|
Indexing metadata ▾ |
|
6. Fig. 6. Contingency table in SPSS with the data from Table 1. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(48KB)
|
Indexing metadata ▾ |
|
7. Fig. 7. SPSS output with the results of kappa statistic calculation. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(88KB)
|
Indexing metadata ▾ |
|
8. Fig. 8. Sample size for kappa statistic calculation. | |
Subject | ||
Type | author.submit.suppFile.figureResearchMaterials | |
View
(48KB)
|
Indexing metadata ▾ |
Review
For citations:
Mitkina E.A., Kozlova Yu.G., Gorbatova M.A., Grjibovski A.M. Reliability analysis of binary outcomes: sample size and calculations of kappa statistic. Marine Medicine. 2023;9(3):102-112. (In Russ.) https://doi.org/10.22328/2413-5747-2023-9-3-102-112