Comment on Reliability and validity of the Chinese version of the Mild Behavioral Impairment Checklist for screening for Alzheimer’s disease

6 September 2019

I read with interest the study of Cui et al. [1] assessing the reliability and validity of the Chinese version of Mild Behavioral Impairment-Checklist (MBI-C) in Alzheimer’s disease (AD) patients. The authors address key elements of the validation process, but their conclusions—the Chinese MBI-C is high in validity and reliability as well as superior to Neuropsychiatric Inventory Questionnaire—are unfortunately weakened by some persistent misconceptions of scale validation in general, and some common issues in neurodegenerative disease research. I raise these issues, as validity of neuropsychiatric symptom measures may have implications for clinical trials and diagnostics.

The authors present Cronbach’s alphas for MBI-C: 0.936 for the entire scale, and 0.878, 0.837, 0.863, 0.664, 0.824 for the five subscales, respectively [1]. For decades, psychometricians and methodologists have encouraged researchers to avoid relying on Cronbach’s alpha for scale validation, as the assumptions for using it are hardly ever met or even tested: namely, unidimensionality, tau equivalence, normal distribution of items and lack of covariance between item errors [2–4]. Alpha can also increase with the number of items, which means a “good” alpha may arise in scenarios with low inter-item correlations but excessive number of items in multidimensional data [2,3]. Internal consistency of the 34 MBI-C items that should theoretically represent five constructs, not one, could be one such scenario where an alpha > 0.9 was observed without it being psychometrically informative.

The authors acknowledge the necessity of conducting a confirmatory factor analysis in a larger sample to assess construct validity in future work, as their current sample size of 96 individuals (50 controls, 46 AD patients) is meager compared to common recommendations [5]. However, using principal components analysis, the authors extracted seven principal components, which appears to be interpreted as evidence for construct validity, even if it is in contrast with their examination of the original five subscale Cronbach’s alphas. To summarize, the construct validity evidence for the Chinese MBI-C remains inconclusive owing to a small sample size that renders confirmatory factor analysis inapplicable. Curiously enough, there was some data for reliability as well as sensitivity and specificity for AD diagnosis, which lends some merit to the authors’ notion that MBI-C could be used as an auxiliary part of the AD diagnostic tool kit in China without knowing whether it actually measures what it is supposed to. However, problems arise when phrases such as ‘MBI-C has high reliability and validity’ [1] are taken out of context.

The interplay between painstaking data collection and urgent demand for more sensitive methods is ever present in scale development and validation for neurodegenerative diseases. However, it should be borne in mind that only with large and representative samples should we presume to have validity evidence for a scale. It is beneficial for both researchers and patients that we allocate enough resources to conduct validation studies capable of providing convincing validity evidence for the scale in at least one setting. In general, accessible, non-technical guidelines for developing and validating scales (e.g., [5,6]) should be preferred over heuristics or adherence to the sometimes questionable practices set forth by previous measures.

Toni T. Saari
Institute of Clinical Medicine, Neurology, School of Medicine, University of Eastern Finland, Kuopio, Finland
School of Educational Sciences and Psychology, University of Eastern Finland, Joensuu, Finland

[1] Cui Y, Dai S, Miao Z, Zhong Y, Liu Y, Liu L, Jing D, Bai Y, Kong Y, Sun W, Li F, Guo Q, Rosa-Neto P, Gauthier S, Wu L (2019) Reliability and validity of the Chinese version of the Mild Behavioral Impairment Checklist for screening for Alzheimer’s disease. J Alzheimers Dis 70, 747–756.
[2] Cortina JM (1993) What is coefficient alpha? An examination of theory and applications. J Appl Psychol 78, 98–104.
[3] Tavakol M, Dennick R (2011) Making sense of Cronbach’s alpha. Int J Med Educ 2, 53–55.
[4] McNeish D (2017) Thanks coefficient alpha, we’ll take it from here. Psychol Methods 23, 1–23.
[5] Clark LA, Watson D (1995) Constructing validity basic issues in objective scale development. Psychol Assess 7, 309–319.
[6] Flake JK, Pek J, Hehman E (2017) Construct validation in social and personality research: current practice and recommendations. Soc Psychol Personal Sci 8, 370–378.


We have carefully read the Letter to the Editor by Toni T. Saari, who made some proposals for our recently published article. We have checked the calculation method of internal consistency reliability and construct validity respectively, and provide the following feedback.

As the authors say, alpha increases with the number of items, and the great Cronbach’s alpha coefficient of the whole scale may be related to the big number of items. We have also calculated the alpha of each subscale and got nice results in most of the dimensionalities. Only one coefficient is just fair (greater than 0.6 and less than 0.7), and the reasons for this are clearly explained in the discussion. There are indeed some studies showing that there are many better alternative methods to replace the Cronbach’s alphas to calculate the reliability of the scale; in particular, omega is more suitable for the scale with multidimensional and tau non-equivalence. However, it is also mentioned in many studies that, due to the lack of tau equivalence, compared with omega, Cronbach’s alpha tends to underestimate rather than overestimated the reliability of the scale [1-3]. Nowadays, the Cronbach’s coefficient is still the most widely used method of reliability calculation.

In addition, we have always acknowledged that the construct validity of the study results is not ideal, which may be related to the selection of the subjects and cultural differences. We also hope to expand the sample size to calculate confirmatory factor analysis in the further study, for optimizing the scale items. Although the content validity and criterion validity of the scale are good, the conclusion that the Mild Behavioral Impairment-Checklist (MBI-C) has high reliability and validity is still not accurate and easy to cause misunderstanding. Because the construct validity is not ideal, it is indeed unprecise to draw this conclusion directly.

We appreciate Professor Toni T. Saari’s attention to our research and for making useful suggestions. This study aims to explore whether MBI, as a new scale for testing behavioral impairment, can replace the Neuropsychiatric Inventory Questionnaire to be an effective tool for screening patients with Alzheimer's disease (AD). Although the small sample is the limitation of this study, MBI-C still demonstrates its superiority of screening. Considering that the internal consistency of each dimensionality of the scale and the structural validity are not ideal enough, we would like to accept the professor's suggestion sincerely, and modify the research conclusion as "This study showed that the Chinese version of the MBI-C has good reliability and validity, and could be used as an alternative scale to the NPI-Q for AD dementia screening in the Chinese population, but further large sample studies to inspect its construct validity is necessary". We hope to make up for the shortcomings of this study in further studies, and still believe that the MBI-C has a good implementation prospect in the screening of patients with AD in China.

Yue Cui, Fang Li, and Liyong Wu


[1] Deng L, Chan W (2017) Testing the difference between reliability coefficients alpha and omega. Educ Psychol Meas 77, 185-203.

[2] Peterson RA, Kim Y (2013) On the relationship between coefficient alpha and composite reliability. J Appl Psychol 98, 194-198.

[3] Tavakol M, Dennick R (2011) Making sense of Cronbach's alpha. Int J Med Educ 2, 53-55.