Abstract
Historical relationship between quantitative and qualitative research
Newton's era and the Renaissance established two permanently important worldviews from which modern traditions and methods of inquiry have proceeded. These two worldviews and the position of scientists within them roughly characterize quantitative and qualitative research. The quantitative world view posits an objective reality characterized by stable, predictable, and commensurable phenomena that operate through the laws of cause and effect. The qualitative worldview emphasizes a subjective reality that is characterized by complexity, apparently infinite variation, and incommensurability without perturbation. The laws of cause and effect operate, but always within a unique set of circumstances determined by multiple factors (Thomas, 1996).
Historically it might have been difficult for many health care practitioners and researchers, the majority with educational backgrounds strictly in the natural sciences or biology, to take a qualitative approach because qualitative methods are rooted in social science (Pope & Mays, 2001). The wide acceptance of qualitative research might also have been slowed because of the influence of views such as those of Kuhn (1970), and Bernstein and Freeman (1975), for example, who believed quantitative prediction to be preferable to qualitative prediction. They considered research that had both quantitative and qualitative data to be less valuable in terms of method than research that employed quantitative data only.
The scientific community was often seen as divided into two groups based on the use of quantitative versus qualitative methods, and the schism between these seemingly conflicting traditions seemed profound and unlikely to be bridged. In recent years, however, in the fields of both social sciences and health care, researchers have come to believe that it is no use to rigidly separate between quantitative and qualitative research (Abell, 1990; Barbour, 1999; Hammersley, 1992; Mechanic, 1989; Pearlin, 1992).
The realization that quantitative and qualitative methods are not incompatible was an important one. Although the two approaches are useful on their own, they are also powerful when used in conjunction with each other. For example, in the form of a questionnaire, qualitative research can be used to narrow a research focus in preparation for use of quantitative methods. Qualitative research is also useful for deeper interpretations of quantitative findings. On the other hand, large variances in quantitative analysis of response data can be explained by focused interviews. Mixed-method research (Johnson & Onwuegbuzie, 2004; O'Byrne, 2007), in which the researcher uses the qualitative research paradigm for one phase of a research study and the quantitative research paradigm for another phase of the study, is also important in health care. Systematic reviews of multimethod studies have demonstrated that significant results would not be possible without equal evaluation of both quantitative and qualitative data (Bravata, McDonald, Shojania, Sundaram, & Owens, 2005; Mulrow, Langhorne, & Grimshaw, 1997). These findings intimate the need to create new integrative criteria by which to evaluate the scientific adequacy of both quantitative and qualitative analysis.
Development of the debate of a an integrative framework for evaluation
There are three main perspectives—method, use, and value—regarding evaluation criteria (Alkin, 2004). As scientific adequacy is one of the most important criteria in method, it was well organized in the social experiment field. Although many researchers recognize the importance of scientific adequacy in qualitative research, it was not well organized in qualitative research field until now. Recently some researchers have developed criteria including not only method but also use or value (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999; Messick, 1995). In this study, however, we focus on method and examine an integrative framework of quantitative and qualitative criteria. There are many words and concepts (Koch, 1994; Watson & Girard, 2004) regarding method in qualitative research field (e.g., rigor, integrity); we use
Validity, reliability, and generalizability are widely used as criteria for the evaluation of quantitative analysis. However, many qualitative researchers, who do not assume an objective reality or a confirmatory perception, tend to question this holy trinity (Klave, 1995). Even now, more than two decades since it began, the interdisciplinary discussion regarding the roles of validity, reliability, and generalizability in the qualitative paradigm continues (Mays & Pope, 2001). Debate concerning the relationship between quantitative and qualitative paradigms is often muddled and confused, and the clutter of terms and arguments has resulted in the concepts' becoming obscure and unrecognizable (Morse, Barrett, Mayan, Olson, & Spiers, 2002).
The 1980s saw the first main wave of qualitative literature and the emergence of a new language for research. The introduction of Lincoln and Guba's (1985) ideas on trustworthiness provided an opportunity for qualitative researchers to explore new ways of expressing validity, reliability, and generalizability outside the semantics of the quantitative paradigm (Guba & Lincoln, 1981). Lincoln and Guba recognized that their new criterion might be imperfect. It also would not yet be considered significant as it stood in marked contrast to that of quantitative inquiry, which claimed to be utterly unassailable (Tobin, 2004). Lincoln and Guba later refined the trustworthiness construct by introducing the criteria of credibility, dependability, confirmability, and transferability. These concepts were innovative and challenging, and they provided the initial platform from which much of the current debate on scientific adequacy emerged (Lincoln & Guba, 1985).
We also recognize and make consideration for the fact that Lincoln's (1985) paradigms are not only perspectives regarding scientific adequacy. As Messick (1989, 1995) thought that validity and values are one imperative, he considered that test validation implicates both the science and the ethics of assessment. As our purpose is to develop integrative framework of quantitative and qualitative criteria, we adopted the conventional definition of
The rejection of the terms
The extant literature highlights the increasing need to reconsider evaluation criteria for scientific adequacy in health care research and to explore an integrative framework of quantitative and qualitative criteria. In this article, we compare quantitative criteria with qualitative criteria and examine their underlying epistemologies.
Review method
As for review perspective, we first examine Patton's (1990) “paradigm of choices” (p. 39), which supports methodological appropriateness as the primary criterion for judging methodological quality. A paradigm of choices allows that different methods might be appropriate in different situations, a view that is quite different from the conventional pragmatic perspectives.
Using a paradigm of choices, in this article we explore the epistemologies underlying quantitative and qualitative evaluation criteria and discuss the issue of appropriateness between paradigm and type of study. Conducting research with an understanding of the strengths and limitations of both quantitative and qualitative methodologies is also important from the viewpoint of critical multiplism (Letourneau & Allen, 1999). To date, however, there exists no integrative framework of quantitative and qualitative criteria (Miles & Huberman, 1994). Unclear methodology can lead to a lack of scientific adequacy in research (Mays & Pope, 2001). Given the lack of an integrative framework of quantitative and qualitative criteria, we aim in this article to develop a clear definition of
A MEDLINE search was conducted to identify studies that discuss evaluation criteria for research. We excluded studies that did not mention validity, credibility, reliability, dependability, objectivity, confirmability, generalizability, or transferability if they had description regarding some evaluation criteria. We have also included articles recommended by health care research experts.
Validity of setting the research frameworks (validity/credibility)
Validity is the strength of research conclusions, inferences, or propositions. More formally, Cook and Campbell (1979) defined it as the “best available approximation to the truth or falsity of a given inference, proposition or conclusion” (p. 37). Others have explained that
If a measured value and criterion score are determined at essentially the same time, a researcher can establish criterion-oriented validity. Criterion-oriented validity is also established when one test is proposed as a substitute for another or a test is shown to correlate with some contemporary criteria. Content validity is established by showing that the test items are samples of a universe in which the investigator is interested; the researcher will define a universe, and sample systematically within that universe in order to develop a framework. Establishing content validity has significant limitations, however, as it is difficult to avoid error (Bohrnstedt, 1985; Kirk & Miller, 1986). Construct validity is involved whenever a test is to be interpreted as a measure of some attribute or quality that is not operationally defined. The problem faced by the investigator is, What constructs account for variance in test performance? Seemingly, there is recognition that a research framework can be established in advance.
Alternatively, Lincoln and Guba (1985) have argued that there is actually a set of multiple constructs formed in the minds of people and that to ensure methodological scientific adequacy, researchers must demonstrate that they have recreated these multiple constructs adequately; that is, that the recreations, arrived at via inquiry, are credible to the creators of the original multiple realities. In this view,
As for the validity/credibility paradigm, there is a major difference regarding the creation of research frameworks (Table 1). When a research framework can be established in advance, it is used to make observations. Because, in this case, developing a research framework means conducting research from a certain perspective, it is broader than testing a hypothesis. When a research framework cannot be established in advance, it is formed in the process of the study. Although quantitative studies employ numbers, which allow limited interpretation, most qualitative studies use words, which might allow for wider interpretation. Thus, quantitative studies are well suited to validity as an evaluation criterion, whereas qualitative studies are well suited to credibility.
Paradigm of criteria and underlying epistemology
In some cases, however, qualitative studies, such as participant observations, do test hypotheses (Stake, 1978). In these instances it might be valuable to apply the validity criterion, as a research framework is formed in advance. Additionally, if the research objective is to form a new hypothesis, researchers might create only a partial framework in advance to, for example, prepare an interview protocol or interview structure. In this case, both credibility and validity might be appropriate criteria for evaluation.
The same is equally true for quantitative research. It is not always possible to create a rigorous framework in advance of every quantitative study. For example, some statistical analyses, such as exploratory factor analysis or data mining, allow room for interpretation, and the use of credibility as an evaluation criterion can contribute to the robustness of the research. However, current techniques of enhancing credibility are limited to text-based data, and new techniques are needed for application to numerical data. The use of questionnaires in quantitative research is another case where the use of credibility might be appropriate. The objective of a text-based questionnaire is to capture reality; therefore, there might be a strong need to use the credibility criterion for evaluation. Because using credibility as an evaluation criterion for quantitative research is useful, further exploration of this use is necessary.
The validity/credibility paradigm is centered on the establishment of a research framework. Validity is used to evaluate frameworks that are set in advance. Credibility is used to evaluate frameworks that are created in the process of research. After assessing the nature of the research framework, a researcher should use either validity or credibility, or both, as appropriate.
Stability of the phenomena and methods (reliability/dependability)
Dependability is used as an evaluation criterion when an observed phenomenon is likely to change depending on a research method, time, or environment and it is difficult to assume stability in the research (Lincoln & Guba, 1985). As a criterion dependability takes into account both factors related to instability and factors related to phenomenal or design-induced changes (Schwandt, 2001). In research studies where dependability is used as the evaluation criterion, it is often the case that researchers have to certify that the collection and analysis of data fall within acceptable professional, legal, and ethical limits (Schwandt, 2001). Dependability is assessed mainly by the use of a dependability audit, in which a third-party auditor systematically checks the researcher's work through review of the audit trail (Schwandt, 1997). The audit trail includes tape recordings, transcripts, interviewer's guides, data reduction and analysis products, and the list of working hypotheses. The auditor ascertains fairness in the representation of the research process and determines whether researchers' accounts are supported by corroborative documents. Dependability is an evaluation criterion focused on the consistency of the research process and is applicable in cases where both method and phenomena might prove to be unstable.
The stability of phenomena and methods varies greatly in the reliability/dependability paradigm (see Table 1). Phenomena and methods that can assume stability are evaluated by the stability of results, whereas phenomena and methods that are not stable are evaluated by the consistency of the research process. Reliability as an evaluation criterion is highly suited to studies with controlled research environments, such as a laboratory observation, whereas dependability is more applicable to studies measuring less controllable events, such as those dealing with human emotion.
In quantitative research the stability of phenomena and methods is often the case; however, this cannot be assumed automatically. Similarly, there are instances when the method and phenomena are actually stable in quantitative studies. Mental health research is a prime example. Symptoms of mental disorders often differ greatly across a group of patients; therefore, assessing clinical condition as a variable is not stable (American Psychiatric Association, 1994). To ensure dependability between symptoms and a diagnosis, the researcher must define a mental disorder based on diagnostic criteria. After defining the diagnosis of a mental disorder, the effects of treatment can be evaluated quantitatively and the characteristics of symptoms can be examined qualitatively. This is also an example where the dependability criterion is used to evaluate the scientific adequacy of both quantitative and qualitative results.
In health care research, rather than choosing a single evaluation criterion based on the assumption of stability (or instability), it is necessary to assess stability in terms of observational method, observed phenomenon, and data analysis to use appropriate criterion. In this paradigm reliability is used to evaluate phenomena and methods that assume stability, whereas dependability is used to evaluate those that do not assume stability.
Neutrality of the observations and interventions (objectivity/confirmability)
Regarding the distance between the observer and the observed, two perspectives are of principal importance: The first is based on the notion that a short distance harms objectivity, and the second is based on the notion that a long distance causes a lack of understanding of the observed.1 Quantitative research generally chooses the former perspective, keeping a certain distance from the observed. Objectivity assumes three things: (a) there is an isomorphism between the study data and reality, (b) observers can keep adequate distance from the observed, and (c) inquiry is value free (Lincoln & Guba, 1985). When observations and interventions are conducted in a neutral way, human errors made in the process of research need to be vetted to ensure scientific adequacy.
Conversely, qualitative research sometimes addresses issues that require researchers to have direct emotional involvement or to experience sympathy, such as when conducting a one-on-one survey and having to developing a cordial relationship with study participants (Lofland, 1971). In such cases, observations and interventions have an effect not only on the observed but also on the observers. When the three assumptions regarding objectivity cannot be held true, researchers need to establish confirmability by controlling for the effects of observations and interventions on the process and consequences of the research.(Lincoln & Guba, 1985) This is done by certifying that the findings and interpretations are based on raw data and by making transparent the methods and process of the research (e.g., raw data, data reduction and analysis products, data reconstruction and synthesis products, process notes) (Miles & Huberman, 1994).
In the objectivity/confirmability paradigm the difference in evaluation methods lies in the recognition of the neutrality of observations and interventions (see Table 1). When observations and interventions are conducted in a neutral way, errors made in the process of research need to be controlled for, and when observations or interventions are not conducted in a neutral way, their effects on the process and consequences of the research need to be controlled for.
In health care research the majority of participants are human beings, and research often has an effect on the observed, even in quantitative studies. Interviews, for example, have various effects on both the observers and the observed due to interpersonal contact, and in survey studies the wording, construction, and arrangement of questions in questionnaires might influence participants' answers. Other factors that can have a confounding effect on results include the researchers themselves, who could realize academic achievement by reporting the research results, and the research sponsor, who might have direct interests in the research results. Although a limited number of scientific journals currently require authors to provide information regarding the source of funding, many journals seem to lack awareness of findings that might be affected by the sponsors. Thus, it might be useful to further examine the possible influence from a sponsor or parent organization on research findings.
Range and applicability of findings (generalizability/transferability)
Shadish (1995) argued that the core principles of generalization apply to both quantitative and qualitative research. Findings from either type of study can be generalized according to five principles: (a) the principle of proximal similarity, (b) the principle of heterogeneity of irrelevancies, (c) the principle of discriminant validity, (d) the principle of empirical interpolation and extrapolation, and (e) the principle of explanation. If researchers recognize that certain study findings can be applied to another context, then exploration of generalization is useful. According to the generalizability paradigm, researchers need to determine the range of application of findings to evaluate the generalizability of those findings.
Alternatively, some qualitative researchers have argued that, at best, only working hypotheses may be abstracted and that the transferability of these hypotheses is determined empirically, depending on the degree of similarity between sending and receiving contexts. Donmoyer (1990) argued that researchers should reject the conventional paradigm of generalizability in qualitative research. Researchers need to learn about both sending and receiving contexts to ensure the transferability of one's inference (Lincoln, 1985). The original inquirer cannot specify the sites to which transferability might be sought, but the implementers can. The responsibility of the original investigator is to enable someone interested in making a transfer to reach a conclusion about whether the transfer can be contemplated as a possibility. To improve the quality of transferability, original researchers are responsible for providing sufficient descriptive data for implementers to make better transferability judgments. Researchers should suggest possible methods of verification. As mentioned above, when researchers recognize that it could be difficult to generalize certain research findings, they should evaluate the possibility of extrapolation based on the transferability paradigm.
The contrast in the generalizability/transferability paradigm lies in the difference in the range of application (see Table 1). In a context where study findings can be applied, evaluation of generalizability is needed. In a context where study findings are not directly applicable, information regarding extrapolation is needed. Compared to the situation in basic research in biology or physics, it is difficult for applied science to attain generalizability, mainly because of its vulnerability to social contexts and regional environments, and the characteristics of participants. Evaluating the quality of research solely from the viewpoint of generalizability should be avoided, as the findings of applied science might be more easily useful to the direct return of the research results to the participants and society.
If a pilot intervention study is conducted for district “A,” the findings need to be generalizable at least to the whole of district A, even though it might be difficult to simply apply the program findings to other countries that have different social systems and characteristics of residents. In such an instance, it might be useful to discuss not only the limitations to generalizability but also the transferability to another context. In quantitative research, it is imperative to establish techniques of transferability.
Although it might not be useful to evaluate the generalizability of certain types of qualitative research that report detailed findings on single cases, in studies using the grounded theory approach, one of the qualitative methods often intended to form a middle-ranged theory, some techniques regarding generalization for a limited range have been proposed. Even in qualitative research it is useful to evaluate the generalizability of findings after determining the range of application.
The generalizability/transferability paradigm is concerned with the range of application of findings. Whether it is quantitative or qualitative research, it is important to recognize and identify the range of application of research findings. After the range of application has been set, generalizability is used to evaluate generalization and transferability is used to evaluate extrapolation. Unless research findings could assume universal generalization or have no possibility of extrapolation, it is useful to use both paradigms for evaluation.
Conclusion
In this article we examined the epistemological issues behind evaluation criteria for scientific adequacy in research. Comparing quantitative research paradigms with those of qualitative research, we discussed the possible use of evaluation criteria based on a pragmatic perspective. Validity/credibility is concerned with research framework, whereas reliability/dependability refers to the stability of observations, neutrality/ confirmability reflects influences between observers and subjects, and generalizability/transferability are epistemologically different in the way findings are applied. Qualitative studies, however, do not always choose qualitative paradigms; and if we can assume stability to some extent in a qualitative study, it might be better to use a quantitative paradigm (reliability). Similarly, it is useful to employ qualitative paradigms to enhance the scientific adequacy of a quantitative study that did not use a research framework with stability in all phases of observation. Regardless of whether it is quantitative or qualitative research, it is important to recognize the four epistemological axes. Evaluating scientific adequacy in a study should be done after establishing a framework(s), assessing stability of the phenomena and methods, ensuring neutrality of the observations and interventions, and determining the range of application for the findings.
