Abstract
Introduction
Usability has become an increasingly important factor that influences how consumers and designers choose among different systems or products [1–3]. Usability evaluation is a specialized process that has been shown to require expertise from a wide range of knowledge domains [4]. However, according to Hornbæk’s paper on current practice in measuring usability [5], a major weakness of current methods is that there is no principal technique that addresses the vagueness and uncertainties inherent in the various components that contribute to the concept of usability. Indeed, most usability methods rely heavily on human judgment about the various attributes of a product, but often fail to take account of the inherent uncertainties in these judgments in the evaluation process. The main goal of this study was to demonstrate how these uncertainties can be elicited, captured and combined by using a fuzzy method integrated with an analytic hierarchy process (AHP) method.Section 2 provides a brief review of existing usability evaluation techniques such as those conducted using general mathematical methods, questionnaires, the AHP method, and fuzzy approach. In Section 3, the general methodological steps of how to use fuzzy evaluation and AHP method will be described. Section 4, will consider the theoretical framework of the proposed fuzzy usability evaluation technique based on the AHP method. A discussion is provided inSection 5.
A brief review of existing usability evaluation techniques
As a core term in human factors and ergonomics, usability has been defined by researchers in different ways [4, 6–10]. By focusing on product perception and acceptance, Shackel proposed an operational definition of usability which provided a set of usability criteria, including effectiveness (level of interaction in terms of speed and errors), learnability (level of learning needed to accomplish a task), flexibility (level of adaptation to various tasks) and attitude (level of user satisfaction with a system) [6]. This definition has been generally accepted by the usability community [11]. Another well-accepted definition of usability was offered by Nielsen [4], which described usability as ‘the measure of the quality of the user experience when interacting with something whether a Web site, a traditional software application, or any other device the user can operate in some way or another’ [7]. Nielsen suggested several operational usability dimensions such as learnability, memorability, efficiency, user satisfaction (subjective assessment of how pleasurable it is to use) and error (number of errors, ability to recover from errors, existence of serious errors) [4]. To consolidate the definitions, the International Organization for Standardization (ISO) defined usability as ‘the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use [8]. However, these various definitions of usability contain concepts that are far from concrete, and they are often highly context dependent in the sense that no single set of measurements can be applied to all products or services. Indeed, many practitioners lament that usability can mean different things to different people, even when it is defined, and it still remains intuitive, uncertain, and ambiguous [5, 12]. Therefore, in the usability community, in order to compare existing usability data to ideal goals or that of other products, practitioners have tried to develop a framework for deriving a single or integrated metric from the various aforementioned metrics with the use of different evaluation techniques [13, 14].
The general weighted additive method
The definition of usability is highly dependent on the measurement method. One of the most direct measurements is the method of user performance which is used widely for evaluating product usability. Practitioners can easily measure the task success rate of users in actually using a product and derive an average accuracy or error rate that reflects product effectiveness [9]. However, different products may require different sets of metrics to measure their effectiveness and it is always difficult to make comparisons between evaluations for different products. A number of attempts have been made to derive a single usability score that combines the different metrics in order to facilitate comparisons. Babiker et al. suggested assigning different weights to a set of metrics such as ease of access, navigation, orientation, and user interaction for evaluating usability of hypertext systems, and then integrating these metrics into a simple weighted additive score [15]. Although they found that the combined metric correlated well to subjective assessment measures, whether the method could be easily generalized or transferred to other systems is questionable, because the weights were based on the specific assessment criteria of a product use. Other methods based on this kind of weighted additive model have been used by various researchers [5, 16]. However, a common problem with this method is that the measurements depend critically on specific products and on the practitioner’s subjective judgment. Also, there is always the problem that it may be too simplistic to assume that a single weight can be assigned to each of the evaluated attributes.
The questionnaire method
In contrast to objective performance measurements, usability evaluation can be made with subjective evaluations [17–19]. In the ergonomics community, several well-known subjective usability questionnaires have been developed based on user personal interactive product experience. These methods include the Post-Study System Usability Questionnaire (PSSUQ) [17, 18], the Software Usability Measurement Inventory (SUMI) [20, 21], and the Questionnaire for User Interaction Satisfaction (QUIS) [22, 23]. The primary advantage of using questionnaires over other usability evaluation methods is that they can be readily applied and have a high benefit to cost ratio. All three questionnaire methods are claimed to have high reliability and validity for usability testing in practice. However, as found in the weighted additive method, these questionnaires suffer from the same problem that it is not clear how multiple metrics (either subjective or objective) derived from the responses can be weighted and combined to provide an overall product usability index.
The analytic hierarchy process (AHP) method
The AHP method was developed by Saaty [24] and has been generally accepted as a robust and flexible multi-criteria decision-making tool for dealing with complex decision problems in various research domains [25–27]. In usability engineering, the AHP method has been used to determine the weights of different components during the evaluation process as well as to conduct synthetic comparative evaluation for multiple products or prototypes [28, 29]. With a structurally hierarchical model, this method requires experts to provide only the rank orders of different metrics of usability, such as the learnability and ease of use, and the corresponding weights for these metrics can then be derived. The AHP is a technique that focuses directly on deriving the appropriate weights based on expert judgments. It is well suited to comparing the relative usability of different alternatives, and thus is a powerful multi-criteria decision-making tool for usability testing purpose. In later sections it will be shown how this method can be coupled with a fuzzy approach to enhance its ability to capture the uncertainties and vagueness of usability perceptions expressed by the experts.
The fuzzy evaluation method
In the discipline of ergonomics there is a good understanding of the role of fuzzy set theory in showing a quantifiable degree of uncertainty in human judgment [26, 31]. The fuzzy evaluation method is based on fuzzy set theory developed by Zadeh [32] for capturing the uncertainties inherent in a system. As discussed above, the processes in usability evaluation inherently involve fuzzy, uncertain, dynamic, and changing information. In the usability engineering field, some early attempts at using the fuzzy evaluation method were made. Cai et al. applied the method to capture the perceived shape and color aesthetics of different products [31]. To compare design alternatives, the imprecise preference structures of the alternatives were modeled by a set of fuzzy preference relations. These relations not only specified whether one attribute was preferred over another attribute, but also how confidently this particular preference order was expressed by the user. For Web page design, Hsiao et al. proposed a Gestalt-like perceptual measure method by combining Gestalt grouping principles and fuzzy entropy [26]. They developed a set of fuzzy relations that captured the layout of graphics, arrangement of texts, and combinations of colors. Both studies showed that the fuzzy evaluation approach can provide a powerful mathematical tool to quantify imprecise information in human judgments.
The methodological framework
Based on these previous efforts in structuring user experience or usability evaluations, in this paper, a universal method of usability evaluation for products will be proposed. This universal method will involve combining the AHP and fuzzy evaluation methods for synthesizing performance data and subjective response data. The aim for this universal method is to derive an index that is structured hierarchically within the framework of ISO 9241 part 11 [8], which define usability in terms of three major components, viz. effectiveness, efficiency, and user satisfaction. An additional goal is to demonstrate the generality of the fuzzy usability evaluation method by showing that any set of standard usability attributes can be adopted and the same process can be applied to obtain a comprehensive evaluation. The general methodological framework will be described in the next section.
The general fuzzy evaluation model
The general fuzzy evaluation model aims at providing a fuzzy mapping between each of the evaluation factors e.g. effectiveness, efficiency and user satisfaction, to a set of categorical appraisal grades e.g. good, excellent. The idea is to define fuzzy sets for the evaluation factors, such that for a particular a usability rating e.g. a 5 on a 7-point scale, could belong to the both the grades ‘good’ and ‘excellent’. However, the extent to which the usability rating belongs to each grade may vary i.e. different degrees of membership to each grade, depending on the weights given to each evaluation factor and the average ratings given by different raters. In the above example, one may find that a rating of 5 for effectiveness can be mapped to the fuzzy sets ‘fair’, ‘good’, and ‘excellent’ with degrees of membership of 0.2, 0.7, and 0.5 respectively. By assigning membership degree to multiple ‘fuzzy grades’, more of the uncertainties inherent in the rating process can be captured and retained, which will be particularly useful for comparing two products. The formal procedures of the general fuzzy evaluation model can be described by the following steps.
Evaluation factors can be defined according to the objectives of the product evaluation process.A set of
The appraisal set can be represented as a vector
The goal of the evaluation process is to provide a mapping from
In general, the fuzzy appraisal matrix of all
In the above matrix notation for
To obtain a comprehensive usability evaluation, the relative importance of each evaluation factor on the overall grading of the product should be quantified. The weight vector can be represented by
The overall appraisal result can be obtained by taking into the account the relative weights of each evaluation factor, such that a single vector with the same level of appraisal grades
Where ‘∘’ is a composition operator,
The weight vector
First, state an overall objective for the problem and list factors that affect the objective. Then structure a hierarchy of criteria for the problem: for each cluster or level in the hierarchy, some factors will be subjected to a corresponding evaluated objective. According to previous suggestions [24], it is desirable to have no more than seven elements in each cluster in each level of the hierarchy.
The major advantage of the AHP method is that, instead of asking experts to directly give a weight for a particular evaluation factor, they will be asked to rate the relative importance of the different factors. Assuming that there are
Each
These imply that matrix
To calculate the weight vectors of evaluated factors, we used the common method of ANC (average of normalized columns). ANC can be presented as:
The weight vector can therefore be obtained from matrix
From Step 3, the numerical weights (
where
If a value of the consistency ratio
Two-layer comprehensive evaluation indices
In the usability community, there are many ways to determine an evaluation index. For example, the International Organization for Standardization combined and consolidated several definitions of usability in ISO 9241 [8]. Here, it was decided to adopt the ISO usability framework as the basis for usability evaluation as it contains the set of attributes most commonly used by some practitioners [14, 19]. In it, usability is measured by
In order to structure a universal usability evaluation index for different systems, the common performance measures of
Determining the fuzzy member function of evaluation matrix R
When determining the mapping quality for factors, the corresponding measure or logistic value of each factor will be ranked as a grade. Five levels of appraisal grades are presented as
In order to address this issue, several geometric mapping functions have been proposed, such as some variations of the triangular or trapezoidal mapping functions. In line with previous studies [37, 38], the semi-trapezoid and trapezoidal distribution are used to construct mapping functions to characterize fuzzy measure values. In this way, the ambiguities and vagueness involved in the process of usability estimations can be retained. If
In the above equations,
Where
After preparatory work, a six-expert panel was used here to determine the thresholds. All experts had more than one year of professional experience in the field of usability engineering, and were introduced to some basic knowledge about how to use the threshold values in the present evaluation model. The threshold values were found to be (0, 0.3, 0.6, 0.8, 0.95, 1) for both task success and task time (for values in intervals [0, 1]), and those for satisfaction were (1, 2, 3.5, 5.5, 6.5, 7). The fuzzy membership functions could then be obtained for task
For example, if one success measure for a specific task was scored as 0.47, then in line with Equations (9) to (13), the subsets in V could be obtained as:
Therefore, according to Equation (1), the quality of the task success measure can be presented:
Normalizing,
In this way, the membership degree of factor
Because only one measure was employed for effectiveness and efficiency separately, we need only determine the weight vectors for
In accordance with Equation (5), first the vector in each column for the matrix of numerical judgments in Table 3 was normalized. The procedure is presented as
Then after averaging over the rows shown in
The resulting weight vectors, given to the three factors used to evaluate
In order to test whether the resulting eigenvector suggests a consistent numerical judgments matrix in Table 3, the value of
First, the maximum eigenvalue
It follows that
Since
In this way, Table 4 presents the pair-wise comparison with respect to overall usability, and we can obtain the weights of
Based on fuzzy evaluation theory, a model for evaluating usability of a product or system was proposed in this study. We believed that the fuzzy comprehensive evaluation technique provided an appropriate and promising path for evaluating the overall usability of a product and the technique is in line with the existing framework used in conducting usability evaluations. This technique will be particularly useful for comparing the advantages and disadvantages of different products or different versions in one product’s life cycle.
Compared to existing usability evaluation methods, this fuzzy evaluation model provides advantages over the conventional methods, and can benefit usability evaluation in two major ways: (1) The fuzzy evaluation method is based on fuzzy set theory, and is an attractive means for modeling the uncertainty or lack of precision that arises from human information processing, and are neither random nor stochastic [37, 43]: i.e. usability can be labeled as intuitive, uncertain, or elusive, especially with respect to user satisfaction. In the evaluated model here, trapezoidal fuzzy number was chosen to determine the fuzzy member function for structuring the fuzzy evaluation matrix, and we found that this function was successful in capturing the uncertainties inherent in the usability evaluation. In addition, the weights obtained with the AHP method can then be combined with the fuzzy evaluation method to provide an overall usability assessment of the product. (2) The current method uses a hierarchical evaluation index that allows iterative measurements on multiple dimensions. Usability can be defined in many dimensions and attributes, and there are many metrics that can be used to measure each of these dimensions or attributes. The choice of the appropriate dimensions and measures often depends on specific business objectives and available resources. Although there is no general rule for how measures should be chosen or combined, in the evaluation framework proposed here it was shown how the three most common and important elements i.e., effectiveness, efficiency, and user satisfaction, can be measured and combined in the hierarchical evaluation process.
Although fuzzy evaluation and AHP are commonly used to determine different aspects of products quality, the fuzzy technique proposed here needs to be tested as to how successfully it can be applied in practical cases. More importantly, the advantage of the fuzzy approach proposed in this study over the traditional evaluation methods like averaging methods should be verified. In a sense, the mathematics process in the fuzzy approach may be challenging for practical use, which is why we should try structuring a common evaluation index within the ISO usability framework, as well as weighting the importance for the different factors evaluated. However, usability or user experience can mean different things for different goals in practice, thus the evaluation index along with mapping functions or parameters may need be set again. As mentioned above, this fuzzy evaluation technique is more suitable to integrate both quantitative data and qualitative data for conducting overall usability evaluations. Combining all these considerations, we would suggest applying this evaluation procedure to summative evaluations like benchmarking or comparing product usability. The evaluation index and its metrics tend to be stable in specific usability practice, and thus we can use some software to run all procedures within the fuzzy technical framework. In another study, we will use a summative usability test case to test the application and strength of the general fuzzy usability framework, as well as to illustrate the effectiveness of the proposed evaluation technique [44].
Conflict of interest
The authors have no conflict of interest to report.
