Abstract
Keywords
1. Introduction
With the growing popularity of the Web, the number of reviews containing opinions, feedbacks, comments, appraisals and so on is increasing at a high speed [1]. Such reviews provide valuable information which can help people to make their decisions on whether or not to purchase a product or service [1,2]. For example, customers’ reviews are essential to other potential customers, retailers and product manufacturers in their efforts to understand the general opinions of customers and help them to make better decisions [3]; assessments on politics can help politicians to interpret the situation and plan their strategy, for example, governments want to know what people who have voted for them think about their activities; and restaurant reviews can assist customers to pick out a place based on their needs [4–6]. As the number of reviews is expanding, it becomes very hard for users to obtain a comprehensive view of others’ opinions about various aspects of objects, products or services through a manual analysis [1,2,7–10]. A proper analysis of the opinions expressed and a summarisation can enable potential users to check previous positive and negative opinions about specific features or aspects of objects [1,10]. This functionality is commonly referred to as sentiment analysis.
In the past few years, sentiment analysis has become a rapidly developing field of study, aiming at models and approaches for the automated analysis of people’s opinions expressed in reviews [1,11–18]. Sentiment analysis may involve techniques from a number of fields including text mining, natural language processing and computational linguistics [1].
A crucial subtask in sentiment analysis is aspect detection: isolation of the aspects on which opinions are expressed [3,5,8,14,19]. This step anticipates the step of determining whether the opinions on aspects are positive or negative [1,4,6,7] and is related to information extraction and consists of extracting structured representations of aspect-oriented opinions from review documents [12,13,18]. In the context of reviews, aspects can be considered to coincide with the topics on which opinions are expressed and sentiment-aspect classification determines whether the opinions on the detected aspects (or topics) are positive, negative or neutral [14,19]. In the remainder of this article, we will use the notion of aspect and topic interchangeably and the focus will be on detecting aspects and determining the orientation of the sentiment expressed for the aspects in a review. In terms of the approach adopted, there are three types of approaches for aspect detection and sentiment-aspect classification tasks [1]: (1) the supervised learning approach, (2) the frequency- and relation-based approach and (3) the topic modelling-based approach.
Up till now, many supervised learning algorithms such as support vector machines, naïve Bayes, neural networks and decision trees have been proposed for the sentiment analysis tasks [4,20–22]. Supervised algorithms are dependent on a set of pre-labelled training data. Although the supervised approaches can achieve reasonable effectiveness, building sufficient labelled data is often expensive and needs much human labour [3]. In addition, a model trained on labelled data in one domain often performs poorly in another domain [1]. Frequency- and relation-based approaches can analyse the opinions in review documents in an unsupervised or semi-supervised way [5,14]; hence, these approaches overcome the need for labelled data that were inherent to the supervised learning. The major shortcoming of the frequency- and relation-based approaches is that they require manual tuning of various parameters, which makes them hard to transfer to another domain [1,14]. In contrast, topic modelling-based methods are unsupervised approaches which benefit from the relation between words and the topics [23–28]. The basic idea in topic modelling is that documents are represented as random mixtures over latent topics, where each topic is characterised by a distribution over words [29–32]. One main advantage of topic modelling-based approaches is that these methods view the text as a mixture of global aspects (or topics) and then analyse the sentiment in the more detailed topic or domain level [27,33].
Due to wide range of products and services being reviewed on the Internet, supervised, domain-specific or language-dependent models are not often practical [5,14,27], and it is desirable to develop models that work with unlabeled data [23]. Therefore, the framework for aspect-based sentiment analysis must be robust and easily transferable between domains or languages. Unsupervised topic modelling using approaches such as latent Dirichlet allocation (LDA) for the aspect-based sentiment analysis has encountered considerable popularity as a way to model latent aspects and sentiments in textual data [23]. Although topic models seem to benefit from the correlation between words and topics, the assumption inherent to these models that texts are a bag of words and that the order of words in a sentence can be ignored is questionable. In this article, we present a novel unsupervised model based on topic modelling which addresses the core tasks necessary to detect aspects and sentiments from review sentences for a document-level sentiment analysis system. The proposed model is expected to yield better results by the fact that it relaxes the bag-of-words assumption by considering word status. Improvements are predicted for the inferring of latent aspects regarding structure of sentences in a document, the ability to extract multi-word aspects and the power to predict the topic of previously unobserved words in documents.
One of the main issues in aspect-based sentiment analysis is that in review sentences, the sentiment polarities are dependent on aspects, topics or domains [27,28]. For example, the opinion (sentiment) word ‘
The remainder of this article is organised as follows. The detailed discussions of existing work are given in section 2. Section 3 proceeds by reviewing the formalism of LDA. Section 4 describes the proposed SAM model, including the overall architecture and specific design aspects. Subsequently, we describe an empirical evaluation and discuss the major experimental results in section 5. Finally, we conclude with a summary and some future research directions in section 6.
2. Related work
The phrase ‘sentiment analysis’ perhaps first appeared in the study of Yi et al. [34]. However, studies of sentiment analysis or opinion mining appeared earlier [20,35]. In this article, we use the terms sentiment and opinion interchangeably. The term sentiment can denote opinion, sentiment, evaluation, appraisal, attitude and emotion [1]. Aspect-based sentiment analysis mainly focuses on detecting aspects and opinions which express or imply positive or negative sentiments [1,27].
Various approaches have been proposed for aspect-based sentiment analysis of review texts. Previous work is based on, for example, association rule mining [2], double propagation [10], unsupervised aspect detection [5,14] and supervised learning methods [4]. These approaches have limitation that they do not group semantically related aspect expressions together [1,25,36]. However, recent approaches using deep neural networks [22,37] showed significant performance improvement over the state-of-the-art supervised methods on a range of sentiment analysis tasks. Soujanya et al. [22] deployed a seven-layer deep convolutional neural network for aspect-based sentiment analysis to tag each word in opinionated sentences as either aspect or non-aspect word. In addition, supervised methods are often not practical due to the fact that creating sufficient volumes of labelled data is often expensive and needs much human labour [3]. Unsupervised topic modelling approaches for identifying aspect words have been shown to be more effective [23,27,36,38]. They involve topic models and consist of a suite of probabilistic algorithms whose aim is to extract latent structure from large collection of documents [29,30]. These models all share the idea that documents are mixtures of topics and that each topic is a distribution over words [32].
Current topic modelling approaches are computationally efficient and seem to capture correlations between words and topics, but they have one main limitation [39–43]. This limitation is that they assume that words are generated independently of each other. This is known as the bag-of-words assumption [39]. In other words, topic models only extract unigrams for topics in a corpus. The bag-of-words assumption for current topic modelling approaches ignores the order of words [40]. This is an unrealistic simplification. In the past few years, several studies have been published that use topic models for the problem of aspect-based sentiment analysis to overcome this limitation [23,39,40,42,44]. Wallach [39] developed a bigram topic model based on the hierarchical Dirichlet language model, using a hierarchical Bayesian model that integrates bigram- and topic-based techniques to document modelling. Wallach’s model does not consider unigram aspects and always generates bigrams. Griffiths et al. [44] proposed a model named the LDA collocation model. It introduces a new set of parameters to decide whether to generate a unigram or a bigram aspect, whereas their model does not always generate a reasonable topic for a word or phrase. Wang et al. [42] improved the LDA collocation model to make it possible to decide whether to form an
Motivated by these observations, we follow this promising line of research. This article is based on the assumption that a topic model considering unigrams and phrases jointly with extracting sentiments is more realistic and more useful in applications. Several integrated models of topics and sentiments have been proposed [26–28,33,45–47]. These models extend from the basic topic models’ idea that an opinionated document is a mixture of topics and a topic is a distribution over words. Topic-sentiment model (TSM) [26] is one of the first proposed approaches which jointly models the mixture of topics and sentiments for documents. There are several differences between TSM and our model. TSM is based on probabilistic latent semantic indexing (PLSI) [30] and has shortcomings on inferencing for new documents and overfitting the data. Our LDA-based model overcomes this. Also, TSM represents sentiment through a language model separately from topics, while SAM considers a topic-sentiment pair as a single unit for the language model. Titov and McDonald [19] proposed a multi-aspect sentiment model (MAS) which is claimed to extract topics that are representative of ratable aspects from user reviews and can obtain sentiment summary for each aspect. Our model differs from MAS in several ways: MAS is a supervised sentiment-topic model as it requires that every aspect is rated in training documents, while SAM is a weakly supervised model with minimum seed information being incorporated and does not use any labelled training data. MAS tries to extract sentiment summaries for each aspect, while our model can present the outcome in aspect, sentence or document level. Both the joint sentiment-topic model (JST) [27] and the aspect-sentiment unification model (ASUM) [28] shown in Figure 1 are highly similar to the SAM model. In these models, sentiments are unified with topics in a single language model. Both JST and ASUM are based on LDA and are fully unsupervised approaches including only minimum prior information for sentiment-aspect classification. JST views topics and sentiments from a document level, but ASUM limits the words in sentence level to come from the same language model. ASUM relaxes the assumption that the subsequent words in a document or a sentence have different aspects, but it does not utilise the structure of sentences in terms of word order, nor does it focus on the extraction of multi-word aspects from text data. Our model, SAM, relaxes the bag-of-words assumption for the extraction of unigram and

(a) JST model and (b) ASUM model.
3. LDA
Topic models are based on the assumption that documents are mixture of topics, where a topic is a probability distribution over words. A topic model is a generative model for documents where it specifies a probabilistic procedure based on probabilistic sampling rules that describe how words in documents might be generated on the basis of random variables [32]. LDA is one of the most popular topic models where its probabilistic procedure ties parameters of documents via a hierarchical generative model [29].
Figure 2 shows the graphical model of the LDA [29]. In this graphical notation, nodes are random variables and edges indicate conditional dependencies between variables. Shaded and unshaded variables indicate observed and latent (i.e. unobserved and hidden) variables, respectively, while plates refer to repetitions of sampling steps with the variable in the lower right corner referring to the number of samples [32].

Graphical representation of the LDA model.
Given a corpus with a collection of
Each document in the corpus is a sequence of
where

Definition of generative process in LDA.
4. SAM: joint sentiment-aspect detection model
4.1. SAM description
LDA is a hierarchical model that treats data as arising from a generative process on random variables to extract the topics. SAM is an extension of LDA that uses a mixture of multinomials to model both topics and sentiments. In order to model sentiments, we propose SAM as a joint sentiment-aspect model by adding a sentiment layer to the LDA model, where sentiment labels are associated with documents, aspects are associated with sentiment labels and words are united with both sentiment labels and aspects. SAM tries to extract latent aspects and sentiments simultaneously from reviews using word co-occurrences, frequencies and the order of words in each document. The proposed unsupervised approach can easily be transformed between domains or languages, where it can detect the polarity of text data at document, sentence or aspect level.
The proposed model tries to jointly extract latent sentiment-aspects from reviews by making use of information of documents as well as the order of words in each document. SAM model is similar to the LDA model in tying together parameters of different documents via a hierarchical generative model, but unlike the LDA model, it does not assume documents are ‘bag of words’ [29,40]. In other words, in LDA, the positions of individual words are neglected for topic inference, while SAM assumes that the topics of words in a document form a Markov chain, and that subsequent words are more likely to have the same topic or sentiment. Figure 4 shows the graphical representation of the SAM.

Graphical representation of SAM.
As shown in Figure 4, SAM is a four-layer hierarchical model, where sentiment labels are defined for review documents, aspects are associated with sentiment labels and the aspect of previous word, and words are related to sentiment labels and aspects. Assume that we have a dataset collection of

Formal definition of the generative process in SAM.
SAM, in addition to the two sets of random variables
Notations used in this article.
SAM: sentiment-aspect detection model.
With defining new sets of variables
The procedure in Figure 5 for generating a word
The hyperparameters
where
Based on the procedure in Figure 5, after model parameters have been determined, given a document
There are five sets of latent variables that we need to infer in SAM, that is, the per-document sentiment label–based aspect distribution
Before discussing the inference problem of our model, it is worth noting that for increasing the consistency of the model, we could add to more sets of associations
These two new associations define transition probabilities between sentiment labels and status variables, which means that a word and its previous could have the same sentiment label. Hence, a word has the option to inherit a sentiment label assignment from its previous word. Accordingly, we could substitute step 3(d)(i) in the generative process of Figure 5 as follows
4.2. Model inference
In order to estimate the distribution of
Letting the subscript
The pseudo-code for Gibbs sampling procedure of SAM is shown in Figure 6.

Gibbs sampling procedure of SAM.
In this method, we will initialise the count matrices by randomly assigning initial aspects, sentiment labels, the count matrices, and topic and sentiment assignment dictionaries. The Gibbs sampling procedure can be run until a stationary state of the Markov chain has been reached [48,49]. Markov chain samples are then used to approximate SAM parameters. The approximate probability of aspect
The approximate probability of sentiment label
The approximate probability of word
The approximate probability of word
The approximate probability of the status variable
5. Experimental results
In this section, we describe the evaluation of the proposed SAM model in a variety of settings and compare it with the JST and ASUM models both qualitatively and quantitatively. For the qualitative analysis, we present a number of aspects and sentiments generated by all models including SAM to show that the SAM results are more informative, coherent and better correlated with the features of an object. For the quantitative comparison, the sentiment classification accuracy at document level for SAM was compared with the accuracy results for JST and ASUM for two different languages. All the experiments reported in this article have been carried out on a PC with an Intel(R) Core(TM) i7-2600 CPU at 3.40 GHz and 8GB memory running on a Windows 7 operating system. The application of the SAM model was programmed using Java programming language with J2SDK version 1.8.0 as the development environment. In the following sections, sentiment seed lexicon, sentiment classification, data collection, evaluation measure and the major experimental results will be discussed.
5.1. Sentiment classification
The main strategy of this article is to infer the semantic orientation of documents based on positive and negative polarity labels. This task is also known as document-level sentiment classification [20,35]. In the field of product review mining, with the results of document-level sentiment classification, users would know the necessary information to determine which products to purchase and companies would know the response from their customers and the performances of their competitors [20]. This article examines the accuracy of document-level sentiment classification of customer reviews as the quantitative analysis. Specifically, SAM uses majority vote of sentence-level sentiment counts, such that it utilises the sentiment distribution π of the sentences in a review to assign a sentiment label for the review.
5.2. Data collection
We deployed datasets of reviews for two different languages, English and Persian, and for three different domains, movie reviews, electronic device reviews and restaurant reviews. The English movie review dataset [35],
Summary of review datasets.
5.3. Sentiment seed lexicon
We chose global opinion and evaluative words for the sentiment seed lexicon which can be treated as the paradigm for detecting the positive and negative sentiment orientation. These sentiment seed words are based on paradigms defined by Turney [20]. The seed lexicon consists of 20 positive and 20 negative sentiment words as shown in Table 3. The sentiment seed lexicon for Persian is the translated version of the English lexicon.
Sentiment seed lexicon.
5.4. Evaluation measure
The performance of the SAM is evaluated using the accuracy measure, which has been used by several other researchers working on aspect-based sentiment analysis and sentiment classification [27,28]. Accuracy is the proportion of true results, both true positives (TPs) and true negatives (TNs), in the population. TP means that a sentence with positive polarity is classified by the system correctly. Accuracy is computed based on Table 4 as in equation (12)
where the sum
Contingency table.
5.5. Applying status variable
Based on the observation that aspects are nouns and sentiments are adjectives or adverbs [2], in the model, we examine the combination of noun phrases [23], adjectives and adverbs from review sentences. We use several experimentally extracted part-of-speech (POS) patterns, which we introduce as heuristic combinations in Table 5. In this article, we focus on five POS tags: NN, JJ, DT, NNS and VBG, for nouns, adjectives, determiners, plural nouns and verb gerunds, respectively.
Heuristic combinations of POS patterns.
From Table 5, the heuristic combinations of the first row select the candidate aspects from the noun phrase patterns such as ‘

Sample review.
5.6. Comparative study
In our experiments, after preprocessing and extracting the sentences from the textual datasets, and after POS tagging, we used Gibbs sampling algorithm for all the models and ran the chain for 1000 iterations to produce a sample of latent variables for each of the experiments. Previous study has shown that topic models are not sensitive to hyperparameters and can produce reasonable results with a simple symmetric Dirichlet prior [27,39–43]. During Gibbs sampling, we used empirical values for the symmetric priors as given in Table 6. Distributions of each word in each aspect were then estimated by the models. We varied the number of extracted aspects in SAM, JST and ASUM for 1, 5, 10, 20, 50 and 80 aspects.
Hyperparameters setting.
5.6.1. Qualitative analysis
For the qualitative analysis, examples of top extracted words of the models are presented in Tables 7 and 8. Table 7 shows the example of aspects detected by the models SAM, JST and ASUM under positive sentiment label from ERestaurant’s reviews, and Table 8 shows the results under negative sentiment label. In addition to the words, these tables present the probability of the word regarding the aspect assignment and sentiment label. In all models, the aspects seem to fit to the ERestaurants review data. For example, words such as ‘barbecued’, ‘codfish’, ‘desserts’, ‘garlic’, ‘spice’, ‘lunch’ and ‘service’ are related to the ERestaurant’s reviews and extracted by the models. In terms of aspect sentiment, by examining each of the aspects in Tables 7 and 8, it is quite evident that the aspects under the positive and negative sentiment labels indeed bear positive and negative sentiments, respectively. For the words without sentiment, it can be found that these words appear in the context of both sentiment labels.
Example of aspects detected by the models SAM, JST and ASUM under positive sentiment label from ERestaurant’s reviews.
SAM: sentiment-aspect detection model; JST: joint sentiment-topic model; ASUM: aspect-sentiment unification model.
Example of aspects detected by the models SAM, JST and ASUM under negative sentiment label from ERestaurant’s reviews.
SAM: sentiment-aspect detection model; JST: joint sentiment-topic model; ASUM: aspect-sentiment unification model.
From the examples in Tables 7 and 8, it can be observed that SAM emphasises on extracting phrases and multi-word aspects along with unigram ones. Also for the assumption in SAM and ASUM, they extract more local topics and aspect-specific sentiment words. However, SAM overcomes ASUM shortcoming by considering word status and the structure of the sentences and positions of words in the documents.
5.6.2. Quantitative analysis
For the quantitative analysis, we compare the results of sentiment classification for the three models including SAM, JST and ASUM. All the models are evaluated using document-level sentiment classification accuracy. In SAM, a document is classified as positive if the positive sentiment sentence count is greater than negative sentences; otherwise, the document is labelled as negative. To determine the sentiment of a sentence, SAM uses sentiment distribution π of the sentence, such that the sentence is set to be positive if the probability of positive sentiment is higher or equal to the probability of negative sentiment, and vice versa. In contrast to SAM, JST and ASUM classify a document by viewing the document as a whole text and comparing the sentiment probabilities in the document level.
The models were run using different number of aspects, 1, 5, 10, 20, 50 and 80 in four datasets. When the aspect number is set to 1, all models, SAM, JST and ASUM, become very close to the standard LDA topic model with only

Sentiment classification accuracy results of SAM, JST and ASUM for EElectronics dataset.

Sentiment classification accuracy results of SAM, JST and ASUM for ERestaurants dataset.

Sentiment classification accuracy results of SAM, JST and ASUM for EMovies dataset.

Sentiment classification accuracy results of SAM, JST and ASUM for PElectronics dataset.
As can be seen from the figures, SAM has a significant improvement over other models in all the datasets and JST shows the lowest results among the models. Also, it can be observed in all three domains, where the number of aspects is equal to one, all the models are close together. This is because they act like classic LDA topic model.
From Figure 8, in the EElectronics, we can see that the sentiment classification accuracy for SAM, JST and ASUM increases when the number of aspects is getting large. Specifically, the JST model shows an accuracy value from 66.99% to 70.29% when the number of aspects changes from 1 to 10; the ASUM model attains an accuracy from 72.27% to 79.53% when the number of aspects changes from 1 to 20; and the SAM obtains the results from 72.27% to 86.79% when the number of aspects changes from 1 to 50. Similarly, for the ERestaurants, EMovies and PElectronics, we can see that the proposed model, SAM, has better performance over JST and ASUM. For the ERestaurants and PElectronics reviews, the curves are more smoothed, while in EMovies, all models attain the lowest results. Also, in EMovies, when the number of aspects increases, the model accuracy drops dramatically.
By assessing the results for EElectronics and PElectronics, it can be found that the results for EElectronics are slightly better. Using a complex script is the main challenge in Persian writing [21,51,52]. For example, one of the issues in Persian text mining is the wide variety of declensional suffixes. Another common problem of Persian text is word spacing. In Persian, in addition to white space as inter-words space, an intra-word space called pseudo-space separates word’s part [21,51,52]. Using white space or not using any space instead of pseudo-space is a challenge in Persian reviews. For example, in the sentence ‘این گوشی قابلیت تشخیص دستخط خوبی دارد. / This phone has good handwritten recognition ability’, the word ‘دستخط / handwritten’ is a word which uses pseudo-space and contains two other words ‘دست / hand’ and ‘خط / written line’. In this sentence, if the algorithm interprets the word ‘دستخط / handwritten’ united or separated, the feature space and the results could be different. These challenges affect the Persian sentiment classification accuracy [51].
By analysing Figures 8–11, the best sentiment classification accuracies of different settings for SAM, JST and ASUM models for four datasets are extracted in Table 9.
Best sentiment classification accuracy for each model in the datasets.
SAM: sentiment-aspect detection model; JST: joint sentiment-topic model; ASUM: aspect-sentiment unification model.
Table 9 shows each model in which the number of aspects obtains the best accuracy for the corresponding dataset. According to this table, comparing the domains, the highest performance among the models on average is found for the EElectronics review dataset. The reason is that topic model–based techniques perform better when the size of the training dataset is larger. The results of ERestaurants and PElectronics reviews are very close to EElectronics results, and the EMovies has the lowest results. When examining the datasets, it can be observed that the EMovies in size is the larger one but the poor results are because of many sentences in reviews that do not bear sentiment or opinion information. In Table 8, the lowest accuracy value is achieved by JST and the highest score is for SAM. Overall, from the results, it can be found that SAM outperforms other models in performance and that the aspects and sentiments generated by the SAM can significantly improve the performance over JST and ASUM.
5.7. Optimal number of aspects
From Table 9, it can be derived, on average, that when the number of aspects is between 10 and 50, the models obtain the best performance. Tables 10 and 11 show the sentiment classification accuracy for SAM, JST and ASUM in three domains with 10 and 50 aspects, respectively.
Sentiment classification accuracy in the datasets for each model with 10 aspects.
SAM: sentiment-aspect detection model; JST: joint sentiment-topic model; ASUM: aspect-sentiment unification model.
Sentiment classification accuracy in the datasets for each model with 50 aspects.
SAM: sentiment-aspect detection model; JST: joint sentiment-topic model; ASUM: aspect-sentiment unification model.
From Tables 10 and 11, it can be concluded that when comparing the models in 10 and 50 aspects, the optimum number of aspects for the SAM model is 50, where 10 aspects is the best number for JST and ASUM. The results with a different number of aspects show that 50 aspects per sentiment label capture more informative aspects and sentiments with few redundancies.
Finally, we can conclude that SAM for the problem of aspect-based sentiment analysis outperforms other models in all review datasets with multiple aspect settings. The bag-of-words assumption causes that the JST captures more global aspects from reviews, while ASUM assumes that each sentence represents an aspect which leads to a better model of aspect sentiment classification. In contrast, SAM releases the bag-of-words assumption by using word co-occurrences within sentence boundaries, assuming that each sentence represents one aspect and successive words may be from the same aspect to form a multi-word aspect. Therefore, SAM attains better effectiveness for detection of aspect and sentiment words.
This comparative evaluation shows that using an unsupervised approach for joint sentiment-aspect detection involving the assumption that the order of words matters, as well as the structure of review sentences and inter-relation information between words, and that giving more importance to multi-word aspects achieves promising performances.
6. Conclusion
In this article, we have presented an unsupervised language-independent model for an aspect-based approach towards the sentiment analysis task for review documents. This model, SAM, involves a probabilistic topic model which jointly detects aspects and sentiments from online reviews. SAM assumes that each sentence in a review contains an aspect and that consecutive words may form a phrase and have the same aspect. In other words, SAM is a probabilistic generative model which tries to detect aspects and sentiments from review documents by considering the underlying structure of a document. The proposed approach models the aspect distributions with a Markov chain and releases the assumption that the aspect distribution within a document is conditionally independent. SAM differs from previous studies in that it needs no training labelled data: uses word status in review sentences, deals with detecting aspects and sentiments simultaneously and extracts coherent words for aspects and sentiments. Our experimental results indicate that SAM is effective in performing the task of sentiment classification and outperforms other models that combine aspect and sentiment detection such as JST and ASUM.
There are several ways that the research described here can be extended. One direction is to further improve and refine the proposed model by incorporating domain knowledge for aspect and sentiment detection. Another way of improvement is to apply statistical techniques for learning the parameters of the model. In other words, the model can be developed to extract aspect and sentiment jointly from review documents by choosing the hyperparameters automatically. Another variant is to evaluate sentiment analysis systems in online social media domains such as Twitter to investigate the performance for large-scale data processing. Finally, we aim to build a language-independent sentiment summarisation system.
