Abstract
Exception mining in large datasets is an important task in traditional data mining with numerous applications in credit card fraud detection, weather prediction, intrusion detection, and cellular phone cloning fraud detection; among other applications. Sifting through the dynamic, unstructured, and ever-growing web data for outliers is more challenging than finding outliers in numeric datasets. Interestingly, existing outlier mining algorithms are restricted to finding outliers in numeric datasets leaving web outlier mining as an open research issue. Web outliers are web data that show significantly different characteristics than other web data taken from the same category. Although the presence of web outliers appears obvious, algorithms for mining them are currently unavailable. Secondly, traditional outlier mining algorithms designed solely for numeric datasets cannot be used on web datasets because they typically contain multimedia. This paper establishes the presence of outliers on the web called web outliers and proposes a general framework for mining them. A web outlier taxonomy is reported that supports the development of content-specific algorithms for mining web outliers. Finally, we propose the WCO-Mine algorithm for mining web content outliers. Experimental results demonstrate that WCO-Mine is capable of finding web outliers from web datasets.
Get full access to this article
View all access options for this article.
