An automatic document clustering procedure is described which does not require the use of an inter-document similar ity matrix and which is independent of the order in which the documents are processed. The procedure makes use of an initial set of clusters which is derived from certain of the terms in the indexing vocabulary used to characterise the documents in the file. The retrieval effectiveness obtained using the clustered file is compared with that obtained from serial searching and from use of the single-linkage clustering method.
D.M. Murray, Document retrieval based on clustered files, Ph.D. thesis, Cornell University (1972).
2.
N. Jardine and C.J. van Rijsbergen, The use of hierarchic clustering in information retrieval, Information Storage and Retrieval7 (1971) 217 -240.
3.
G. Salton, Dynamic Information and Library Processing (Prentice-Hall , New Jersey, 1975).
4.
C.J. van Rijsbergen, Information Retrieval (Butterworths, London, 1979).
5.
M. Fritsche, Automatic clustering techniques in information retrieval, Report EUR 5051e , Commission of the European Communities, Luxembourg (1974).
6.
J.A. Hartigan, Clustering Algorithms (Wiley, New York, 1974).
7.
F.J. Rohlf, Fast algorithms for agglomerative clustering, Paper presented at the Meeting of the North American Branch of the Classification Society, April 1977. Abstracted in Classification Society Bulletin 4 (1977) 27.
8.
R. Sibson , SLINK: an optimally efficient algorithm for the single-link cluster method, Computer Journal16 ( 1973) 30-34.
9.
W.B. Croft , Clustering large files of documents using the single-link method, Journal of the American Society for Information Science28 (1977) 341-344.
10.
A.F. Harding and P. Willett, Indexing exhaustivity and the computation of inter-document similarity matrices, Journal of the American Society for Information Science31 ( 1980) 298-300.
11.
D. Defays , An efficient algorithm for a complete link method , Computer Journal20 (1977) 364-366.
12.
R.T. Dattola , Experiments with a fast algorithm for automatic classification, in: G. Salton (Ed.), The SMART Retrieval System (Prentice-Hall, New Jersey, 1971).
B. Litofsky, Utility of automatic classification systems for information storage and retrieval , Ph.D. thesis, University of Pennsylvania (1969).
15.
D. Crouch , A file organization and maintenance procedure for dynamic document collections, Information Processing and Management11 (1975) 11-21.
16.
C.J. van Rijsbergen and W.B. Croft, Document clustering: an evaluation of some experiments with the Cranfield 1400 collection, Information Processing and Management11 (1975) 171-182.
17.
R. Cody, The effect of document ordering in Rocchio's clustering algorithm, Journal of the American Society for Information Science24 (1973) 232-233.
18.
N. Jardine and R. Sibson, Mathematical Taxonomy (Wiley, London , 1971).
19.
W.T. Williams , G.N. Lance, M.B. Dale and H.T. Clifford, Controversy concerning the criteria for taxonometric strategies, Computer Journal14 (1971) 162-165.
J. MacQueen, Some methods of classification and analysis of multivariate observations, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1965 (University of California Press, 1969).
22.
G.N. Zhitkov , Classification of structural elements in analysing complex images, Automatic Documentation and Mathematical Linguistics , 4 (1970) 19-25.
23.
C.J. van Rijsbergen, Automatic classification in information retrieval, Drexel Library Quarterly14 (1978) 75-89.
24.
C.W. Cleverdon , J. Mills and E.M. Keen, Factors Determining the Performance of Indexing Systems (Cranfield College of Aeronautics, 1966).
25.
G. Salton , A. Wong and C.S. Yang, A vector space model for automatic indexing , Communications of the Association for Computing Machinery18 (1975) 613-620.
26.
G.W. Adamson and J.A. Bush, A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures, Journal of Chemical Information and Computer Sciences15 (1975) 55-58.
27.
W.B. Croft, Organizing and searching large files of document descriptions, Ph.D. thesis, University of Cam-bridge (1978).
28.
P. Willett , Document retrieval experiments using indexing vocabularies of varying sizes. Part II: Hashing, truncation, digram and trigram encoding of words, Journal of Documentation35 (1979) 296-305.
29.
R.D. Feinman and K.L. Kwok, Classification of scientific documents by means of self-generated groups employing free language, Journal of the American Society for Information Science24 (1973) 382-396.
30.
S. Schiminovich , Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm, Information Storage and Retrieval6 ( 1971) 417-435.
31.
R.T. Dattola , FIRST: Flexible information retrieval system for text, Journal of the American Society for Information Science30 (1979) 9-14.
32.
G. Herdan, The advanced theory of language as choice and chance ( Springer Verlag, Berlin, 1966).
33.
K. Sparck Jones, Automatic indexing, Journal of Documentation30 ( 1974) 393-432.