Generally, Text mining indicates the process of extracting maximum-quality information from the text. Moreover, it is mostly exploited in applications such as text categorization, text clustering, and text classification and so forth. In recent times, the text clustering is considered as the facilitating and challenging task exploited to cluster the text document. Because of the few inappropriate terms and large dimension, accuracy of text clustering is reduced. In this work, the semantic word processing and Enhanced CSO algorithm are presented for automatic text clustering. At first, input documents are stated as input to the preprocessing step that provides the useful keyword for clustering and feature extraction. After that, the ensuing keyword is applied to wordnet ontology to discover the hyponyms and synonyms of every keyword. Then, the frequency is determined for every keyword used to model the text feature library. Since it comprises the larger dimension, the entropy is exploited to choose the most significant feature. Hence, the proposed approach is exploited to assign the class labels to generate different clusters of text documents. The experimentation outcomes and performance is examined and compared with conventional algorithms such as ABC, GA, and PSO.
R. Janani, S. Vijayarani,"Text document clustering using Spectral Clustering algorithm with Particle Swarm Optimization",Expert Systems with Applications, Volume 134, 15 November 2019, Pages 192-200.
S. Yang, G. Huang and B. Cai, "Discovering Topic Representative Terms for Short Text Clustering," IEEE Access, vol. 7, pp. 92037-92047, 2019.
Mohamed Atef Mosa,"Real-time data text mining based on Gravitational Search Algorithm Expert Systems with Applications", Volume 137, 15 December 2019.
Elizaveta K. Mikhina, Vsevolod I. Trifalenkov,"Pages 117-129Text clustering as graph community detection",Procedia Computer Science, Volume 123, 2018, Pages 271-277.
Caiyan Jia, Matthew B. Carson, Xiaoyang Wang, Jian Yu,"Concept decompositions for short text clustering by identifying word communities", Pattern Recognition, Volume 76, April 2018, Pages 691-703.
Mosa, M. A. (2017). How can ants extract the essence contents satellite of social net- works . LAP Lambert Academic Publishing ISBN: 978-3-330-32645-3 . Mosa, M. A. (2018). Swarm intelligence optimization techniques based multiple types of text summarization with its satellite contents. Knowledge-Based Sys- tems .
Mosa, M. A. , Hamouda, A. , & Marei, M. (2017a). Ant colony heuristic for user-con- tributed comments summarization. Knowledge-Based Systems, 118 , 105–114 .
Mosa, M. A. , Hamouda, A. , & Marei, M. (2017b). Graph coloring and ACO based sum- marization for social networks. Expert Systems with Applications, 74 , 115–126 .
Becker, H. , Naaman, M. , & Gravano, L. (2011). Selecting quality twitter content for events. In ICWSM (p. 11) . [10] Bollen, J. , Mao, H. , & Pepe, A. (2011). Modeling public mood and emotion: Twitter sentiment and socioeconomic phenomena. In ICWSM: 11 (pp. 450–453) .
Harabagiu, S. M. , & Hickl, A. (2011). Relevance modeling for microblog summariza- tion. ICWSM .
Hu, M. , & Liu, B. (2004). Mining and summarizing customer reviews. In Proceed- ings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168–177). ACM .
Wahiba Ben Abdessalem Karaa, "A New Stemmer to Improve Information Retrieval", International Journal of Network Security & Its Applications (IJNSA), vol. 5, no. 4, pp. 143-154, 2013.
Y. Sharafi,M . A. Khanesar and M. Teshnehlab," Discrete binary cat swarm optimization algorithm," 2013 3rd International Conference on Computer, Control & Communication (IC4), Karachi, 2013, pp. 1---6.
S. G. Wang, F. F. Zhou, and F.J. Wang, "Effect of inertia weight w on PSO-SA algorithm," International Journal of Online Engineering, vol.9, no. 6. pp. 87-91, 2013.
Neenavath Veeraiah and Dr.B.T.Krishna,"Intrusion Detection Based on Piecewise Fuzzy C-Means Clustering and Fuzzy Naive Bayes Rule",Multimedia Research, Volume 1, Issue 1, October 2018.
Renjith Thomas and MJS. Rangachar,"Hybrid Optimization based DBN for Face Recognition using LowResolution Images"Multimedia Research, Volume 1, Issue 1, October 2018.
Lewis DD (2015) Reuters-21578 text categorization test collection, http://www.daviddlewis.com/resources/testcollections/reuters 21578/.
The 20 newsgroup dataset for text clustering, http://www.csmining.org/index.php/id-20-newsgroups.html