A Study on Text Classification: Term Weighting Algorithm Analysis
Abstract
With the advancement of digital recording and storing technology, plus the huge growth of world wide web, people nowadays use digital texts instead of paper to write and record. In order to realize more text applications, the technology of text classification is gradually gaining attention recently. To achieve automatic text classification through machine learning, the related five technologies, including pre-processing, feature extraction, feature selection, term weighting and classification algorithm, are often discussed as well by many researches. In this paper, we are going to explore the impact of term weighting on text classification.
Term weighting is definitely a very important part of text classification. The calculated weight should directly reflect the importance of the term in entire text to allow machine learning to achieve the best classified result. We applied some common term weighting methods to several pre-defined datasets and conducted the experiments. Instead of intuitively considering that the value of weight represents how important it is, it turned out that the result shows the term actually may not as important as the high scored weight represents.
Kuan-Hua Tseng, Chun-Hung Richard Lin, Jain-Shing Liu, Chih-Ming Andrew Huang, Yue-Han Wang, "A Study on Text Classification: Term Weighting Algorithm Analysis," Journal of Internet Technology, vol. 22, no. 2 , pp. 311-325, Mar. 2021.
Full Text:
PDFRefbacks
- There are currently no refbacks.
Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314 E-mail: jit.editorial@gmail.com