Enhanced New Words Recognition Based on Multi-level Semantic Vectors and Multi-task Learning Models

Jin Pan,
Yang Chen,
Chunlu Zhao,
Yang Liu,
Jie Chu,

Abstract


Neologism discovery is a basic task in natural language processing, and it is very important to improve the performance of various downstream tasks. In order to solve problems such as word segmentation errors easily caused by existing technologies and incomplete capture of word semantic information, this paper proposes an enhanced new words recognition based on multi-level semantic vectors and multi-task learning models, aiming to solve the problems of word segmentation errors, incomplete semantic capture and dynamic recognition in existing technologies. First, an improved hash algorithm is used to generate dictionaries. A multi-level time series model is used to identify potential neologism candidates and map them to a high-dimensional vector space to generate synthetic semantic vectors. Then, a context-semantic graph model is constructed to analyze the context compatibility of words, and the sentiment score and domain relevance score are calculated through the multi-task learning model. Finally, the comprehensive score is used to identify new words. Experimental results show that this method has significant advantages in accuracy, semantic understanding and application range.

Keywords


New word discovery, Word segmentation, Multi-tasking learning, Multi-level semantic vector

Citation Format:
Jin Pan, Yang Chen, Chunlu Zhao, Yang Liu, Jie Chu, "Enhanced New Words Recognition Based on Multi-level Semantic Vectors and Multi-task Learning Models," Journal of Internet Technology, vol. 26, no. 3 , pp. 327-335, May. 2025.

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com