A Supervised Named Entity Recognition Method Based on Pattern Matching and Semantic Verification

Nan Gao,
Zhenyang Zhu,
Zhengqiu Weng,
Guolang Chen,
Min Zhang,

Abstract


Named entity recognition is a basic task in the field of natural language processing and plays a pivotal role in tasks such as information extraction, machine translation, and knowledge graph construction. It has also received widespread attention in financial, biological and pharmaceutical industries. This paper proposes a method of weakly supervised learning to recognize the complex named entities (commonly composed of multiple small entity sequences, hereinafter referred to as CNEs) in the corpus, which makes it difficult to determine the boundaries of such entities. To improve the recognition accuracy, our method Masked-BiLSTM-CRF is proposed to separate the context semantic relationship determination from the entity boundary confirmation. This method is based on two aspects to solve the above problems: (1) Semantic model based on CNEs mask processing. Before training, the CNEs in the corpus will be masked, and then use the masked corpus training the semantic model through BiLSTM-CRF, which can verify whether the context semantics of the corresponding location entities are correct. (2) A weakly supervised CNEs boundary confirmation model based on sequential patterns. In the small sample data set, the target CNE candidate set is found by sliding window combined with sequence pattern matching, and then it is effectively screened and judged by the semantic understanding model obtained in (1). The experimental results show that compared with the named entity recognition method based directly on BiLSTM-CRF on the weakly-supervised named entity recognition in financial field, our proposed method improves F1-Score in the small data training sample set by nearly 9%, and it has some generalization ability.


Citation Format:
Nan Gao, Zhenyang Zhu, Zhengqiu Weng, Guolang Chen, Min Zhang, "A Supervised Named Entity Recognition Method Based on Pattern Matching and Semantic Verification," Journal of Internet Technology, vol. 21, no. 7 , pp. 1917-1928, Dec. 2020.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com