详细信息
基于多信息资源的汉语复合词自动生成研究
On Automatic Generation of Chinese Compound Words Based on Multi-Information Resources
文献类型:期刊文献
中文题名:基于多信息资源的汉语复合词自动生成研究
英文题名:On Automatic Generation of Chinese Compound Words Based on Multi-Information Resources
作者:汪梦翔[1]
第一作者:汪梦翔
机构:[1]北京联合大学师范学院,北京100010
第一机构:北京联合大学师范学院
年份:2024
期号:4
起止页码:127-138
中文期刊名:语言文字应用
外文期刊名:Applied Linguistics
收录:;北大核心:【北大核心2023】;CSSCI:【CSSCI2023_2024】;
基金:教育部人文社科基金青年项目“省略型汉语动名超常搭配的语义组配机制及处理方法研究”(22YJC740073)资助。
语种:中文
中文关键词:多信息资源;复合词;自动生成;平行周遍规则;词向量
外文关键词:multi-information resources;compound words;automatic generation;parallel rule;word vector
摘要:本文探讨了基于多信息资源的汉语复合词自动生成方法,该方法旨在生成两类结果:一是生成已有的近义词,二是生成未登录的新词。两类结果的生成均涉及语素的替换。本文从语素层面入手,依据语言学中的平行周遍规则,首先界定可替换语素的取值范围;随后,结合语素词典的释义文本与How Net中语素间关系的知识,构建基于多信息资源的MIFSSM模型(多信息融合语义相似度模型),生成具有语义近似序列且能区分不同义项的近义语素集;最后,选用适当的组配规则对替换后的语素进行新的组合,并利用Chat GPT进行人机互动式评估。实验结果表明,在有限的预训练数据集下,该方法仍能有效且高质量地生成汉语复合词。
This paper explores the automatic generation of Chinese compound words based on multi-information resources.The method aims to generate two types of results:existing synonyms and previously unrecorded new words,both involving the substitution of morphemes.Starting from the morpheme perspective and relying on the parallel rule in linguistics,this paper first defines the range of replaceable morphemes.Subsequently,by integrating explanatory texts from morpheme dictionaries and knowledge of morpheme relationships from HowNet,it constructs the MIFSSM(Multi-Information Fusion Semantic Similarity Model)to generate sets of synonymous morphemes that possess semantic approximation sequences and can distinguish different senses.Finally,appropriate combination rules are selected to rearrange the substituted morphemes,and the generated results are evaluated through human-machine interactive assessments using ChatGPT.Experimental results demonstrate that,even with limited pre-trained datasets,this method can still effectively and efficiently generate high-quality Chinese compound words.
参考文献:
正在载入数据...