월례발표회 자료
2018년 3월: 이용훈(충남대) & 김지혜(한국교원대)
관리자 2018.03.25 876


Experimental Data and Deep Learning: Subjecthood
Tests in Korean



Yong-hun Lee (Chungnam National University)


Ji-Hye Kim (Korea National University of
Education)






Nowadays, as the techniques in
machine learning and deep learning develop, there have been some trials which
apply those techniques in the study of language and natural language
processing. This paper shows how deep-learning technique can be applied to
experimental data so that the constructed language model can predict some
interesting properties of natural languages.



This
paper takes the data sets in six subjecthood tests in Korean (Kim et al., 2015;
Lee et al., 2015; and Kim et al., 2017) and makes a language model with a deep
neural network (DNN). After converting the z-scores into the scores between 0
and 100, the scores are classified into 10 groups (0-10, 11-20, ..., 91-100).
Then, a DNN model is constructed where the converted 10 groups of scores are
the output category and the other linguistic features become input values. The
input values include types of diagnostic tests, Case markers, SSC vs. MSC,
presence/absence of diagnostics, animacy of the subjects, and so on. From the
constructed model, predicted scores will be calculated for each sentence when
such and such linguistic environments are satisfied. The goal of this paper is
to demonstrate how to use DNNs to uncover some interesting properties of
natural languages using experimental data and deep learning.






References


Kim, J., Y. Lee and E. Kim. 2015. Obligatory
Control and Coordinated Deletion as Korean Subject Diagnostics: An Experimental
Approach. Language and Information 19:75-101.



Kim, J., Y. Lee, and E. Kim. 2017. Honorific
Agreement and Plural Copying as Korean Subjecthood Diagnostics: An Experimental
Approach. Studies in Modern Grammar 93:119-144.

Lee,
Y., E. Kim and J. Kim. 2015. Reflexive Binding and Adjunct Control as Subject
Diagnostics in Korean: An Experimental Approach. Studies in Language
31:427-449.

×