KSLI
첨부 파일 #1 이용훈-김지혜.pdf (1,421KB) (Down:0)
    2018년 3월: 이용훈(충남대) & 김지혜(한국교원대)
    

Experimental Data and Deep Learning: Subjecthood Tests in Korean

Yong-hun Lee (Chungnam National University)

Ji-Hye Kim (Korea National University of Education)

 

Nowadays, as the techniques in machine learning and deep learning develop, there have been some trials which apply those techniques in the study of language and natural language processing. This paper shows how deep-learning technique can be applied to experimental data so that the constructed language model can predict some interesting properties of natural languages.

          This paper takes the data sets in six subjecthood tests in Korean (Kim et al., 2015; Lee et al., 2015; and Kim et al., 2017) and makes a language model with a deep neural network (DNN). After converting the z-scores into the scores between 0 and 100, the scores are classified into 10 groups (0-10, 11-20, ..., 91-100). Then, a DNN model is constructed where the converted 10 groups of scores are the output category and the other linguistic features become input values. The input values include types of diagnostic tests, Case markers, SSC vs. MSC, presence/absence of diagnostics, animacy of the subjects, and so on. From the constructed model, predicted scores will be calculated for each sentence when such and such linguistic environments are satisfied. The goal of this paper is to demonstrate how to use DNNs to uncover some interesting properties of natural languages using experimental data and deep learning.

 

References

Kim, J., Y. Lee and E. Kim. 2015. Obligatory Control and Coordinated Deletion as Korean Subject Diagnostics: An Experimental Approach. Language and Information 19:75-101.

Kim, J., Y. Lee, and E. Kim. 2017. Honorific Agreement and Plural Copying as Korean Subjecthood Diagnostics: An Experimental Approach. Studies in Modern Grammar 93:119-144.

Lee, Y., E. Kim and J. Kim. 2015. Reflexive Binding and Adjunct Control as Subject Diagnostics in Korean: An Experimental Approach. Studies in Language 31:427-449.