Lee, Yong-hun and Yu, JeeHee. 2017. A Random Forest Analysis of Can and May in Korean EFL Learners’ Writings. English Language and Linguistics 23.3, 81-100. It is predictable that the uses of English modal auxiliaries by English as a Foreign Language (EFL) learners differ from those by English as a Native Language (ENL) speakers. Following the studies in Deshors (2010) and Deshors & Gries (2014), Yoon and Lee (2016) adopted a corpus-based method and examined the uses of these two modal auxiliaries can and may by ENL speakers and Korean EFL learners. Utilizing the same corpus data used in Yoon and Lee (2016), this study investigates where the discrepancies were originated. This paper takes a random forest analysis and calculates the variable importance of each linguistic factor in the corpus data of each group. The analysis reveals which factors play a crucial role in the determination of alternations in each group. Through the analysis, the followings were observed: (i) Sense (deontic vs. epistemic vs. dynamic) played the most important role in the determination of can vs. may in both groups of speakers, (ii) Vendler’s classification, subject morpheme type, verb semantics, and animate type (of the subject) played essential roles in ENL speakers’ group, (iii) Subject person, subject morpheme type, animate type, and clause type played critical roles in Korean EFL learners’ group, and (iv) Korean EFL learners gave more importance to sense, subject morpheme type, animate type, and subject person whereas ENL speakers gave more importance to Vendler’s classification.
Key words: EFL writings, corpus, modal, random forest, variable importance