The development of formulaic sequences in first and second language writing
Formulaic sequences are recognised as having important roles in language acquisition, processing, fluency, idiomaticity, and instruction. But there is little agreement over their definition and measurement, or on methods of corpus comparison. We argue that replicable research must be
grounded upon operational definitions in statistical terms. We adopt an experimental design and apply four different corpus-analytic measures, variously based upon n-gram frequency (Frequency-grams), association (MI-grams), phrase-frames (P-frames), and native norm (items in the Academic Formulas
List – AFL-grams), to samples of first and second language writing in order to examine and compare knowledge of formulas in first and second language acquisition as a function of proficiency and language background. We find that these different operationalizations produce different patterns
of effect of expertise and L1/L2 status. We consider the implications for corpus design and methods of analysis.