Document Type: Original Article

Authors

Department of Foreign languages, Isfahan (Khorasgan), Islamic Azad University

Abstract

The present study sought to identify the similarities and/or differences between texts written by Iranian university students of English teaching major and those written by English natives in terms of syntactic complexity. To this end, an automated computational web tool, namely Coh-Metrix was used to scrutinize a corpus containing 83 text excerpts extracted from 10 dissertations written by Iranian Ph.D. students as well as a comparison corpus including 94 text excerpts selected from 10 Ph.D. dissertations written by English native speakers in terms of four specific measures representing syntactic complexity. The results indicated that among the four measures, Mean Number of Modifiers and Sentence Syntax Similarity functioned as distinctive factors differentiating between the first language (L1) and second language (L2) texts, whereas Left Embeddedness and Minimal Edit Distance were found to be similar between the two corpora. The findings may have several implications for EFL practitioners.

Keywords

1. Introduction

Over the past half century, a plethora of comparative studies has investigated the similarities and differences between the properties of L2 texts and those found in L1 writing of native English speakers. The purpose shared among such studies has been adapting L2 writing instructions to bridge the gaps between texts written by foreign/second language learners and those of English natives. In such examinations, comparisons have been made with regard to either global (macro) arrangements of ideas, discourse construction, cohesion, and coherence (e.g., Grabe & Kaplan, 1996; Hinkel, 1997, 1999; Indrasuta, 1988; Ramanathan & Kaplan, 2000) or textual (micro) features that have the usage of marking discourse organization and aiding in the development of cohesive and coherent prose (e.g., Connor & Johns, 1990; Crossley & McNamara, 2009, 2011; Field & Oi, 1992; Flowerdew, 2000; Hinkel, 1995, 2001; Johns, 1984; Johnson, 1992; Khalil, 1989; Mauranen, 1996; Reid, 1992; Swales, 1990).

As claimed by Halliday (1991), syntactic complexity is the only linguistic feature that could be considered a representative of both linguistic processing (system) and product (instance). Syntactic complexity, also named ‘syntactic maturity’ or ‘linguistic complexity’, is defined by Ortega (2003) as “the range of forms that surface in language production and the degree of sophistication of such forms” (p. 492).  In fact, higher levels of syntactic complexity represent a wider variety of sentence patterns, or increasingly more elaborate language (Foster & Skehan, 1996).

In spite of its importance, syntactic information of language production in corpus linguistics has not yet received adequate attention owing to the difficulty of extracting such information from corpora (Gilquin, 2003). Moreover, a plethora of limitations in the selection of either corpora or measures for analysis is easy to detect among the previous corpus-based studies on language production. The most impeding limitation was selecting different approaches to conceptualize syntactic complexity considering the plenitude of indices measuring the feature. Another salient drawback of many of the previous corpus-based studies which explored second language writing proficiency in terms of syntactic complexity was the absence of a native-speaker (NS) baseline while examining the performance of non-native speakers (NNS) (Foster & Tavakoli, 2009). As believed by Hinkel (2003), the careful comparison of the two groups’ (i.e., NS and NNS) performance can shed light on academic language issues that L2 learners are struggling with and can help teachers and material developers to adopt appropriate methods for addressing those problems.

The pedagogical need for following an effective L2 writing instruction could be a matter of great urgency in the context of English academic writing, where FL/L2 university students and language learners experience serious difficulties in their academic endeavors such as writing Ph.D. dissertations and research articles. Although several researchers have studied syntactic complexity in various parts of research articles such as Introduction (e.g., Jalilifar, 2010; Shirani & Chalak, 2016), Method (e.g., Lim, 2006), Results (e.g., Brett, 1994), and Discussion (e.g., Jalilifar, Hayati, & Namdari, 2012; Peacock, 2002), just a few studies have compared English academic texts (dissertations) written by Iranian writers and those written by English native speakers (e.g., Jalilifar et al., 2012). Nonetheless, these studies have been genre analyses focusing on the discourse moves of one or more sections of the Master (M.A.) dissertations or research articles. Ph.D. dissertations were not included so that one can fully delve into academic writing across students.

 

2. Literature Review

Based on Bachman’s (1990) conceptual model of language ability, syntactic complexity is a vital factor in second language assessment and therefore, is often used as an index of language proficiency and development status of L2 learners’ writing. Having perceived a variety of functional frameworks, several linguists (e.g., Bachman, 1990; Ferreira, 1991; Giv´on, 1991; Ortega, 2003) devoted a good deal of scholarly attention to characterize the concept of syntactic complexity. For instance, Ortega (2003) has given a definition of syntactic complexity as “the range as well as the degree of sophistication of the forms that surfaces in language production.” As asserted by Kyle (2016), the distinction between syntactic complexity and syntactic sophisticationmust be taken into consideration while operationalizing the syntactic pattern of any given text. Contrary to syntactic complexity, which represents the formal characteristics of syntax, syntactic sophistication shows the relative difficulty of learning particular syntactic structures (Bulté & Housen, 2012).

A wide range of syntactic complexity measures has surfaced in the second language writing development literature. Consequently, a plethora of second language writing development studies has been conducted to answer the question of how much the various syntactic complexity metrics are indices with validity and reliability regarding second language learners’ developmental level (e.g., Bardovi-Harlig & Bofman, 1989; Ferris, 1994; Henry, 1996; Larsen-Freeman, 1978, 2009; Ortega, 2003; Lu, 2011;; Wolfe-Quintero et al., 1998). Considering its importance, syntactic complexity has a crucial role in second language writing instruction and assessment and, thus, has received remarkable attention (Buckingham, 1979; Perkins, 1983). Nevertheless, studies that systematically made a comparison of syntactic complexity in performances of NS and NNS is scarce, with a few notable exceptions. For instance, in her quantitative analysis of 1,083 NS and NNS English academic texts, Hinkel (2003) found that advanced NNS students in the U.S. universities tended to overuse simple syntactic constructions. In a more recent study, Qi (2014) intended to find out to what extent and how the syntactic complexity was related to the proficiency of ENL, EFL, and ESL learners based on three highly comparable sub-corpora from the International Corpus Network of Asian Learners of English (ICNALE). Having included both global and specific complexity measures (e.g., subordination-based and coordination-based measures) in the study, he concluded that global complexity measures are stable indicators of proficiency levels.

While there is no doubt that previous studies have offered very useful insight regarding the links and gaps between texts written by NS and NNS in terms of syntactic complexity, results derived from most of them need to be interpreted with caution owing to the limited number of syntactic complexity measures applied to small bulk of data. One of the factors that may have contributed to this situation is the absence of computational tools for automating syntactic complexity analysis of second language writing.

Computational tools have begun to provide a more available and theoretically tangible approach for the quantitative assessment of texts as they expand and develop. Adopting a computational approach to scrutinizing various text features in general and syntactic complexity, in particular, made it possible for researchers to classify the differences between L1 and L2 writers on the basis of surface level features used in their texts. Lu (2010), for instance, designed a computational system, entitled L2 Syntactic Complexity Analyzer (SCA), for automatic measurement of syntactic complexity of English writing samples written by college-level English learners analyzing 14 various measures. Taking advantage of the availability of the newly developed analyzerby Lu (2010), Ai and Lu (2013) designed a computational system to automate the analysis of syntactic complexity of writing samples produced by college level L2 English learners applying a comprehensive set of 10 syntactic complexity measures. Several years ago, however, an online computational tool, namely Coh-Metrix, had been developed by Graesser et al. (2004) for evaluating various text features including syntactic complexity. Being capable of producing as many as 108 indices related to different linguistic features representing descriptive qualities, syntactic complexity, lexical sophistication, and cohesion. The tool has been validated throughout a number of other studies distinguishing text types (e.g., Crossley et al., 2007; McCarthy et al., 2006; McCarthy et al., 2007).

Utilizing Coh-metrix in a contrastive analysis examining the linguistic similarities and differences between L1 and L2 essays, Crossley and McNamara (2011) investigated the potential for linguistic features regarding lexical sophistication, text cohesion, and syntactic complexity to distinguish between texts written by high intermediate and advanced L1 and L2 writers. The corpus comprised four L2 sub-corpora of essays written by English learners (university students in their twenties) from four language backgrounds: German, Finnish, Czech, and Spanish and a sub-corpus including L1 essays. The results of the study by considering the number of words before the main verb, as the only proxy measured for syntactic complexity, showed a significant difference between the L1 group and the Finnish and Czech groups; however, no difference was reported between the Spanish and German groups. These findings demonstrated that the L1 texts contained statements that had significantly more words before the main verb than Finnish and Czech essays. Overall, the results of Crossley and McNamara’s (2011) study suggested that some features of L2 writing (such as syntactic complexity) may not be cultural, but rather may depend on L2 learners’ proficiency level.

In sum, although several studies have used a native baseline to examine non-natives’ performance in second language writing (e.g., Reid, 1992; Ferris, 1994; Crossley & McNamara, 2009, 2011), studies that specifically compare syntactic complexity in NS and NNS students’ academic writing are still a rarity. Accordingly, the current study aimed to explore the similarities and differences between Ph.D. dissertations written by Iranian university students and those of English native speakers in terms of syntactic complexity. To this end, the following research questions were formulated.

  1. Are there any significant differences between Ph.D. dissertations written by English native speakers and those written by Iranian EFL writers in terms of syntactic complexity?
  2. Which syntactic complexity measures discriminate between Ph.D. dissertations written by English native speakers and those written by Iranian EFL writers?

 

3. Methodology

3.1. Design of the Study

The current investigation is a corpus-based comparative study on syntactic complexity in academic texts (Ph.D. dissertations). To address the research questions, a quantitative analytical approach involving both descriptive and inferential analyses was adopted. Having operationalized syntactic complexity as four major syntactic components; namely Left Embeddedness, Mean Number of Modifiers, Minimal Edit Distance, and Sentence Syntax Similarity, the study compared two corpora written by English native speakers and Iranian Ph.D. students to accentuate the similarities and differences thereof. 

 

3.2. Corpus

Two corpora were employed to address the questions posed in the current study; the main corpus and a comparison corpus. The main corpus was comprised of 83 text excerpts extracted from 20 dissertations (containing 46757 words) written from 2011 to 2014 by Iranian Ph.D. students of Islamic Azad University (IAU), Isfahan (Khorasgan) Branch whose major was English teaching. Of all the chapters, Chapter Five was targeted for analysis because the other chapters were supposed to include paragraphs which were not reputable to be composed by the students themselves. Due to the dissertations’ thematic structure, Chapter Five is often sub-divided into other headings such as Overview, Discussion, Conclusion, and Implications, each served as a text excerpt.

The comparison corpus included 94 text excerpts (containing 16262 words) written in a similar time span by English native speakers from the United States and the United Kingdom. Since the two countries are two of the most internationalized countries in the world, the researchers made sure that the dissertations had been written by native speakers  requesting this information via e-mail.

The dissertations written by Iranian Ph.D. students were selected randomly from among all L2 dissertations registered from 2011 to 2014 at Islamic Azad University (IAU) of Isfahan, Khorasgan Branch, in the field of English-language teaching. To promote the homogeneity of the two corpora, the L1 corpora were selected from among Ph.D. dissertations written by native English speakers in the field of applied linguistics. Having accessed the soft copy of the two corpora, the texts were cleaned and formatted removing oddities such as the odd foreign (non-English) letters or strings of mathematical symbols, as well as removing pictures, charts and diagrams. The texts were finally converted into Coh-Metrix-readable format as files with txt extension by a Coh-Metrix-team recommended software, namely TextPad.

 

3.3. Data Collection Procedure

Coh-Metrix (version 3.0, 2013), an automated computational web tool, was selected to analyze the syntactic pattern of the two corpora. In addition to 11 descriptive indices helping users to interpret patterns of data, the tool computes as many as 97 indices representing various linguistic features of a text such as readability, easability, lexical sophistication, cohesion, and syntactic complexity. Concerning syntactic complexity of a text, Coh-metrix computes seven indices including one index measuring the mean number of words before the main verb entitled ‘Left Embeddedness’; one index evaluating the mean number of modifiers per noun phrase entitled ‘Mean Number of Modifiers’, three indices estimating the minimum editorial distance score for words, part of speech tags, and lemmas called ‘Minimal Edit Distance’; and two indices measuring the portion of intersection tree nodes across all adjacent sentences as well as between all combinations across paragraphs called ‘Sentence Syntax Similarity’.

Before the inclusion of the syntactic complexity measures enumerated above in the final analysis, collinearity between them was assessed so that the potential model power is not wasted. Among all the seven indices, Left Embeddedness, Mean Number of Modifiers (per noun phrase), Minimal Edit Distance for part of speech, and Sentence Syntax Similarity for adjacent sentences, were the indices that met the collinearity assumption and were , therefore, retained in the final analysis of the data. 

 

3.4. Data Analysis Procedure

To examine whether there existed any significant difference between the L1 and L2 texts in terms of syntactic complexity, Multiple Analysis of Variance (MANOVA) was performed. Moreover, a common approach applied in many previous researches namely Discriminant function Analysis (DFA), to explore the distinguishing features between different texts (e.g., Biber, 1993; Crossley & McNamara, 2009; Crossley & McNamara, 2011), was used to determine the syntactic complexity measures distinguishing texts written by Iranian writers from the L1 texts.

 

4. Results

4.1. Results of Descriptive Analysis

The first research question was intended to investigate whether or not there was any significant contrast between the L1 and L2 texts in terms of syntactic complexity. To answer the question, first, the data were analyzed descriptively. Table 1 presents the descriptive statistics of the four indices representing syntactic complexity in texts written by both natives and Iranian EFL learners at Ph.D. levels. 

Table 1.

Descriptive Statistics of the Syntactic Complexity Measures in the L1 and L2 Texts

Variable

L1/L2

Min

Max

Mean

SD

Skewness

Kurtosis

Left Embeddedness

L2

.400

4.333

1.939

.657

1.361

1.663

L1

.545

3.273

1.732

.551

.412

.664

Mean Number of Modifiers

L2

.634

1.426

.969

.162

.497

-.024

L1

.509

1.519

.832

.152

1.074

1.989

Minimal Edit Distance

L2

.521

.954

.780

.074

-.788

1.164

L1

.511

.933

.771

.079

-1.316

1.624

Sentence Syntax Similarity

L2

.025

.171

.069

.024

1.043

1.881

L1

.020

.107

.059

.015

.279

.945

Note. Min = Minimum; Max = Maximum; SD = Standard Deviation

 

As depicted in Table 1, the skewness and kurtosis values for all the data sets were within the range of -2 to +2, indicating the normality of all sets of data on a descriptive level (Tabachnick & Fidell, 2007). As displayed in Table 1, the L2 texts, on average, contained more words before the main verb (M=1.939) compared to the L1 texts (M=1.732). In addition, the average number of modifiers (per noun phrase) was greater in L2 texts (M=.969) compared to the L1 texts (M=.832). Furthermore, the average score estimated for the Minimal Edit Distance index was found to be slightly higher in the L2 texts (M=.780) in comparison with the L1 texts (M=.771). Finally, as displayed in Table 1, the L2 writers, on average, used more similar syntactic structures (M=.069) than the L1 writers of English did (M=.059).

 

4.2. Results of Inferential Analysis

To address the first research question, a one-way MANOVA was conducted after checking the fundamental assumptions required to report valid results. As the first assumption, it is assumed that the data represent a sample from a multivariate normal distribution. Due to the fact that multivariate normality is a particularly tricky assumption to test for, the normality of every dependent variables (indices related to syntactic complexity) for each group of the independent variable (L1/L2 texts) was used as the best guess as to whether there is multivariate normality (see Table A1 in the Appendix). Another assumption required to use MANOVA was that there should be no multi-collinearity between the dependent variables. That is, the correlation between the dependent variables should be low to moderate. Based on the results, it was ensured that no index pair correlated above r=>.70 (see Table A2 in the Appendix). The other assumption was the homogeneity of variance-covariance matrices. This assumption was checked using Box's M test of equality of covariance as well as Levene's test of homogeneity of variance (see Tables A3 and A4 in the Appendix).

Since all assumptions were satisfied, a one-way MANOVA examined the significance of the difference between the L1 and L2 texts in terms of a linear combination of the four indices representing syntactic complexity. The results are displayed in Table 2 below.

 

Table 2.

Multivariate Tests’ Results for the Syntactic Complexity Measures

Effect

Value

F

Hypothesis df

Error df

Sig.

Partial Eta Squared

Intercept

Pillai's Trace

.995

8107.652

4.000

172.000

.000

.995

Wilks' Lambda

.005

8107.652

4.000

172.000

.000

.995

Hotelling's Trace

188.550

8107.652

4.000

172.000

.000

.995

Roy's Largest Root

188.550

8107.652

4.000

172.000

.000

.995

 

Group

Pillai's Trace

.221

12.301

4.000

172.000

.000

.222

Wilks' Lambda

.779

12.301

4.000

172.000

.000

.222

Hotelling's Trace

.283

12.301

4.000

172.000

.000

.222

 

 

 

 

 

 

 

Roy's Largest Root

.283

12.301

4.000

172.000

.000

.222

 

 

 

As shown in Table 2, the test was found to be significant (Wilk’s Λ = .779, F (4, 172) = 12.301, p < .001, multivariate η² = .222). This significant F indicated that there were significant differences between L1 and L2 texts on a linear combination of different indices representing syntactic complexity. The value estimated as Multivariate Partial Eta Squared value (.222) indicated that approximately 22% of the multivariate variance of the dependent variables was associated with the group factor (L1/L2 texts).

Stepwise discriminant function analysis (DFA) was used to answer the second research question intended to determine the syntactic complexity measures discriminating between the L1 and L2 texts. Discriminant analysis was actually used to determine the best predictors of whether a text is written by Ph.D. students who are native speakers of English or Iranian Ph.D. students. Table 3 depicts the best predictors based on different steps regarded in the DFA model.

 

Table 3.

Step

Entered

Wilks' Lambda

Statistic

df1

df2

df3

Exact F

Statistic

df1

df2

Sig.

1

Mean Number of Modifiers

.838

1

1

175.000

33.758

1

175.000

.000

2

Sentence Syntactic Similarity

.796

2

1

175.000

22.333

2

174.000

.000

 Variables Retained in the DFA Model

 

 

 

 

 

As displayed in Table 3, among the four indices included in the DFA model, Mean Number of Modifier and Sentence Syntactic Similarity were retained and the other two (Left Embededness and Minimal Edit Distance) were removed from the model. According to the results in Table 3, Mean Number of Modifiers was the best single predictor and Sentence Syntactic Similarity was the next-best one. Table 4 presents the multivariate tests’ (Wilks’ lambda) results for the indices retained in the DFA model.

 

Table 4.

Wilks’ Lambda Test on Retained Variables in the DFA Model

Step

Number of Variables

Lambda

df1

df2

df3

Exact F

Statistic

df1

df2

Sig.

1

1

.838

1

1

175

33.758

1

175.000

.000

2

2

.796

2

1

175

22.333

2

174.000

.000

 

As illustrated in Table 4, the model was a good fit for the data with just one predictor (Mean Number of Modifiers) , Wilk’s Λ = .838, F (1, 175) = 33. 758, p < .001,  or with two predictors (Mean Number of Modifiers and Sentence Syntactic Similarity, Wilk’s Λ = .796, F (2, 174) = 22.333, p < .001.

 

5. Discussion

The present study aimed to identify whether there was any significant difference between Ph.D. dissertations written by English L1 and Iranian writers in terms of syntactic complexity. Additionally, as its secondary aim, the study sought to explore how different syntactic complexity measures could differentiate the L1 and L2 texts. As mentioned earlier, among all the Coh-Metrix indices measuring syntactic complexity, four indices (i.e., Left Embeddedness, Mean Number of Modifiers, Minimal Edit Distance, and Sentence Syntax Similarity) were included in the final analysis. Quantitative analysis of the data highlighted a significant difference between the L1 and L2 texts based on a linear combination of the syntactic complexity measures. Accordingly, it was revealed that the Ph.D. dissertations written by English L1 differed significantly from those written by Iranian university students in terms of syntactic complexity. Notwithstanding, the differences in conceptualizing the syntactic complexity measures between the current study and Ai and Lu’s (2011) corpus-based study of syntactic complexity in NNS and NS university students’ writing, both of the studies found a significant difference between the L1 and L2 texts in terms of syntactic complexity.

The research findings, generally, seem to echo the findings of the research conducted by Qi (2014). To further develop the scope of  the corpus-based research, Qi (2014) investigated the syntactic complexity of texts written by EFL learners, ESL learners, and English native speakers and concluded that “global syntactic complexity measures as well as subordination-based complexity measures” are capable of differentiating between the three groups of learners.

To determine the syntactic complexity measures distinguished the L1 and L2 texts, discriminant function analysis was carried out and the results indicated that among the four indices representing syntactic complexity, Mean Number of Modifiers and Sentence Syntax Similarity contributed meaningfully to discriminate between the L1 and L2 texts; however, the former played a more prominent role in differentiating between the texts in comparison with the latter.

Regarding the Mean Number of Modifiers index, the result of the study showed that the average number of modifiers in L2 texts was greater than the L1 texts. The following samples extracted from the two corpora illustrate the different distribution of the index in the L1 and L2 texts. Modifiers are marked in italics.

 

Text 1 written by an Iranian student:

Also, the general writing quality was assumed to be a combination of accuracy, complexity, and fluency of the samples the participants had developed. Lastly, the participants of the study may have had a quite different writing quality if they had been assigned topics which are descriptive, explanatory, argumentative, or other types.

 

Text 2 written by an English native speaker:

This connection with teachers, students, and classrooms, however, is typically not the main focus for most researchers. One way to make stronger connections between the research and language teaching/learning might be to create a new role that is situated between corpus-based research and the classroom.

As illustrated in the above examples, using modifiers to describe a noun/noun phrase or to make its meaning more specific is much more prevalent among Iranian L2 writers than their native counterparts. Acknowledging the fact that native speakers of any given language are more proficient in using the language in comparison with their non-native counterparts, this finding contradicted the results of Parkinson & Musgrave’s (2014) examination on pre-modification and post-modification of noun phrases in academic writing which testified to the ascendancy of texts written by more proficient writers over those written by lower proficient ones in terms of using modifiers. The finding also seems to be inconsistent with the previous studies carried out using Coh-Metrix (e.g., Crossley & McNamara, 2014; Guo, Crossley, & McNamara, 2013; McNamara, Crossley, & McCarthy, 2010) which found that essays with more words before the main verb (including nominal subjects), as well as essays including more modifiers per noun phrase, are more likely to have been written by more proficient learners.

The discrepancy between the present study’s findings and those of the previous studies could be attributed to the existence of a benchmark in Iranian academic context for writing Ph.D. dissertations. Being provided with well-structured samples of academic writing, Iranian L2 writers generally follow syntactic structures used by professionals in the field of academic writing. This may account for the increased use of modifiers among Iranian writers. As Biber, Gray, & Poonpon (2011) suggested, phrasal elaboration, particularly noun phrase elaboration, is a key feature of academic writing. Moreover, the fact that the main corpus of the study was comprised of texts written by students studying at a higher level of English education (Ph.D. level) could explain the disagreement between the current study’s finding and those of the previous ones. According to Biber et al.’s (2011) writing development hypothesis, as writers become more proficient, their essays tend to be more characteristic of academic writing and, therefore, their texts include more prepositional phrases as modifiers.

Further support for this finding could lie in the fact that people often expend more effort to be understood while communicating in a second or foreign language. Taking the attributive nature of modifiers into account, Greenbaum and Quirk (2010) pointed out that L2 writers use modifiers as an influential strategy not only to highlight different text units but also to provide extra information required to clarify their intended meaning. 

Sentence Syntax Similarity was found to be the secondary syntactic complexity measure contributing to differentiate between the L1 and L2 texts. This index was revealed to be significantly greater in the L2 texts than the L1 ones. In other words, dissertations written by Iranian Ph.D. students included a lower proportion of intersecting syntactic nodes among adjacent sentences and, as a result, a higher degree of consistency and uniformity of syntactic constructions in the text. In the example texts printed below, Text 1 written by an Iranian Ph.D. student recorded a higher level of Sentence Syntactic Similarity (.074) than Text 2 written by an English native speaker (.036).

 

Text 1 written by an Iranian student:

As far as words are concerned, it should be highlighted that explicit vocabulary instruction without tasks does not lead to observably sufficient gains at the end of an individual class. It is essential newly learnt target words be practiced through either receptive and/or productive tasks following explicit instruction.

 

Text 2 written by an English native speaker:

There is considerable risk in making overly broad generalizations from the present study for several reasons. There is no underlying ‘truth’ that is waiting to be found in a study of this kind. What emerges from the data are reflections and insights of these particular teachers within this particular setting, filtered through my own particular research agenda and methodology.

Given the Iranian students’ lower level of proficiency in English compared to their native speaker counterparts, this finding can easily be explained. The significance of the finding may also be highlighted considering the results of the study carried out by Crossley and McNamara (2014) which indicated that the Sentence Syntactic Similarity index reflects a decrease in values for longitudinal growth in English writing proficiency.

Regarding the other two indices (i.e., Left Embeddedness and Minimal Edit Distance), the data analysis results showed significantly similar mean values for the two corpora. As the following two samples extracted from the corpora illustrates, the two texts contained an approximately similar mean number of words (shown in italics) before the main verb of the main clause in sentences. Additionally, both texts recorded approximately similar values (.079) as the minimum editorial distance scores for part of speech tags.

 

Text 1 written by an Iranian student:

The findings of this study have implications for the teaching of Arabic, but these implications apply not only to the classroom practices of the teachers, but also to the support that families can provide for young learners and ways that language learning can be integrated into the school community as a whole. Further, these findings highlight the importance of establishing routes to proficiency that meet the needs of diverse learners and of the society that stands to gain from their proficiency. In the previous chapter, I discussed four patterns from the collected data, each of which may powerfully influence learners ’ investment in Arabic language learning: construction of Arabic as heritage in families, language and literacy in the learning of Arabic, the role of religion in Arabic language learning, and learning Arabic in local and global context. I will follow these themes in suggesting implications for the findings of this study.

 

Text 2 written by an English native speaker:

Analyzing the respondents’ answers to the questionnaires, it can be generalized that 72.6% of the Iranian teacher educators disagreed with the impact of their learning experience on their teacher education at university. In other words, they believed that teachers’ prior learning experience hasn’t influenced their pedagogical knowledge. Moreover, they disagreed with the positive impact of teacher training courses on their current practices and they called those courses impractical. This can prove their belief on the impact of prior teacher education courses on their current teaching practice.

 

6. Conclusion

Adopting a corpus-based cross-sectional research design, this study presented a comprehensive comparison of syntactic complexity in English L1 and Iranian Ph.D. students’ academic writing. Detailed analysis of the data revealed a significantly higher average for using modifiers in L2 texts in comparison with the L1 text. Furthermore, it was speculated that Iranian L2 writers’ lower level of English proficiency compared to their English native counterparts resulted in sentences with higher degrees of similarity in terms of syntactic structure. Concerning the other two indices related to syntactic complexity (i.e., Left Embeddedness and Minimal Edit Distance), a similar distribution was found in the L1 and L2 texts written at Ph.D. levels.

Taking the findings of the current study into account, L2/FL advanced writing pedagogy may benefit from a central focus on the specific measures which differ significantly between texts written by English NS and NNS. Teachers can emphasize the function of these measures (English modifiers, for instance) more explicitly while teaching grammar in classrooms. Depending on the classroom level, needs and requirements, teachers are able to make these features more overt or covert throughout the language course. It might make it possible, then, that L2/FL learners try to approximate to NS’s writing quality more effortlessly. This can be in line with more cliché-avoidance and less pattern practice while writing (a Ph.D. dissertation, for example).

The findings also can shed light on Coh-Metrix which offers significant implications to non-native English teachers, students, and researchers whose academic successes and careers depend on the quality of the texts they produce in English.  Needless to say, affording specific metrics, it provides nonnative English teachers a deeper understanding of stylistic and linguistic features comprising L2 texts and helps them to evaluate and determine the quality of academic writing assignments using a more objective human judgment. To nonnative EFL/ESL students Coh-Metrix can serve as a useful text modelling resource whereby they learn about discourse features like syntactic complexity, co-reference, deictic elements of spatiality and temporality, lexical diversity, as well as other essential lexical characteristics. Such measures of language sophistication can help students to identify the vital factors dominating more complex uses of linguistic features that have a great bearing on L2 text production and comprehension. To researchers, computational tools like Coh-Metrix can potentially provide certain sign posts by which nonnative researchers can improve their L2 text production models. Consequently, they can manage stylistic problems which are a great impediment to publishing their papers in prestigious academic journals.

Finally, it is worth mentioning that several limitations and delimitations such as employing a limited corpus in terms of source and size as well as a specific writing genre (academic writing) will inevitably limit the degree to which generalizations can be drawn from the data. Further studies, therefore, are recommended to be carried out utilizing Coh-Metrix to investigate syntactic characteristics of texts in other writing genres, as well as examining other textual features important to academic writing such as cohesion and lexical sophistication.

 

Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford etc.: OUP.

Bardovi-Harlig, K. & Bofman, T. (1989). Attainment of syntactic and morphological accuracy by advanced language learners”. Studies in Second Language Acquisition, 11(1), 17-34. Retrieved from  https://doi.org/10.1017/S0272263100007816

Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8, 243-257. Retrieved from  https://doi.org/10.1093/llc/8.4.243

Biber, D., Gray, B., & Poonpon, K. (2011). Should we use  the characteristics of conversation to  measure grammatical complexity in L2 writing development? TESOL Quarterly, 45(1),          5-35. Retrived from

https://doi.org/10.5054/tq.2011.244483

Buckingham Jr, H. W. (1979). Linguistic aspects of lexical retrieval disturbances in the posterior fluent aphasias. Academic Press.

Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity (pp. 21-46). Amsterdam/Philadelphia, PA: John Benjamins.

Brett, P. (1994). A genre analysis of the results section of sociology articles. English for Specific Purposes, 13, 47-59.

Connor, U., & Johns, A. M. (1990). Coherence in Writing: Research and Pedagogical   Perspectives. Teachers of English to Speakers of Other Languages (TESOL), Inc., 1600 Cameron Street, Suite 300, Alexandria, VA 22314.

Crossley, S. A., & McNamara, D. S. (2009). Computationally assessing lexical differences in L2           writing. Journal of Second Language Writing, 17(2), 119-135.  Retrieved from  https://doi.org/10.1016/j.jslw.2009.02.002

Crossley, S. A., & McNamara, D. S. (2011). Understanding expert ratings of essay quality: Coh       Metrix analyses of first and second language writing. International Journal of Continuing   Engineering Education and Life-Long Learning, 21(3), 170-191. Retrieved from  https://doi.org/10.1504/IJCEELL.2011.040197

Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A          computational investigation of syntactic complexity in L2 learners. Journal of Second       Language Writing, 26, 66-79. Retrieved from  https://doi.org/10.1016/j.jslw.2014.09.006

Crossley, S. A., McCarthy, P. M., & McNamara, D. S. (2007). Discriminating between second   language learning text-types. In D. Wilson & G. Sutcliffe (Eds.), Proceedings of the 20th international Florida artificial intelligence research society (pp. 205-210). Menlo Park,      California: AAAI Press.

Gilquin, G. (2003). Causative get and have: so close, so different. English Linguistics, 31 (2), 125-148.

Giv ́on,  T. (1991). Markedness in grammar: distributional, communicative and cognitive correlates of syntactic structure. Studies in Language, 15(2), 335-370.

Grabe, W., & Kaplan, R. B. (1996). Theory and practice of writing: An applied linguistic perspective. New York: Longman.

Ferreira F. (1991). Effects of length and syntactic complexity on initiation times for prepared utterances. Journal of Memory and Language, 30(2),  2110-2233.

Ferris, D. R. (1994). “Lexical and syntactic features of ESL writing by students at different levels  of L2 proficiency”. TESOL Quarterly, 28 (2), 414-420. Retrieved from https://doi.org/10.2307/3587446

Foster, P. & Skehan, P. (1996). “The influence of planning and task type on second language performance”. Studies in Second Language Acquisition, 18 (3), 299-323. https://doi.org/10.1017/S0272263100015047

Field, Y., & Oi, Y. L. M. (1992). A comparison of internal conjunctive cohesion in the English essay writing of Cantonese speakers and native speakers. RELC Journal 23(1), 15-28.

Flowerdew, L. (2000). Investigating errors in a learner corpus. In Burnard & T. McEnery (Eds.), Proceedings in the teaching and language corpora conference, (pp. 145-154).

Foster, P. & Tavakoli, P. (2009). Native speakers and task performance: Comparing effects oncomplexity, fluency and lexical diversity. Language Learning, 59(4), 866–896.  Retrived from  https://doi.org/10.1111/j.1467-9922.2009.00528.x

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of        text on cohesion and language. Behavior Research Methods, Instruments, & Computers,       36(2), 193-202. Retrieved from https://doi.org/10.3758/BF03195564

Greenbaum, S & Quirk, R. (2010). A Student’s grammar of the English language. Harlow: Longman.

Halliday, M. A. K. (1991). Language as system and language as instance: The corpus as a theoretical construct. In J. Svartvik (Ed.) Directions in corpus linguistics: Proceedings of nobel symposium (pp. 61-78). Berlin: Mouton De Gruyter.

Henry, K. (1996). “Early L2 writing development: A study of autobiographical essays by university-level students of Russian”. The Modern Language Journal, 80(3), 309-326. https://doi.org/10.1111/j.1540-4781.1996.tb01613.x

Hinkel, E. (1995). The use of modal verbs as a reflection of cultural values. TESOL Quarterly, 29, 235-343.

Hinkel, E. (1997). Indirectness in L1 and L2 academic writing. Journal of Pragmatics, 27 (3), 360-386.

Hinkel, E. (1999). Objectivity and credibility in L1 and L2 academic writing. In E. Hinkel (Ed.), Culture in second language teaching and learning. (pp. 90-108). Cambridge: Cambridge University Press.

Hinkel, E. (2001). Matters of cohesion in L1 and L2 academic texts. Applied Language Learning, 12, 111–132.

Hinkel, E. (2003). Simplicity without elegance: features of sentences in L2 and L1 academic texts.  TESOL Quarterly, 37(2), 275-301. Retrieved from  https://doi.org/10.2307/3588505

Indrasuta, C. (1988). Narrative styles in the writing of Thai and American students. In A.C. Purves, Writing across languages and cultures: Issues in contrastive rhetoric. Newbury Park: SAGE, 206-226.

Jalilifar, A. R. (2010). Research article introductions: Sub-disciplinary variations in applied linguistics. The Journal of Teaching Language Skills (JTLS), 2(2), 29-55.

Jalilifar, A. R., Hayati, A. M., & Namdari, N. (2012). A comparative study of research article    discussion sections of local and international applied linguistic journals. The Journal of   Asia TEFL, 9(1), 1-29.

Johns, A. M. (1984). Textual cohesion and the Chinese speaker of English. Language Learning            and Communication, 3, 69–74.

Johnson, P. (1992). Cohesion and coherence in Malay and English. RELC Journal, 23(2),  Retrieved from  1-17. https://doi.org/10.1177/003368829202300201

Khalil, A. (1989). A study of cohesion and coherence in Arab college students’ writing. System   17(3), 359-371. Retrieved from https://doi.org/10.1016/0346-251X(89)90008-0

Kyle, K. (2016). Measuring syntactic development in L2 writing: fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication (Unpublished doctoral dissertation) , Georgia State University.

Larsen-Freeman, D. (2009). Adjusting expectations: The study of complexity, accuracy, andfluency in second language acquisition. Applied Linguistics, 30(4), 579-589.  Retrieved from https://doi.org/10.1093/applin/amp043

Lim, J. M. H. (2006). Method sections of management research articles: A pedagogically motivated qualitative study. English for Specific Purposes, 25, 282-309.

Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing.International Journal of Corpus Linguistics, 15(4), 474–496.  Retrieved from https://doi.org/10.1075/ijcl.15.4.02lu

Lu, X., & Ai, H. (2013).A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. Studies in Corpus Linguistics, 59, 249-264.  Retrieved from   https://doi.org/10.1075/scl.59.15ai

Mauranen, A. (1996). Discourse competence: Evidence from a thematic development in native and          non-native texts. In E. Ventola & A. Mauranen (Eds.) Academic writing: intercultural and         textual issues (pp. 195-230). Amestrdam: John Benjamins.  Retrieved from https://doi.org/10.1075/pbns.41.13mau

McCarthy, P. M., Lehenbauer, B. M., Hall, C., Duran, N. D., Fujiwara, Y., & McNamara, D. S.       (2007). A Coh-Metrix analysis of discourse variation in the texts of Japanese, American,     and British Scientists. Foreign Languages for Specific Purposes, 6, 46–77.

McCarthy, P. M., Lewis, G. A., Dufty, D. F., & McNamara, D. S. (2006). Analyzing writing styles     with Coh-Metrix. In G. Sutcliffe & R. Goeble (Eds.), Proceedings of the Florida Artificial Intelligence Research Society International Conference (FLAIRS).

McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 25-43.

Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492-518. Retrieved from  https://doi.org/10.1093/applin/24.4.492

Perkins, K. (1980). Using objective methods of attained writing proficiency to discriminate among holistic evaluations. Teachers of English to Speakers of Other Languages Quarterly, 14(1), 61-69.

Parkinson, J & Musgrave, J (2014). Development of noun phrase complexity in the writing of           English for academic purposes students. Journal of English for Academic Purposes.  14, 48-59. Retrieved from  https://doi.org/10.1016/j.jeap.2013.12.001

Qi, D. (2014). Syntactic complexity of EFL, ESL and ENL: evidence of  the international corpus network of Asian learners of English(Unpublished master’s thesis), National University of Singapore, Singapore.

Ramanathan, Vai & B. Kaplan, Robert. (2000). Genres, Authors, Discourse Communities: Theory and Application for (L1 and) L2 Writing Instructors. Journal of Second Language Writing, 9, 65-85. Retrieved from  171-191. 10.1016/S1060-3743(00)00021-7.

Reid, J. R. (1992). A computer text analysis of four cohesion device in English discourse by native and nonnative writers. Second Language Writing, 1, 79–107.

Shirani, S., & Chalak, A. (2016). A genre analysis study of Iranian EFL learners’ master theses     with a focus on the introduction section. Theory and Practice in Language Studies, 6(10),   1982-1987. Retrieved from https://doi.org/10.17507/tpls.0610.13

Swales, J. (1990). Non-native speaker graduate engineering students and their introductions:    Global coherence and local management. In U. Connor & A. Johns (Eds.), Coherence in     writing (pp. 189-207). Alexandria, VA: TESOL.

Wolfe-Quintero, K., Inagaki, S. & Kim, H. Y. (1998). Second language development in writing: Measures of fluency, accuracy, & complexity. Honolulu, HI: University of Hawaii Press.