Document Type: Original Article


Department of English Language and Literature, Payame Noor University, Iran


This study attempted to examine visual and verbal characteristics in Touchstone textbook series. For this purpose, four reading comprehension texts with similar topics were selected from the four Touchstone textbooks. Seven pictures accompanying the four texts were analyzed based on Kress and van Leeuwen’ssocial semiotics, and the four texts were analyzed with reference to Halliday’s systemic functional linguistics. The results depicted fairly high functionality of the visuals in Touchstone series as well as their humanistic communicative trends. Moreover, it was found out that the pictures supported the linguistic text, there by helping learners to comprehend the textual content. The findings also showed that increase in text difficulty made for the occurrence of material and relational processes. Lastly, the results of verbal analysis suggested that reading comprehension texts of Touchstone series chiefly describe real world experiences and actions rather than behaviors, thoughts, or feelings. Based on the outcomes, it can be concluded that the visuals are pertinent to the linguistic passages and help the learners to get a better understanding of the texts.


1. Introduction

The focus of this study is on reading comprehension visual and textual elements, which are of central importance in English as Foreign Language (EFL) textbooks. Clearly, embedding suitable visuals accompanying reading texts can improve comprehension and learning. According to Tahririan and Sadri (2013) “pictures convey information more efficiently and effectively than words do”(p.2). In other words, readability and comprehensibility of a book depends a lot on its pictures because pictures can convey detailed information in visual forms (Tahririan & Sadri, 2013).

Textbooks are commodities, cultural representations, and political objects (Shannon, 2010). For this reason, they may be considered as sources through which the reader can discover the method and the purposes of their production, the origin of their content, and the way teachers and students might use them. Sheldon (1988) argues that course books are in fact recognized as route maps by any EFL programs. Though judging a book based on its appearance does not sound sensible, learners’ perception of the books’ content may be affected by its appearance. As stated by Kress and van Leeuwen (1996) visual grammar cannot be separated from verbal grammar or any other grammar. Therefore, in the field of teaching and learning foreign languages, presenting proper visual aspect of the context is an important consideration.

English is practiced as a foreign language in Iran and many Iranians learn English through receiving formal instructions at high schools or at private language institutes where English textbooks with international publishers are taught. Nowadays, Touchstone series are one of the most widely used English textbooks in Iranian English teaching market. As such, this study built on a social semiotic perspective to examine the verbal and visual modes of Touchstone series as an example of popular textbooks for English language learners in Iranian EFL context. The theoretical approach taken in this study, then, is applied to two semiotic systems including the language (Halliday& Mathiessen, 2004) and the image (Kress & van Leeuwen, 1996). The following research questions guided this study:

  1. How are the reading comprehension texts in the Touchstone series presented visually?
  2. How are the reading comprehension texts in the Touchstone series presented verbally?

2. Literature Review

In social semiotics (Halliday, 1996) language is viewed as “a product of social practice” (Halliday, 1996, p. 86) as opposed to Saussure’s view of language who regards language as a code (van Leeuwen, 2005, p. 3). As stated by Halliday (1996), language is a social semiotic system serving as a resource for meaning through varying and shifting contexts of human interaction. Systemic functional linguistics (henceforth, SFL), expanded and elaborated by Halliday (1994), assumes that language is a social semiotic system which is not decontextualized. Contrary to previous views which conceived of language as a decontextualized phenomenon, Halliday posited that language cannot be detached from its use and situation.

Through SFL, Halliday presents specific details such as context and contextual organization of the written text whose application in the study of texts can provide insight as to why a written text is presented in the way it is. SFL places written and spoken language within the domain of social interaction where a set of options are created with regard to social context. SFL introduces a framework which helps to arrive at contextualized interpretation of structure and its intended meaning, realize the aims of a text, examine the potentials of meaning basic in linguistic production, and detect the deeper association between elements within a given discourse production (Bednarek & Martin, 2010).

In SFL, Halliday (1994) proposed three metafunctions he claimed to exist in all languages: ideational, interpersonal and textual. The ideational metafunctionrealized in transitivity deals with things, whether real or imagined in the world. The interpersonal is concerned with relationship with people and, in Halliday’s words, constitutes the ‘participatory function of language’ (Halliday, 2007, p. 184) realized by mood and modality. Finally, the textual functionrealized in information structure relates to how a text is constructed.

Halliday’s metafunctional approach to language was extended and applied to other modes of semiotics including visual modes (Kress & van Leeuwen, 2006) and color (Kress & van Leeuwen, 2002). According to Kress and van Leeuwen (2006), the capability to apply Halliday’s metafunctions to other modes of semiotics, i.e., moving images (Iedema, 2001), sound and music (Kress & van Leeuwen, 2006) make his approach multimodal, suggesting it can attend to “the full range of communicational forms people use […] and the relationships between them” (Jewitt, 2009, p. 14). 

Kress and van Leeuwen (2006) adopted Halliday’s (1978) metafunctions of language to present a visual grammar classification including representational, interactive, and compositional modes. Kress and van Leeuwen’s (2006) classification is applicable to visual analysis of different humanistic subjects including English textbooks. In Kress and van Leeuwen’s (2006) model, a vast scope of features and types of visuals are presented including children’s drawings, book illustrations, photo-journalism, fine art, etc. 

Based on the multimodal approach, the present study examines Touchstone English series reading comprehension passages along with their associated illustrations. For this purpose, Kress and van Leeuwen’s model of visual grammar to examine textbook illustrations and Halliday and Matthiesen’s (2004) transitivity analysis were utilized to investigate the reading passages. Table 1 shows the correspondence between Halliday’s (1978) metafucntions of language and Kress and van Leeuwen’s (2006) model of visual meaning. As can be seen in Table 1, Kress and van Leeuwen actually used different terms to refer to the same concepts in Halliday’s model: representational instead of ideational; interactive instead of interpersonal; and compositional instead of textual.


Table 1.

 Theoretical Framework

Theoretical frameworks


Halliday’s (1978) Model of Metafunctions of Language




Kress and van Leeuwen’s (1996, 2006) Framework of  Visual Meaning





One of the core contents of a textbook is the written text. In point of fact, some teachers or administrators rate a textbook according to the written material. The type of discourse and its features can significantly impact the quality of language teaching and learning. The efficiency of teaching and learning languages is also affected by multimodality of language text books.

The literature is replete with severalstudies focused on the verbal and visual modes of textbooks.  To cite an example, Bezemer and Kress (2010) conducted a social semiotic analysis of textbooks in various subjects (i.e., English, Science, and Mathematics) across varying spans of time (i.e., the 1930s, 1980s, 2000s). Their findings showed a shift in the design of the books with regard to their social/pedagogic relations. In another social semiotic study, Torres (2015) drew upon Kress and van Leeuweun’s (2006) visual grammar to examine an EFL textbook taught in South Korean University context. Her findings showed “some instances where the visual message was in contradiction with the verbal message” (p.250), reflecting the embedded ideologies in the texts and images. Chu (2011) examined picture books used to teach reading in an upper-primary classroom. The findings pointed to the dominance of focus on the verbal mode compared to visual images.

Among local Iranian scholars, Tahririan and Sadri (2013) carried out a study aimed at analyzing images in Iranian high school EFL course books. Their study analyzed three high school EFL course books used in an Iranian secondary school with reference to Kress and van Leeuweun’s (2006) visual grammar. These researchers deduced that the aforementioned text books mostly presented informative or illustrative functions. In addition, the images were of high functional value. However, poor modality and plain graphic design along with out of date depiction of current Iranian lifestyle and society were among the observed defects of the text books. Alaei and Ahangari (2016) used transitivity analysis to investigate ideology or opinion expression in Joseph Conrad’s Heart of Darkness. The findings revealed that material (40.4%), relational (27.2%), and mental (20.4.5) processes had the highest observed frequencies, respectively. They pointed out that most actions are done by animate Actors where the writer tries to bring the reader to the point that the main character of the story (Marlow) has the main responsibility of informing others about the colonization of Africa.

Although some studies conducted in Iranian context (e.g.,Alaei & Ahangari, 2016; Tahririan & Sadri, 2013) investigated the visual or verbal textbook analysis, the importance of multimodality and its effect on language learning has not been accorded much attention in textbooks widely used in Iranian foreign language institutes. Nearly all studies have brought into consideration one or some of the features of visual or textual modes of the framework and, as a result, other aspects seem to have been overlooked. In other words, applying both visual and textual analysis in EFL textbooks has received scant attention in Iranian studies. Besides, previous studies are mainly conducted on Iranian high school English textbooks which obviously are not designed for communicative purposes. Consequently, evaluating both visual and verbal aspects of Touchstone series, currently taught in many Iranian foreign language institutes as an international source for learning English would be a worthwhile undertaking.


3. Methodology

This section explains the design and the corpus of the study. The section ends with procedure and data analysis.


3.1. Design and Context of the Study

This research was designed as a descriptive study using both qualitative and quantitative data to examine visual and textual elements in Touchstone reading comprehension sections. Whilst the visual elements were analyzed qualitatively according to Kress and van Leeuwen’s (1996, 2006) model of visual grammar, the reading comprehension texts were analyzed quantitatively based on Halliday and Matthiesen’s (2004) transitivity system. The results were then compared to see how the two elements assisted language learners’ understanding.


3.2. Corpus of the Study

Touchstone series is widely used in many Iranian language schools as one of the most popular sources of language learning for EFL learners. Written by McCarthy, McCarthy and Sandiford (2nd edition, 2014), the series are published by Cambridge University Press. Except book 1, which contains nine reading comprehensions, each book consists of twelve reading comprehension passages with colored pictures accompanying the reading texts.

For the purpose of this study, four reading comprehension passages were selected based on topic similarities alongside their seven corresponding pictures from Touchstone students’ books from book one to four, targeting beginning, high beginning, low-intermediate, and intermediate level language learners, respectively. Table 2 presents the details of the selected corpus.

Table 2.

Touchstone Series Selected Reading Texts for Analysis





Number of Words

Number of Pictures

Book 1







High Beginning


Technology in the Future



Book 3

Low Intermediate





Book 4







3.3. Data Analysis Procedures

To answer to the research questions, Halliday and Matthiesen’s (2004) transitivity system and Kress and van Leeuwen’s (1996, 2006) model of visual grammar were used (see Table 3).


Table 3.

Analytical Framework for Visual and Verbal Analysis

Research Question

Analytical Framework

Elements Analyzed

How are the reading comprehension texts in the Touchstone series presented visually?

Model of Visual Grammar

(Kress & van Leeuwen, 1996, 2006)




How are the reading comprehension texts in Touchstone series presented verbally?

Transitivity Analysis

( Halliday & Matthiesen, 2004)











The reading comprehension texts were analyzed to examine the use of transitivity following Halliday and Matthiesen’s (2004) framework of Transitivity System. The accompanied visual elements (i.e., pictures) were analyzed based on Kress and van Leeuwen’s (1996, 2006) classification of visual analysis. Table 3 illustrates details of the framework. Table 4 displays Kress and van Leeuwen’s (2006) model of visual grammar.


Table 4.

Kress and Van Leeuwen’s (2006) Model of Visual Grammar











Sociocultural Portrayal








Medium Shot

Long Shot






Frontal / Oblique

High /Low/ Eye-leveled








Information Value








The selected reading comprehension texts were examined for the use of the six types of clauses in Transitivity System: material, relational, mental, verbal, behavioral, and existential processes. Table 5 presents a description of Halliday and Matthiesen’s (2004) framework. The data were then statistically analyzed. .

Table 5.

Description of Material, Mental, and Verbal Processes of Transitivity System

Process Types


Type of Verbs


Physical action in the real world 

Doing, happening ,


Process of perception, cognition, emotion, affection

Sensing, seeing, feeling,  thinking, wanting,


Process of communication



Representing possession, equivalence, attribute

Being, attributing, identifying,


Introducing indirect speech with verbs like laugh- talk- breath- cry




Exist – there is ….


According to Halliday and Matthiesen (2004, p. 179), “A material clause construes a quantum of change in the flow of events as taking place through some input of energy”. In material clauses, the source of energy that causes the change is generally a participant (the actor). Material process includes events, activities and actions for both animate and inanimate actors. Types of ‘doing’ in material realm consist of creative and transformative clauses. In creative clauses, the actor of goal is seen as being brought into existence as the process extends. In transformative clauses, a pre-existing actor or goal is considered as being changed as the process develops (Halliday & Matthiesen, 2004). Type of verbs utilized in creative clause include form, emerge, make, create, produce, construct, build, design, write, compose, draw, paint, and bake.

Mental processes relate to “our experience of the world of our own consciousness” (Halliday & Matthiesen, 2004, p. 197). Mental clauses are clauses of sensing and the tense of the verbal groups is simple present rather that present-in-present. It is one of the differences between mental and material clauses. The subject of all mental clauses is ‘I’, except ‘don’t worry’ in which the subject is ‘you’.  The participants’ roles for mental clauses include ‘sensor’ and ‘phenomenon’.

Verbal process refers to the process of saying. Verbal process contains the features of mental and relational processes and is symbolized in the action of saying. In verbal clauses there is always one participant that represents the speaker. Also, there might be additional participants, which represent the addressee.

Relational clauses tend to characterize and to identify. The process is realized mostly with the verb ‘be’ in simple present or simple past. Relational process comprises three types of clauses. First, attributes a relationship of sameness between two entities (‘x is y’). Second, the entity is defined in terms of time, manner, or location (circumstantial ‘x is at y’). Third, points out that an entity owns another (‘x has y’). They are named intensive, circumstantial, and possessive, respectively. Generally, intensive and circumstantial clauses contain ‘be’ verbs, but possessive clauses contain verbs such as possess, belong, own, and have. The term ‘being’ in relational process does not refer to existence. Therefore, in relational clauses with the verb ‘be’, there must be two participants, instead of one and a relationship of being is set up between two entities.

Behavioral processes, according to Halliday and Matthiesen (2004), exist between material and mental borderline. Behavioral processes are the outer demonstration of inner working expressing processes of consciousness and psychological states. However, behavioral processes are not considered as a clear-cut class of processes. Indeed, they are generally looked upon as a cluster of small subtypes mixing the material and the mental into a continuum. In behavioral process the participant is the behavior, that is, generally a conscious being. The most typical pattern in behavioral process is a clause containing a behavior and a process. The verb is intransitive with one participant. Behavioral process displays an action in which both mental and physical features are indivisible and essential to it. In this process, there is one participant that is named behavior.


4. Results

In this section, the analyses of visual elements followed by analyses of reading comprehension passages are presented. 


4.1. Visual Analysis

This section reports the results for the visual analysis of the Touchstone series. In this regard, representational, interactive, and compositional meanings of the visuals are presented.

4.1.1. Representational Mode

According to Kress and van Leeuwen (2006), representational mode pertains to the presentation of participants, their actions, and the details related to them. The results revealed that participants play an important role in the pictures by conveying information related to the textual parts. In fact, the details of participants’ activities illustrate and support the textual parts of reading comprehension texts.

The analyses of the corpus under study showed that all participants were human, with their age ranging between young adults and adolescents. Therefore, a realistic trend toward the viewers’ lives, experiences, and action is realized. Since there were 3 male and 3 females, it can be claimed that no bias or gender stereotyping exists. The equal distribution of technology use by males and females through all books also shows there is no pattern of gender stereotyping in the corpus. The corpus showed 4 drawings and 3 photos which displays a balance in the presentation of naturalistic pictures and drawings. 

The representation of participant-related features including their appearance such as their outfit and clothing and their possessions such as electronic devices are modern and familiar for young language learners. Such participant activities as working with cell phones or computers are common and ordinary for Iranian learners and match with the learners’ present lifestyle. Participants’ appearance is portrayed in accordance with western cultural norms as the female participants do not have head covering. No attempt is made at conveying any political, ethical, or religious messages by participants’ appearance or actions.


4.1.2. Interactive Mode

Interactive meaning refers to the relation between the image and the viewer. The images are analyzed based on distance, perspective, and modality. Distance is an important component of visual frame referring to the size of the visual frame. Kress and van Leeuwen (2006) point out that close-up picture which includes the head and shoulders of the participant indicate a closer or friendly relationship. Besides, while a medium shot which shows the subject up to the waist implies a social connection and far personal distance, a long shot frame displays social distance between the viewers and the visuals.

The results of analyses indicated that the corpus images include three close-up (book 2 and book 4), a medium shot (book 1), and two long shot frames (book 1, 3). In other words, the social distance between the viewer and the participant is balanced. Although the long shot frames imply impersonal relationship, the two long shot pictures observed in the corpus of this study increased the modality of pictures. For instance, book 3 represents an internet swindler in a long shot frame reflecting the difference between the viewer and the participants’ world. In fact, the implication is that there is no close relationship between the viewer and the swindler. As a result, the modality of the picture is increased.

In terms of distance, visual evaluation of the Touchstone series suggests that various forms of distance, that is, close-up, medium shot, and long shot are used to present the visuals. There was no difference in male and females’ distance features. 

Perspective as the second aspect of interactive mode is unique to images and refers to the selection of an angle, a ‘point of view’ (Kress &van Leeuven, 2006).The analysis indicated 1 vertical and 6 horizontal angles along the visuals with eye-leveled angle in vertical axis. Kress and van Leeuwen (2006) state that eye-leveled angle indicates power equality of the pictures’ participants and the viewers. Therefore, no superiority is felt between the learners and the visuals’ participants through all 4 levels of the Touchstone series. Thus, the learners can develop a sense of involvement with the participants’ world. The findings showed that 2 participants in books 1 and 2 appear in the back view.  According to Kress and van Leeuwen (2006), back view implies a measure of trust which, despite the abandonment, is complex and ambivalent. 

The results further display that complex feelings are transmitted to the viewers in a way that the two cases of back view direct the learners’ attention to the participants’ activities rather than themselves. Therefore, more relevant information is conveyed to support the textual part. Book 1 includes a picture of a female participant who is doing yoga while focused on a monitor. She is presented in back view and the picture holds necessary details related to the textual part, that is, exergaming. One plausible interpretation could be that although the participant does not have frontal view and eye contact with the viewer, here the focus is mainly on exercising itself and not the communication with the participant. As a result, the picture has high modality.

Modality as the third aspect of interactive mode analysis included analyzing visual elements for colors and contextualization of the pictures including color saturation, color differentiation, and color modulation.  Kress and van Leeuwen (2006) assume that the concept of modality in visual communication is socially dependent referring to the way the people, places, and things are presented as if they were real, really existed in this way, or did not. The results indicated that all drawings have high color saturation and varied colors with low color modulation. However, the photos showed naturalistic color saturation and color differentiation (medium) with high modality. The implication is that the visuals of the Touchstone series show high modality considering color saturation, color differentiation, and color modulation.

Kress and van Leeuwen (2006) assume that “… modality judgments are social, dependent on what is considered real (or true, or sacred) in the social group for which the representation is primarily intended” (p. 156).  Touchstone series are developed for adults and young adults, and high modality of visuals makes possible communicating meaning in a way that conveys clear meaning to the learners. No black and white pictures are used along reading texts of the Touchstone series; the pictures are pertinent to the linguistic part and reflect the underlying meaning of the reading texts. Overall, evaluation of the modality of the visuals suggests that high color saturation and color differentiation of the pictures have increased their modality.

Contextualization refers to the background. Analysis of both photos and drawings of the corpus showed low contextualization. As a result, in terms of contextualization the modality of the visuals was low and the interaction between the pictures and learners is limited to the participants and their acts. However, depth in pictures was high. The shade of light and perspective create depth and make photos seem more real and more naturalistic than drawings.

The findings displayed a poor contextualization in the photos and drawings under study. The participants are depicted in the foreground and the backgrounds are plain or vague. As an example, in book 4 the photo of a female participant comes along the text, but the photo is decontextualized and the details of location or time are not presented.  It suggests that low or plain contextualization separates the participants from a particular location and a specific moment in time. The results of the corpus analysis indicated that the foreground, that is, the participants, is sharper and more defined than the background, making the background look artificial and less naturalistic and less real. The communication between the learners and the pictures is facilitated by presenting enough details in the background and such details convey meaning to the learners.


4.1.3. Compositional Mode

The compositional mode analyzed the text for its information value and salience. Information value attributes specific informational values to the visuals by the placement of the elements of the image. It refers to the placement of the visual elements on the left, right, top, bottom, center, and margin, or different pictorial zones. Based on Kress and van Leeuwen (2006), in verbal-visual texts,  left side items carry the meaning of a familiar piece of information or the ‘Given’ (the viewer is already familiar with it) and the right side presents a piece of new information or the ‘New’. The results revealed that three pictures were placed in the left side of the texts, and four were embedded on the right. The analysis also revealed that two pictures appeared on the top and five were placed below the texts.

Placement of the elements on top of the page offers them as Ideal and the lower part as Real. According to Unsworth (2008), the textbooks’ Ideal or Real recognition implies different meanings. Kress and van Leeuwen (2006) state that the top part represents abstract emotive and general information to show us ‘what might be’ while the bottom part contains concrete specific detailed down-to-earth informative and practical information showing us ‘what is’. Kress and van Leeuwen (2006) also assert that when the text is placed on the upper part and the picture on the lower part of the page, the text plays the lead role and the picture is subordinate to the text. Accordingly, the main part of the message is communicated to the learners textually. However, when the picture is placed on the top and the text on the lower section of the page, the Ideal is the picture which visually communicates the main message to the learners visually. In this respect, the results of the present study suggest that the texts in the Touchstone series have the lead role being presented “as the idealized or generalized essence of the information” (Kress & van Leeuwen, 2006, p.187).The pictures are subordinated to the texts. Hence, the texts are emphasized with the visuals supporting the textual information.

As the second aspect of compositional mode, salience refers to the way the elements of the picture are presented to get the viewer’s attention to different degrees (Kress & van Leeuwen, 2006). Factors such as placement of different elements in the background or foreground, color contrast and sharpness, relative size of the picture, etc. can influence the salience of a picture.

The results indicated that the texts are the most dominant parts of the pages and the visuals support them in a way that the size of the pictures is not as big as the texts’ size. Therefore, the texts have occupied the most space of the reading pages rather than the pictures and densely printed pages in which textual parts are dominant were observed in the corpus. Eye-catching color differentiation and saturation enhances the salience of the pictures.The underlying meaning of the images accompanies the texts but does not add more information to it. In this way, the images are of illustrative type used purposefully.  Moreover, the results indicated that the pictures of the corpus attract the viewers’ attention because of their color density, color saturation, and color differentiation. Hence, these factors increase the pictures’ salience in order to get the learners’ attention. All pictures are also distinct from the textual parts by framing lines, shade of light or color. In this respect, the high salience of the visuals could be attributed to the fact that they are intended to get the learners’ attention and would not be ignored.            


4.2. Textual Analysis

For verbal analysis, Halliday and Matthiesen’s (2004) transitivity system was employed which yielded the identification and analysis of six process types in the four texts. In the opinion of Halliday and Matthiesen (2004, p. 170), “The transitivity system construes the world of experience into a manageable set of process types”. Halliday and Matthiesen (2004) add that “…the clause is also a mode of reflection, of imposing order on the endless variation and flow of events” (p.170). They also maintain that each process type provides its own schema for understanding a particular domain of experience. Therefore, to understand and interpret the inner and outer world of experience and reflection of the flow of events which are expressed in the text by the writer, analyzing clauses is the most helpful way and, for this reason, it is considered for verbal analysis in this study.

A summary of the findings is presented in Table 7. As shown in the table, material process with 127 occurrences (50%), relational with 81 occurrences (31.8%), and mental with the frequency of 29 (11.4%) have the highest frequency in the four texts, respectively. This observation reveals that the texts mostly explain physical actions in the real world, including clauses of doing and happening. Relational clauses as the second most frequent processes show that the text has also served to characterize and identify things. It includes both attribution and identification. Compared with material and relational processes, mental and verbal processes had fewer occurrences in the four levels of the Touchstone series. The low occurrence of the mental processes indicated that the texts were not focused on the inner world or feelings and verbs such as “like, know, think, want, or perceive” were not used much in the texts to represent a conscious sensor or being.


Table 7.

Distribution of Textual Elements in the Corpus























































































Halliday and Matthiesen (2004) claim that “they [verbal clauses] contribute to the creation of narrative by making it possible to set up dialogic passages” (p. 252). Thus, the low occurrence of the verbal clauses in the corpus showed that the four texts were not of narrative type. According to Halliday and Matthiesen (2004), behavioral clauses “often appear in fictional narrative introducing direct speech, as a means of attaching a behavioral feature to the verbal process of saying” (p. 252). Thus, the low occurrence of behavioral and existential processes in the corpus indicates that the texts did not refer to human behavior or existence.

The analysis of the texts suggested that in books 1 to 4, as the text difficulty increases, so does the material process. As illustrated in Table 7, verbal, behavioral, and existential processes were less frequent than material, relational, and mental processes in the linguistic text. Furthermore, behavioral process type had the lowest frequency among six processes. Table 8 represents the results of the textual evaluation of six process types in the corpus.


Table 8.

Textual Evaluation Results in the Corpus





50 %



31.8 %



11.4 %



4.7 %



0.7 %



1.1 %



As shown in Table 8, material process shows the highest occurrence in the four texts of the Touchstone series, while behavioral and existential process are the least frequent process types in Touchstone series reading comprehension texts.


5. Discussion

To answer to the first research question, that is, how the reading comprehension texts in Touchstone series are presented visually, Kress and van Leeuween (2006) theoretical model was adopted and the visual elements were subsequently examined in terms of representational, interactional, and compositional modes. The findings showed that scales of visual evaluation showed naturalistic and humanistic trends. The pictures were of different kinds such as drawings and photos replicating Bezemeret, Diamantopoulou, Jewitt, Kress and Mavers’s (2012) findings about the effect of multimodal social semiotic approach on learning. They claimed that two types of visual mode (i.e., drawings and pictures) promote learning. Therefore, using both photos and drawings in Touchstone series exerts positive effects on the learning process. However, the findings were contrary to Tahririan and Sadri’s (2013) analyses of local English textbooks where Iranian high school old textbooks were found to use outdated portrayal of objects, overdramatized national identity, had poor modality and used gray scale printing. By contrast, the finding of this study indicated that up-to-date themes and topics were used in Touchstone textbooks. In this study, all drawings or photos of Touchstone textbook series were colored and fairly of high modality. In fact, differences and similarities were found between the findings of this study and Tahririan and Sadri’s (2013) findings. They found out that most images of the old high school books were placed in the right side of the page which is taken as ‘new’ for the viewer; an observation which is compatible with the findings of this study  where 4 pictures out of 7 were presented in the right side of the text. While Tahririan and Sadri showed that most pictures were presented in the bottom of the page, the findings of this study revealed that 3 picture were placed at the bottom of the text which pertains to providing the learners with real facts. The optimum layout involves putting the pictures at the bottom which implies providing the students with reality or real facts or presenting the pictures on the right side of the page which suggests new information is provided for the learner.

The findings of this study ran counter to Tahririan and Sadri’s (2013), Marefat and Marzban’s (2014), and Roohani and Heidari’s (2012) studies in terms of gender stereotyping in textbooks. These studies pointed out bias and gender stereotyping in Iranian high school old textbooks, the Iran Language Institute (ILI), and Summit 2B textbooks. By contrast, the findings of this study showed no gender stereotyping or bias between male and female pictorial participants in Touchstone series. Moreover, reading comprehension text about exergaming is a representation of less familiar sport for Iranian students. The pictures of this text fail to provide enough details about the sport and therefore seem to look odd for Iranian learners who do not share similar experience.

Regarding perspective, the findings were compatible with those reported by Tahririan and Sadri (2013) according to which all images were represented in eye-leveled angle. Therefore, the power equality is likely to cause the learners to make connection with the picture. While Tahririan and Sadri (2013) maintained that, most pictures were represented in long shot, in this study most pictures were close up or medium shot. In this way, Touchstone series present pictures with personal, intimate connection between the pictures and learners. In addition, the findings were in line with Tahririan and Sadri’s (2013) which revealed about half of the pictures of high school old books included no contextualization. In fact, contextualization in Touchstone series was of low modality which means all pictures fall within the category of ellipsis lacking enough details of the setting or background. Some pictures were decontextualized which results in poor contextualization modality. It produces less real scenes or abstraction from reality which lowers the connection between the learner and the visual. In terms of text-image status, the outcomes were compatible with Tahririan and Sadri’s (2013) findings since the pictures were illustrative and informative. Therefore, the visuals were of high salience which consequently enhanced modality. As a result, they are designed purposefully to convey the underlying meaning of the text or illustrate the textual part.

To respond to the second research question, i.e., how the reading comprehension texts in Touchstone series are presented verbally, Halliday and Matthiessen’s (2004) transitivity system framework was adopted. Based on this scheme, there are six process types, namely, Material, Relational, Verbal, Mental, Behavioral, and Existential. The analysis of the six process types suggested that material process was of the highest frequency followed by relational process. These findings support the results of Zheng, Yang and Ge (2014) who investigated medical research articles and found out that material process had the most occurrence followed by relational, mental, verbal, and existential processes, respectively. This finding implies that Touchstone reading texts are written about real world activities and experiences using material and relational processes the most. Thus, beginning to intermediate level learners who use these books are likely to comprehend the texts more efficiently because it is more tangible for them. Moreover, lower usage of mental verbs shows that communication with inner world, i.e., thinking and feeling, is not the main focus of the texts and the outer world is emphasized rather than the inner world.

The lowest occurrence of process types in Touchstone texts belonged to existential and behavioral processes. Based on Halliday and Matthiessen (2004), existential processes are on the borderline between the relational and material processes and refer to existence or happening mainly recognized by the verb ‘be’. Halliday and Matthiessen (2004, p. 174) also state that “the setting or orientation of a narrative is often dominated by ‘existential’ and ‘relational’ clauses, but the main event line is construed predominantly by ‘material’ clauses”. Accordingly, it can be deduced that Touchstone reading texts are not of narrative type.  In this regard then the results are in accordance with Halliday and Matthiessen’s (2004, p. 257) claim that “… existential clauses are not, overall, very common in discourse…” and about 3 to 4 percent of all clauses are existential type. Halliday and Matthiessen (2004,p.248) state that behavioral clauses pertain to “physiological and psychological behavior, like breathing, coughing,…”.  Only 7% of the processes in the Touchstone texts showed behavioral process revealing that the writer’s focus is not on human behavior and the learners are not faced with fictional narrative.

According to Halliday and Matthiessen (2004), material processes are related to happening and doing which are used to construe the procedures or events that take place. The findings of the study indicated that in Touchstone series reading comprehension texts material clauses were of highest frequency in order to describe actions in the real world, changes or events.

Using high modality pictures and photos is another merit of the textbooks. Kress and van Leeuwen (2006) claim that “what is regarded as real depends on how reality is defined by a particular social group” (p.158). Since these textbooks are mostly used by young learners who are familiar with social media and technological devices, providing both drawings and photographs for this group of learners is proper and helps convey the intended meaning. As a result, the visuals of the textbooks are of high modality and seem to be the proper medium for the representation of the real world. It was found that using young participants in all images is yet another merit because the Touchstone series are written for young learners and using young participants in the textbooks’ visuals in fact helps the learner to communicate with the characters as it causes sympathy between the learners and book characters. The participants are engaged in typical activities that teenagers at the same age usually do.


6. Conclusions

According to Kress and Van Leeuwen (1996), visual grammar cannot be separated from verbal or any other grammar. In teaching foreign languages, presenting proper visual and textual elements shapes the process of meaning making for the learners which is noteworthy. Therefore, an introduction to visual grammar in an English language course through multimodal texts and involving students to analyze the text in relation to the accompanied picture can develop their language skills and help them better understand a text.

The findings of the present study suggest that the writer’s idea is expressed in the texts by employing a specific process type. Both drawings and photos are utilized in reading comprehension texts with a balanced trend. Although using drawings is not naturalistic and does not have the quality of real photos, visuals are of relevance to the linguistic passages and help the learners to arrive at a more thorough understanding of the texts. The design and depiction of the drawings seem purposeful and suitable.

Textbook evaluation can assist to elucidate strengths and weaknesses of the textbooks and provides suggestions for improving, developing, designing, and presenting suitable textbooks. The selection of the textbook can influence the whole EFL syllabus around it (Garinger, 2002, Harmer, 1991), yet, the diversity of textbooks available at the market has made the selection of proper books difficult (Cunningsworth, 1995, Green, 1926). Therefore, the success or failure of an ELT course might in part depend upon the quality of the textbook chosen for instruction (Green, 1926).

This study reported the analysis of four texts based on their topic similarity. Thus, researchers are recommended to build on the results reported here and conduct similar researches comparing different textbooks with each other. Further research is also needed to use audio parts or communicative activities of the textbooks. Future studies may also be developed to investigate the effect of different aspects of textbook multimodality on communicative needs and interpersonal interactions. Moreover, researchers can undertake similar studies for a wider range and number of reading texts or comparing sets of texts with different subjects, visually and verbally. Other sections of the English textbooks taught in language institutes, such as new vocabulary, conversation strategies, listening and free talks can be taken into account for future research. Such investigation can include comparisons between two or more textbooks, either. Other sections of the books can undergo the investigations to find out their communicative aspect effectiveness.



Alaei, M. &Ahangari, S. (2016). A study of ideational metafunction in Joseph Conrad’s “Heart of Darkness”: A critical discourse analysis. English Language Teaching, 9(4), 203-213.

Bednarek, M. & Martin, J. (2010). New discourse on language: Functional perspectives on multimodality, identity, and affiliation. London, United Kingdom: Continuum.

Bezemer, J., Diamantopoulou, S., Jewitt, C., Kress, G.,&Mavers, D. (2012). Using a social semiotic approach to multimodality: Researching learning in schools, museums and hospitals. NCRM Working Paper 1/12, 14 March. London: National Centre for Research Methods. Retrieved from

Bezemer, J., & Kress, G. (2010). Changing text: A social semiotic analysis of textbooks. Designs for Learning3(1-2), 10–29. DOI:

Chu, P. Y. (2011). Picture book reading in a new arrival context: a multimodal perspective on teaching reading. Unpublished PhD thesis, University of Adelaide.

Cunningsworth, A. (1995). Choosing your coursebook. UK: Heinemann English Language Teaching.

Garinger, D. (2002). Textbook selection for the ESL classroom (Report No. EDO-FL-02-10). The U.S. Dep. of Education, Office of Educational Research and Improvement, National Library of Education (ERIC Document Reproduction in Service No. ED-99-CO-0008). Retrieved from:

Green, A. (1926). The measurement of modern language books. The Modern language  Journal, 10(5), 259-269,

Halliday, M. (1978). Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold.

Halliday, M. (1994). An introduction to functional grammar. London: Edward Arnold.

Halliday, M. (1996): Introduction: Language as a social semiotic. The social interpretation of language and meaning. In P. Cobley (ed.). The communication theory reader (pp.88-93). London: Routledge.

Halliday, M. (2007). Language and education. London: Continuum.

Halliday, M. &Matthiessen, C. (2004). An introduction to functionalgrammar. London: Edward Arnold.

Harmer, J. (1991). The practice of English language teaching. New York: London Publishing.

Iedema, R. (2001). Analyzing film and television: Asocial semiotic account of hospital: an unhealthy business. In T. van Leeuwen, & C. Jewitt (eds.), Handbook of visual analysis. (pp.183-204). London: Sage.

Jewitt, C. (2009). The Routledge handbook of multimodal analysis. London: Routledge.

Kress, G. & van Leeuwen, T. (2002). Color as a semiotic mode: notes for a grammar of color. Visual Communication, 1(3), 343-368.

Kress, G., & van Leeuwen, T. (1996/2006). Reading images: The grammar of visual design. London: Routledge.

McCarthy, M., McCarthy, J. & Sandiford, H. (2014). Touchstone Student Book 1 (2nd edition). Cambridge: Cambridge University Press

McCarthy, M., McCarthy, J. & Sandiford, H. (2014). Touchstone Student Book 2 (2nd edition). Cambridge: Cambridge University Press

McCarthy, M., McCarthy, J. & Sandiford, H. (2014). Touchstone Student Book 3 (2nd edition). Cambridge: Cambridge University Press

McCarthy, M., McCarthy, J. & Sandiford, H. (2014). Touchstone Student Book 4 (2nd edition). Cambridge: Cambridge University Press

Marefat, F. &Marzban, S. (2014). Multimodal analysis of gender representation in ELT textbooks: Reader's perceptions. Procedia - Social and Behavioral Sciences, 98, 1093-1099.

Roohani, A., &Heidari, N. (2012). Evaluating an instructional textbook:  A critical discourse perspective. Issues in Language Teaching, 1(1), 123-159.

Shannon, p. (2010). Textbook development and selection. In International Encyclopedia of Education (3rd Edition). DOI: 10.1016/B978-0-08-044894-7.00065-8

Sheldon, L. (1988). Evaluating ELT textbooks and materials. ELT Journal, 42 (2), 237- 246. Retrieved from

Tahririan, M. & Sadri, E., (2013). Analysis of Images in Iranian High School EFL Course Books. Iranian Journal of Applied Linguistics, 16(2), 137-160.

Torres, G. (2015). ‘Reading’ world link: A visual social semiotic analysis of an EFL textbook. International Journal of English Language Education, 3(1), 239-253.

Unsworth, L. (2008). Multimodal semiotic analyses and education. In L. Unsworth (Ed.), Multimodal semiotics: Functional analysis in contexts of education (pp. 1-12). London: Continuum.

van Leeuwen, T. (2005). Introducing social semiotics. London: Routledge

Zheng, S., Yang, A., &Ge, G. (2014). Functional stylistic analysis: Transitivity in English medium medical research articles. International Journal of English Linguistics, 4(2), 12-25.