SLI Diagnostics in Narratives : Exploring the CLARIN-NL VALID Data Archive

In 2014 the Vulnerability in Acquisition: Language Impairments in Dutch (VALID)1 Data Archive for pathological language data (CLARIN-NL-12-010 grant) was launched. The aim of the VALID Data Archive is to unite various available datasets ranging from metadata, experimental results, and test outcomes to spontaneous speech data, including video recordings, and to develop unambiguous protocols to ascertain the interpretation of research outcomes. In this chapter we report a study that we carried out using the VALID Data Archive. In an earlier project the language development of children with Speci c Language Impairment (SLI) had indeed been investigated using a narrative task (retelling of a picture story); the VALID database thus contains transcripts and audio les of the speech of 50 children with SLI and 24 age-matched typically developing (TD) children in the age range between 5;6 to 12;0 years, who all participated in this earlier project. Our study focused on morphosyntactic and lexical accuracy and complexity, in order to determine which language measures are diagnostic indicators of SLI on the basis of this narrative data. Results showed that SLI children performed less well than TD children for morphosyntactic and lexical accuracy and complexity. Interestingly, the results obtained can be compared to results found in three other studies on narratives performed by SLI and TD children. The similarities and di erences in the outcomes reveal the urgency to have identical, precise protocols in handling and analysing complex data. 1 http://validdata.org/clarin-project/datasets/ How to cite this book chapter: Bergmann, L, van Hout, R and Klatter-Folmer, J. 2017. SLI Diagnostics in Narratives: Exploring the CLARINNL VALID Data Archive. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 167–180. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.14. License: CC-BY 4.0 UP 033 odijk odijk_printer 2017/12/15 15:57 Page 168 #188 168 CLARIN in the Low Countries 14.1 The VALID Data Archive The Vulnerability in Acquisition: Language Impairments in Dutch (VALID) Data Archive (CLARIN-NL-12-010 grant) that was launched in 2014 is an open multimedia data archive with data from speakers su ering from language impairments. The aim of the VALID Data Archive is to unite various available datasets ranging from metadata, experimental results, and test outcomes to spontaneous speech data, including video recordings, and to develop unambiguous protocols to ascertain the interpretation of research outcomes. In the CLARIN-NL framework ve VALID data resources were curated; an overview of the key information on each of these ve data resources is provided in the Appendix. For all datasets concerned, written informed consent has been obtained from the participants or their carers. All materials were anonymised. The audio les were converted into wav (linear PCM) les and the transcriptions into CHAT or ELAN format. Research data that consisted of test, SPSS and Excel les were documented and converted into CSV les. All datasets obtained appropriate CMDI metadata les. A new CMDI metadata pro le for this type of data resources was established and care was taken that ISOcat metadata categories were used to optimise interoperability. A full overview of VALID metadata categories can be found in Klatter-Folmer et al. (2014). A er curation all data were deposited at the Max Planck Institute for Psycholinguistics in Nijmegen, where persistent identi ers are linked to all resources. The content of the transcriptions in CHAT and plain text format can be searched with the TROVA search engine (cf. Klatter-Folmer et al., 2014; van den Heuvel et al., 2014). The most important di erence with the Child Language Data Exchange System (CHILDES2 ) is that VALID is a specialised structured database for all types of data related to pathological language, ranging from metadata, experimental results and test outcomes to spontaneous speech data, including video recordings. CHILDES, on the other hand, covers the spectrum of rst-language acquisition research data, focusing in particular on spontaneous speech data, and with fewer datasets from child clinical groups. Moreover, the VALID Data Archive covers all age groups. The realisation of the data archive was made possible by a CLARIN-NL grant (12-010) for a pilot project. This pilot enabled us to build up experience in conserving di erent kinds of pathological language data in a searchable and persistent manner. The conserved datasets re ect current research in language pathology rather well, both in the range of designs and in the variety in pathological problems, such as Speci c Language Impairment (SLI), deafness, dyslexia and ADHD (Klatter-Folmer et al., 2014; van den Heuvel et al., 2014). The rst author of the present contribution carried out the study presented below (Bergmann, 2015), monitored by Roeland van Hout (VALID data provider) and Jetske Klatter (VALID project leader). A main goal of this study was to test the accessibility of the VALID data archive and to signal problems met in extracting the data. 14.2 SLI Diagnostics in Narratives SLI is a set of speech and language disorders with high co-morbidity with other disorders and impairments. Its de nition is based on exclusion criteria and is related to a mix of linguistic, sensory, cognitive, neural-motor, and emotional restrictions. This rather unsatisfactory de nition is largely due to the heterogeneous speech and language behaviour of SLI children (Manders, De Bal and Van den Heuvel, 2013), while at the same time no speci c causes of SLI have been detected yet (Archibald and Gathercole, 2006). The co-morbidity patterns found do support the idea that SLI is a multi-factorial disorder (Bishop, 2006). SLI children display a problematic and delayed development in language form, function, and use, where impairments may occur in all language domains, such as phonology, semantics, 2 http://childes.talkbank.org/ UP 033 odijk odijk_printer 2017/12/15 15:57 Page 169 #189 SLI Diagnostics in Narratives: Exploring the CLARIN-NL VALID Data Archive 169 morphosyntax, and pragmatics (Casalini et al., 2007). Bishop (2006) concluded that SLI children obviously have di culties in adequately processing information that is being o ered in a short time span, as is the case in spoken conversations. For the majority of SLI children, grammar is a di cult area, and weak morphosyntactic skills are correlated with poor lexical-semantic skills (Simon-Cereijido and Gutiérrez-Clellen, 2009; Toppelberg and Shapiro, 2000; Bishop, 2013). Studies focusing on the complexity and accuracy of morphosyntax address a range of features. Smith-Lock (1993) already pointed to di erences between SLI and typically developing (TD) children in passive sentence constructions, and Rice performed several investigations into mean length of utterance (MLU), showing that SLI children lag behind in MLU, partly because of the absence of complex morphosyntactic constructions, e.g. subordinate clauses and question clauses (Rice, Redmond and Ho man, 2006). As for accuracy, research by Vandewalle et al. (2012) showed errors in verb in ection, articles, and word order, when compared to TD children, and Simon-Cereijido and Gutiérrez-Clellen (2009) mentioned deletion of function words. In SLI, the production of complex utterances triggers an increase in morphosyntactic errors, as complex utterances are obviously more demanding (Colozzo et al., 2011). Considering lexical complexity and accuracy, Bishop (1992) argued that SLI children have di culties processing linguistic input as a whole, resulting in weak and ine cient connections between words, which in turn leads to longer retrieval time and more errors in word choice (Kambanaros et al., 2014). In a longitudinal study of 500 SLI and TD children, Rice and Ho man (2015) found that SLI children consistently performed less well than age-matched TD children on lexical tasks. Gaining more insight into the causes and characteristics of SLI requires a detailed diagnostic procedure. The usual battery of SLI test materials focuses on communication in structured settings, such as inviting participants to select the image that best represents a stimulus word. These experimental settings are unnatural and provide scarce information about linguistic skills in a spontaneous or semi-structured conversation (Peña et al., 2006). This argues for using narrative tests that combine spontaneous quality with a structured content. Retelling a picture story requires quite di erent competencies to those used in structured settings, such as introducing the characters, explaining the topic and structuring the text. Also, (re)telling a story challenges people to be more explicit and to produce longer linguistic units (Treurniet, 2011; Treurniet and Orgassa, 2011). As in these tasks children tend to show more linguistic variation and produce more utterances, they are an appropriate means for collecting data on morphosyntactic and lexical skills. Several studies con rm that narratives demonstrate the morphosyntactic and lexical problems of SLI children (e.g. Kambanaros et al., 2014; Vandewalle et al., 2012). The morphosyntactic and/or lexical accuracy and complexity of narratives by SLI children were analysed in three earlier studies in the Netherlands: that of Treurniet (2011); Verhoeven, Steenge and Van Balkom (2011); and Zwitserlood et al. (2015). All three studies mention problems in the morphosyntactic and/or lexical domain for SLI children. Each used a di erent set of narrative data. The VALID Data Archive contains yet another, new narrative dataset. Departing from the Dutch studies, the following research questions and hypotheses were formulated for our study: A. How do SLI children perform with regard to morphosyntactic accuracy and complexity in a narrative in comparison to their typically developing peers? H1: SLI children use morphosyntactically less compl


The VALID Data Archive
The Vulnerability in Acquisition: Language Impairments in Dutch (VALID) Data Archive (CLARIN-NL-12-010 grant) that was launched in 2014 is an open multimedia data archive with data from speakers su ering from language impairments.The aim of the VALID Data Archive is to unite various available datasets ranging from metadata, experimental results, and test outcomes to spontaneous speech data, including video recordings, and to develop unambiguous protocols to ascertain the interpretation of research outcomes.In the CLARIN-NL framework ve VALID data resources were curated; an overview of the key information on each of these ve data resources is provided in the Appendix.For all datasets concerned, written informed consent has been obtained from the participants or their carers.All materials were anonymised.The audio les were converted into wav (linear PCM) les and the transcriptions into CHAT or ELAN format.Research data that consisted of test, SPSS and Excel les were documented and converted into CSV les.All datasets obtained appropriate CMDI metadata les.A new CMDI metadata pro le for this type of data resources was established and care was taken that ISOcat metadata categories were used to optimise interoperability.A full overview of VALID metadata categories can be found in Klatter-Folmer et al. (2014).A er curation all data were deposited at the Max Planck Institute for Psycholinguistics in Nijmegen, where persistent identi ers are linked to all resources.The content of the transcriptions in CHAT and plain text format can be searched with the TROVA search engine (cf.Klatter-Folmer et al., 2014;van den Heuvel et al., 2014).
The most important di erence with the Child Language Data Exchange System (CHILDES2 ) is that VALID is a specialised structured database for all types of data related to pathological language, ranging from metadata, experimental results and test outcomes to spontaneous speech data, including video recordings.CHILDES, on the other hand, covers the spectrum of rst-language acquisition research data, focusing in particular on spontaneous speech data, and with fewer datasets from child clinical groups.Moreover, the VALID Data Archive covers all age groups.
The realisation of the data archive was made possible by a CLARIN-NL grant (12-010) for a pilot project.This pilot enabled us to build up experience in conserving di erent kinds of pathological language data in a searchable and persistent manner.The conserved datasets re ect current research in language pathology rather well, both in the range of designs and in the variety in pathological problems, such as Speci c Language Impairment (SLI), deafness, dyslexia and ADHD (Klatter-Folmer et al., 2014;van den Heuvel et al., 2014).The rst author of the present contribution carried out the study presented below (Bergmann, 2015), monitored by Roeland van Hout (VALID data provider) and Jetske Klatter (VALID project leader).A main goal of this study was to test the accessibility of the VALID data archive and to signal problems met in extracting the data.

SLI Diagnostics in Narratives
SLI is a set of speech and language disorders with high co-morbidity with other disorders and impairments.Its de nition is based on exclusion criteria and is related to a mix of linguistic, sensory, cognitive, neural-motor, and emotional restrictions.This rather unsatisfactory de nition is largely due to the heterogeneous speech and language behaviour of SLI children (Manders, De Bal and Van den Heuvel, 2013), while at the same time no speci c causes of SLI have been detected yet (Archibald and Gathercole, 2006).The co-morbidity patterns found do support the idea that SLI is a multi-factorial disorder (Bishop, 2006).
SLI children display a problematic and delayed development in language form, function, and use, where impairments may occur in all language domains, such as phonology, semantics, morphosyntax, and pragmatics (Casalini et al., 2007).Bishop (2006) concluded that SLI children obviously have di culties in adequately processing information that is being o ered in a short time span, as is the case in spoken conversations.For the majority of SLI children, grammar is a di cult area, and weak morphosyntactic skills are correlated with poor lexical-semantic skills (Simon-Cereijido and Gutiérrez-Clellen, 2009;Toppelberg and Shapiro, 2000;Bishop, 2013).
Studies focusing on the complexity and accuracy of morphosyntax address a range of features.Smith-Lock (1993) already pointed to di erences between SLI and typically developing (TD) children in passive sentence constructions, and Rice performed several investigations into mean length of utterance (MLU), showing that SLI children lag behind in MLU, partly because of the absence of complex morphosyntactic constructions, e.g.subordinate clauses and question clauses (Rice, Redmond and Ho man, 2006).As for accuracy, research by Vandewalle et al. (2012) showed errors in verb in ection, articles, and word order, when compared to TD children, and Simon-Cereijido and Gutiérrez-Clellen (2009) mentioned deletion of function words.In SLI, the production of complex utterances triggers an increase in morphosyntactic errors, as complex utterances are obviously more demanding (Colozzo et al., 2011).
Considering lexical complexity and accuracy, Bishop (1992) argued that SLI children have diculties processing linguistic input as a whole, resulting in weak and ine cient connections between words, which in turn leads to longer retrieval time and more errors in word choice (Kambanaros et al., 2014).In a longitudinal study of 500 SLI and TD children, Rice and Ho man (2015) found that SLI children consistently performed less well than age-matched TD children on lexical tasks.
Gaining more insight into the causes and characteristics of SLI requires a detailed diagnostic procedure.The usual battery of SLI test materials focuses on communication in structured settings, such as inviting participants to select the image that best represents a stimulus word.These experimental settings are unnatural and provide scarce information about linguistic skills in a spontaneous or semi-structured conversation (Peña et al., 2006).This argues for using narrative tests that combine spontaneous quality with a structured content.Retelling a picture story requires quite di erent competencies to those used in structured settings, such as introducing the characters, explaining the topic and structuring the text.Also, (re)telling a story challenges people to be more explicit and to produce longer linguistic units (Treurniet, 2011;Treurniet and Orgassa, 2011).As in these tasks children tend to show more linguistic variation and produce more utterances, they are an appropriate means for collecting data on morphosyntactic and lexical skills.Several studies con rm that narratives demonstrate the morphosyntactic and lexical problems of SLI children (e.g.Kambanaros et al., 2014;Vandewalle et al., 2012).
The morphosyntactic and/or lexical accuracy and complexity of narratives by SLI children were analysed in three earlier studies in the Netherlands: that of Treurniet (2011); Verhoeven, Steenge and Van Balkom (2011);and Zwitserlood et al. (2015).All three studies mention problems in the morphosyntactic and/or lexical domain for SLI children.Each used a di erent set of narrative data.The VALID Data Archive contains yet another, new narrative dataset.Departing from the Dutch studies, the following research questions and hypotheses were formulated for our study:

Method
All the tasks performed by participants in a collective research project led by Radboud University Nijmegen and Kentalis on the expression of spatial relations by SLI children in oral language production were stored in the CLARIN-NL VALID Data Archive; this so-called SLI RU-Kentalis database was one of the ve sets that were curated.This database contains narratives by SLI and TD children.In this contribution we discuss a new analysis of the Frog goes to dinner narrative.Analysing these data from a perspective that di ers from the main aim of the original project (which was to study how SLI children expressed spatial relations in this narrative) was a good test to explore whether the VALID Data Archive is easily accessible and usable for new researchers with new questions.
The Netherlands has special schools for SLI children.To be eligible for special education and extra care, children with SLI have to meet certain criteria that have been acknowledged by the Ministry of Education, Cultural A airs and Science of the Dutch government.A child's communicative and cognitive abilities are assessed in an examination by a speech therapist, a psychologist and if necessary an audiologist.The SLI diagnosis is made when a child has speech or language impairments that cannot be attributed to limited cognitive abilities.Furthermore, it has to be established that the child has problems in two or more of the following language areas: speech production, auditory processing, and grammatical development or lexical-semantic knowledge.Only the children whose scores on standardised language tests for at least two of these aspects of language are 1.5 standard deviation below average are admitted onto a special form of education.
Only the children with grammatical and lexical-semantic problems were included in the original research project.The selection was made based on the children's achievements on standardised language tests given by the SLI schools: children whose language scores were 1.5 standard deviation or more below average on at least one subtest measuring syntactic and semantic development met the criteria to be included in the research project.The children who participated in the study all have Dutch as a rst language.
The sample we used included 74 children out of the total of 93 participants in the original research project: 50 SLI children and 24 TD children, 40 boys and 34 girls.Main reasons for excluding children from the sample were that we constrained the analysis to three age groups or that for some participants less than 30 utterances that could be analysed for the narrative were available (see Bishop and McDonald, 2010).The children came from primary school classes 2 to 7 (Dutch school system), and both the SLI and the TD children were divided into three age groups in order to investigate age e ects in the development of their language pro ciency (Table 14.1).
The picture book that was used in the narrative was Frog goes to dinner (Mayer, 1974).The reason for choosing this book instead of the more famous Frog, where are you?(Mayer, 1969) was that the Frog goes to dinner book contained a large and varied number of spatial elements and relations which suited the research perspective of the main project much better.In this black-and-white illustrated story, a little boy brings his frog to a fancy restaurant.The VALID Data Archive contains not only the audio les, but also the transcripts and TextGrids for the PRAAT analysis tool (Boersma and Weenink, 2004).A TextGrid is a transcript of the audio le that can be made visible simultaneously with the audio le in PRAAT.The transcripts are available in CHAT format (MacWhinney, 2000), which makes calculation of MLU5 and Guiraud's index (CLAN) possible by using CLAN tools.
Before starting the analysis, all transcripts were processed to mark and select the utterances appropriate for the analyses to be performed.Examples of utterances labelled as not appropriate for deeper analyses on the utterance level were 'yes' and 'no' answers and straightforward formulaic utterances (e.g.'ik weet het niet' , 'I don't know').The steps of analysis are depicted in Figure 14.1.In general, in all three age groups, the TD group produced more appropriate or usable utterances than the SLI group.Both the transcripts and the audio les were used to analyse the data.
Morphosyntactic complexity was measured with MLU5, which stands for the mean length of utterances (in words) of the ve longest utterances of a child, using CLAN.
Morphosyntactic accuracy, as the percentage of correct utterances, was measured by marking utterances containing errors in: position/in ection of verbs, noun form, word order, omission of function words, and grammatical gender.Utterances that were recti ed by the child (self-correction) were labelled as correct.
For lexical complexity Guiraud's index was calculated.First a list was drawn up of all words in the transcripts, excluding proper names, onomatopoeia, and noninterpretable words.Words like 'ja' ('yes') and 'ok' were also le out, because they are not suitable to establish a child's vocabulary size (Schaerlaekens, 2008).Guiraud's index is computed by dividing the number of types by the square root of the number of tokens, resulting in a measure of richness of the productive lexicon.
Lexical accuracy was again expressed by the percentage of correct utterances.Utterances containing incorrect function words (e.g.prepositions, conjunctions) and neologisms were categorised as incorrect.The same procedure was applied if content words were used with a wrong meaning.

Morphosyntactic Complexity
This variable was investigated by selecting the ve longest utterances in words that were lexically and morphosyntactically correct: the MLU5. Figure 14.2 gives the relevant box plots for TD and SLI children.

Morphosyntactic Accuracy
In Figure 14.3 accuracy in morphosyntax is shown by the box plots of the percentage of correct utterances.

Lexical Accuracy
The box plots of accuracy in vocabulary are given in Figure 14.5.

Discussion
In this section, we rst relate the results to our hypotheses.Table 14.2 gives the partial eta squares (η 2 ) for the e ects that turned out to be signi cant.

Morphosyntactic skills H1: SLI children use morphosyntactically less complex language.
H2: SLI children are morphosyntactically less accurate.The results of our study support both hypotheses on morphological skills.In all age groups SLI children produced morphosyntactically less complex and less accurate utterances than agematched TD children.There is an age e ect as well, indicating development over time in both groups.The morphosyntactic skills of SLI children were lower than those of their TD peers.These ndings correspond to ndings in other studies : Heilmann, Miller and Nockerts (2010) consider MLU a valid diagnostic variable for SLI, and so do Dunn, Flax and Sliwinski (1996).Smith-Lock (1993) has advised to carefully interpret MLU because raw data do not warrant any direct conclusions about the complexity of language, but in combination with other linguistic variables, e.g. the percentage of morphosyntactically correct utterances, MLU is useful in identifying SLI (Moyle et al., 2011;Simon-Cereijido and Gutiérrez-Clellen, 2009).Colozzo et al. (2011) also concluded that SLI children have problems in telling a grammatically accurate story; they therefore investigated the content quality of the story, and observed that children with weak morphosyntactic skills delivered less consistent narratives.In our study an assessment of the content quality was not carried out, but in future research (using the CLARIN-NL VALID Data Archive) content quality could be used to better assess the language production processes in SLI children.
Lexical skills H3: SLI children use lexically less complex language H4: SLI children are lexically less accurate The two lexical hypotheses were also corroborated by the data.In comparison to their TD peers, children with SLI produced less lexically complex and less accurate utterances.The age e ect was straightforward for complexity: complexity increased with age.There was no main e ect of age for accuracy, but an interaction between group and age, indicating that the age di erence only a ected the SLI group.The TD children already had high lexical accuracy scores in the youngest group, and the mean score does not show a pattern of change over time.The SLI lexical scores are lower than the TD lexical scores, but importantly the SLI lexical accuracy scores are much higher (all above 80%) than the SLI morphosyntactic accuracy scores (with scores as low as 20%).Lexical problems apparently are less strong or surface less strongly than morphosyntactic problems in SLI children, a conclusion supported by the lower partial eta squared values in the lexical outcomes.The morphosyntactic and lexical accuracy scores are marked by outliers, i.e. children who have scores far higher or lower than their age and child group.The variability in scores seems more typical of SLI children, who can have extremely severe impairments in speci c linguistic domains.Remarkably, there are high correlations between the lexical and the morphosyntactic accuracy score, i.e. 0.82 for the TD children and 0.92 for the SLI children.This result deviates from the ndings in Kambanaros et al. (2014), who investigated lexical pro ciency with similar variables: SLI children performed less accurately on the lexical and morphosyntactic level, but contrary to our study no relation between the two skills was observed.
How do our results compare to results found earlier in analyses of narratives of monolingual Dutch TD and SLI children?Treurniet (2011; see also Treurniet and Orgassa, 2011 compared 7-year-old SLI children to 5-year-old TD children.The TD children, although two years younger, turned out to have the same scores as the SLI children for lexical and morphosyntactic accuracy. As in our study the scores on morphosyntactic accuracy (53%, a score lower than our scores in the same age groups) were much lower than lexical accuracy ones (88%, a score comparable to our scores).Their average Guiraud scores (5.6) seem a bit lower than ours, but they had a di erent frog story.
A second study including monolingual TD and SLI children is the one performed by Verhoeven et al. (2011), who measured MLU on all usable utterances and grammatical accuracy, with criteria similar to ours.Their MLU returned a signi cant SLI e ect (η 2 = .06),a signi cant age e ect (two age groups: 7-and 9-year-olds; η 2 = .19)and no interaction e ect.Although they computed the MLU on all usable utterances their e ect sizes were lower than ours -this perhaps demonstrates that it is preferable to restrict the MLU to the subset of longest utterances.Their grammatical accuracy scores give an SLI e ect (η 2 = .38),an age e ect (η 2 = .06),and no interaction.The scores (TD with an average of 80% grammatical accuracy and SLI with an average of 46% grammatical accuracy) are lower than we found, and the gap between SLI and TD children seems wider.
A third study on narratives is the longitudinal study by Zwitserlood et al. (2015), which included monolingual SLI and TD children as well, ranging in age from 6.5 to 8.5 years.The authors found an SLI e ect (η 2 = .27),an age e ect (η 2 = .16),and no interaction e ect between age and SLI for MLU.They found particular strong e ects for grammatical accuracy: the SLI e ect was η 2 = .72,the age e ect η 2 = .34,and there was no interaction e ect; the TD scores, with an average of 85%, are similar to ours, while the SLI scores, with an average of 48%, are much lower.
The positive outcome of all studies, including ours, is that the e ects found are comparable.The e ects all show that our measures, in particular morphosyntactic/grammatical accuracy, proved to be useful in diagnosing SLI children.On the other hand there is an obvious overlap in scores between the TD and SLI groups, restricting the diagnostic value of our measures.At the same time the results of the four studies show substantial variation in e ect sizes.Stronger e ect sizes are crucial when it comes to powerful diagnostics.The di erences in outcomes between the four studies may have multiple sources, e.g. the homogeneity of the groups of participants, the design type (cross-sectional vs. longitudinal), and the coding schemes applied to the complicated, rich narrative data.The data curated in the VALID Data Archive, accessible to all interested researchers, demonstrates the necessity to combine and accumulate data not only in their raw format but also through coding schemes and coded data, to increase the analytic power and to calibrate our research tools.That seems particularly relevant when the data are intricate, as in the case of narratives.
Finally, it is important to observe that precise details on data analysis are o en lacking in published articles, including the ones we discussed.One o en needs the detailed, original protocols to understand how speci c decisions are being taken in de ning utterances (or perhaps T-units) and subordination (directly relevant to the MLU).The same applies to the de nition of words, morphemes and accuracy measures.These decisions have direct consequences for the outcomes and may obscure the (dis)similarities between di erent studies.

Conclusion
As frequently stated in the research literature on language development, narratives are an appropriate and attractive option for gathering rich data on the linguistic, cognitive, and social competencies of SLI children (Be -Lopes, Bento and Perissinoto, 2008).Gathering, transcribing and coding narratives is laborious and time-consuming, however.Given their richness it seems self-evident to store such data sources in accessible, standardised formats.The CLARIN goals made the VALID Data Archive possible and we see this as a rst step in establishing the availability of data sources to improve and widen the research perspectives on language and speech pathology (Rietveld et al., 2005).
We analysed the dataset on SLI children available in the VALID Data Archive and found several small infelicities that could be remedied -the Data Archive is now more easily usable.Using and trying out the Data Archive is an important step not only in improving the database, but also in adding to the data new information that came out of the new analyses.It also shows how important it is to have other research data available.CLARIN also made available the data used in Treurniet (2011) and the Functional Elements in Speci c Language Impairment (FESLI)3 data (Treurniet and Orgassa, 2011); it seems self-evident to link the FESLI corpus (12 bilingual children without SLI, 25 monolingual children with SLI, 20 bilingual children with SLI) more directly to the VALID Data Archive.The Verhoeven et al. (2011) and Zwitserlood et al. (2015) data are unfortunately not accessible.The main goal of the VALID enterprise is to include other databases in the archive.An archive is pivotal in evaluating the robustness of experimental outcomes in terms of reproducibility and replicability.VALID is a proper medium to guarantee the quality and comparability of datasets.
The results of the present investigation motivate more in-depth research on morphosyntactic and lexical variables in order to improve the diagnostics and treatment of SLI children.The e ects for morphosyntax were strongest, but in absolute terms were still weak.Morphosyntactic variables are generally considered important indicators for identifying SLI (Dunn, Flax and Sliwinski, 1996).The availability of larger amounts of standardised and enriched data sources might sharpen our analyses by enabling us to focus more on correlational patterns (what is the relation between morphosyntactic and lexical problems, a correlation we found in our analysis but which is not reported in the other Dutch studies) and on actual speech patterns (by applying for instance machine learning techniques on string data).CLARIN and its data formats o er the proper perspective on establishing the rich data sources we need and, hopefully, will motivate researchers to make their data available on the internet.eye gaze to corresponding pictures (one out of two per trial) was recorded; lexical decision experiment: words (presented in combination with pictures) were correctly or incorrectly pronounced (phonemic errors); many speech elicitation experiments (various designs; digital audio recording; partial transcriptions); auditory grammaticality judgement task; all coded responses in Excel / SPSS formats; WISC digit span task; Snijders-Oomen nonverbal intelligence test; N-CDI's: standardised communicative development inventory, completed by participants' parents; Size: raw estimate of 60 GB.

A
. How do SLI children perform with regard to morphosyntactic accuracy and complexity in a narrative in comparison to their typically developing peers?H1: SLI children use morphosyntactically less complex language H2: SLI children are morphosyntactically less accurate B. How do SLI children perform with regard to lexical accuracy and complexity in a narrative in comparison to their typically developing peers?H3: SLI children use lexically less complex language H4: SLI children are lexically less accurate

Figure 14 . 2 :
Figure 14.2: Box plots of the mean length in words of the ve morphosyntactically longest utterances (MLU5), by child (TD or SLI) and age group.

Figure 14 . 3 :
Figure 14.3: Box plots showing the percentage of morphosyntactically correct utterances by child (TD or SLI) and age group.

Figure 14 . 4 :
Figure 14.4: Box plots for Guiraud's index, split out by child (TD or SLI) and age group.

Figure 14 . 5 :
Figure 14.5: Box plots showing the percentage of lexically correct utterances, by child (TD or SLI) and age group.

( 3 )
The Bilingual Deaf Children RU-Kentalis Database Informants: 11 deaf children, longitudinal; Characteristics: 5 boys and 6 girls ; 3 -6 years old; prelingual deafness (hearing loss of minimally 80dB Fletcher Index on the best ear), no mental restrictions; Aim of data collection: investigation of the bilingual language and communication development of young deaf children in Sign Language of the Netherlands (SLN) and Dutch (D); Materials available: Tests: Nijmeegse Observatieschaal voor Kleuters (NOK; SLN & D), Reynell Test voor Taalbegrip (SLN & D), Dutch version Assessing British Sign Language Development (SLN): data processed and coded in SPSS; Spider Story (SLN & D): data processed and coded in SPSS; Semi-structured conversations with deaf and hearing adults -video recorded (SLN & D): a selection of ve minutes communication per recording has been selected and transcribed in a CHAT-like format (104 recordings); Size: 4 GB complete video recordings; 1 GB selected parts video recordings; 0.1 GB selected parts transcripts; 0.5 GB test and background data.

( 4 )
The ADHD and SLI Corpus UvA DatabaseInformants: 26 Dutch children with ADHD, 19 Dutch children with SLI, 22 Dutch children controls; Characteristics: ages between 7 and 8 years; 80% male, 20% female; intelligence within normal ranges; Aim of data collection: to compare the language and executive functioning pro les of children with ADHD to that of children with SLI and TD children; Materials available: Tests: Sentence repetition task; Non-Word repetition task; Frog story narratives, processed in SPSS on morphological, syntactic and pragmatic measures; Children's Communicative Check-list II; CANTAB EF tasks for executive functioning; Size: 4 GB (67 recordings).(5)The Deaf Adults RU Database Informants: 46 deaf Dutch adults, 38 hearing Turkish adults, 24 hearing Moroccan adults, 10 Dutch controls; Characteristics: males: 22 deaf + 31 Turkish/Moroccan + 5 controls; females: 24 deaf + 31 Turkish/Moroccan + 5 controls; Aim of data collection: investigation of the acquisition of Dutch by deaf Dutch adults (late L1/early L2) and comparison to hearing Turkish and Moroccan-Arabic L2-learners of Dutch (late L2) on morphosyntactic aspects; Materials available: Test: standardised C-test Instaptoets Anderstalige Volwassenen (IAV); coded and processed in SPSS; Writing task The Frog Story: recorded and stored in ScriptLog (Holmquist), data coded and processed in Excel and SPSS; Size: 2 GB.

Table 14
boy's coat pocket and causes a number of incidents.The plot is worked out in 30 pictures.Before starting the audio recording, the children were invited to leaf through the book.During the actual retelling the researcher, if necessary, asked open questions to motivate the child to continue the narrative.The narrative was recorded using a Sony MZ-NH 700 minidisk recorder.An external microphone was added to enhance the quality of the recordings.
.1: Overview of subjects: gender; age group (N, mean age, and SD).the