Abstract

Recent estimates indicate that French is spoken as a first or additional language by over 220 million people. French is an official language in 29 countries and in many organizations such as the United Nations and the Red Cross. French is also, after English, the most widely taught language in educational systems around the world, with an estimated 120 million students and 500,000 teachers. It is hardly surprising then that there is a strong international demand for official certification of French competence and that a range of tests are on offer to meet this goal.
Among the recognized tests available for this purpose are the DELF (Diplôme d’études en langue française) and DALF (Diplôme approfondi de langue française). These are official qualifications awarded by the French Ministry of Education to certify the French competence of non-French citizens or of French citizens from non-francophone countries who have not completed a French secondary or higher education diploma.
There are six independent diplomas: three for children or adolescents (DELF Prim, DELF Junior and DELF Scolaire) and three for adults (DELF tout public, a general proficiency qualification for those over 16 years of age, DELF Pro, a work-related test for those seeking initial employment opportunities or promotion, and DALF for higher level candidates). Each test is oriented to the CEFR scale with DELF Prim pitched at the pre-A1–A2 levels for immigrants with limited literacy backgrounds, the other DELF tests spanning the A1 to B2 levels and the DALF assessing proficiency at the more advanced C1 and C2 levels. Each test battery covers the four skill components of Listening, Speaking, Reading and Writing.
DELF tout public B2
This review will confine itself to the DELF tout public at the B2 (Independent User) level, which is the minimum standard required for studying at most French-medium universities. Candidates can register for and take this exam (approved by French law in 1985) on specified dates at any of the 1186 officially accredited examination centres located both within and outside France. Although there is a wide variation in the size of the candidature across centres, the global demand for the DELF B2 has increased steadily in recent years from 18,244 in 2007 to 62,049 in 2016, making it the most popular of all the available diplomas in the DELF suite. The reason for this growth is unclear, but may have something to do with the reform of 2005, after which the qualification could be obtained by passing a single exam, rather than a series of unit diplomas (Vincent Folny, pers. comm.). Test data for 2016 show that the mean age of the candidature was 26 years, with a slightly higher ratio of females to males (60:40). Test takers came from 152 source countries, including those where French is an official language or a local lingua franca and others where it is a foreign language. In 2016, the largest candidatures were from Spain (12.6%), Italy (9.1%), Switzerland (5.3%) and Germany (5.2%).
Test description
DELF B2 is a two-and-a-half-hour pen-and-paper exam with a 30-minute Listening section followed by Reading and Writing sections, each lasting one hour. A one-on-one Speaking test of 20 minutes’ duration is scheduled separately, either before or after completion of the other three components.
The B2 Listening test (Compréhension de l’oral) comprises a range of forced-choice and open-ended short-answer questions in French testing comprehension of two to three quasi-authentic (unmodified) audio-recorded texts. Speakers’ accents may vary according to region (e.g., Marseillais or Parisian) or nation (e.g., Belgian and Moroccan), depending on the topic, with intelligibility established via pre-testing. Topics are of general interest and represent different text types. The first text is played once only and the second, more complex, text is played twice. Questions are previewed briefly by candidates before listening and time is given after the recording to complete responses.
The Reading component (Compréhension des écrits) assesses comprehension of two written texts. One of these texts is expository, dealing with a topic related to France or the French-speaking world and the other is more argumentative in nature. Question types vary from four-option multiple-choice items to open-ended questions, with some requiring extended responses of a line or more of text. Where true–false items are used, the candidate must support the choice of answer by citing from the relevant part of the text.
The Writing task (Production écrite) requires a personal response to a written prompt which could be a formal letter or a book, film review or debate. The length of the required response is 250 words. Performance is assessed analytically for linguistic and discourse features with roughly equal weighting for each. The linguistic criteria are Breadth of vocabulary, Appropriateness of vocabulary choice, Spelling, and Grammatical control and Grammatical complexity, with a slightly greater weighting given to grammatical control than to the other linguistic elements. The discourse features assessed are Relevance of response (to task requirements), Sociolinguistic appropriateness, Clarity of content, Capacity to argue a position, and Coherence and cohesion.
For the Speaking test (Production orale) the candidate must “state and defend an opinion based on a short document designed to elicit a reaction”. Thirty minutes’ preparation time is allowed for the candidate to choose one of two stimulus text options of around 150 words each. The text opens up the ground for discussion on the chosen topic about which the candidate is required first to express his or her point of view in the form of a monologue and then to defend or elaborate in response to questions from the two examiners. As with Writing, speaking performance is assessed analytically with points awarded for Lexis, Morphosyntax and Phonology as well as for Topic handling, Linking of ideas and Ability to respond appropriately to interlocutor input. Again, as for Writing, a slightly greater weighting is given to morphosyntax than to the other linguistic features.
Sample papers, which can be used to acquaint test users with the nature of the tasks and test materials, are available on the CIEP website: www.ciep.fr/en/delf-tout-public/sample-papers.
Test delivery
In France, responsibility for test administration is assigned to the French Local Education authorities (rectorats). Outside France, this responsibility falls under the authority of the French Cultural Office or the relevant French embassy in the country of concern. Registration costs are set by the Department for Cooperation and Cultural Affairs (SCAC) of the French embassy and the National Commission, on the basis of local considerations such as administrative costs and, presumably, affordability for the test taker. In major Australian cities like Melbourne, where this author decided to take the test, the local branch of the Alliance Française charges the sum of AU$200 (around 140 euros), whereas in Madagascar the fees are set as low as 10 euros.
A range of accommodations are available for candidates with disabilities such as two alternate test versions in braille and large-print versions of test papers for blind or visually impaired candidates as well as the provision of headphones for those with hearing difficulties.
Scoring procedures
Listening and reading items are clerically scored by locally trained raters based on a pre-established marking key, with clear procedures in place at the level of the administering centre to resolve any uncertainties in relation to open-ended responses. Rater feedback is gathered routinely and used to refine marking keys as required.
Both Writing and Speaking are marked independently by two trained rater/interlocutors against the above-listed analytic criteria. Marking for the Speaking is undertaken during the examination. The final mark for both Speaking and Writing is generally the average of the total number of scores awarded independently by the two raters. When discrepancies between these raters exceed five points, a third rater is called in to provide an additional set of ratings. The final mark awarded to the candidate is the average of the two closest ratings.
Quality control issues pertaining to rater behavior are delegated to local Embassies or Cultural Centres who use a spreadsheet provided by the CIEP to monitor reliability on a regular basis.
Reporting
The overall pass mark for DELF B2 and all other examinations in the DELF–DALF suite is set at 50 out of a total of 100 marks. A minimum score of five (out of 25) is also required for any single test component. Candidates are informed of their result (including the total and section scores and whether they have achieved the B2 level) within a month of taking the test, although the official certificate takes some months to be issued. A candidate reaching or exceeding the pass threshold on the DELF B2 is deemed to have the following: a degree of independence that allows him/her to construct arguments, to defend his/ her opinion, explain his/her viewpoint and negotiate. At this level, the candidate has a degree of fluency and spontaneity in regular interactions and is capable of correcting his/her own mistakes. (www.ciep.fr/en/delf-tout-public/detailed-information-the-examinations)
The qualification, once achieved, is valid for life.
Candidates who fail to obtain the level are not able to appeal their results but can apply to view their examination papers and the marks obtained at the local test centre. There is no limit to the number of times that a candidate can sit the exam, unless they have already passed, in which case they can resit only if they apply to revoke the existing diploma in writing before the examination session. This latter action carries some risk because, if a candidate does not pass the diploma when reattempting it, the previously issued diploma will remain invalid.
Evaluation
The DELF has been awarded the Association of Language Testers in Europe (ALTE) Q-mark, a quality indicator attesting to the fact that this exam has been audited and found to meet all 17 of ALTE’s quality standards (see the ALTE website, www.alte.org, for further information about these standards, which cover the five main areas of test construction, administration and logistics, marking and grading, test analysis and communication with stakeholders). Although the award of the ALTE Q-mark will be reassuring for European test users in particular, the lack of a publicly available technical manual or any published research which might attest to the quality of the DELF exam makes it difficult for language testing researchers and other users to make an informed independent evaluation of the exam. The commentary which follows is based on publicly available information on the website of the Centre international d’études pédagogiques (CIEP) (www.ciep.fr/en), the entity charged with development and maintenance of the DELF/DALF exams, and on internal documentation and commentary generously supplied to the reviewer by the testing agency and the local test administrator at the Alliance Française in Melbourne, Victoria. This author’s own experience as a recent test candidate is drawn upon to provide further insights where relevant (see Appendix for the score report).
Validity and reliability
According to the CIEP website the principles underpinning the design of the DELF are in accordance with international standards for effective language testing. There are some statements indicating that the theory of language underlying the test is compatible with the CEFR’s conceptualization of language users as social actors operating in personal, public, academic or professional spheres. The test construct is said to embody a view of language as an interactional resource and to encompass linguistic, sociolinguistic and pragmatic aspects of competence in the French language. Although the absence of any publicly available formal account of the test development and validation process makes it difficult to judge the extent to which this view of language has been captured in the actual test design, this reviewer’s own experience of preparing and sitting for the test suggests that it does indeed tap into a broad range of language resources, and involves contexts and topics that are both interesting and more cognitively and linguistically demanding than those normally broached in everyday interaction. Instructions are clear and item types are various and generally familiar, reducing the likelihood of method effects on test outcomes.
Documentation supplied to this reviewer by the CIEP also indicates a high level of professionalism in test development practices, including due attention to the matching of test items to CEFR levels at the design stage, pre-testing, statistical monitoring of item consistency, discrimination and item bias (by age, gender and nationality), using classical and Rasch analyses. Test equating has been undertaken routinely since 2011 when a large-scale study involving 10 different forms and 4300 candidates from 15 different countries was used to compare difficulty and locate anchor items as a basis for building new comparable forms. Efforts are also made to ensure the stability of pass/fail decisions for each version by boosting the weightings of highly discriminating items around the cut-score. All raters for the DELF undergo training and have to pass a test in order to be accredited. Accreditation is given for five years.
One of the documents provided to this reviewer was a report of a study comparing DELF and DALF exam scores with self-assessments of proficiency and with levels achieved on the Test de connaissance française (TCF), a shorter French placement test (also developed at the CIEP), which spans all levels of the CEFR. The study drew on a sample of 236 students who took the TCF and one or other of the DELF/DALF exams along with a self-assessment questionnaire. A comparison of test scores showed the TCF level to be a good predictor of candidates’ likelihood of success on the corresponding level of the DELF–DALF exam (i.e., a minimum of 82% of students classified at each TCF level received 50% or more on the respective DELF–DALF exam). Although this finding offers some support to the progression of difficulty implied in the different DELF examination levels (A1–C2), the sample size is rather small. Vertical test equating procedures would provide more robust evidence for the DELF-level classification. 1
The report also vouches for the close alignment between self-assessments of proficiency and actual levels of performance on the DELF, with the exception of the oral component where candidates are inclined to overrate their speaking ability. Concluding that the candidates’ estimates are either on or off the mark of course rests on the assumption that the exam results (including the arbitrary 50% pass mark) 2 are themselves a valid indicator of attainment of the respective CEFR levels as conceptualized by the designers of the CEFR (or as embodied in other well-known proficiency tests). Whether such an assumption is defensible is difficult to determine in the absence of published information about any formal processes whereby the DELF–DALF tests have been aligned to the CEFR or to other tests claiming links to the CEFR. Clearly, this is an area in which further research and/or documentation is needed.
Utility
Under this heading I deal with issues relating to test preparation test delivery and reporting, mainly from a personal perspective as a recent DELF B2 test taker.
The CIEP website contains a great deal of useful information including an online placement test to verify the choice of exam level, an overview of the test structure, a set of practice tests, a set of FAQs with helpful responses and links to published test preparation materials, many of them authored by CIEP employees. These materials and two CIEP-sanctioned texts borrowed from the Alliance Française library were particularly helpful to this reviewer in preparing for the exam, especially after receiving information that an intensive test preparation course I had planned to attend was to be cancelled owing to lack of demand.
The score report, emailed to me as a pdf attachment within a month of taking the test, provided numerical information only: the four sub-skill scores and the overall total (see Appendix). There is no indication on this report of what these particular scores mean or any link to the relevant CEFR profile which might have been helpful for users not already familiar with CEFR. Had I not succeeded in meeting the B2 level requirement I might also have welcomed advice about areas and avenues for improvement in the particular skill areas needing attention.
Although I was pleased to learn that I had attained the B2 diploma, I wondered about its lifelong validity, a status which holds for all official diplomas in France. This makes sense where the qualification is viewed merely as a marker of learning achievement. However, given that (a) a DELF score can also serve as evidence of readiness for university entry and for other important professional functions, and (b) attrition may be an issue for those like myself who do not have occasion to use French on a regular basis, one would hope that institutional users are made aware of the potential limits to the durability of the qualification and of the need, where the stakes are high, for candidates to provide recent evidence of their level. That said, there is no strong basis in research for the alternative convention adopted by many language testing agencies of a two-year shelf life for test results. Given individual variability in attrition rates, it is difficult to provide hard and fast advice on this issue.
Fairness/integrity
Here I consider a number of factors that might raise fairness concerns among test candidates or result in the integrity of the DELF scores being called into question. Again these are considered largely from a personal perspective based on my experience of a particular exam session.
A potential fairness limitation in relation to the particular Listening version I encountered was that of the sound quality of one of the listening texts, which, while audible, was less than optimum. This is likely to have been a site-specific delivery issue caused by poor quality equipment - a situation I might have queried had the stakes for me been higher. As noted above, the responsibility for test administration is with the local Cultural Office or French Embassy, and therefore not subject to direct scrutiny by the CIEP.
Another issue that might have worried me if using the DELF B2 result for academic or professional advancement was the absence of any avenue for appeal should my performance have been rated as unsatisfactory. Although unsuccessful candidates have the right to request a review of their examination papers, this is for feedback purposes only, as the examiners’ decision is considered final. Since responsibility for marking and monitoring of rater quality outside France is assigned to local jurisdictions, central verification of results by the CIEP appears not to be a legal option.
I was also surprised, given my knowledge regarding the lengths that other major testing agencies go to avoid security breaches, that there was no verification of my identity when presenting for the one-on-one oral exam. Could I have sent a proxy to speak on my behalf? It would seem so. Proof of identity was requested in the subsequent written exam but a different invigilator was involved. My result could therefore quite easily have reflected the efforts of two different people. Moreover, half way through written exam session one of the candidates asked if she could take a small break. She was allowed to leave the room, apparently without an escort, and returned some minutes later. Had she been consulting her dictionary?
These may be simply benign site-specific lapses in an environment where the candidature is very small (only 49 candidates took the DELF B2 in Melbourne in 2016) and where, according to the local DELF test administrator, 60% of test takers in recent years have taken the test for reasons of personal satisfaction as a mark of achievement in their language study trajectory. Nevertheless, given that the remaining 40% may have taken the test as a requirement for high-stakes purposes such as study or work in a French-speaking environment, the tightening of test security is clearly an issue that needs further attention. While the CIEP runs a two-day annual briefing meeting with cultural services in Paris and conducts regular test centre audits around the world, guaranteeing security across more than 1000 test locations in different jurisdictions is a daunting task. It is also a strain on resources, given the relatively small size of the candidature compared with other large-scale English proficiency tests, a constraint we should be mindful of in relation to many tests in languages other than English.
Concluding remarks
The DELF B2 is a well-designed professionally developed exam that is communicative in its orientation. It fulfils the important dual functions of certifying achievement on French language courses and at the same time measuring proficiency or readiness to use the language in real world contexts such as academia. The latter use comes with particular challenges. The fact that important decisions may hinge on DELF B2 results places the onus on DELF administering authorities to closely monitor test security and to ensure fairness and transparency for all test users.
An extract from the ILTA guidelines for practice (www.iltaonline.com/index.php/enUS/component/content/article?id=122) is cited below as a reminder to all language testing agencies of what might be expected with regard to the obligations of those preparing and administering publicly available tests:
Make a clear statement as to what groups the test is appropriate for and for which groups it is not appropriate.
Make a clear statement of the construct the test is designed to measure in terms a layperson can understand.
Publish validity and reliability estimates and bias reports for the test along with sufficient explanation to allow potential test takers and test users to decide if the test is suitable in their situation.
Report the results in a form that will allow test users to draw the correct inferences from them.
Refrain from making any false or misleading claims about the test.
Publish a handbook for test takers which:
6.1. Explains the relevant measurement concepts so that they can be understood by non-specialists
6.2. Reports evidence of the reliability and validity of the test for the purpose for which it was designed.
6.3. Describes the scoring procedure and, if multiple forms exist, the steps taken to ensure consistency of results across forms.
6.4. Explains the proper interpretation of test results and any limitation on their accuracy.
Clearly, many resources have been poured into the development of the French certification system and the current DELF B2, having satisfied the auditing requirements for an ALTE Q rating, can be considered sound in quality. Most of the above guidelines appear to have been either partially or fully followed by the CIEP, but item 3 and item 6 deserve greater attention. Reservations expressed in this review with regard to these two areas could be easily allayed by making internally compiled information more readily accessible for test users. Where information is lacking this could be remedied through a program of ongoing validation research, undertaken both internally and in collaboration with respected members of the international language testing community.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
