Abstract
This article reviews the approaches taken by the courts to the admissibility of voice identification evidence in four jurisdictions: England and Wales, Scotland, Northern Ireland and the Republic of Ireland. Each jurisdiction addresses the question in a different way and each approach is open to criticism. This paper will argue that a contextualised approach to the problem would allow for improvements which would enhance the quality of the evidence and the adjudicative process.
Introduction
The history of the law of evidence is the history of a series of largely isolated responses to particular problems at different times. (Twining, 1985: 1)
In this paper I will argue that an analysis of the approach taken in four similar common law jurisdictions to a particular facet of identification evidence, namely the identification of speakers by recognition of their voice, reveals contradictory and inconsistent legal approaches to its collection, presentation, admissibility and assessment. I suggest that the central cause for this is the development of rules via the expository tradition of evidence scholarship. The problems of this approach were recognised by Twining in his 1983 essay ‘Identification and misidentification in legal processes: Redefining the problem.’ 1 Part of the problem identified by Twining is that the legal system focuses primarily on the decisions of appeal courts in individual cases, and in doing so fails to address the systemic problems which may cause injustice elsewhere. I will argue that a contextual review of voice identification evidence which draws on the insights of other disciplines is called for. This would enable the development and refinement of procedures for the collection and presentation of evidence which maximise the chances of accuracy in the decision making process.
The knowledge that miscarriages of justice have arisen from mistaken identifications being made by honest, but unreliable, witnesses has played a major role in shaping the rules of evidence in common law legal systems (see, for example, Devlin, 1976). Where a witness purports to identify an individual through recognition of their voice rather than their physical appearance, the risk of an error occurring is recognised as being even greater than in cases of eyewitness identification (a type of evidence that has always been regarded as particularly prone to error by the common law). Each criminal justice system develops its own processes to minimise the risk of a wrongful conviction through a combination of testing a witness’s ability to distinguish voices, supporting the identification with expert evidence, assessing the cogency of the identification before admitting it into evidence and giving appropriate directions to the jury. As this paper will show, in the case of earwitness evidence, principles are developed, primarily by the courts, on a case-by-case basis. By examining the approaches of four similar jurisdictions I will demonstrate that this leads to inconsistency of approach in an area where similarity would be expected.
This paper examines the approaches taken in England and Wales, Scotland, Northern Ireland, and the Republic of Ireland, and assesses their strengths and weaknesses when reviewed against the current research on aural recognition and memory. The four jurisdictions chosen have many features in common; both in terms of their structures and procedures, and the populations who are served by them. All four are common law legal systems in which the prosecution must prove their case against an accused beyond a reasonable doubt. All require the most serious cases to be tried before a jury, encouraging judges to exclude evidence which may be unduly prejudicial and requiring them to caution juries about the inherent risk of misidentification. Geographically the four legal systems are adjacent to each other (the first three comprising the United Kingdom). Although there are a large number of native languages and dialects, English is spoken, either as a first language or fluently, by a large proportion of the population (Central Statistics Office, 2011; Office for National Statistics, 2011) and although Welsh speakers in Wales 2 and Irish speakers in the Republic of Ireland 3 have the right to have court proceedings conducted in those languages, English is the primary language of the court system.
Given these similarities it might therefore be expected that there would be a consistency of approach in the collection and presentation of voice recognition evidence. The problem which the courts are trying to address in each jurisdiction is the same: How does the legal process ensure that fact-finders are properly able to assess the complex process by which a witness hears, memorises, compares and then recalls the voice of another? This paper will demonstrate that although the procedures adopted by each legal system seek to achieve the same result, in certain respects—particularly in respect of identification procedures adopted in the investigation phase—there are differences which create a disparity of approach and may produce a disparity of outcome.
The problems which I identify are by no means unique to this cluster of jurisdictions. Courts in Australia (McGorrery and McMahon, 2017: 1), Canada (Sherrin, 2015) and the USA (Tiersma, 2003) have all been criticised for their ad hoc approaches to voice identification, which have evolved out of the eyewitness literature but which pay only cursory attention to research from other disciplines (particularly psychology). The problems I identify are found throughout common law jurisdictions.
Twining identified the need for a new approach to how identification evidence is conceptualised and dealt with as long ago as 1983, but despite this, this paper suggests that the difference in practice across these four jurisdictions is indicative of little progress being made and suggests the need for a contextualised approach to understanding the challenges in this area. I will discuss the areas where research in disciplines other than law could inform practice and suggest what the impact of such research could be.
The problem with voice identification
The circumstances in which voice identification can provide crucial evidence in a criminal trial are infinitely variable. A witness may claim to recognise someone they have been in the presence of but not seen as a result of the perpetrator wearing a mask, the witness being blindfolded or the perpetrator being otherwise ‘out of sight’. Offences may be committed over the telephone or law enforcement officers may conduct audio surveillance which only captures voices. One of the earliest examples of a voice identification being contested was in the trial of Captain William Hulet, tried and convicted of being the executioner of King Charles I, but whose death sentence was commuted to life as the trial judge believed the jury’s guilty verdict to be incorrect (Cobbett, 1810: 1185).
Identification evidence is an inherently unreliable mechanism upon which to base a finding of fact. Where the sole, or major, evidence in a case is a witness who purports to put a defendant at a particular location which that defendant disputes being at, a conviction or acquittal will often be determined by an assessment of the accuracy of the identification. If a witness places a defendant at the scene of the crime and appears confident of his or her identification then this can be compelling evidence, especially when a defendant is unable to corroborate his or her alibi. There is, however, no correlation between confidence and accuracy (Wells et al., 1979) and a prosecution witness whose evidence is credible and persuasive may, nonetheless, be wholly wrong on the only issue to which his or testimony is directed. There may well be very little that an advocate challenging such evidence can do to effectively contradict it. The general principles which can render eyewitness evidence unreliable are well documented and recognised in the legal system. The English case of Turnbull 4 is recognised in a number of jurisdictions as providing a framework within which eyewitness evidence should be analysed. Whilst the principles for how eyewitness evidence should be collected and treated in court are well established, earwitness evidence raises some specific issues which are discussed below.
The recording of an initial description
It is recognised that the initial description of a suspect provided by a witness can be of critical importance in any subsequent criminal trial, especially when at the time of making the observation the witness did not know the suspect (Devlin, 1976: para. 5.8). A description which is recorded proximately in time to the observation provides a contemporaneous account against which subsequent claims of identification can be assessed (Devlin, 1976: para. 5.12). In many cases the first complaint will be made to a police officer who will record the entry in a notebook. At a later date, the witness will be asked to provide an evidential statement which will also contain a description. This practice enables the advocates and fact-finders to identify any inconsistencies and protects against revisions to the identification being caused by subsequent exposure to the suspect prior to a trial.
When an eyewitness gives a physical description of a suspect, the features which a witness can be expected to describe are within the normal experience of both the witness and the investigator as giving physical descriptions of individuals is a routine activity in society. This permits the detailed recording of a preliminary description. Police officers in England and Wales are trained to use the mnemonic ‘ADVOKATE’ when recording an initial description of a physical description from an eyewitness (College of Policing, 2018). The acronym ‘ADVOKATE’ derives from the criteria found in Turnbull 5 and officers recording descriptions are required to address each of the headings with the witness. This aims to ensure thorough collection of evidence and enable those who need to review the evidence to be able to assess it against the framework which ultimately be used at trial.
In cases where a witness is being asked to describe a voice they have heard, there are fewer points of reference which are within the normal descriptive powers of a lay witness and the vocabulary used to describe the voice may be broader (Yarmey, 1995). This reduces the number of points of comparison against which the original description and the suspect’s voice can be compared and, in doing so, reduces the ability of the fact-finder to discern points of dissimilarity. There is no equivalent to the mnemonic ‘ADVOKATE’ for recording descriptions of voice.
Identification procedures
It is recognised that in cases of eyewitness identification, identity parades are of significant forensic value in demonstrating that a witness can identify a suspect in controlled conditions (Devlin, 1976: para. 8.7). This is the case, either when the suspect is unknown to the witness (in which case the parade can provide evidence which can inculpate or exculpate a suspect from a crime) or where the suspect disputes the identification of witness who purports to recognise him or her (in which case the parade serves to corroborate the witness’s ability to recognise the defendant). Where a witness fails to identify a suspect, this is often a point of significance in the presentation of the defence case. Two issues are of significance in assessing how legal systems regulate identification parades: the first is the manner in which the parades are conducted, the second is the mechanisms by which investigating authorities are compelled to undertake them and the consequences to the prosecution of them not doing so. Whilst the procedures for visual identification are well established within each of the jurisdictions concerned, the procedures for conducting voice parades are not. As a result of this, in cases involving ‘earwitnesses’ the investigation and trial processes are deprived of a tool which can help to inculpate or exculpate suspects. An eyewitness parade can capture the physical appearance of the suspect and quickly identify similar ‘foils’ amongst which the suspect’s image can be hidden. In the case of earwitnesses this is more challenging (Hollien, 1996). A number of factors contribute to this; there is perceived to be a greater risk of plasticity in the human vocal range (making disguise more likely) (Clifford, 1980), the content of the speech itself may, if not carefully selected, provide cues to the identity of the speaker, and finding suitable foils which offer a fair chance to make a selection is difficult (Hollien, 1996). The level of complexity in producing and conducting a procedure is likely to impact upon the approach a court takes in assessing how failure to conduct such a procedure should be addressed (Robson, 2017). In cases where there is a widely used process for conducting identification procedures, underpinned by a Code of Practice, the failure of the investigating authorities to conduct such a procedure can be a matter for legitimate criticism by the defence and judge (Devlin, 1976: para. 5.88). In cases where there is less clarity as to how and when such procedures should be used, it becomes more difficult for the defence to mount such a challenge.
Methods of expert comparison
In cases where a recording of the voice exists, expert comparisons can be admitted to either prove or disprove the contention that the suspect is the speaker. There are two methods of comparison used by expert witnesses to provide evidence linking a disputed recording of a suspect’s voice with a known sample of a suspect’s voice—auditory analysis and acoustic analysis. Auditory analysis involves an expert (usually in phonetics) listening to the disputed sample and the known sample and recording similarities and dissimilarities. Acoustic analysis involves the analysis of a spectrograph, produced electronically, which provides a graphical representation of the voice which can be compared with another. A full critique of the strengths and weaknesses of these methods can be found in Ormerod (2002). However, in broad terms the safety of auditory analysis is determined by the methodology of the expert practitioner and no major study has assessed error rates. Acoustic analysis proceeds on the assumption that each speaker has an individual speech pattern. However, as Ormerod notes, this assumption may be a fallacy, with factors such as deliberate disguise and the quality of the recording creating results which may implicate the innocent or exculpate the guilty (2002: 780).
Suggestibility
Lay listeners are more likely to be influenced by external factors than eyewitnesses (Yarmey, 1995). Where lay listeners are asked to hear a voice and assess who it belongs to, the presence of an individual in a police station or as the defendant in a criminal trial is likely to influence the listener into assuming that the voice they hear is the voice which is in dispute. In some cases witnesses (especially police officers) have been asked to identify witnesses present in an interview room. 6 In other cases, juries may be asked to assess whether the voice of a defendant who has given evidence matches a contested voice played to them. 7 In each of these examples, a certain identification may come, not from an accurate comparison, but because of a broad similarity coupled with the bias caused by the listener’s knowledge that the suspect has been detained.
The above list is by no means exhaustive but it indicates some of the areas in which the risks of errors in the assessment of voice identification evidence are greater than with eyewitness evidence. This justifies the problem of voice identification being treated as an issue in its own right, rather than as a variant of eyewitness identification. Each of the jurisdictions discussed recognises this and to a greater or lesser extent has modified its procedures accordingly but in different ways.
England and Wales has (relatively speaking) the most extensive set of rules and procedures and so this analysis begins with that jurisdiction.
England and Wales
Voice identification is admissible in England and Wales and a conviction can be based solely upon such evidence. The approach taken to it has developed within the framework of rules for eyewitness identification. England and Wales has a well-established procedure for eyewitness parades under Code of Practice D to the Police and Criminal and Evidence Act 1984 (‘PACE’).
8
Officers are required to record a description of the suspect ‘as first given by the eye-witness’.
9
The Code contains considerable guidance to investigating officers on how identification procedures should be conducted to ensure their probity. In their early incarnations, the Codes contained a provision that witnesses observing an identity parade could ask for a person on a line-up to speak.
10
In 2005, (in part as a result of move away from standard physical line-ups), this provision was removed. Voice parades are now addressed in a paragraph which states; While this Code concentrates on visual identification procedures, it does not preclude the police making use of aural identification procedures such as a ‘voice identification parade’, where they judge that appropriate.
11
Whether or not a failure to conduct a voice parade amounts to a breach of the PACE which would potentially make the evidence liable to exclusion under s. 78 of the Act or liable to adverse comment in summing up is currently unclear. In Gummerson (a case decided on the 1995 iteration of Code D) the Court of Appeal ruled that the failure to conduct a voice parade did not amount to such a breach, stating that the fact that Code D made no mention of voice parades meant that it was not intended to cover this situation. 14 Two developments since this case cast doubt on the current validity of this proposition. The first is the House of Lords ruling on identity parades in the case of Forbes, in which it was held that the purpose of Code D was to avoid individual officers exercising their discretion in different ways. 15 The second is the revised wording of Code D, which since 2005 has explicitly incorporated voice parades. Although the language in relation to voice parades in the code is less robust than it is for eyewitness identification parades, it is arguable that it at least requires investigating officers to be able to conduct the procedure if they consider it necessary. A further attempt was made to address the point in the Court of Appeal in Suleman. However leave to appeal on this ground was rejected by the single judge, although primarily on the basis that it was not a case of voice identification by a lay witness. 16
The approach of the courts to identification evidence generally is found in the case of Turnbull, which recognises the inherent risks in a conviction based solely on identification evidence. 17 Turnbull requires a trial judge to remove a case from the consideration of the jury if it is a ‘fleeting glimpse’ or otherwise impeded and no other corroboration exists. This remains one of the few remaining examples in English law where a judge can be permitted to identify corroborative evidence before allowing a case to proceed to the jury, as the trend in the Law of Evidence has moved away from formal corroboration requirements and towards discretionary judicial directions to the jury. Although Turnbull focuses on eyewitness evidence, in Hersey the Court of Appeal held that the principles within it applied equally to aural identification and in such cases trial judges should give a suitably adapted direction. 18 This direction should warn of the risks of mistaken identification, caution that an honest witness may be a mistaken witness and remind the jury of the strengths and weaknesses of the identification. No further guidance was given in this case as to those factors which might amount to ‘strengths’ and ‘weaknesses’ (Judicial College, 2017: 15.7 para. 2).
The most recent edition of the Bench Book provides more detailed guidance for judges as to the factors which are specific to voice identifications which are not present in the standard and more familiar Turnbull warning. These include;
Audibility of speech heard
Environmental factors affecting hearing of speech
Duration for which speech heard
Number of voices heard
Whether it was heard directly or by phone
Whether there was an identified attempt to disguise the voice
Hearer’s hearing disability or other impediment (if any)
Variety of speech heard
Degree of familiarity with speaker
Distinctiveness or accent of speaker
Whether speaker spoke in native tongue
Lapse of time between hearing and identification process (Judicial College, 2017: 15.7 para. 4).
The most detailed consideration of how the criminal courts should approach the admissibility of voice identification is found in the case of Flynn and St John.
19
This case concerned police officers, who had arrested and interviewed the two defendants in connection with a conspiracy to commit robbery, subsequently identifying their voices from covert tape recordings of a van used in the robbery. The Court of Appeal allowed the defendants’ appeal, concluding that the evidence should be excluded due to the poor quality of the recordings and the failure to keep any adequate record of the circumstances in which the tapes were listened to. In discussing voice identification generally, Gage LJ addressed the issues surrounding two types of voice identification; expert comparison conducted using acoustic or auditory analysis, and lay listener identification derived from witnesses who had a special knowledge either acquired from a close relationship, frequency of contact or specialist knowledge acquired through study of a known sample of a suspect’s voice. In respect of the latter, the court observed that: in our opinion, the key to admissibility is the degree of familiarity of the witness with the suspect’s voice. Even then the dangers of a mis-identification remain; the more so where the recording of the voice to be identified is poor.
20
Whilst in each of these cases the identification may well have been correct, it appears that in many cases the court at first instance and on appeal has not been focused on using a criteria-based approach with a view to assessing the cogency of the voice identification evidence in isolation, but instead looking at the remainder of the evidence and, if there is other evidence, allowing the jury to make this decision. This is an unsatisfactory approach. It is accepted that there is no correlation between witness confidence and accuracy and that a confident witness may be a persuasive witness. Voice recognition evidence which lacks forensically objective criteria making it safe is, if presented, by a confident witness, more prejudicial than probative and this defect cannot be saved by direction to the jury. An example of the dangers inherent in this approach can be found Dwain George. 24 The appellant was convicted of murder in 2001. Part of the case against him came from a witness who heard the murderer say, ‘You’re dead now.’ The witness had briefly attended school with the appellant three years prior to the event, but had never spoken to him. It was argued at trial that the judge should direct the jury to disregard this identification but this was rejected in favour of giving the jury a robust warning about the risk of misidentification. The Court of Appeal approved this approach and upheld the conviction, noting the corroboration which came from the presence of gunshot residue on clothing. 25 Subsequent doubts were cast on the validity of the gunshot residue and the case was referred back to the Court of Appeal by the Criminal Cases Review Commission. On a second appeal, the Court of Appeal accepted that the gunshot residue evidence was unreliable, and whilst not departing from the approach taken in the previous appeal in respect of the voice identification, they accepted that the admission of doubtful residue evidence had potentially bolstered a weak identification. Undoubtedly, when looked at in isolation, the evidence of the witness who claimed to recognise the appellant lacked any form of cogency and should have been removed from the consideration of the jury.
The authorities suggest that when confronted with a voice identification the courts are unwilling to consider its admissibility in isolation, preferring instead to assess he sufficiency of the evidence against the accused as a whole. This is perhaps understandable. Under s. 78 of PACE, applications to exclude evidence must be made before the evidence is admitted. 26 The points which are likely to support such an application are likely to be exposed during cross-examination of the observer during trial and leave the advocate with no choice but to rely on a submission of no case to answer as a means of establishing that evidence is insufficiently probative. This requires an assessment of the prosecution case as a whole and, where corroboration exists, it means the question of identification becomes one of weight rather than admissibility. 27
In respect of expert comparisons, the Court of Appeal have accepted the admissibility of both acoustic and auditory evidence. In Robb it was agreed by the appellant that voice identification was a suitable field for an expert to give evidence upon. The conviction was challenged upon the basis that an auditory analysis (conducted by an expert in phonetics listening to a recording of kidnapper and a recording of the defendant) required verifying with an acoustic analysis (requiring an analysis of electronic spectrographs of the tapes). Whilst the court accepted that the approach of the Crown’s expert represented ‘a minority view in his profession’, they accepted that his expertise was such that is was permissible to rely on him. 28 In Flynn and St John, the court accepted that the most reliable means of establishing identity was a combination of acoustic and sophisticated auditory techniques. They rejected the proposition, derived from O’Doherty (discussed later, in relation to Northern Ireland) that auditory analysis on its own was insufficient to found a prosecution.
Of the four jurisdictions under discussion, the English system offers the clearest support for the use of some form of a voice parade which offers an objectively controlled and scientifically tested system of pre-charge testing of the witness. The extent to which a defendant can rely on a police failure to conduct such a procedure to his or her advantage at trial remains open to question. This lack of clarity has enabled some police forces to make policy decisions to discount voice parades as an investigative technique.
Although judges are given guidance in factors to take into account in directing a jury, there has been a reluctance to engage in pre-trial assessments of admissibility rather than a retrospective review in the light of all the evidence. Although Flynn and St John attempted to set a high threshold for the admissibility of such evidence, subsequent decisions in the Court of Appeal have shown an unwillingness to adopt this standard in cases where other evidence appears compelling.
Scotland
The Scottish Law of Evidence is underpinned by a principle of corroboration requiring two pieces of evidence to be put before the court to establish a prima facie case (Alison, 1833: 551). In cases of eyewitness identification the threshold for corroboration is a low one (the position being expressed by General Emslie LJ in Ralston, ‘…where one starts with a positive identification by one witness, then very little else is required’). 29 There has been little discussion of the position in respect of voice recognition. Theoretically, therefore, a defendant cannot be convicted solely on the basis of a witness’s identification of their voice. In Burrows the High Court of Justiciary allowed an appeal where one witness heard, but did not see, an attacker whilst a second witness recognised the attacker but did not hear him speak. 30 The court concluded the aural evidence was too thin to corroborate the eyewitness evidence. Evidence of voice recognition conducted over the telephone is competent evidence although it may be subject to challenge. 31
There are substantial differences between the approaches to identification evidence generally between Scotland and England and Wales. Scottish law permits (and indeed encourages) witnesses making dock identifications and there is no requirement to conduct an identification parade (although failure to do so may result in a direction to the jury to treat any subsequent identification with caution). It is permissible to hold a dock identification where no parade has been conducted (Judiciary Studies Committee, 2012: ch. 16). Identification procedures in Scottish law are not governed by PACE but instead by the Lord Advocate’s Guidelines on the Conduct of Visual Identification Procedures (Crown Office and Procurator Fiscal Service, 2007). Notwithstanding the title of these guidelines, provision is made, at Annex G, for conducting voice parades. These are conducted in the manner of live parades, with the suspect being placed amongst volunteers chosen for ‘voice and accent similarity’. No definition is given as to how this is gauged, although the suspect and his legal representative have a right to hear the speakers prior to the parade. Each member of the parade is asked to speak and the witness is asked if they recognise any voice in particular. If they do not, they are asked if any voice sounds ‘similar’ (Crown Office and Procurator Fiscal Service, 2007: Appendix G). Although this procedure is ascribed for identifications made primarily by voice, the provisions for standard parade line-ups based on visual identification explicitly permits the foils to be allowed to be asked to speak (indeed in choosing the appropriate type of parade the investigating officer should take into account the possibility of a witness requesting this). This means that line-ups which have been assembled for the purposes of a visual recognition may become, at the request of the witness, a test of the aural recognition without any safeguards to ensure the similarity of the foils having been put in place in the creation of the parade.
Whilst these rules make the tests easier to administer than voice parades of the type conducted in England and Wales and may facilitate their use more frequently, they pose a number of substantial dangers which greatly outweigh any benefit they may produce. Live voice parades have long been recognised as being less reliable than recorded parades and guidelines in other jurisdictions explicitly prohibit their use. 32 It is therefore surprising that not only is their use tolerated, but that it is mandated. ‘Similarity’ is not defined for the purposes of the assessment of foil suitability. Almost by definition, foils will be selected by age and gender to match the physical appearance of the suspect which may produce misleading points of visual similarity. There is no requirement that any expert plays a role in ascertaining the similarity and no standardised method exists for describing the voice of either the suspect or the foils. Similarity is not an objective concept; a witness whose accent is from a similar region to a suspect may be able to give a more nuanced analysis than a witness identifying a suspect with a foreign accent (see for example Atkinson, 2015). The presence of visual cues to appearance (which may have been absent in the original identification) risk irrelevancies tainting the cognitive process, as do the absence of factors which may have impeded the original identification (such as the voice being heard through a mask or over a telephone).
The Scottish rules permit a holistic identification based on a combination of appearance and voice. In Farmer v HM Advocate, victims of a masked robber were permitted to view a parade where the participants were masked and asked the members of the parade to speak. Both witnesses indicated that they were identifying the suspect primarily on stature and the appearance of his eyes and in the case of one witness corroboration from his voice. The Appeal Court deemed this sufficient evidence to go before the jury and declined to interfere with the verdict, making it clear that if a positive identification was made, its assessment was a matter for the jury alone. 33 This appears to indicate an unwillingness by the Appeal Courts to entertain an argument that identification could be excluded by the trial judge if the quality of it was objectively too poor.
In charging the jury, judges are required to remind jurors of the risks inherent in identification evidence and of the cautionary approach to be taken in assessing identification evidence, and the form of the direction mirrors Turnbull. Although the Jury Manual reminds judges of the competence of voice identification evidence, it gives no additional guidance to those factors which are peculiar to voice identification and no specimen direction (Judiciary Studies Committee, 2012).
The approach taken by the justice system in Scotland to the question of voice identification does little to provide assurance that evidence is properly tested at a pre-trial stage or that juries are made aware of the specific risks of a witness purporting to recognise a speaker’s voice. Mistaken voice identification caused by a flawed identification procedure has played a part in at least one accepted miscarriage of justice—that of Patrick Meehan, who was incorrectly identified as being the masked assailant involved in a murder and robbery in 1969 (Kennedy, 2003: 118). Whilst this misidentification was almost certainly facilitated through deliberate prejudicing of the witnesses, it remains a reminder of the serious consequences to an individual of inadequate pre-charge safeguards (Kennedy, 2003: 118). The requirement of corroboration provides little certainty that evidence will be properly assessed. As Davidson and Ferguson argued (in discussing the mooted abolition of the corroboration rule in respect of eyewitness identification evidence): It may therefore be that legislators should be concentrating on improving the quality of identification evidence through such devices as the guidelines provided for the authorities under Code D of the Police and Criminal Evidence Act 1984 in England, and that corroboration in this context is more of a red herring. (Davidson and Ferguson, 2014: 18)
Northern Ireland
The Bench Book for Northern Ireland draws heavily on the English authorities discussed above and, on the basis of Gummerson, makes the assertion that voice evidence does not need an identification procedure to render it admissible (NI Judiciary, 2015: 107). Judges are directed to provide a suitably adapted Turnbull direction although no guidance is provided as to what adaptation is necessary. The directions refer the Court of Appeal judgment in Mullan, although this judgment provides no further assistance as to what direction was given by the trial judge other than to observe that he dealt with it ‘admirably’. 34
Identification procedures in Northern Ireland are governed by Code D of the Police and Criminal Evidence (Northern Ireland) Order 1989 with the language of paragraph 1.2 of the 2015 iteration of the code in respect of voice procedures replicating the language of Code D of PACE in England and Wales. The arguments about whether failure to conduct a procedure amounts to a breach of PACE are likely to follow the same reasoning as in England and Wales and the statement that an identification parade is not necessary needs to be treated with caution.
There is only one appeal case in which a substantial discussion regarding voice identification has taken place, but it is a significant one for its discussion of the approach to be taken to expert evidence.
In O’Doherty, the appellant was convicted of an aggravated burglary. 35 The facts of the case are not fully summarised in the judgment, but the main evidence against him was a recording of a call for an ambulance. The prosecution sought to link the defendant to the tape of the call in three ways: first, via the evidence of a police officer who knew the appellant and had heard the tape; second, an expert witness (Mrs McClelland) who conducted an auditory comparison and stated that the voice on the tape was ‘very likely’ to be that of the appellant; and third by inviting the jury to conduct their own comparison. Leave to appeal was refused but the case was referred back to the Court of Appeal by the Criminal Cases Review Commission. Fresh evidence was presented from Dr Nolan, who drew attention to the high error rate associated with auditory analysis and the potential impact of bias upon listener judgment. He noted that there were clear and obvious differences between the acoustic analysis he had conducted and the auditory analysis. Dr Nolan observed that auditory analysis was able to tell whether two speakers had the same accent but to go beyond that required a quantitative acoustic analysis. As the sound quality on the tape was degraded, Dr Nolan concluded that many of the key sounds required to conduct such an analysis were missing. Dr Nolan’s conclusion was that voices were likely to have originated from different speakers.
The Crown relied upon reports from Dr Peter French (who accepted the underlying premise to Dr Nolan’s evidence but disagreed with the interpretation) and Mrs McClelland (who doubted the benefits of auditory analysis).
The Court concluded the conviction was unsafe. It noted that Mrs McClelland was adopting the same method of analysis which had been deemed to be admissible in Robb. The Court of Appeal concluded: that in the present state of scientific knowledge no prosecution should be brought in Northern Ireland in which one of the planks is voice identification given by an expert which is solely confined to auditory analysis. There should also be expert evidence of acoustic analysis such as is used by Dr Nolan, Dr French and all but a small percentage of experts in the United Kingdom and by all experts in the rest of Europe, which includes formant analysis.
36
The court considered whether it was permissible for jurors to make their own comparisons between the tape and the voice of the defendant giving evidence. By reasoning in line with the authorities on identifying defendants from CCTV tapes, they concluded that it was, but that it required a clear Turnbull, modified for the circumstances of the case, and the absence of such a direction rendered the conviction unsafe.
In mirroring the English Codes of Practice, the Northern Irish approach acknowledges the need for a robust procedure but lacks clarity in the mechanisms to enforce this. As noted earlier, the Court of Appeal of England and Wales, in Flynn and St John, rejected the approach of not permitting prosecutions to proceed solely on the basis of auditory evidence. It would appear that Northern Ireland has adopted a more cautious approach to the question of founding convictions based on expert earwitness identification.
Republic of Ireland
The issue of voice identification evidence has only been discussed at length in one Irish appeal case, that of Crowe. 37 Crowe is worthy of further discussion both as an illustration of the problems which can arise in a prima facie compelling case, and to demonstrate how the Court of Criminal Appeal had to devise an approach in the absence of safeguards.
Crowe was charged with sending a menacing message to a member of the Garda, Detective Sergeant Smith. DS Smith had received a call on his work mobile which was recorded as lasting 32 seconds. During this call the speaker asked if he was speaking to Robert Smith, to which Smith responded his first name was Denis. The speaker then made a series of threats. An investigation was launched in which DS Smith did not participate. The telephone used was registered to Mr Crowe and that evening the Gardaí attended his address, where they found the appellant, his partner, his child and a man named Byrne. The appellant was asleep, but on the pillow by his head was the telephone from which, it was agreed, the call was made. Crowe was arrested and interviewed. The interview was video recorded. The appellant would not disclose his PIN to unlock the phone but admitted that only he knew it. He denied making the call. The following day DS Smith was shown the video and after listening for approximately 15 seconds indicated that he recognised the voice as being that of the caller. Smith stated he knew the appellant but they had never had any quarrel. He stated that he recognised the voice from the recording not from his previous knowledge of Crowe. When pressed as to how he knew who it was, he replied: There is nothing specific in his accent that would have made me think otherwise, think other than that it was him.
A number of points were raised on appeal relating to the admissibility of the evidence, both in terms of its probative value and the use of the recorded interview without a caution. In the absence of any guidance in Irish law, the court conducted a thorough analysis of the English case of Flynn and St John which emphasised the need for extreme caution to be exercised when dealing with lay listener identification. There was some discussion with counsel as to whether any form of procedure would have escaped criticism. Counsel argued that whilst there may have been an argument over any type of procedure, those arguments would have been weaker than if there were no safeguards at all.
The Court of Criminal Appeal accepted that there were significant ‘infirmities’ in the case which exposed it to criticism and allowed the appeal. They noted that DS Smith knew at the time of the identification that Crowe was the only suspect. The court agreed that despite the fact the event was relatively fresh in Smith’s memory, the cogency was minimal.
The court also commented that: This is the first such case to come before this court and the absence of safeguards in this case, though a matter to be deprecated, may be partly explained by the novelty of the situation presenting itself to the investigating Gardaí in this case. Be that as it may, the Court agrees with counsel for the appellant that the total absence of safeguards meant that minimum standards of fairness were not met in the circumstances of this particular case, and accordingly the conviction cannot be upheld.
38
The court noted that procedures were available in both England and Scotland and expressed the view that: Undoubtedly, the adoption in a particular case of a voice identification procedure which attempts to address potential biases and infirmities by means of safeguards, is likely to improve the cogency of such evidence. Therefore such measures are strongly to be encouraged on that account alone. Perhaps even more importantly they are also to be strongly encouraged in the interests of procedural fairness.
39
The approach in Crowe is perhaps the clearest example of an appellate court recognising the need for safeguards and being prepared to intervene when such safeguards are not present, notwithstanding the absence of any statutory guidance. Whilst Crowe might encourage innovation in the development of the investigatory process, it gives no assistance on how the reliability of those safeguards might subsequently be assessed by the courts. Crowe recognises that the principle of a fair trial requires more than a lay assessment of the credibility of a witness.
The challenges of the expository tradition
Whilst each of the jurisdictions mentioned above have attempted to address the challenges posed by voice identification and recognised the caution with which it must be treated, each has within it uncertainties and ambiguities which result in divergences from the best practices that research from other disciplines might identify.
Examining the differences between the jurisdictions illustrates these problems. England and Wales has a procedure which has been developed with the input of experts in phonetics. The Code of Practice which supports identification procedures supports their use (as it does in Northern Ireland). However, in practice they are complex to conduct and as a result some police forces refuse to use them. To date there is no judicial authority which treats the failure to conduct a parade as a breach of the Codes of Practice. In Scotland, voice identification procedures are conducted in a very different way, in a manner akin to a ‘line-up’ parade. Whilst this makes them easier to undertake and means they can be deployed with greater frequency, the lack of any scientific evidence to support the accuracy of this method must be a cause for significant concern. The Republic of Ireland has not yet adopted any guidelines for whether procedures should be conducted or not. Notwithstanding this, the decision of the Court of Criminal Appeal in Crowe represents the clearest recognition of the principle that the failure to conduct a procedure akin to a voice parade might impact upon the fairness of the trial.
In respect of expert evidence, whilst there is no authority from either Ireland or Scotland as to what manner of expert evidence is admissible, their appears to be a clear difference between the approaches taken on O’Doherty in Northern Ireland (where auditory evidence must be confirmed with auditory evidence) and Flynn and St John in England (which has rejected this approach).
The approach contained in the published judicial directions and guidance differs between jurisdictions. Whilst these directions may ultimately not reflect the approach individual judges will adopt in summing a case up to a jury, the greater the detail which is contained within such guidance, the more likely it is that a judge will direct a jury to consider these points. The Bench Book for England and Wales provides some detail on the factors to be taken into account in tailoring a Turnbull direction, whilst in Northern Ireland and Scotland reference is made to the need for caution, but with much less detail.
This lack of cohesion is surprising. The factors which are relevant to the assessment of the evidence are the same, with a number of expert witnesses from the fields of phonetics and acoustics having given evidence in more than one of the jurisdictions under discussion. In a number of cases, the appellate courts have been referred to and given careful consideration to the approaches adopted elsewhere. Despite this, however, no unifying normative framework for the treatment of such evidence has emerged. To consider why this is so, it is helpful to consider the manner in which rules of evidence In particular identification evidence develop.
A helpful starting point is Thayer’s response to the earlier critics of the English rules of evidence: I think it would be juster and more exact to say that our law of evidence is a piece of illogical, but no means irrational, patchwork, not at all to be admired, nor easily found to be intelligible, except as a product of the jury system, as the outcome of a quantity of rulings by sagacious lawyers, while settling practical questions in presiding over courts where ordinary, untrained citizens are acting as judges of fact. (Thayer, 1898) Its protagonists would readily concede that history, the social sciences, and other disciplines are relevant to an understanding of law, but they tend to treat them as marginal and not really part of the specialised study of law. (Twining, 2006: 167)
Twining’s criticism of the Expository Approach stems from the biases which are inherent within it, namely that its tendencies are: (a) to be rule-centred; (b) to pay disproportionate attention to the decisions of appellate courts; (c) to treat jury trials as the paradigm of all trials; (d) to concentrate on events in the courtroom, to the exclusion of pre-trial and post-trial events; (e) to adopt a rationalistic and aspirational approach to problems of evidence rather than an empirical perspective; and (f) in discussing reform, to take existing rules and devices as the starting-point for response to problems, with a consequent tendency to be rather thin on diagnosis. (Twining, 2006: 168)
Twining’s central criticism of the expository tradition is that it follows the ‘Way of the Baffled Medic’, seeking to provide remedies without ever diagnosing the condition. He cites by way of example the debate around the improvement of identification parades, which are underpinned by an assumption that the only function of such parades is to produce evidence whilst ignoring other questions such as whether other procedures could be used or parades could be used for other purposes (Twining, 2006: 174). The problems with this approach are evident in the differing approaches to the use of voice parades within the jurisdictions discussed. There is a recognition within each of the jurisdictions that a parade of some sort assists in a rational assessment of the evidence, but where guidance on how these tests are to be administered exists they are derived from eyewitness procedures.
A similar criticism can be made of how jury directions are formulated. Where directions need to be given to the jury, the assumption is often made that it requires no more than a modification of the directions for eyewitness evidence, an assumption described by Hollien as ‘folly’ (Hollien, 1996). Roberts and Zuckerman categorise the Turnbull guidelines as: [an] extended illustration of the law’s attempt to facilitate lay fact-finding by developing evidentiary guidance, in the form of forensic reasoning rules, which incorporate factual generalisations distilled from previous experience of legal proceedings. (Roberts and Zuckerman, 2010: 687)
Under the expository approach to evidence scholarship, when a novel problem arises its proponents are left with no option but to draw upon existing practice to devise solutions whether or not that it is appropriate. The approach to voice identification assumes that the cognitive processes involved in identifying voices are the same as they are for physical recognition but a considerable body of research suggests otherwise (for example, Yarmey, 1995). Whilst there is emphasis on the fallibility of witnesses in identifying voices they have heard, consideration also needs to be given to how that evidence is recorded by police officers and tested by investigators, presented or challenged by advocates and analysed by fact-finders. Normative assumptions about how this should be done are inadequate unless empirical research confirms them to be correct. To give an example, the assumption that the most accurate means of conducting a parade involves the sequential playing of a number of voices has no basis in empirical research (indeed whether it is the best basis under which to conduct visual line-ups is questionable) (Carlson et al., 2008).
Twining argued that problems of misidentification could be better resolved by the adoption of what he referred to as the ‘Contextual Approach’, in which the rules of evidence, procedure and practice are set against a broader context determined by the purposes of the study. Twining noted that the expository tradition generated a ‘standard case’ of misidentification where: an incident leading to a contested case tried before a jury in which the main role of the witness is to provide admissible evidence of identification. The emphasis is on the objective reliability of evidence presented in court, and on conviction of innocent persons as the sole mischief of misidentification. (Twining, 2006: 175)
Reviews of identification evidence have made positive changes in this area before. In 1974, following a number of high profile miscarriages of justice, Roy Jenkins, the then Home Secretary, commissioned a review of identification evidence in the criminal justice system. This resulted in the Devlin Report of 1976, which Twining praises as combining features of both the expository and contextual approaches as well as developing a total process model for the evaluation of evidence. The Devlin Report was adopted in the formation of the Turnbull guidelines 42 and paved the way (via the Phillips report) for the introduction of Codes of Practice D to PACE (Philips, 1981). This Code aims to provide clear guidance to investigating officers as to the steps that should be taken when a suspect disputes identification evidence, and a failure to comply with these codes can be considered in assessing the quality of the identification evidence. Whilst it is often assumed that the primary purpose of these procedures is to ensure that innocent people are not wrongfully convicted, it is important to remember that they play just as valuable a role as a tool for supporting the credibility of witnesses by demonstrating the integrity of the evidence-gathering process. The absence of such a whole process review of voice identification evidence perhaps goes some way to explaining the absence of any cohesive system of guidance in any jurisdiction.
One of the other explanations for the relative lack of attention that the law of evidence has paid to earwitness evidence is that, compared to eyewitness evidence, cases of earwitness evidence are comparatively rare. The scarcity of these cases does not make them less worthy of review; the consequences for the parties who are affected by the verdict which will be decided on such evidence are as just great as they are for eyewitness identification. Many of the problems which exist are due to physical, psychological and environmental factors which are common to all jurisdictions. A contextual approach could look at the whole process from a cognitive perspective supported by empirical evidence. This would enable the development of rules of best practice which could be adopted with little variation by individual jurisdictions . This offers a potential additional advantage for procedures that require resourcing. If a common system for voice procedures exists then a police force in England, required to construct a parade featuring speakers with a particular Scottish or Irish accent, would be able to share materials with greater ease.
In revisiting Twining in 2004, Roberts observed that the procedures relating to identification evidence had still not received this kind of attention and analysis (Roberts, 2004). Roberts’ review of eyewitness evidence used developments in psychology to attack the notion that the procedures of English law provided adequate safeguards against miscarriages of justice, with particular reference to distinction between ‘estimator variables’ and ‘system variables’ first described by Wells (1978) (Wells, 1978, 1988). Estimator variables are environmental factors which impact upon the accuracy of the observer. Systems variables are factors which are caused by the investigative and trial process (factors which cause the witness’s recollection to be distorted or manipulated).
In furthering Twining’s medical analogy Roberts classified factors which led to misidentification in eyewitness cases as falling into two ‘strains’ which needed different forms of treatment. In respect of estimator variables he states that: The most that the criminal justice system can do is to remain vigilant as to its various symptoms and implement an effective screening programme in an attempt to detect possible outbreaks of the disease. (Roberts, 2004: 103)
In relation to systems variables, Roberts argues that: While the disease remains difficult to diagnose outbreaks are caused by the practices and procedures followed in the criminal process. It is, therefore, possible to take preventative measures by adopting and adhering to appropriate regimens concerning the treatment of eyewitness identification. Wherever there is interaction between a number of witnesses, or police officers and witnesses, there is a danger that a witness’s memory and recollection of relevant events will be distorted. (Roberts, 2004)
As discussed earlier in this paper, at the first contact that a witness has with an investigator they should be required to give a description of the voice they have heard and this initial description has a potential evidential value in assessing the evidence ultimately given at trial. Whether such a description is consistent or inconsistent with the voice of the suspect is a matter which a fact-finder is likely to take into account when assessing the reliability of the witness. There is currently very little research on whether lay witnesses are able to accurately describe voices beyond broad characteristics such as gender and occasionally accent. We do not know if it is possible to obtain more detailed (and accurate) descriptions through structured questioning by the officers recording the statement. If such questioning is to be undertaken, careful consideration needs to be given to ensure that the questions do not inadvertently lead witnesses to providing answers which match a known suspect. Research in this area can establish whether it is possible to interview witnesses in a manner which maximises the quality of the description they give without distorting the evidence they may ultimately provide in court.
Assuming that a description has been recorded, it may then be appropriate to conduct a voice procedure. Although procedures exist, there has been no testing to see if they represent best possible practice. Current practices vary but reviews should be undertaken to see if it is possible to improve upon the procedures currently in use. The primary focus of such research should be the extent to which procedures generate false positives in the form of incorrect positive identifications and how these can be reduced. Variables which might impact upon this include the sequencing of the line-up, the duration of each clip, the means by which the suspect and the foil voice are captured (i.e. extracts from the police interview recording or specially prepared samples) and the way in which the witness is asked to make an identification (after each voice or at the end of the procedure). These are all matters which can be tested empirically, as they have been in relation to eyewitness procedures (see, for example, Flowe et al., 2015). The outcomes of this research may also facilitate a simplification of the process; if it is possible to simplify the process by which parades are conducted without compromising their accuracy, this would encourage their more frequent use as a safeguard in voice identification cases. It may be perfectly feasible to develop a protocol for procedures which can be constructed from a centralised database of voices in manner which is no more complex than a video parade.
The circumstances in which a fact-finder may be called upon to assess the testimony of a witness who is identifying a voice are varied and courts will always require flexibility to ensure that the interests of justice are met. However, this assessment must take place in an environment which seeks to maximise the accuracy of the decision being made. Identification evidence is the product of a complex cognitive processes and this is especially the case where a witness hears and does not see a suspect. Given the potentially substantial evidential importance which will attach to a positive identification, the legal system needs to be sensitive to the particular challenges involved and ensure that the processes from evidence collection to trial minimise the risks of distortion and maximise the possibility that variables which impact upon accuracy are identified and addressed. These factors are best identified through collaboration with other disciplines and are of relevance to all common law jurisdictions.
Footnotes
Acknowledgements
I am grateful to Dr Harriet Smith, Dr David Wright and Dr Natalie Braber for their advice and assistance with the psychological and linguistic aspects of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
