Tech Talk


Shhh ... Someone Might be Listening ...
 

This article explains the use of audio forensic analysis and the admissibility of audio tape recording as evidence in court.

 


In 1973 President Nixon’s secretary accidentally (so she said) erased or over-recorded 18.5 minutes of audio from the Whitehouse tapes. Within that erased portion was a conversation between Nixon and his Chief of Staff, HR Haldeman, containing possibly incriminating evidence in the Watergate conspiracy.

 

Judge John Sirica assembled a team of scientists and audio engineers to inspect the tapes. Their findings: that the original audio was erased by five or more over-recordings.

 

In 1974 Richard Nixon became the first US President to resign from office.

 

Closer to home, yet another political leader may have tied herself up in a web of audio tape. The hottest ‘pirated copies’ now being widely circulated in the Philippines are not the usual sexual shenanigans by the rich and famous. The tape contains a conversation between a woman, who sounds very much like President Gloria Arroyo, discussing her victory margin in the 2004 presidential election with someone identified by ex-colleagues as election official Virgilio Garcilliano. What made the issue a hot one? That the recording was made BEFORE the votes had been counted!

 

Mrs Arroyo admits that she spoke with an election officer but her aides insist the tape was doctored. Her people have since released their own version of a similar conversation, claiming theirs is the original. However, since the tapes have not been submitted to any expert examination, the situation remains a murky one full of counter-accusations. And President Arroyo is fighting for her political life.

 

It is not only the high and mighty who find themselves embroiled in such controversies. Audio tape recordings are sometimes the only evidence available. Here are some real-life scenarios:

 

Obscene or Threatening Phone Calls

The tape is analysed to confirm that there is no evidence of tampering. If the accused does not admit guilt, the expert can initiate voice recognition protocols.

 

Fraud or Breach of Contract

In cases where there is no documentary evidence, some have resorted to recording conversations on a hidden tape recorder. Key words or phrases indicating time, dates, numerical values etc may prove integral to the plaintiff’s case. The emphasis is to ascertain whether any key words, phrases or sentences have been deleted or inserted.

 

Recording a Crime in Progress

A distress call to emergency services will continue being recorded if the caller’s phone is left on. The attacker might inadvertently say something which offers clues to investigators. Even without specific clues, the assailant may offer sufficient voice audio to identify him/her (a realistic procedure if the pool of suspects is first reduced by the investigative process).

 

So now you have an audio tape. And what is on it can make or break your entire case. But the court may never get to hear the contents if it is not admissible as evidence. This is where audio forensics comes in.

 

What is Forensic Audio?

It is the scientific examination of recorded audio evidence for indications of tampering, editing (deletions, omissions or insertions) alterations, and inappropriate recorder actions in pertinent portions of the conversation.

 

In order to determine that the tape is an ‘authentic original recording’, the forensic examiner must be satisfied:

1   that the recording device was capable of taping the conversation offered as evidence; 

2   that the operator of the recording device was competent to operatethe device; 

3   that the recording is authentic and correct; 

4   that the recording as shown to the court has been preserved from the condition it was in during forensic analysis; 

5   that the speakers are identified (where there is a question over the identity of the speaker/s, the forensic audio expert can use voice recognition methodology to address this criteria).
 

The Aural/Spectrographic Method

There is a range of specific equipment and procedures which the forensic audio examiner uses to determine authenticity. Conclusions are formed through a combination of two main areas of methodology:

 

Aural analysis or critical listening

Listening to the voice/s on tape to compare single sounds, and series of sounds, for similarities and discrepancies. The examiner notes pronunciation similarities or differences, eg whether ‘the’ is said with a short or long vowel sound, whether the ‘th’ is clear or the speaker blurs it into a ‘der’. The examiner also scrutinises for speech habits, inflections, psycholinguistic features, dialect or accents, syllable grouping and even breath patterns. This part of the identification process relies heavily on a trained and talented ‘ear’ (much like the ‘nose’ of a master wine sampler!).

 

Spectrographic or visual analysis

A spectrograph is an automatic sound wave analyser. This instrument produces a visual representation of a given set of sounds in the parameters of time, frequency and amplitude. This visual representation is called a spectrogram and is a graphic depiction of the patterns of the acoustical events within the timeframe analysed.

 

Some older analog machines are still in use. But their days are numbered with the rise of digital signal analysis. These sophisticated systems provide high fidelity signal acquisition, high speed digital processing circuitry for fast and flexible analysis, and CD-quality playback.

The quantum jump for examiners is the much wider range of comparison and measurement tools. It is possible to display multiple sound spectrograms, to adjust time alignment and frequency ranges, and to take detailed numeric measurements of the displayed sounds. Thus, the examiner widens the scope of the analysis to create a more detailed picture of the voice or sound being analysed.

 

Specialised audio forensic equipment really comes into its own for speech analysis and where the voice has been masked by coherent noise or loud music. Powerful decoding filters can attenuate the music or background noise and uncover the speech. What if that background noise is not regular, where the ambient noise environment is constantly changing? There are adaptive filters which adjust themselves to remove a modelled signal representing the unwanted time domain waveform while preserving the target signal.

 

Particularly useful for audio forensic examination is the spectral filter. It allows the examiner to create a high-resolution frequency contour using up to 32,000 bands of equalisation. With such a high degree of frequency selectivity, the examiner can remove in-band and out-of-band extraneous noises for greater accuracy and specificity.

 

However, not all audio tape recordings are suitable for forensic examination. If the level of the ambient noise is so high relative to the recorded speech that the uncovered speech is badly distorted, then the forensic examiner is likely to reject the case due to the poor quality of the unknown voice.

 

Voice Analysis and Identification

First let us understand how complex the human voice is. No human being can produce one pitch at a time. Instead, the voice is a simultaneous series of fundamentals and overtones. Some overtones are random. Others are multiples of the fundamental — called harmonics.

 

Of all voice characteristics, the two most important are frequency and intensity. Frequency is the speed at which air particles vibrate, measured in centimetres per second. Intensity is the amount of energy (loudness) in a wave or pulse. No two sound waves (even those produced by the same individual) will have the same frequencies and intensities. However, intraspeaker variability is less than interspeaker variability.

 

Uniqueness in voice is a product of both physiology and learning. With physiology, the two most important things are the resonators (nasal and oral cavities, pharyngeal passages) and articulators (lips, teeth, tongue, soft palate and jaw muscles). We learn to speak through imitation, and trial and error. All the time the brain’s speech centre is sending signals to various organs. There is no such thing as spontaneous speech. A person may try to disguise their voice but the way the brain controls speech habits, the way that his resonators and articulators are shaped and used cannot be changed.This makes each and every individual voice unique.


Even so, the methodology of voice comparison and identification is not as absolute a process as in fingerprint technology. Thus, interpretation of data by the examiner plays a more important role in audio forensics. Examiners generally insist on some basic conditions for comparison recordings:

After the aural stage, spectrograms representing the same sound/s are visually compared. The analyst studies bandwidth, mean frequencies, trajectories, striations, stops, plosives, fricatives. Both differences and similarities are noted. The examiner will then arrive at one of five standard conclusions:

 

Positive Identification — at least 20 similarities and all differences accounted for.

 

Probable Identification — less than 20 similarities and no unexplained differences.

 

Positive Elimination — 20 differences or more exist and cannot be explained away.

 

Probably Elimination — when recorded text is limited or of low quality.

 

No Decision — when insufficient information or too few common speech sounds.

 

Admissibility

The use and acceptance of audio forensic analysis is most widespread in the United States. Even those courts which have rejected admission are cognisant of continuing work in this field and have specifically left the door open to future admissibility. In Singapore, audio tape evidence has been successfully introduced and used in a number of cases. With aural and spectrographic analysis producing consistently reliable support to the courts, this forensic tool has enormous potential waiting to be tapped.

 

That is good news to everyone except Presidents with something to hide!

 

Jackie Chan

Audio Visual Forensics

E-mail: prana@singnet.com.sg