Clinician Research Guide

Audio Feedforward for Selective Mutism: What the Research Shows

Video self-modeling and its audio analog can be clinically useful for selective mutism, but historically they have required editing skill, coordination, or equipment that made routine use impractical. This page reviews the research base for audio feedforward, with emphasis on Blum et al. (1998), and then connects that model to current clinical use.

Download Study Handout (PDF)

What Is Audio Feedforward?

Audio feedforward is a variant of self-modeling in the Dowrick tradition, in which the child listens to an edited recording that depicts them speaking in situations where they are currently mute. In the Blum et al. (1998) model, the child records answers at home, where speech is available, and those answers are edited together with recordings of the target adult asking the questions.

A parent and child recording audio together in a low-pressure home setting.

The result is a composite tape that portrays the child as already communicating in the feared context. That distinction matters: this is feedforward rather than feedback. The child is not reviewing what already happened in a neutral way; they are hearing a future-self who has already succeeded.

Clinically, this also distinguishes the procedure from simple reinforcement-based strategies. There is no reward contingency placed on live speech in the moment of exposure. The intervention aims to lower the anxiety threshold around the setting rather than increase performance pressure around words.

The Blum et al. (1998) Study

Blum et al. (1998) describe three girls, ages 6 to 9, each with at least 1.5 years of severely inhibited speech at school. All three had failed to respond to positive reinforcement procedures for speaking before audio feedforward was introduced.

Three girls ages 6–9 with 1.5+ year histories of severely inhibited speech at school
All had failed to respond to positive reinforcement for speaking before the feedforward intervention
Parents helped generate 15 open-ended questions; the child answered at home where speech was available
Recordings were edited together with the target adult asking the questions, creating a tape that portrayed successful communication in the feared context
Each child listened to the tape at least twice daily for one week before reassessment

The results were clinically striking. All three children showed rapid verbal responding after the intervention, with generalization to untargeted adults and settings in ways that would be difficult to explain by rote memorization alone.

Case 1 (N.C.): zero verbal responses at baseline; whispering all answers within 9 days; within one month she was speaking in a normal voice and initiating conversation
Case 2 (K.B.): zero responses at baseline; after one week she answered all 15 questions, including five not present on the tape; gains generalized to her classroom teacher without an additional tape
Case 3 (A.R.): four of 15 answers after the first week; after two more days of listening she answered all 15; volume was shaped toward normal over one month and she was raising her hand in class within that same period
Across all three cases, verbal responding generalized to untargeted individuals without new recordings

A child speaking to a neighbor for the first time while a parent observes nearby.

The paper is also appropriately cautious. Not every child benefited. Children who refused to make the tape did not respond, and two children who completed tapes also failed to show improvement. The sample is small, and the study is best read as proof of clinical possibility rather than proof of broad efficacy.

Why Audio Feedforward Works

The clinical mechanism is the same one proposed for video self-modeling. The child repeatedly experiences themselves succeeding in the feared context before that success is required in real life. In feedforward terms, the nervous system is being given a new expectancy template.

Blum et al. (1998) suggest that this reduces the threat response associated with the target setting, lowering the anxiety threshold enough for real speech to emerge. The model aligns with the broader self-modeling literature, including generalization effects described in the VSM work of Kehle et al. (1990), Pigott and Gonzales (1987), and Holmbeck and Lavigne (1992). Audio feedforward appears to rely on the same mechanism, delivered through a narrower modality.

Clinical Advantages Over Video Feedforward

One reason the Blum paper remains useful is that it addresses implementation barriers directly. The authors were solving a practical problem: how to make self-modeling clinically usable without the expense and technical load that older video systems imposed.

Audiotape technology is less expensive and more widely available than traditional video systems.
Editing is technically simpler because there is no need to match background scenery or visual continuity.
No consent is required for other students to appear in a recording.
The intervention can be assigned as a home-based task without specialized clinic equipment.

The authors also raise an important unresolved question: whether seeing and hearing oneself may ultimately be more effective than audio alone. Their study could not answer that question. It remains a live clinical consideration, not a settled conclusion.

Limitations and Appropriate Use

This is a case study, not a randomized controlled trial. Generalizability is limited.
The sample is small, and the authors explicitly note that the approach was not effective in every case they attempted.
Children who refused to make the tape did not benefit; two children who completed tapes also did not show response.
Listening dose may matter. Blum et al. (1998) suggest longer tapes or more frequent exposure may improve efficacy in partial responders.
Audio feedforward was not used in isolation. Behavioral contingencies for speaking remained in place alongside the intervention.

For that reason, audio feedforward is best understood as one component of a broader plan. It may help lower activation and create early success, but it should not replace hierarchy design, transfer of stimulus control work, or careful exposure pacing.

How BVJ Operationalizes This

BraveVoiceJourney operationalizes the possibility Blum et al. (1998) demonstrated using the video modality the authors themselves flagged as potentially more effective. The desktop clinician app provides the scenario library and allows the child to both see and hear themselves speaking in the target setting, while keeping all recordings local to the device. The family web app extends that model for between-session carryover. The claim is not that technology substitutes for treatment planning; it is that BVJ makes this feedforward workflow more practical to implement with fidelity.

Download the Free Clinician App

Works in a single session. All recordings stay on your device. No account required.

Download for Mac Download for Windows

Share the Family App