Processing AVPlayer’s audio with MTAudioProcessingTap

The MTAudioProcessingTap that was introduced with iOS 6 is incredibly powerful – it sits at a point in the audio chain of AVPlayer (or any of the other AVFoundation classes that utilize AVAssets) which greatly simplifies the amount of work required to roll your own signal processing/analysis in an app.

By attaching the simple tap object to a track of your asset, all of the existing powerful AVFoundation functionality is available, which lets the system handle demuxing of your media container, decoding of the audio as well as its rendering, handling audio interruptions and endpoint changes and so on – letting the app developer focus on what is important to them – working directly with PCM audio.

There are a bunch of applications for the audio processing tap, including:

  • Displaying a graphical spectrum analyser for a song/movie playing
  • Providing some user-controllable graphical EQ
  • Implementing some preset filters (a base booster, karaoke mix removing the vocal track, etc)
  • Dynamic range compression, limiting, etc
  • Pitch shifting, time dilation, etc

The most important feature of all, however, is as follows: All of the above works on any local file AVAsset, including tracks from the user’s iPod library which applications cannot obtain a file handle to and thus we cannot use our own mp4 demuxer/aac decoders (even Apple’s own AudioFileOpenURL doesn’t work with a user’s iPod library, and use of this library would still require us to set up our own AudioQueue which is not rocket science, but again, more work that we’d like to avoid). This has the side-effect of allowing our tap code to be distributed as an add-on module for any code which already utilises AVFoundation for its media playback. Rather than asking an app-developer to rewrite their entire playback infrastructure to use lower-level Audio Queues or even Audio Units, they can add on our processing to their existing app with ease. Update: Ryan McGrath found a way to get the MTAudioProcessingTap to work with remote streams. He has a nice writeup here: Recording live audio streams on iOS.

When coupled with the AVAssetReader/AVAssetWriter group of classes, we don’t have to throw away the audio that was processed – it can be written back to disk if we were inclined to write some sort of DJ’ing app.

The code

We will begin with an AVFoundation base which just sets up an AVPlayer to play us a song that we have loaded into our application’s bundle.

NSURL *assetURL = [[NSBundle mainBundle] URLForResource:@"skyfall" withExtension:@"m4a"];
assert(assetURL);

// Create the AVAsset
AVAsset *asset = [AVAsset assetWithURL:assetURL];
assert(asset);

// Create the AVPlayerItem
AVPlayerItem *playerItem = [AVPlayerItem playerItemWithAsset:asset];
assert(playerItem);

assert([asset tracks]);
assert([[asset tracks] count]);

self.player = [AVPlayer playerWithPlayerItem:playerItem];
assert(self.player);

[self.player play];

Next, we pull out the audio track from the asset, and attach to its audioMix the inputParameters that we are going to create. The inputParameters are what contain the audio processing tap that will be used by the AVPlayer when it is ready to obtain audio for rendering in its internal Audio Queue. To create the tap, we provide it with a collection of callback functions – they are where all the heavy-lifting of audio is done, and allow us to prepare our system for processing as well as tear it down when iOS decides we’re not needed any more (e.g. when we’re done with the AVPlayer and it is released).

// Continuing on from where we created the AVAsset...
AVAssetTrack *audioTrack = [[asset tracks] objectAtIndex:0];
AVMutableAudioMixInputParameters *inputParams = [AVMutableAudioMixInputParameters audioMixInputParametersWithTrack:audioTrack];

// Create a processing tap for the input parameters
MTAudioProcessingTapCallbacks callbacks;
callbacks.version = kMTAudioProcessingTapCallbacksVersion_0;
callbacks.clientInfo = (__bridge void *)(self);
callbacks.init = init;
callbacks.prepare = prepare;
callbacks.process = process;
callbacks.unprepare = unprepare;
callbacks.finalize = finalize;

MTAudioProcessingTapRef tap;
// The create function makes a copy of our callbacks struct
OSStatus err = MTAudioProcessingTapCreate(kCFAllocatorDefault, &callbacks,
 kMTAudioProcessingTapCreationFlag_PostEffects, &tap);
if (err || !tap) {
    NSLog(@"Unable to create the Audio Processing Tap");
    return;
}
assert(tap);

// Assign the tap to the input parameters
inputParams.audioTapProcessor = tap;

// Create a new AVAudioMix and assign it to our AVPlayerItem
AVMutableAudioMix *audioMix = [AVMutableAudioMix audioMix];
audioMix.inputParameters = @[inputParams];
playerItem.audioMix = audioMix;

// And then we create the AVPlayer with the playerItem, and send it the play message...

None of this makes any sense without the callbacks for the processing tap. Here are the first four:

 void init(MTAudioProcessingTapRef tap, void *clientInfo, void **tapStorageOut)
{
    NSLog(@"Initialising the Audio Tap Processor");
    *tapStorageOut = clientInfo;
}

void finalize(MTAudioProcessingTapRef tap)
{
    NSLog(@"Finalizing the Audio Tap Processor");
}

void prepare(MTAudioProcessingTapRef tap, CMItemCount maxFrames, const AudioStreamBasicDescription *processingFormat)
{
    NSLog(@"Preparing the Audio Tap Processor");
}

void unprepare(MTAudioProcessingTapRef tap)
{
    NSLog(@"Unpreparing the Audio Tap Processor");
}

These functions give us a chance to prepare our processing before the real process() function is invoked.

The init() callback helps us by passing in the client data that we provided when initially setting up the collection of callbacks that are passed through when creating the tap. We can assign this into our tap storage, which allows us to easily obtain it again in any of the other callbacks with a call to MTAudioProcessingTapGetStorage(), passing in our tap. The client data can be used if we need some storage (say, if we’re implementing some simple time-domain FIR filter and we need to keep frames equal to the order of the filter from the end of the previous block, or vDSP’s setup data, or any user-defined parameter that could help us retain some state between blocks, etc). In our example, I just pass through a reference to the UIViewController subclass that contains all of this code. This is hardly an ideal design, but we’re just focusing on the minimum amount of code possible to get processing happening. Perhaps one could design a processing class or struct that keeps all of this state concisely packed together, and pass that in as the client data.

The prepare() callback is useful in that it will give us hints as to the maximum block size that it will feed us during each process callback. It also lets us know the format of the audio coming in (is it interleaved? floating or fixed point? what bit depth is it?). These parameters could be useful if we were going to be performing FFTs on the data, as Apple’s vDSP framework ensures good performance on its transforms by requiring the user to perform its memory allocation/setup functions ahead of time (and they require the block size). This kind of setup code is perfect for the prepare callback.

The process() callback is where the real work is done. In this case, our “processing” simply scales the samples down by some fraction defined by a slider in our view. This gives us a simple volume control. It is scaling linearly, which isn’t particularly intuitive/pleasing for the ears – a logarithmic volume slider would be better. We cheat and use the vDSP framework to multiply the signal efficiently.

#define LAKE_LEFT_CHANNEL (0)
#define LAKE_RIGHT_CHANNEL (1)

void process(MTAudioProcessingTapRef tap, CMItemCount numberFrames,
 MTAudioProcessingTapFlags flags, AudioBufferList *bufferListInOut,
 CMItemCount *numberFramesOut, MTAudioProcessingTapFlags *flagsOut)
{
    OSStatus err = MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut,
                   flagsOut, NULL, numberFramesOut);
    if (err) NSLog(@"Error from GetSourceAudio: %ld", err);

    LAKEViewController *self = (__bridge LAKEViewController *) MTAudioProcessingTapGetStorage(tap);

    float scalar = self.slider.value;

    vDSP_vsmul(bufferListInOut->mBuffers[LAKE_RIGHT_CHANNEL].mData, 1, &scalar, bufferListInOut->mBuffers[LAKE_RIGHT_CHANNEL].mData, 1, bufferListInOut->mBuffers[LAKE_RIGHT_CHANNEL].mDataByteSize / sizeof(float));
    vDSP_vsmul(bufferListInOut->mBuffers[LAKE_LEFT_CHANNEL].mData, 1, &scalar, bufferListInOut->mBuffers[LAKE_LEFT_CHANNEL].mData, 1, bufferListInOut->mBuffers[LAKE_LEFT_CHANNEL].mDataByteSize / sizeof(float));
}

To grab audio to process, the processing tap provides the function MTAudioProcessingTapGetSourceAudio which we use to place the output audio into the bufferList that was passed in. We then do processing on this audio in-place. vDSP_vsmul is a function where, given a “vector” (in this case, the audio provided to us by the tap) it will skip along it one float at a time (hence the “1” we passed in) and multiply it by a scalar that we provide.

We utilise the processing tap’s storage to keep a reference to the view controller, where we pull out the value from the slider which we use when processing.

In another post I will show code implementing some simple filters using the MTAudioProcessingTap.

26 thoughts on “Processing AVPlayer’s audio with MTAudioProcessingTap

    • small note, for me (as a new user to AV Foundation) it was quite unclear how to get the right tracks for audio. In the example you show, you just get the first track (index 0) to obtain some audio. I changed that (for now, further research is required) to:

      NSArray *audioTracks = [asset tracksWithMediaCharacteristic:AVMediaCharacteristicAudible];
      AVAssetTrack *audioTrack = [audioTracks objectAtIndex:0];

  1. Pingback: MTAudioProcessingTap « Riks Dev Blog

  2. Great article on a subject very hard to find information on! Works great with m4a, but I can’t get it working with mp3. The processing is failing on MTAudioProcessingTapGetSourceAudio. Any ideas what to change?

  3. Pingback: Issue with MTAudioProcessingTap on device | BlogoSfera

  4. Thanks for this very instructive article, very useful. Apple should really get some documentation out there about this…

    I’m trying to get this to work with remote URLs (HLS streaming) and not having any success, the only indication that this is expected is the note at the end of http://developer.apple.com/library/ios/#qa/qa1716/_index.html (“AVAudioMix only supports file-based assets”).

    Can you confirm this? Your statement “All of the above works on any AVAsset”, may need a mention of this limitation. Can you think of any other way of processing audio data (accessing audio queues) while still using AVPlayer?

    • You’re right – thanks for pointing out the limitation. I haven’t tried this but it seems likely that the feature doesn’t work for remote assets, which is a shame. There are other parts of AVFoundation with similar behaviour – like AVAssetReader/Writer and so on – that only work with local files. I’m sorry if I led you down that fruitless path.

      • Hello Chris,

        Thanks for this wonderful article on tapping audio samples.

        I followed Ryan’s blog @ http://venodesigns.net/2014/01/08/recording-live-audio-streams-on-ios/
        to get the tracks from the player’s current item and was succeeded with the snippet below, but was stuck at MTAudioProcessingTap’s “init” callback only. “prepare” callback is never getting called for “m3u8″ links.

        For me,

        AVPlayerItemTrack *firstTrack = _audioPlayer.currentItem.tracks.firstObject;

        does return the avAssetTrack only when we add timerObserver KVO like


        - (void)observeValueForKeyPath:(NSString *)keyPath ofObject:(id)object change:(NSDictionary *)change context:(void *)context
        {
        if (PlayerStatusObserverContext == context) {

        id newValue = change[NSKeyValueChangeNewKey];

        if (newValue && [newValue isKindOfClass:[NSNumber class]]) {
        self.previousAudioTrackID = 0;

        if (AVPlayerStatusReadyToPlay == [newValue integerValue]) {
        __weak typeof (self) weakself = self;

        timeObserver = [player addPeriodicTimeObserverForInterval:CMTimeMakeWithSeconds(1, 100) queue:nil usingBlock:^(CMTime time)
        {
        weakself.lbl_PlaybackDuration.text = stringFromCMTime(time, nil);
        weakself.seekBar.value = (float)(CMTimeGetSeconds(time) / CMTimeGetSeconds(dur));

        if(weakself.isM3U8) {

        @try {

        for(AVPlayerItemTrack* track in [weakself.player.currentItem tracks]) {
        if([track.assetTrack.mediaType isEqualToString:AVMediaTypeAudio])
        weakself.currentAudioPlayerItemTrack = track;
        }

        AVAssetTrack* audioAssetTrack = weakself.currentAudioPlayerItemTrack.assetTrack;

        weakself.currentAudioTrackID = audioAssetTrack.trackID;

        if(weakself.previousAudioTrackID != weakself.currentAudioTrackID) {

        NSLog(@”:::::::::::::::::::::::::: Audio track changed : %d”,weakself.currentAudioTrackID);
        weakself.previousAudioTrackID = weakself.currentAudioTrackID;

        weakself.audioTapProcessor = nil;
        weakself.audioTapProcessor.audioMix = nil;
        [weakself initAudioTap];
        }
        }
        @catch (NSException *exception) {
        NSLog(@”Exception Trap ::::: Audio tracks not found!”);
        }
        }
        }];
        }

        And I’m keeping track of changes to AVAssetTrack’s trackID.

        Everything until MTAudioProcessorTap’s “init” callback is fine, but not “prepare” callback.

  5. Hey Guys. I’m looking forward to your next post. I’ve already integrated your example. And i need EQ very strong. Because AVPlayer can stream music. Please let me know when new post will be ready or if you know any way to process streaming audio i will be really pleased! Help help help! Thanks

  6. At what point do the unprepare and finalize callbacks get called? Do you have to set self.player = nil or explicitly set the processing taps to nil?

  7. Amazing! Ive been waiting for an example like this. Ive been handling my samples via AVAssetReader and RemoteIO unit, now thanks to this ive reduced my cpu usage from 40% to 6%!!
    Thank you

  8. Late to the party, but how do you tear down the Tap when you’re done with it? It looks like it’s a CF object, and in Apple’s example, after initialization its only reference is passed into the audio mix, where it’s no longer accessible via public methods or parameters. So is it just supposed to die when the player item dies? I never see the unprepare or finalize callbacks get called in Apple’s example (I added comments to them to check).

  9. Hi, nice post, thanks. I managed to implement an iPod EQ tap processor working and I can confirm that it’s indeed working with remote URLs (M4A/MP3) under iOS 7. There’s now also a sample project from Apple which helped me a lot (just search for “MYAudioTapProcessor”).

  10. Pingback: AVFoundation audio processing using AVPlayer's MTAudioProcessingTap with remote URLs | Technology & Programming

  11. Great article and very well written. Have just started on a big project so found this very helpful. Would love to read the follow up.

  12. Hi Chris,Thanks for such a wonderful and descriptive explanation of MTAudioProcessingTap. I have a question here can we introduce a library that process the raw audio data and applies some enhancements to the data? and also that library’s inNumberFrames is limited to 1024 frames only.

  13. I am trying to get access to the audio data on a remote HLS stream. Using the methods described will not call the callbacks. Did this work for anyone?

  14. Has anyone gotten this working with HLS? I can use this with a remote or local file, but not on an HLS stream. Would love to hear if so…

Leave a reply to Travis Cancel reply