Processing AVPlayer’s audio with MTAudioProcessingTap

January 7, 2013 by chritto

The MTAudioProcessingTap that was introduced with iOS 6 is incredibly powerful – it sits at a point in the audio chain of AVPlayer (or any of the other AVFoundation classes that utilize AVAssets) which greatly simplifies the amount of work required to roll your own signal processing/analysis in an app.

By attaching the simple tap object to a track of your asset, all of the existing powerful AVFoundation functionality is available, which lets the system handle demuxing of your media container, decoding of the audio as well as its rendering, handling audio interruptions and endpoint changes and so on – letting the app developer focus on what is important to them – working directly with PCM audio.

There are a bunch of applications for the audio processing tap, including:

Displaying a graphical spectrum analyser for a song/movie playing
Providing some user-controllable graphical EQ
Implementing some preset filters (a base booster, karaoke mix removing the vocal track, etc)
Dynamic range compression, limiting, etc
Pitch shifting, time dilation, etc

The most important feature of all, however, is as follows: All of the above works on any local file AVAsset, including tracks from the user’s iPod library which applications cannot obtain a file handle to and thus we cannot use our own mp4 demuxer/aac decoders (even Apple’s own AudioFileOpenURL doesn’t work with a user’s iPod library, and use of this library would still require us to set up our own AudioQueue which is not rocket science, but again, more work that we’d like to avoid). This has the side-effect of allowing our tap code to be distributed as an add-on module for any code which already utilises AVFoundation for its media playback. Rather than asking an app-developer to rewrite their entire playback infrastructure to use lower-level Audio Queues or even Audio Units, they can add on our processing to their existing app with ease. Update: Ryan McGrath found a way to get the MTAudioProcessingTap to work with remote streams. He has a nice writeup here: Recording live audio streams on iOS.

When coupled with the AVAssetReader/AVAssetWriter group of classes, we don’t have to throw away the audio that was processed – it can be written back to disk if we were inclined to write some sort of DJ’ing app.

The code

We will begin with an AVFoundation base which just sets up an AVPlayer to play us a song that we have loaded into our application’s bundle.

NSURL *assetURL = [[NSBundle mainBundle] URLForResource:@"skyfall" withExtension:@"m4a"];
assert(assetURL);

// Create the AVAsset
AVAsset *asset = [AVAsset assetWithURL:assetURL];
assert(asset);

// Create the AVPlayerItem
AVPlayerItem *playerItem = [AVPlayerItem playerItemWithAsset:asset];
assert(playerItem);

assert([asset tracks]);
assert([[asset tracks] count]);

self.player = [AVPlayer playerWithPlayerItem:playerItem];
assert(self.player);

[self.player play];

Next, we pull out the audio track from the asset, and attach to its audioMix the inputParameters that we are going to create. The inputParameters are what contain the audio processing tap that will be used by the AVPlayer when it is ready to obtain audio for rendering in its internal Audio Queue. To create the tap, we provide it with a collection of callback functions – they are where all the heavy-lifting of audio is done, and allow us to prepare our system for processing as well as tear it down when iOS decides we’re not needed any more (e.g. when we’re done with the AVPlayer and it is released).

// Continuing on from where we created the AVAsset...
AVAssetTrack *audioTrack = [[asset tracks] objectAtIndex:0];
AVMutableAudioMixInputParameters *inputParams = [AVMutableAudioMixInputParameters audioMixInputParametersWithTrack:audioTrack];

// Create a processing tap for the input parameters
MTAudioProcessingTapCallbacks callbacks;
callbacks.version = kMTAudioProcessingTapCallbacksVersion_0;
callbacks.clientInfo = (__bridge void *)(self);
callbacks.init = init;
callbacks.prepare = prepare;
callbacks.process = process;
callbacks.unprepare = unprepare;
callbacks.finalize = finalize;

MTAudioProcessingTapRef tap;
// The create function makes a copy of our callbacks struct
OSStatus err = MTAudioProcessingTapCreate(kCFAllocatorDefault, &callbacks,
 kMTAudioProcessingTapCreationFlag_PostEffects, &tap);
if (err || !tap) {
    NSLog(@"Unable to create the Audio Processing Tap");
    return;
}
assert(tap);

// Assign the tap to the input parameters
inputParams.audioTapProcessor = tap;

// Create a new AVAudioMix and assign it to our AVPlayerItem
AVMutableAudioMix *audioMix = [AVMutableAudioMix audioMix];
audioMix.inputParameters = @[inputParams];
playerItem.audioMix = audioMix;

// And then we create the AVPlayer with the playerItem, and send it the play message...

None of this makes any sense without the callbacks for the processing tap. Here are the first four:

 void init(MTAudioProcessingTapRef tap, void *clientInfo, void **tapStorageOut)
{
    NSLog(@"Initialising the Audio Tap Processor");
    *tapStorageOut = clientInfo;
}

void finalize(MTAudioProcessingTapRef tap)
{
    NSLog(@"Finalizing the Audio Tap Processor");
}

void prepare(MTAudioProcessingTapRef tap, CMItemCount maxFrames, const AudioStreamBasicDescription *processingFormat)
{
    NSLog(@"Preparing the Audio Tap Processor");
}

void unprepare(MTAudioProcessingTapRef tap)
{
    NSLog(@"Unpreparing the Audio Tap Processor");
}

These functions give us a chance to prepare our processing before the real process() function is invoked.

The init() callback helps us by passing in the client data that we provided when initially setting up the collection of callbacks that are passed through when creating the tap. We can assign this into our tap storage, which allows us to easily obtain it again in any of the other callbacks with a call to MTAudioProcessingTapGetStorage(), passing in our tap. The client data can be used if we need some storage (say, if we’re implementing some simple time-domain FIR filter and we need to keep frames equal to the order of the filter from the end of the previous block, or vDSP’s setup data, or any user-defined parameter that could help us retain some state between blocks, etc). In our example, I just pass through a reference to the UIViewController subclass that contains all of this code. This is hardly an ideal design, but we’re just focusing on the minimum amount of code possible to get processing happening. Perhaps one could design a processing class or struct that keeps all of this state concisely packed together, and pass that in as the client data.

The prepare() callback is useful in that it will give us hints as to the maximum block size that it will feed us during each process callback. It also lets us know the format of the audio coming in (is it interleaved? floating or fixed point? what bit depth is it?). These parameters could be useful if we were going to be performing FFTs on the data, as Apple’s vDSP framework ensures good performance on its transforms by requiring the user to perform its memory allocation/setup functions ahead of time (and they require the block size). This kind of setup code is perfect for the prepare callback.

The process() callback is where the real work is done. In this case, our “processing” simply scales the samples down by some fraction defined by a slider in our view. This gives us a simple volume control. It is scaling linearly, which isn’t particularly intuitive/pleasing for the ears – a logarithmic volume slider would be better. We cheat and use the vDSP framework to multiply the signal efficiently.

#define LAKE_LEFT_CHANNEL (0)
#define LAKE_RIGHT_CHANNEL (1)

void process(MTAudioProcessingTapRef tap, CMItemCount numberFrames,
 MTAudioProcessingTapFlags flags, AudioBufferList *bufferListInOut,
 CMItemCount *numberFramesOut, MTAudioProcessingTapFlags *flagsOut)
{
    OSStatus err = MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut,
                   flagsOut, NULL, numberFramesOut);
    if (err) NSLog(@"Error from GetSourceAudio: %ld", err);

    LAKEViewController *self = (__bridge LAKEViewController *) MTAudioProcessingTapGetStorage(tap);

    float scalar = self.slider.value;

    vDSP_vsmul(bufferListInOut->mBuffers[LAKE_RIGHT_CHANNEL].mData, 1, &scalar, bufferListInOut->mBuffers[LAKE_RIGHT_CHANNEL].mData, 1, bufferListInOut->mBuffers[LAKE_RIGHT_CHANNEL].mDataByteSize / sizeof(float));
    vDSP_vsmul(bufferListInOut->mBuffers[LAKE_LEFT_CHANNEL].mData, 1, &scalar, bufferListInOut->mBuffers[LAKE_LEFT_CHANNEL].mData, 1, bufferListInOut->mBuffers[LAKE_LEFT_CHANNEL].mDataByteSize / sizeof(float));
}

To grab audio to process, the processing tap provides the function MTAudioProcessingTapGetSourceAudio which we use to place the output audio into the bufferList that was passed in. We then do processing on this audio in-place. vDSP_vsmul is a function where, given a “vector” (in this case, the audio provided to us by the tap) it will skip along it one float at a time (hence the “1” we passed in) and multiply it by a scalar that we provide.

We utilise the processing tap’s storage to keep a reference to the view controller, where we pull out the value from the slider which we use when processing.

In another post I will show code implementing some simple filters using the MTAudioProcessingTap.

26 thoughts on “Processing AVPlayer’s audio with MTAudioProcessingTap”

Niek says:

February 6, 2013 at 8:31 am

Nice glance at MTAudioProcessingTap! Thanks for pointing me in the right direction.

Reply
- Niek says:
  
  February 6, 2013 at 8:33 am
  
  small note, for me (as a new user to AV Foundation) it was quite unclear how to get the right tracks for audio. In the example you show, you just get the first track (index 0) to obtain some audio. I changed that (for now, further research is required) to:
  
  NSArray *audioTracks = [asset tracksWithMediaCharacteristic:AVMediaCharacteristicAudible];
  AVAssetTrack *audioTrack = [audioTracks objectAtIndex:0];
  
  Reply
  - chritto says:
    
    February 28, 2013 at 11:48 pm
    
    Good point. Thanks Niek. I’ll try that and then modify the post – this makes that step much more clear.
Pingback: MTAudioProcessingTap « Riks Dev Blog
Alexandr Dubenko says:

March 7, 2013 at 11:41 am

“In another post I will show code implementing some simple filters using the MTAudioProcessingTap.”

Please write about this, thanks!

Reply
sean says:

March 11, 2013 at 4:49 am

Great article on a subject very hard to find information on! Works great with m4a, but I can’t get it working with mp3. The processing is failing on MTAudioProcessingTapGetSourceAudio. Any ideas what to change?

Reply
Pingback: Issue with MTAudioProcessingTap on device | BlogoSfera
yaron says:

April 7, 2013 at 6:18 pm

Hi chritto
i’ll be glad if you contact me about your perfect code

Reply
johnnyfrenchy says:

May 30, 2013 at 9:58 am

Thanks for this very instructive article, very useful. Apple should really get some documentation out there about this…

I’m trying to get this to work with remote URLs (HLS streaming) and not having any success, the only indication that this is expected is the note at the end of http://developer.apple.com/library/ios/#qa/qa1716/_index.html (“AVAudioMix only supports file-based assets”).

Can you confirm this? Your statement “All of the above works on any AVAsset”, may need a mention of this limitation. Can you think of any other way of processing audio data (accessing audio queues) while still using AVPlayer?

Reply
- chritto says:
  
  July 15, 2013 at 1:54 am
  
  You’re right – thanks for pointing out the limitation. I haven’t tried this but it seems likely that the feature doesn’t work for remote assets, which is a shame. There are other parts of AVFoundation with similar behaviour – like AVAssetReader/Writer and so on – that only work with local files. I’m sorry if I led you down that fruitless path.
  
  Reply
  - Travis says:
    
    July 15, 2013 at 8:55 am
    
    Check this out for streams http://developer.apple.com/library/ios/#DOCUMENTATION/AudioVideo/Conceptual/AVFoundationPG/Articles/02_Playback.html
  - Ryan McGrath (@ryanmcgrath) says:
    
    January 8, 2014 at 3:16 pm
    
    So I know this is an old post, but I determined that it is indeed possible to get this to work with remote assets, albeit with a bit of legwork.
    
    There’s a writeup about the technique on my site: http://venodesigns.net/2014/01/08/recording-live-audio-streams-on-ios/
  - chritto says:
    
    January 14, 2014 at 5:37 am
    
    Nice find! I updated the post to link to your write-up. Cheers!
  - Raghunath says:
    
    August 21, 2014 at 2:43 pm
    
    Hello Chris,
    
    Thanks for this wonderful article on tapping audio samples.
    
    I followed Ryan’s blog @ http://venodesigns.net/2014/01/08/recording-live-audio-streams-on-ios/
    to get the tracks from the player’s current item and was succeeded with the snippet below, but was stuck at MTAudioProcessingTap’s “init” callback only. “prepare” callback is never getting called for “m3u8″ links.
    
    For me,
    AVPlayerItemTrack *firstTrack = _audioPlayer.currentItem.tracks.firstObject;
    
    does return the avAssetTrack only when we add timerObserver KVO like
    
    - (void)observeValueForKeyPath:(NSString *)keyPath ofObject:(id)object change:(NSDictionary *)change context:(void *)context { if (PlayerStatusObserverContext == context) {
    id newValue = change[NSKeyValueChangeNewKey]; if (newValue && [newValue isKindOfClass:[NSNumber class]]) { self.previousAudioTrackID = 0; if (AVPlayerStatusReadyToPlay == [newValue integerValue]) { __weak typeof (self) weakself = self; timeObserver = [player addPeriodicTimeObserverForInterval:CMTimeMakeWithSeconds(1, 100) queue:nil usingBlock:^(CMTime time) { weakself.lbl_PlaybackDuration.text = stringFromCMTime(time, nil); weakself.seekBar.value = (float)(CMTimeGetSeconds(time) / CMTimeGetSeconds(dur)); if(weakself.isM3U8) { @try { for(AVPlayerItemTrack* track in [weakself.player.currentItem tracks]) { if([track.assetTrack.mediaType isEqualToString:AVMediaTypeAudio]) weakself.currentAudioPlayerItemTrack = track; } AVAssetTrack* audioAssetTrack = weakself.currentAudioPlayerItemTrack.assetTrack; weakself.currentAudioTrackID = audioAssetTrack.trackID; if(weakself.previousAudioTrackID != weakself.currentAudioTrackID) { NSLog(@”:::::::::::::::::::::::::: Audio track changed : %d”,weakself.currentAudioTrackID); weakself.previousAudioTrackID = weakself.currentAudioTrackID;
    weakself.audioTapProcessor = nil; weakself.audioTapProcessor.audioMix = nil; [weakself initAudioTap]; } } @catch (NSException *exception) { NSLog(@”Exception Trap ::::: Audio tracks not found!”); } } }]; }
    And I’m keeping track of changes to AVAssetTrack’s trackID.
    
    Everything until MTAudioProcessorTap’s “init” callback is fine, but not “prepare” callback.
comonitos says:

June 21, 2013 at 6:13 pm

Hey Guys. I’m looking forward to your next post. I’ve already integrated your example. And i need EQ very strong. Because AVPlayer can stream music. Please let me know when new post will be ready or if you know any way to process streaming audio i will be really pleased! Help help help! Thanks

Reply
Travis says:

July 11, 2013 at 9:25 am

Very nice introduction. I would love to see the follow on write up.

Thanks

Reply
J says:

September 26, 2013 at 1:59 pm

At what point do the unprepare and finalize callbacks get called? Do you have to set self.player = nil or explicitly set the processing taps to nil?

Reply
Alan says:

September 26, 2013 at 3:25 pm

Amazing! Ive been waiting for an example like this. Ive been handling my samples via AVAssetReader and RemoteIO unit, now thanks to this ive reduced my cpu usage from 40% to 6%!!
Thank you

Reply
Tenfour04 says:

October 5, 2013 at 3:59 am

Late to the party, but how do you tear down the Tap when you’re done with it? It looks like it’s a CF object, and in Apple’s example, after initialization its only reference is passed into the audio mix, where it’s no longer accessible via public methods or parameters. So is it just supposed to die when the player item dies? I never see the unprepare or finalize callbacks get called in Apple’s example (I added comments to them to check).

Reply
chris says:

October 8, 2013 at 12:40 pm

Hi, nice post, thanks. I managed to implement an iPod EQ tap processor working and I can confirm that it’s indeed working with remote URLs (M4A/MP3) under iOS 7. There’s now also a sample project from Apple which helped me a lot (just search for “MYAudioTapProcessor”).

Reply
Pingback: AVFoundation audio processing using AVPlayer's MTAudioProcessingTap with remote URLs | Technology & Programming
Adrian says:

April 10, 2014 at 5:51 pm

Great article and very well written. Have just started on a big project so found this very helpful. Would love to read the follow up.

Reply
Anand says:

April 11, 2014 at 7:23 am

Hi Chris,Thanks for such a wonderful and descriptive explanation of MTAudioProcessingTap. I have a question here can we introduce a library that process the raw audio data and applies some enhancements to the data? and also that library’s inNumberFrames is limited to 1024 frames only.

Reply
iwxxcz says:

October 29, 2014 at 10:15 am

Does it possible to implement EQ with AVPlayer?
If it is, i would be very thankful for any links or information about this

Reply
Alfredo says:

November 25, 2014 at 12:06 pm

I am trying to get access to the audio data on a remote HLS stream. Using the methods described will not call the callbacks. Did this work for anyone?

Reply
jkelvie says:

January 16, 2015 at 5:09 pm

Has anyone gotten this working with HLS? I can use this with a remote or local file, but not on an HLS stream. Would love to hear if so…

Reply