Introduction to Voice Over – An Audio Perspective

VO History Infographic

Voice Over History Infographic

So, you want to get into voice over. You have a great voice, or so you’ve been told. You’re likely an actor and want to delve into the art of using your voice to broaden your creative boundaries, and likely make more money. Great!!!

This article was written to aid the beginning voice artist in understanding the audio side of the business.

I’ve worked in the industry now for several years and often get approached with similar questions. How do I build a studio? Can I really record at home? What microphone should I buy? What’s a sample rate? How do you use a compressor?

People entering into any industry have many questions about technique, hardware, best practices and the obvious questions about how to get ahead. While I won’t spend time discussing the finer points of acting technique or how to book your first job, I will discuss in great length the equipment and practice needed to setup, use and maintain a home studio. At the very least, by reading this article you should have a better understanding of audio and how it’s created, edited and manipulated.

So who am I? My name’s Mike Varela. I’m an audio engineer working in Los Angeles. I work in the world of audio post production, mostly TV. I also coach and engineer at the Screen Actors Guild Foundation’s Don LaFontaine Voice Over Lab (mouthful I know). I’ve been working daily with voice artists now for 6 years and it’s this breadth of experience coupled with my love for post and audio engineering in general that has led me to a deeper understanding of how the voice industry and it’s individual actors work. I’ve put together this article mostly because I get asked similar questions, usually on a daily basis and almost always about audio recording and it’s related equipment. Hopefully through my experience and this article, you’ll gain a deeper understanding of audio for voice over.


Before we get started though, let’s delve into a history lesson.

Voice over has been around for a while, in fact, since Walt Disney decided to voice Mickey in Steamboat Willy. Back then audio recording was only available in larger studios and it continued this way until the availability of cheap 2-track and reel to reel machines were released to the general public. Even with this availability, most VO artists still had to visit a professional studio, radio station or their agents’ office to lay down a take. Not until the the larger adoption of the personal computer did home recording become a possibility. Now, the personal computer, tablet and really computing in general is available for pennies on the dollar compared to what has come before.

So what did come before?


Audio recording itself goes way back. In fact, as far back as 1857 with the introduction of the Phonautograph, a device that traced the analog air waves it heard into dust on paper. Problem here was the user couldn’t play back the audio, it was for study only.



Then, in 1877 Thomas Edison created the Phonograph with a cylindrical head that changed (transduced) air pressure into  an  etched surface by needle onto said cylinder. At this point the reproduction of recorded sound was possible. It was a major turning point for music and speech.



Later, in 1889, Emile Berliner created a variation of the Phonograph with flat platters. He called his invention the Gramaphone.



The use of discs aided in cheaper production costs, easier transportation and storage and even a slight boost in volume. It would be this iteration of acoustical recording that would win out in what could be considered early “format wars”. By 1910 sales had beat the cylinder and it was clear that the Gramaphone had won – much to Edison’s dismay (and his ears). He went down still crediting the cylinder as the more transparent medium.

During these years acoustic recording was enabled by capturing sound waves through a funnel which transmitted mechanical (air pressure) energy to a needle that etched corresponding waves onto a medium. These etchings were an ‘analog’ to the original sound, hence the term. The problems that faced acoustic recording were often distance and power (air pressure) To record sound one needed a clean, but loud, signal at the precipice of the cone. enough air pressure needed to be present to work the needle into usable grooves into what started as tin and later became wax.

Early acoustic recording

Early acoustic recording

A recordist at this time was often relegated to moving people around a room in a configuration that balanced the output volume of their instruments to the distance to cone, and other artists. He then too would often crank the handle to rotate the disc and record the performance.


Around the same time acoustic recording was getting off the ground a new technology was being developed. In 1906 the first vacuum tube was created and was used to amplify weak electrical signals. By 1915 telephone companies began using this technology to boost telegraph signals and later of course, telephone ones. During this time engineers began developing the first microphones; devices used to transduce (convert / change) air pressure variations into electricity. In 1920 the first “electric recording” was made.

Electric recording

Electric recording

Over the years microphone technology would grow and changes to the design would evolve. In all cases however the use of the microphone is the same. Take air pressure (mechanical energy – analog sound) and change (transduce) it to electrical signals (still analog) so it could be recorded to some medium. Until the 1940’s this was still wax platters (records).

Microphones have undergone changes over the years but all perform similarly. One of the first designs, still used today is the moving coil, or dynamic design.

Dynamic Mic Diagram

Dynamic Mic Diagram

In this design air pressure enters the mic head and pushes on a plate. This plate is attached to a coil of wires wrapped around a magnetic core. When a vocalist speaks, air pressure hits the plate, pushing against the diaphragm and thus the copper wires. As the wires move back and forth over the magnetic core a small electrical charge is created, which travels down two lead wires – creating a positive and negative signal. This signal is sent to an amplifier (a pre-ampliphier, or preamp) where it is amplified (using tubes in earlier times) to a level usable to record or listen with.

In the early 1940’s a company called Ampex began manufacturing magnetic tape and a new era of recording was about to get under way. The tape medium had been around for some time, in fact the Germans had been experimenting with the medium since the 1920’s, but the war and abundance of gramaphones meant that tape waited till about 1950 to be adopted.

With the adoption of tape the world stepped away from the inherent issues with acoustic recording. While amplification made recordings to disc better, the medium was still evolving and issues in the signal chain and electronics in general meant that recordings during the early years were wrought with sonic defects like extra top end (hiss), lack of bass and a serious deficit in dynamic range.

Magnetic tape offered a cleaner sound and more expressive dynamic range. It also offered the ability to edit. Now for the first time, a recordist could splice and paste tape together to create or fix audio from other recordings. This is also the first time that overdubbing becomes possible. Overdubbing is the process of recording to the same medium while at the same time, listening to a previously recorded take, and, even at times, punching new audio over old. We can thanks the guitarist Les Paul for this.

Because most recordings were being done at this time with microphones, tape made perfect sense. An electrical signal was sent from the mic down wire to the preamp and finally recorded to tape, still as an electrical signal. When tape is played back, that electrical signal, created by magnetization, is passed into two lead wires and into a speaker, which is really just a dynamic microphone in reverse.

Speaker Diagram

Speaker Diagram

Magnetic tape also allowed stereo audio to become prolific. The idea of stereo, a sound source recorded with two microphones at the same time which detail movement between and around those devices, was invented in Paris in 1890. Although in use, stereo wasn’t made popular until the introduction of tape. In fact, early adopters of tape and even HiFi record systems in the 1950’s startled neighbors with recordings of trains passing from speaker to speaker. At this same time the industry moves largely from amplification via vacuum tubes to the newly created, less power hungry and cheaper to produce, transistor. This made home recording and listening systems cheaper and easier to maintain.

Tape continues to dominate the market in professional circles all the way through the 50’s, 60’s, 70’s and 80’s. Reel to reel and large track count tape enable full band recording, overdubbing and duplication without much signal loss. Then, in 1982 we see a glimpse of the future in the first digital recording.


Digital recording is a paradigm that breaks the centuries long marriage to analog recording. During the acoustic, electrical and tape era’s, sound captured to a medium is a direct “analog” to the original signal, either in electrical charge of etched grooves identical to the mechanical air energy that created them. Digital recording however breaks this tradition by recording an approximation of the signal at some given time and interval.


In digital recording the front end of the signal chain remains the same. A recordist captures sound by placing a microphone in front of the source. That sound pressure hits the capsule of the microphone, gets converted into a weak electrical signal, travels down the mic cable to a pre-amplifier where it’s amplified (now mostly with transistors) and finally into a soundcard which houses an ANALOG to DIGITAL converter (AD). This AD takes a reading at specific intervals (sampling rate) and converts the voltage level to binary code (e.g. 1001110100001110010110). This sample then gets stored in series on the computers hard drive. The sampling rate of audio is somewhat analogous to a film camera. When a director shoots a scene a roll of film is run through the camera’s iris at a specific rate, 24 frames per second) Light enters this aperture and gets recorded onto this moving film at effectively, 24 pictures per second. Then, when played back at the same speed, a light projects the moving images onto a screen, recreating the scene. Another way to think about digital recording is by thinking of it as a flipbook, those fun books you either bought or made as a kid. When holding a book you’d notice the many pages of single pictures and as you thumbed through the book with some speed those individual images became a movie of sorts. Digital recording and specifically the sample rate, is exactly like this.

How a computer measures different sample rates

How a computer measures different sample rates


When audio (as voltage, and analog mind you) enters a soundcards’ analog to digital converter the voltage is read and a sample (picture) is taken at that moment. That picture (sample) is converted to binary code and stored in an allowable binary word length (bit depth) of some user determined length. The common bit depths, or word lengths we use these days are 16 and 24. This means a sample can take up a value in 16 ones and zeros, or twenty four.

16 Bit = 0000000000000000

24 Bit = 000000000000000000000000

A diagram displaying both sampling rate and bit depth

A diagram displaying both sampling rate and bit depth – This is a 16 bit word (file)

Let’s discuss the image above. You can see the graph is created with 16 horizontal divisions. This is our bit depth, or word length – 16 Bits. When the sound wave enters the AD converter(the red line above) the soundcard begins taking snapshots of the voltage level (sampling rate) and rounds the value to the nearest horizontal division. These are the blue dots on the soudn wave – the samples.

This is an important concept. Computers store, work and calculate things as discrete events and thus, not in real time. If you remember from before, acoustic and electrical recording work with real-time analog signals (voltage or air pressure). When we use a computer to record we need to adapt to the non real-time nature of computing. To do this we need convert our real-time audio signal to a discrete set of numbers. This is why we sample audio.

Bit depth can be a hard concept to understand but the general idea is fairly simple. I usually explain this concept with crayons. If you remember back to the first grade, you likely started school with a box of very large crayons, possibly 10. Now if we put aside your artistic abilities at this age, it’s likely you could have done some pretty great work with 10 colors. A few years later,  you grew a bit and finally made it to 3rd grade. This meant a school supply trip to the store where you likely picked up a nice large box of crayons once again, though this time there were 50 or 100 in the box. Now you had silver, fuchsia, gold, a multitude of blue and maybe desert salmon (ok, maybe not), but, this metaphor relates well to audio bit depth. A lower bit depth is like having only 10 crayons. You’re allowable color depth or range is limited by the numbers of colors and since you can’t melt the crayons and create new colors, you’re stuck with 10. However, give someone 100 colors and the resulting illustration could be more precise or artistic. It turns out volume works this way too – to a degree. A larger bit depth means that the artist can be more expressive. More expressive in terms of quality of audio and also dynamics (loud and soft volume). The idea here is the higher the bit depth, the better quality of audio.

Super Mario Bros. - An 8 bit game.

Super Mario Bros. – An 8 bit game.

Back in the 80’s when computer audio was in the hands of nerds, the consumer video game console began showing the public what digital sound could be like. Yes, in 1982 the CD arrived but it wasn’t until the late 80’s and really the 90’s that it took hold en masse. Back when Super Mario Bros. was released for Nintendo, digital sound often sounded like said game. Bleeps and blops. In fact, Super Mario Bros. was an 8 bit game and so the sound followed suit. In the 8 bit world, volume can only be expressed in 256 values, or levels. This might sound like a lot, but anyone who’s played 80’s video games knows it doesn’t sound like professional music. When game systems began taking advantage of faster processors they started using 16 Bit chips for audio and the values of volume allowed jumped from 256 to 65,536. Now for the first time the sound of a game matched the sound of the instrumentation. Actually, CD’s had been on the market for a little while now and the standards bodies agreed that CD standard would be 16 bit audio, at a sample rate of 44,100 Hz.

Confused still about sampling? Well, let me go further down the hole. When sampling audio it’s important to note that to be able to reconstruct the wave form correctly, a soundcard needs to sample twice for each oscillation of the wave, meaning that we need to capture the higher part of the sound wave as well as the lower part. Two samples per wave means we can effectively double the samples needed for what we’d like to capture. This nice bit of science can be attributed to the Nyquist-Shannon sampling theorem.


Humans have two ears and the ability to hear a range of frequencies and volumes. In fact, humans are able to specifically hear a spectrum of frequencies between 20 cycles per second to 20,000 cycles per second. Heinrich Hertz was a physicist who proved/studied electromechanical waves related to the matter and thus we now call cycles per second, Hertz or Hz for short. So to repeat, the human frequency range is 20Hz to 20,000Hz. That means we can hear cycles (waveforms) from 20 waves per second to 20,000 waves per second. Sounds pretty good right? Well, dolphins, dogs, elephants and even bats can hear outside and above our range. In fact, bats can hear all the way up to 200,000 Hz, so there.

When we decided to begin sampling audio we might have taken our upper limit of hearing, 20,000 Hz, and used that as the rate. But remember Nyquist/Shannon from earlier? Henry Nyquist and Claude Shannon proved through experimentation that in order to capture sound for playback that is close to what was heard during capture, two samples needed to be taken of each wave. Ok, so let’s double the rate then – 40,000Hz. Great. But… when deciding what rate to choose during the introduction of digital audio to the world, people remarked on the ability of some people to hear up to 22,500 Hz. The standards bodies agree to use that figure (and a little safety room never hurt anyone right?) when setting the first mass used digital media – the Compact Disc. Doubling 22,500 Hz got us to 44,100 Hz and that’s where we’ve been ever since.

Sample Rate and Reconstruction

Sample Rate and Reconstruction

Sampling, like bit depth, also benefits sound capture and playback if it’s numbers are increased. When film entered the digital world, the industry settled on 48,000 Hz as the standard and although 16 bit remained the norm for many many years, 24 bit audio is now most common for Film and TV.

You might ask, why not go even higher? Well, some do. People report they can hear differences in higher sample rates. I call bluff here on one major account, the upper limit of human hearing is around 22,500 Hz. At sample rates like 96,000 Hz we’d effectively hear half that, or 48,000Hz. It’s just not possible. The argument for larger sample rates come from downstream processing. If anyone is going to stretch , manipulate or augment the audio with digital processors, it helps to have twice as many samples to work with. The other argument comes when we speak of future proofing our recordings. Popular music is often recorded in 96,000 samples per second and who knows, in the coming decades, maybe auditory implants will allow us to hear above our normal range. But, recording at this rate has it’s drawbacks too. For one, file sizes get really big very fast. Second, you’re going to have to convert the audio to a sampling rate that is consistent with the industry and at a listening rate playable by most computers.

It’s also possible to record at higher bit rates as well. We spoke of 8 Bit and 16 Bit and even 24 Bit, so what’s next? Well, 32 Bit is already here. The argument for a larger bit rate is similar to a sampling one. More bits in the digtial word means we can augment the file more without as many distortions. Like was mentioned before, bit depth is related to an expressible range of volume. We discussed a range of hearing in terms of samples, that 20Hz to 20,000Hz spectrum. There’s also a range of volume too. Humans with good ears can hear a range of about 130 decibels.


What is a decibel? A decibel is a unit of measurement in sound that often details power or intensity. It’s a way to numerically describe a level of volume, or sound pressure level. The DB (decibel) scale has many variants but when dealing with acoustic audio we often speak in a scale that puts 0 as the threshold of hearing (quiet) and about 130 Decibels as the threshold of pain (loud).

In the digital world, the scale is flipped and we often speak of volume in the DBFS, or Decibels Full Scale. In DBFS, the ceiling is measured as 0 and all volume is read out in negative numbers below that. Inaudible sounds float around -120 DBFS or so where normal speech often sits around -70 DBFS. Music is mastered to around -1 DBFS Peak with averages (RMS) in the -12 DBFS range, while a films dialog usually sits around -27 DBFS or so. In DBFS, the closer to 0 you get the louder the sound is. Hitting all 0’s in the the audio waveform is said to be going full code. Think about this for a moment. Remember that bit depth is coded as a series of 1’s and 0’s. Any audio signal can be converted into a binary code, in say 16 bit, to some value in the 65,536 available values that are present. But what if we over-saturate the analog to digital converter with signal that’s louder than it can handle? How do we represent this in binary code?

1111111111111111 – that’s how. This represents the uppermost limit of codifying a signal, or going… full code. At this point, all audio above this level isn’t recorded and the resulting waveforms represent a square top, which sounds like a square wave, which sounds like unpleasant distortion.

In the analog audio world of tape or vinyl, it was often pleasing to push the medium past it’s limits. In fact saturation is a tonally pleasing thing. But, in the digital world, saturation is the worst kind of sound. Luckily, by recording at 16 or 24 bit word lengths, recordists are given a large dynamic range in which to record their sounds. So, it’s fairly safe to record at a lower level.

Let’s revisit the human range of hearing and bit depth. We left off wondering why we don’t often record in higher bit depths. First and again, file size gets bigger. But more importantly, bit depth is related to dynamics, or volume expression. A human has a range of about 130 dB’s. When recording 16 bit audio, we’re given a range of 96 dB’s, which is a good range of volume and has been the standard for compact discs for 30 years. We also record in 24 bit, giving us a range of 140 dB’s, which turns out to be more than the range of our hearing. So we’re now left with the question of why use more disk space and processing power when we’re not able to hear the difference? Well, aside from marketing gimmicks and the valid argument of capturing an insanely wide dynamic range of audio, in the end it’s not really worth it. In fact, we’d need to truncate the bit word down to a useable size anyhow for transmission and delivery. For most voice over, 16 bit is perfectly acceptable, and in the case of audiobooks, it’s mandatory. In the film world, it’s 24 bit all the way.

Let’s recap some general numbers that you should be familiar with and use when recording.

  • Standard voice over, compact disc audio and audiobook recordings should be at 44,100 Hz Sampling Rate and a word length of 16 bits.
  • Film, and video in general, mandates a sampling rate of 48,000 Hz and a word length of 24 bits.


Now that we’ve got the theory out of the way, let’s begin our discussion of hardware. It’s best then to move along the signal chain. We start with you, a VO artist and your vocal chords. Get a full nights sleep, it helps.

We then come to the microphone.

A mic is at most a capture device. It’s can also be iconic, a statement of wealth, just another hardware unit in a long line of cool stuff or the device you share more moments with than the outside world. In any case, choosing a mic is like choosing shoes, or colors for your house, or breakfast or maybe just what you can afford. Mics come in all flavors and dollar amounts.

While the argument for cost equalling quality can be made, I believe only up to a point. Ideally one should test their voice with many mics, trying a large number of styles and price ranges and only then, deciding what’s best based on what is heard. But alas, that’s not always possible. I’m not going to discus what brand or model is best for voice over, because there isn’t one. In fact, the location where you record has more affect on the sound than the mic does. But, it would be best if we begin our discussion with the types of microphones that are available.


We briefly discussed the dynamic, or moving coil, design earlier. I’d suggest you revisit the diagrams if it’s a little fuzzy for you.

The dynamic mic is a favorite of locations where movement is necessary. Places like concerts, stages, music halls, radio stations, churches. In almost all these scenarios, one is holding the mic itself and possibly moving around a stage. The distance to capsule often changes frequently and the vocal training of the user is often questionable. These mics are built to take a beating. They are often very rugged and built to be moisture resistant.

Shure SM58 - The most popular dynamic mic

Shure SM58 – The most popular dynamic mic

Dynamic mics don’t need extra power to record audio. You can simply plug the mic into a preamp and begin to speak. However, due to the physical nature of moving a coil around a magnet, some resistance to sound pressure is built in. This translates to a less sensitive mic capsule and one that needs a little more oomph at the preamp stage. It also means that quieter sounds aren’t registered as loud, things like the high frequencies in a voice. But this isn’t always a bad thing. This also means that the noise floor, or quieter room sounds, aren’t registered as loud either – this is a good thing. And, this is usually the mic of choice when needing to record very loud sounds like gun shots or plane bys.

A quick note on that preamp you’re plugging the mic into. The design of a good preamp means the maker has paid special attention to the mic input and transformers, also the wiring. When buying a cheaper soundcard with built in pre amp, many manufacturers are tied when it comes to quality in any one department. Some focus on the converters, some on design, some even on power management. In most cases however, the gain applied to the mic signal is never one of linearity. Often as a user turns the gain knob, the volume begins to become logarithmic near the top. In fact, many soundcards have a point when they actually just boost signal by a large amount, somewhere in the 90-95% of the dial rotation. For most mics this isn’t a problem. But, dynamic mics need a little more juice to register, often bringing the preamp volume into this last 10%. if you’re in the dreaded crossover zone you’ll notice an unhealthy jump in volume as you turn the knob. Luckily better preamps avoid this with great design, but they can be expensive. There is a solution however that does help solve this issue. The Cloudlifter by Cloud Microphones.



The Cloudlifter is a passive device that sits in your mic signal path. You plug your mic cable into one end and another XLR cable into the out and then into your soundcard. This device reduces small amounts of line noise, lifts the signal and boosts the gain 25dB, which is perfect for low output mics like dynamics, low output preamps and even long cable runs. If you’re looking into a dynamic mic, I’d suggest pairing it with one of these.


The condenser mic (or capacitor mic in the UK) is a studio mic. It’s often the choice of recordists for things that are delicate, like voice, strings, pianos etc. The fundamental design of this type of mic differs greatly from the dynamic one.

Condenser Mic Diagram

Condenser Mic Diagram

If we look at the diagram above you’ll notice the movement of air dictated by the arrows. This is your voice. The fist difference between a dynamic mic and a condenser one is power. A condenser mic needs power, 48 Volts actually. This almost always comes from your preamp, or if your preamp is part of your soundcard, then that. There is usually a button on the front/back of your preamp called +48, or Phantom, or 48v. To get this mic to work, you’ll need to flip that switch, push that button.

When you supply power to the mic, the juice actually travels up the mic cable itself, via 2 wires. Because this power isn’t supplied by plugging in a wall jack or extra mic plug, it’s often unseen, and thus called phantom power.

When power is supplied to the mic, the “Diaphragm” is electrified into a floating field, a field that exists now between that and the “Backplate”. This small field of electricity exists in a stationary amount and only fluctuates as pressure is applied to the diaphragm. When you speak, you put pressure on the diaphragm and the field between it and the backplate shrinks. This adjustment is registered as a voltage and it operates for both pressure on the plate and the return of that plate to it’s “normal” position. This whole concept is called Capacitance, hence the brits naming the device a capacitor microphone.

I usually explain this better with a pitcher of water. Let’s say you fill a pitcher of water up about 2/3rds. Then, you make a fist with your hand and plunge it into the pitcher. As you push down, the water level rises in proportion, and conversely, falls in similar amount as you pull your hand out. This displacement (movement) can be calculated and measured. When you speak into a condenser mic, you’re affecting (displacing) the charged field in a measured amount which gets converted to a voltage and then travels down the mic cable to your preamp.

Neumann u87

Neumann u87

The condenser mic is often more sensitive to sound pressure and reproduces higher frequencies with more detail and it owes this ability to the design of it’s capsule. Due to the fact that the diaphragm is floating in this electrical field, it’s much easier to move. This is in contrast to the dynamic principle of moving a capsule and wires around a magnet.

Because the mic is more sensitive to sounds, it’s also more apt to be affected my handling movement. While perfectly acceptable to handle and move the mic, more care should be put into it’s care.


The ribbon mic has been a stable in music recording now for decades. It’s often found in front of brass instruments and is credited with a smooth silky sound. Ribbon mics works on a design similar to dynamics mics.

Ribbon Mic Diagram

Ribbon Mic Diagram

The diagram above shows the design of a ribbon mic. Here we see a central piece of aluminum floated in a magnetic field between two poles. The ribbon itself has two leads (wires) protruding from both ends. As a user speaks the very thin piece of metal pushes back in the field and the proportion to it’s distance from center is what gets registered as a voltage. This concept is in line with the other name given to these mics, Velocity Mic. You might notice from studying the digram that due to the flat and unimpeded nature of a ribbon between two magnetic poles, it might be possible to speak into this mic from the front or back of the unit – and you’d be right. I mentioned the design was close to a dynamic one due to the magnetic nature of the voltage creation. One is moving something around or through a field to generate electricity.

Ribbon mics for the longest time were expensive and had one large, crippling and horribly destructive caveat, phantom power. Any power applied to the mic via mic cable, like phantom power (that 48 volts) would fry (melt) the ribbon capsule, rendering the unit useless. I mention this because it’s often the case that engineer’s leave phantom turned on most of the time. The power supplied is obviously necessary for condenser mics but it also doesn’t hurt dynamics. Ribbons on the other hand, fatally fry when in contact with this power. Over the years mic makers have taken note of this and lately you’ll find a new breed of ribbon called ‘Active’ on the market. These are safe for use with phantom.

AEA 44

AEA 44

Although ribbon mics are found in music studios the world round, you won’t often find them in voice over studios. Personally I’ve found they take more care than I’m prepared to put in for a studio workhorse like a vocal mic.


The USB mic is the invention of intersection, analog and digital. During the earlier 2000’s the market for podcasting and video chat began heating up and mic manufacturers took notice. Although studios have been using analog mics for years, the new crop of home users often didn’t want to invest or learn how to work peripheral studio gear. Enter the digital mic.

On the face of it, the mic itself is actually analog. The capsule, pressure plates and wires are all from the analog style condenser mic system. However, as the signal leaves the main diaphragm and travels down the wires it enters a purpose bulit-in AD (analog to digital) converter. The mic converts the voltage coming from the capsule into binary code right in the body of the microphone, then, as attached via USB cable, travels down to your computer.

The USB mic has made waves in the podcasting market and has found equal fervor in the voice over market as well. Voice over artists don’t need a lot of studio gear, in fact, really only a single input and headphones. The USB mic provides this at a reduced cost in comparison to the analog route.

Audio Technica AT 2020 USB+. A great USB mic.

Audio Technica AT 2020 USB+. A great USB mic.

USB mics are often cheaper, mostly under 200$. They work equally well on mac and pc systems and usually come with a simple tripod stand for desk use. But, due to the cheaper nature of the design and purchase price, some quality is often lost as well. Studios round the world almost never use USB mics to record talent. This is due mostly to an already embedded analog architecture of wires, mixers and a closet of already purchased mics. But it’s also for two strong other reasons as well. For one, the length of USB is limited to 16 feet and when recording audio, should never be extended via a USB hub. Doing so often results in clicks and pops in the audio recorded and is due to the nature of non real-time packet loss of signal. The other main reason for choosing analog is a sound one. Analog mics, those of value mind you, are often manufactured to a higher sonic standard and by nature only perform one function – capture of sound. USB mics on the other hand both record but also convert and it’s this co-habitation of singals and microchips that often leads to a loss in quality and more importantly, an elevated system self-noise. All electronics exhibit self-noise – it’s a by product of electricity and it’s related components. However, USB mics usually exhibit more of this, due in part to the proximity of analog and digital circuits and sometimes to the choice of cheaper parts. As mentioned above, these mics are often priced at or below 200$. For this, one gets both an analog mic, but also a digital converter and of course a box and full color printing and marketing. This price usually isn’t enough to cover quality components in all these areas.

I want to make a quick note however on the rising quality of USB mics. Manufactures are taking notice and putting more care lately into design and components. Many models now sound very good. I often discuss studio setup with people and when doing so I usually bring up type of work. If a VO actor is interested in quickly recording auditions and doesn’t want to fuss with extra gear, or if this actor often goes into an agency booth to record, or he/she doesn’t much care to setup a proper studio at home, then the USB mic is a great choice. It’s also a great starter mic and if and when you decide to upgrade, it becomes a great portable choice for travel.

If an actor is interested in audiobooks, singing or setting up a proper studio to handle real work at home at a broadcast quality level, then the analog route is the way to go. Keep in mind too that these dual paths are not interchangeable. A USB mic won’t work with a soundcard and an analog mic won’t plug into a USB port.

A quick aside on USB mics. Current trend has been to include a headphone port on newly released mics. We’ll cover latency later in this article, but it should be said that the inclusion of said ports aids the actor in hearing their voice in real time and also allows the computer to be located farther from the mic, minimizing unwanted fan noise. These are good things.


Voice over has a lot in common with other creative disciplines. Photography, audio editorial, music and web design all require practice, with an introductory learning curve and in most cases, require hardware purchases. Voice over too requires discipline and practice,


Process of digital audio