Sound Design and Production
In this article, I will be exploring the methods and principles of sound design and production for computer games. I’ll be researching the technical considerations needed to be given to actually producing sound.
Audio Limitations of Game Platforms
On earlier game platforms, such as the ZX Spectrum, the sound chips were only meant to produce sound effects for the games. It wasn’t as such the audio that was poor; it was the actual hardware on the platform itself. At the time of the Spectrum, the technology was nowhere near as advanced as it is now and thus developers could only have a certain amount of space for audio in their games, usually ending up as beeps or certain low-key sound effects. As time went on and technology improved, so did the hardware for the platforms and along with it the audio. Increased capacity for games, such as going from cartridges to CDs meant that instead of going for audio of merely in 128 -512KBs to something ranging between 600MB to 4GB. This meant that a game could massively increase in size in every aspect. Full soundtracks could be added to a game now. Consoles such as the PS1, 5th generation of platforms, had games such as Tony Hawks’ Pro Skater on it with complete track lists for songs. This technological step forward for audio in games would see the games audio industry become a massive part of games, with developers and publishers alike trying to get the best sound tracks possible for their games. Games such as the FIFA series, 6th generation platforms such as the PS2, had a massive track list and featured songs from the current charts. As time has progressed, so the limitations have diminished on consoles. From consoles only have simple beeps for sounds to having full track lists on current consoles; it is amazing how technology has advanced over the generations. Music only became important properly when the 8-bit systems were released. One leading innovator of this new method was Atari and the Pot Keyboard Integrated Circuit also known as POKEY, which was designed and patented by inventors Steven T. Mayer and Ronald E. Milner in 1982. The POKEY was the innovative technology and set the foundations for future sound chips. It has incorporated in it 4 semi-independent audio channels which could be configured as four 8-bit channels, two 16-bit channels or one 16-bit channel and two 8-bit channels, which per channel you could vary the volume, frequency and waveform, which would all play simultaneously. The design of the POKEY made it possible for game developers to put polyphonic music and sound effects of up to four channels in game. Though not in production anymore, the POKEY set the base mark and has been emulated ever since and set developers off in a race to develop the best audio chips. This innovative technology has got us to where we are today with audio in consoles.
Current technology, such as DVDs, CDs, Blu-rays’ and Downloads keep pushing forward with even greater space being developed on their software, making it even easier to have bigger and bigger files placed on them. With literally Gigabytes of memory on these products, elements of games have got bigger, such as the audio and graphics. The quality of Audio on such products like Blu-ray, has given audio a massive boost. These magnetic storage formats has given us optical storage space for our games, audio files etc.
Sound File Formats
There are dozens of audio file formats, all doing different things with different qualities and limitations. However I will be focussing on two of the main common file formats, WAV, which is Microsoft’s ‘Waveform’ file and MP3, which is Moving Picture Experts Group’s ‘Layer III’. I’ll start with WAV.
WAV is a Microsoft and IBM audio file format standard for storing on audio bitsream on PCs and is the main format used on Windows systems for raw and uncompressed audio of CD-quality sound files, which means that they can be large in size – around 10 MB per minute. Wave files can also contain data encoded with a variety of lossy codecs to reduce the file size.
MP3 is an audio specific format that was designed by Moving Picture Experts Group. It is the 2nd generation MPEG, with previous iteration being MPEG-1. To give the current version its official name is MPEG-2 Audio Layer III or “MP3”. It is a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage as well as a standard default of digital audio compression for the transfer and playback of music on digital audio players. The use in MP3 of a lossy compression is designed to greatly reduce the amount of data required to represent the audio recording and still sound like a faithful reproduction of the original uncompressed audio for most listeners.
Uncompressed audio
Uncompressed audio files are digital representations of soundwave, which are most accurate. However it can be a resource-intensive method of recording and storing digital audio, in terms of storage and management. They are generally the master audio formats of choice as they are suitable for archiving and delivering audio at high resolution due to their accuracy. An example of an uncompressed audio file format is WAV. It is the most widely used and common uncompressed file format.
Lossless compression
Lossless date compression is a class of data compression that allows the exact original data to be reconstructed from the compressed data. Lossless audio formats are most often used for archiving or production purposes, whereas smaller Lossy files being typically used on portable players and in other cases where storage space is limited and/or exact replication of the audio is unnecessary. Examples of lossless audio are Apple Lossless and MPEG-4 SLS.
Lossy compression
Lossy compression is a data encoding method which compresses data by discarding some of it. The procedure aims to minimise the amount of data that needs to be held, handled and/or transmitted by a computer. The advantage of Lossy methods over lossless methods is in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application. Examples of lossy audio compression are MP3 and WMA, with MP3 being a standard file format for portable music players which need to utilise space by minimising the file sizes as much as possible without reducing the quality within notability.
Audio Environment – Stereophonic and Manaural
Over the years, audio has changed. It initially started off with monaural sound reproduction, or mono as is commonly known. Mono is single channel audio, all coming from one single audio path regardless of the amount of microphones etc. Currently mono is still in operation as standard for radiotelephone communications, telephone networks and audio induction loops for use with hearing aids. Monaural these days has been replaced by the technologically advanced Stereophonic, which has a minimal of two audio channels through a configuration of two or more loudspeakers in such a way as to create the impression of sound heard from various directions as in natural hearing. It is now the common sound in the entertainment industry, such as TV and the cinema. As of recently, there have been further technological developments in creating a surreal audio environment for people. It is the 5.1 surround sound, which has 6 channels, or an even better one which has 24-channels of sound. Surround sound manages to bring in a whole range of techniques available to it, such as stereophonic, to enrich the audio experience with sound reproduced by additional speakers hidden in discrete places, which are strategically placed to give the listener a forward perspective of the sound field at the location. The effect of surround sound makes the listener feel they are there and adds a certain element of realism to the sound, making the human work like normal.
One style of audio that is being worked on is 3D. 3D sound and audio effect is growing and the techniques to implement them are similar to surround sound but give a 3D effect instead. For example, the listener may hear a sound that feels like it is coming from behind them when in fact it is just the speaker placed in such a position as to recreate this sensation. For the 3D sound effect to work, you have to encircle the listener with speakers that are left-surround, right-surround and back-surround, instead of having standard “screen channels” which are centre, front left and front right. 3D sound effect can convert standard audio such as stereo and 5.1 to 8.1 single and multiple zone 3D sound experiences in real time. This means sound can be manipulated and localized for the listener, meaning the sound could come from above, below or behind the listener.
Audio Sampling
The sample rate, which is measured in hertz or Hz, defines the number of samples per second. There was a theory created about this, aptly named the “Nyquist-Shannon Sampling Theorem” and states that, for example, if a signal has an upper band width of 100Hz, a sampling frequency greater than 200Hz will avoid aliasing and allow theoretically perfect reconstruction. Essentially, perfect reconstruction will be accomplished when the sampling frequency is greater than twice the maximum frequency being sampled. The minimum sampling rate that satisfies the sampling theorem and human hearing is 44.1 kHz. 44.1 kHz is used for CDs generally, however it also includes recording devices and CD-quality encrypted wireless microphones, amongst others.
Audio bit depth is the bit depth which describes the number of bits of information recorded for each sample. Bit depth directly corresponds to the resolution of each sample in a set of digital audio data. Common examples of bit depth include CD quality audio, which is recorded at 16 bits and DVD-Audio, which can support up to 24-bit audio.
In this article, I will be exploring the methods and principles of sound design and production for computer games. I’ll be researching the technical considerations needed to be given to actually producing sound.
Audio Limitations of Game Platforms
On earlier game platforms, such as the ZX Spectrum, the sound chips were only meant to produce sound effects for the games. It wasn’t as such the audio that was poor; it was the actual hardware on the platform itself. At the time of the Spectrum, the technology was nowhere near as advanced as it is now and thus developers could only have a certain amount of space for audio in their games, usually ending up as beeps or certain low-key sound effects. As time went on and technology improved, so did the hardware for the platforms and along with it the audio. Increased capacity for games, such as going from cartridges to CDs meant that instead of going for audio of merely in 128 -512KBs to something ranging between 600MB to 4GB. This meant that a game could massively increase in size in every aspect. Full soundtracks could be added to a game now. Consoles such as the PS1, 5th generation of platforms, had games such as Tony Hawks’ Pro Skater on it with complete track lists for songs. This technological step forward for audio in games would see the games audio industry become a massive part of games, with developers and publishers alike trying to get the best sound tracks possible for their games. Games such as the FIFA series, 6th generation platforms such as the PS2, had a massive track list and featured songs from the current charts. As time has progressed, so the limitations have diminished on consoles. From consoles only have simple beeps for sounds to having full track lists on current consoles; it is amazing how technology has advanced over the generations. Music only became important properly when the 8-bit systems were released. One leading innovator of this new method was Atari and the Pot Keyboard Integrated Circuit also known as POKEY, which was designed and patented by inventors Steven T. Mayer and Ronald E. Milner in 1982. The POKEY was the innovative technology and set the foundations for future sound chips. It has incorporated in it 4 semi-independent audio channels which could be configured as four 8-bit channels, two 16-bit channels or one 16-bit channel and two 8-bit channels, which per channel you could vary the volume, frequency and waveform, which would all play simultaneously. The design of the POKEY made it possible for game developers to put polyphonic music and sound effects of up to four channels in game. Though not in production anymore, the POKEY set the base mark and has been emulated ever since and set developers off in a race to develop the best audio chips. This innovative technology has got us to where we are today with audio in consoles.
Current technology, such as DVDs, CDs, Blu-rays’ and Downloads keep pushing forward with even greater space being developed on their software, making it even easier to have bigger and bigger files placed on them. With literally Gigabytes of memory on these products, elements of games have got bigger, such as the audio and graphics. The quality of Audio on such products like Blu-ray, has given audio a massive boost. These magnetic storage formats has given us optical storage space for our games, audio files etc.
Sound File Formats
There are dozens of audio file formats, all doing different things with different qualities and limitations. However I will be focussing on two of the main common file formats, WAV, which is Microsoft’s ‘Waveform’ file and MP3, which is Moving Picture Experts Group’s ‘Layer III’. I’ll start with WAV.
WAV is a Microsoft and IBM audio file format standard for storing on audio bitsream on PCs and is the main format used on Windows systems for raw and uncompressed audio of CD-quality sound files, which means that they can be large in size – around 10 MB per minute. Wave files can also contain data encoded with a variety of lossy codecs to reduce the file size.
MP3 is an audio specific format that was designed by Moving Picture Experts Group. It is the 2nd generation MPEG, with previous iteration being MPEG-1. To give the current version its official name is MPEG-2 Audio Layer III or “MP3”. It is a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage as well as a standard default of digital audio compression for the transfer and playback of music on digital audio players. The use in MP3 of a lossy compression is designed to greatly reduce the amount of data required to represent the audio recording and still sound like a faithful reproduction of the original uncompressed audio for most listeners.
Uncompressed audio
Uncompressed audio files are digital representations of soundwave, which are most accurate. However it can be a resource-intensive method of recording and storing digital audio, in terms of storage and management. They are generally the master audio formats of choice as they are suitable for archiving and delivering audio at high resolution due to their accuracy. An example of an uncompressed audio file format is WAV. It is the most widely used and common uncompressed file format.
Lossless compression
Lossless date compression is a class of data compression that allows the exact original data to be reconstructed from the compressed data. Lossless audio formats are most often used for archiving or production purposes, whereas smaller Lossy files being typically used on portable players and in other cases where storage space is limited and/or exact replication of the audio is unnecessary. Examples of lossless audio are Apple Lossless and MPEG-4 SLS.
Lossy compression
Lossy compression is a data encoding method which compresses data by discarding some of it. The procedure aims to minimise the amount of data that needs to be held, handled and/or transmitted by a computer. The advantage of Lossy methods over lossless methods is in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application. Examples of lossy audio compression are MP3 and WMA, with MP3 being a standard file format for portable music players which need to utilise space by minimising the file sizes as much as possible without reducing the quality within notability.
Audio Environment – Stereophonic and Manaural
Over the years, audio has changed. It initially started off with monaural sound reproduction, or mono as is commonly known. Mono is single channel audio, all coming from one single audio path regardless of the amount of microphones etc. Currently mono is still in operation as standard for radiotelephone communications, telephone networks and audio induction loops for use with hearing aids. Monaural these days has been replaced by the technologically advanced Stereophonic, which has a minimal of two audio channels through a configuration of two or more loudspeakers in such a way as to create the impression of sound heard from various directions as in natural hearing. It is now the common sound in the entertainment industry, such as TV and the cinema. As of recently, there have been further technological developments in creating a surreal audio environment for people. It is the 5.1 surround sound, which has 6 channels, or an even better one which has 24-channels of sound. Surround sound manages to bring in a whole range of techniques available to it, such as stereophonic, to enrich the audio experience with sound reproduced by additional speakers hidden in discrete places, which are strategically placed to give the listener a forward perspective of the sound field at the location. The effect of surround sound makes the listener feel they are there and adds a certain element of realism to the sound, making the human work like normal.
One style of audio that is being worked on is 3D. 3D sound and audio effect is growing and the techniques to implement them are similar to surround sound but give a 3D effect instead. For example, the listener may hear a sound that feels like it is coming from behind them when in fact it is just the speaker placed in such a position as to recreate this sensation. For the 3D sound effect to work, you have to encircle the listener with speakers that are left-surround, right-surround and back-surround, instead of having standard “screen channels” which are centre, front left and front right. 3D sound effect can convert standard audio such as stereo and 5.1 to 8.1 single and multiple zone 3D sound experiences in real time. This means sound can be manipulated and localized for the listener, meaning the sound could come from above, below or behind the listener.
Audio Sampling
The sample rate, which is measured in hertz or Hz, defines the number of samples per second. There was a theory created about this, aptly named the “Nyquist-Shannon Sampling Theorem” and states that, for example, if a signal has an upper band width of 100Hz, a sampling frequency greater than 200Hz will avoid aliasing and allow theoretically perfect reconstruction. Essentially, perfect reconstruction will be accomplished when the sampling frequency is greater than twice the maximum frequency being sampled. The minimum sampling rate that satisfies the sampling theorem and human hearing is 44.1 kHz. 44.1 kHz is used for CDs generally, however it also includes recording devices and CD-quality encrypted wireless microphones, amongst others.
Audio bit depth is the bit depth which describes the number of bits of information recorded for each sample. Bit depth directly corresponds to the resolution of each sample in a set of digital audio data. Common examples of bit depth include CD quality audio, which is recorded at 16 bits and DVD-Audio, which can support up to 24-bit audio.