Wednesday, July 6, 2016

Chip tunes - Part 1 History and evolution

A sound chip is an integrated circuit (i.e. "chip") designed to produce sound. It might be doing this through digital, analog or mixed-mode electronics. Sound chips normally contain things like oscillators, envelope controllers, samplers, filters and amplifiers. During the late 20th century, sound chips were widely used in arcade game system boards, video game consoles, home computers, and PC sound cards.

Chiptune refers to a collection of related music production and performance practices sharing a history with video game soundtracks. The evolution of early chiptune music tells an alternate narrative about the hardware, software, and social practices of personal computing in the 1980s and 1990s. By digging into the interviews, text files, and dispersed ephemera that have made their way to the Web, we identify some of the common folk-historical threads among the commercial, noncommercial, and ambiguously commercial producers of chiptunes with an eye toward the present-day confusion surrounding the term chiptune.

In its strictest use, the term chiptunes refers to music composed for the microchip-based audio hardware of early home computers and gaming consoles. The best of these chips exposed a sophisticated polyphonic synthesizer to composers who were willing to learn to program them. By experimenting with the chips' oscillating voices and noise generator, chiptunes artists in the 1980s—many of them creating music for video games—developed a rich palette of sounds with which to emulate popular styles like heavy metal, techno, ragtime, and (for lack of a better term) Western classical. Born out of technical limitation, their soaring flutelike melodies, buzzing square wave bass, rapid arpeggios, and noisy gated percussion eventually came to define a style of its own, which is being called forth by today's pop composers as a matter of preference rather than necessity.

The earliest precursors to chip music can be found in the early history of computer music. In 1951, the computers CSIRAC and Ferranti Mark 1 were used to perform real-time synthesized digital music in public. One of the earliest commercial computer music albums came from the First Philadelphia Computer Music Festival, held August 25, 1978, as part of the Personal Computing '78 show. The First Philadelphia Computer Music Festival recordings were published by Creative Computing in 1979. The Global TV program Science International (1976-9) credited a PDP-11/10 for the music.

CSIRAC
CSIRAC Computer

Ferranti Mark 1
Ferranti Mark 1 Computer


Before the appearance of microcomputers at the end of the 1970s, digital arcade games provided the primary computing experience for people outside of financial data centers, university labs, and military research facilities. Installed in loud public spaces like bars and roller-skating rinks, the experience of playing these games was likely accompanied by the sound of a nearby radio, DJ, or jukebox playing the latest disco and progressive rock. In the early 1980s, computer gaming followed computers into the privacy of the home. The sound produced by arcade cabinets might have competed with other environmental noises, but many of the earliest home computer games included only a brief theme, a few sound effects, or no sound at all. The general-purpose home platforms were not as well suited to audio reproduction as the custom-built arcade cabinets. Nevertheless, during the first few years of the 1980s, the number of platforms diversified, and each new design provided a different set of affordances for the growing number of computer music composers to explore.

The Apple II home computer, released in 1977, included a single speaker inside of its case that could be programmed to play simple musical phrases or sound effects (Weyhrich 2008). In-game music was very rare as memory storage for audio data was limited and audio playback was costly in terms of the central processing unit (CPU) cycles (note 1). The Atari VCS game console, released the same year as the Apple II, was designed to be attached to a television. Its television interface adapter (TIA) controlled both the audio and video output signal.


Steve Jobs demonstrating the Apple II 

Atari 2600 VCS


Although the TIA could produce two voices simultaneously, it was notoriously difficult to tune (Slocum 2003). Rather than include multivoice harmonic passages for a machine with unpredictable playback capacity, games such as Atari's Missile Command implemented rhythmic themes using controlled bursts of noise for percussion instruments (Fulop 1981). Programmers charged with interpreting recognizable musical themes from arcade games, films, or pop groups were less free to experiment. Data Age's Journey Escape (1982), billed as "the first rock video game," struggled against the tonal limitations of the TIA in its squeaky interpretation of Journey's hit song "Don't Stop Believing," while Atari's E.T.: The Extra-Terrestrial (1982) presented a harmonically accurate re-creation of the original theme.



Example of Missle Command gameplay and sounds (Atari 1981)



Theme from journey escape (Data Age 1982)


Theme from Atari's E.T.: The Extra-Terrestrial (Atari 1982).


Continuous music was, if not fully introduced, then arguably foreshadowed as one of the prominent features of future video games as early as 1978, when sound was used to keep a regular beat in a few popular games. Space Invaders (Midway, 1978) set an important precedent for continuous music with a descending four tone loop of marching alien feet that sped up as the game progressed. Arguably Space invaders and Asteroids (Atari 1979, with a two-note “melody”) represent the first examples of continuous m music in games, depending on how one defines music. Music was slow to develop because it was difficult and time consuming to program on the early machines as Nintendo composer Hirokazu “Hip” Tanaka explains: “Most music and sound in the arcade era (Donkey Kong and Mario Brothers) was designed little by little, by combining transistors, condensers and resistance. And sometimes music and sound were even created directly into the CPU port by writing 1s and 0s, and outputting the wave that becomes sound at the end. In the era when ROM capacities were only 1k or 2k, you had to create all the tools by yourself. The switches that manifest dresses and data were placed side by side, so you have to write something like ‘1, 0, 0, 0, 1’ literally by hand. A combination of the arcade’s environment and the difficulty producing sounds led to the primacy of sound effects over the music in this early stage of game audio’s history.



Donkey Kong sounds and gameplay (Nintendo 1981)

By 1980, arcade manufacturers included dedicated sound chips known as programmable sound generators or PSGs into their circuit boards, and more tonal background music and elaborate sound effects developed. Some of the earliest examples of repeating musical loops in games were found in Rally X (Namco/Midway, 1980) which had a six bar loop (one bar repeated four times, followed by the same melody transposed to a lower pitch), and Carnival (Sega 1980, which used Juventino Rosas’ “Over the Waves” (waltz of ca 1889). Although Rally X relied on sampled sound using a digital to analog converter, Carnival used the most popular of early PSG sound chips, the General Instruments AY-3-8910. As with most PSG sound chips, the AY series was capable of playing three simultaneous square-wave tones, as well as white noise. Although many early sound chips had this four channel functionality, the range of notes available varied considerably from chip to chip, set by what was known as a tone register of frequency divider. In this case the register was 12-bit, meaning it would allow for 4,096 notes. The instrument sound was set by an envelope generator, manipulating the attack, decay sustain and release (ADSR) of a sound wave, by adjusting the ADSR, a sound amplitude and filter cutoff could be set.

AY-3-8910 Sound Chip


Gameplay and music from Carnival (Sega 1980)

Despite (or perhaps because of) the challenges presented, some developers embraced the limitations of these early home computing platforms. In preparation for the development of Activision's Pressure Cooker in 1983, Garry Kitchen determined a set of pitches that the Atari TIA could reliably reproduce. He then hired a professional jingle writer to compose theme music using only the available pitches. The resulting song is heard playing in two-part harmony on both TIA audio channels during the title screen. Pressure Cooker further challenged the audio conventions of the Atari VCS by including a nonstop soundtrack during game play. One of the TIA's voices repeats a simple, two-bar bass line, while the other is free to produce sound effects in response to in-game events.


Title screen and game play from Pressure Cooker (Activision 1983).


The Atari 8-bit family is a series of 8-bit home computers introduced by Atari, Inc. in 1979 and manufactured until 1992. All are based on the MOS Technology 6502 CPU running at 1.79 MHz, roughly twice that of similar designs, and were the first home computers designed with custom co-processor chips. This architecture allowed the Atari designs to offer graphics and sound capabilities that were more advanced than contemporary machines like the Apple II or Commodore PET, and gaming on the platform was a major draw; Star Raiders is widely considered the platform's killer app. Another computer with custom graphics hardware and similar performance would not appear until the Commodore 64 in 1982.


Atari 800
Atari 800 8 bit Computer

The Pot Keyboard Integrated Circuit (POKEY) is a digital I/O chip found in the Atari 8-bit family of home computers and many arcade games in the 1980s. It was commonly used to sample (ADC) potentiometers (such as game paddles) and scan matrices of switches (such as a computer keyboard). POKEY is also well known for its sound effect and music generation capabilities, producing a distinctive square wave sound popular among chip tune aficionados. The LSI chip has 40 pins and is identified as C012294. POKEY was designed by Atari employee Doug Neubauer, who also programmed the original Star Raiders.

Some of Atari's arcade systems use multi-core versions with 2 or 4 POKEY chips in a single package for more sound voices. The Atari 7800 allows a game cartridge to contain a POKEY, providing better sound than the system's audio chip. Only two games make use of this: the ports of Ballblazer and Commando.


This awesome 8 bit composition shows the power behind the POKEY sound chip.



Compilation of demos showing the Atari's 8 bit computers audio and video capabilities


The USPTO granted U.S. Patent 4,314,236 to Atari on February 2, 1982 for an "Apparatus for producing a plurality of audio sound effects. This referred to POKEY's sound generation abilities. The inventors listed were Steven T. Mayer and Ronald E. Milner.

No longer manufactured, POKEY is now emulated in software by classic arcade and Atari 8-bit emulators and with the SAP player.

In 1980, most home computer music remained limited to single-voice melodies and lacked dynamic range. Robert "Bob" Yannes, a self-described "electronic music hobbyist," saw the sound hardware in first-generation microcomputers as "primitive" and suggested that they had been "designed by people who knew nothing about music" (Yannes 1996). In 1981, he began to design a new audio chip for MOS Technology called the SID (Sound Interface Device). In contrast to the kludgy Atari TIA, Yannes intended the SID to be as useful in professional synthesizers as it would be in microcomputers. Later that year, Commodore decided to include MOS Technology's new SID alongside a dedicated graphics chip in its next microcomputer, the Commodore 64. Unlike the Atari architecture, in which a single piece of hardware controlled both audio and video output, the Commodore machine afforded programmers greater flexibility in their implementation of graphics and sound.

SID (Sound Interface Device) 6581
SID (Sound Interface Device) 6581 Chip used in C64 computers

Technically, the SID enables a broad sonic palette at a low cost to the attendant CPU by implementing common synthesizer features in hardware. The chip consists of three oscillators, each capable of producing four different waveforms—square, triangle, sawtooth, and noise (note 3). The output of each oscillator is then passed through an envelope generator to vary the timbre of the sound from short plucks to long, droning notes. A variety of modulation effects may be applied to the sounds by the use of a set of programmable filters to create, for example, the ringing sounds of bells or chimes.


Commodore C64 Sid Collection

Several peripherals and cartridges were developed to take advantage of the music-making possibilities of the Commodore 64's SID chip, but even the best of these products could not match the flexibility and freedom of working with the chip's features directly by writing programs in 6502 assembly language (Pickens and Clark 2001). Of course, although the SID's implementation of sound synthesis would be familiar to electronic musicians of the time, programming in assembly was a very different experience from turning the knobs and sliding the faders of a comparable commercial synthesizer like the Roland Juno-6. Early Commodore 64 composers had to write not only the music, but also the software to play it back.

In the mid-1980s, chiptunes and computer game music appeared largely indistinguishable. The game music was not distinct from the rest of pop music, however: the songs reflected the musical interests of their composers. Most of the composers discussed here were young men living in Europe and the United States, and the influence of heavy metal, electro, New Wave pop, and progressive rock were prevalent throughout the 1980s. By assigning a distinct timbre to each of the voices, the SID could emulate the conventional instrumentation of a four-piece rock band: drums, guitar, bass, and voice (Collins 2006). For example, Martin Galway's 11-minute title track for Origin System's Times of Lore (1988) reflects the influence of classical guitar in heavy metal. Like the opening section of Metallica's "Fade to Black" from 1984, Times of Lore begins with an arpeggiated chord progression played on one voice with a harmonized "guitar solo" layered on top using a second voice.



Theme from Origin System's Times of Lore (1988); music composed by Martin Galway.

In 1985, the Nintendo Entertainment System (NES) entered the North American market with a similar polyphonic audio capability to the SID. Nintendo games tended to include more in-game background music than their Commodore 64 counterparts because the cartridges on which its games were stored could hold considerably more data than the media available for the Commodore 64 (Collins 2006). The biggest distinction between the two platforms, however, was that the Commodore 64 was a home computer that happened to be well suited to gaming, whereas the NES was strictly a gaming console. The Commodore 64 shipped with programming tools, a QWERTY keyboard, and rewriteable diskette storage that enabled experimentation. The NES, by contrast, operated more like a VCR and loaded games from read-only cartridges.

The NES was introduced in Europe in 1986 but never achieved the success it found in the United States (Nintendo 2008). As the decade came to a close, European gamers appear to have favored programmable home computers like the Atari ST, Amiga, and IBM PC-compatible machines to the closed game consoles like the NES, Game Boy, and Sega Genesis. This divide in platform preferences explains why, in comments made in 2002, composer Rob Hubbard recalled "[missing] out on a lot of [chiptune] developments" by moving to the United States in 1987 (Hubbard 2002).


NES Music Compilation

A major advance for chip music was the introduction of frequency modulation synthesis (FM synthesis), first commercially released by Yamaha for their digital synthesizers and FM sound chips, which began appearing in arcade machines from the early 1980s.Arcade game composers utilizing FM synthesis at the time included Konami's Miki Higashino (Gradius, Yie-Ar Kung Fu, Teenage Mutant Ninja Turtles) and Sega's Hiroshi Kawaguchi (Space Harrier, Hang-On, Out Run).

Yamaha YM2612
Yamaha YM2612 Chip found in the popular Sega Genesis 16 Bit console


By the early 1980s, significant improvements to personal computer game music were made possible with the introduction of digital FM synthesis sound. Yamaha began manufacturing FM synth boards for Japanese computers such as the NEC PC-8801 and PC-9801 in the early 1980s, and by the mid-1980s, the PC-8801 and FM-7 had built-in FM sound. This allowed computer game music to have greater complexity than the simplistic beeps from internal speakers. These FM synth boards produced a "warm and pleasant sound" that musicians such as Yuzo Koshiro and Takeshi Abo utilized to produce music that is still highly regarded within the chiptune community. In the early 1980s, Japanese personal computers such as the NEC PC-88 and PC-98 featured audio programming languages such as Music Macro Language (MML) and MIDI interfaces, which were most often used to produce video game music. Fujitsu also released the FM Sound Editor software for the FM-7 in 1985, providing users with a user-friendly interface to create and edit synthesized music.

The widespread adoption of FM synthesis by consoles would later be one of the major advances of the 16-bit era, by which time 16-bit arcade machines were using multiple FM synthesis chips. A major chiptune composer during this period was Yuzo Koshiro. Despite later advances in audio technology, he would continue to use older PC-8801 hardware to produce chiptune soundtracks for series such as Streets of Rage (1991–1994) and Etrian Odyssey (2007–present). His soundtrack to The Revenge of Shinobi (1989) featured house and progressive techno compositions that fused electronic dance music with traditional Japanese music. The soundtrack for Streets of Rage 2 (1992) is considered "revolutionary" and "ahead of its time" for its "blend of swaggering house synths, dirty electro-funk and trancey electronic textures that would feel as comfortable in a nightclub as a video game." For the soundtrack to Streets of Rage 3 (1994), Koshiro created a new composition method called the "Automated Composing System" to produce "fast-beat techno like jungle," resulting in innovative and experimental sounds generated automatically. Koshiro also composed chiptune soundtracks for series such as Dragon Slayer, Ys, Shinobi, and ActRaiser. Another important FM synth composer was the late Ryu Umemoto, who composed chiptune soundtracks for various visual novel and shoot 'em up games.


Streets of Rage Sound Track (Sega 1991)


The Amiga is a family of personal computers sold by Commodore in the 1980s and 1990s. Based on the Motorola 68000 family of microprocessors, the machine had a custom chipset with graphics and sound capabilities that were unprecedented for the price, and a pre-emptive multitasking operating system called AmigaOS. The Amiga provided a significant upgrade from earlier 8-bit home computers, including Commodore's own C64.

The sound chip, named Paula, supports four PCM-sample-based sound channels (two for the left speaker and two for the right) with 8-bit resolution for each channel and a 6-bit volume control per channel. The analog output is connected to a low-pass filter, which filters out high-frequency aliases when the Amiga is using a lower sampling rate (see Nyquist frequency). The brightness of the Amiga's power LED is used to indicate the status of the Amiga's low-pass filter. The filter is active when the LED is at normal brightness, and deactivated when dimmed (or off on older A500 Amigas). On Amiga 1000 (and first Amiga 500 and Amiga 2000 model), the power LED had no relation to the filter's status, and a wire needed to be manually soldered between pins on the sound chip to disable the filter. Paula can read directly from the system's RAM, using direct memory access (DMA), making sound playback without CPU intervention possible.

Paula sound chip
Paula is primarily the sound chip in the Amiga, capable of playing 4 channel 8bit stereo sound, however it is also the floppy drive controller.

Although the hardware is limited to four separate sound channels, software such as OctaMED uses software mixing to allow eight or more virtual channels, and it was possible for software to mix two hardware channels to achieve a single 14-bit resolution channel by playing with the volumes of the channels in such a way that one of the source channels contributes the most significant bits and the other the least.

The quality of the Amiga's sound output, and the fact that the hardware is ubiquitous and easily addressed by software, were standout features of Amiga hardware unavailable on PC platforms for years. Third-party sound cards exist that provide DSP functions, multi-track direct-to-disk recording, multiple hardware sound channels and 16-bit and beyond resolutions. A retargetable sound API called AHI was developed allowing these cards to be used transparently by the OS and software.



Compilation of Amiga demos that were created by computer enthusiasts in Europe especially the nordic countries such as Finland, Denmark, Scandinavia and also Germany to certain extent, these demos were created to show the sound and video capabilities of the Amiga computers (I will talk more about the Demo Scene in the second part of these series of posts)


The Super Nintendo Entertainment System (officially abbreviated the Super NES or SNES, and commonly shortened to Super Nintendo) is a 16-bit home video game console developed by Nintendo that was released in 1990 in Japan and South Korea, 1991 in North America, 1992 in Europe and Australasia (Oceania), and 1993 in South America.

The SNES is Nintendo's second home console, following the Nintendo Entertainment System (NES). The console introduced advanced graphics and sound capabilities compared with other consoles at the time. Additionally, development of a variety of enhancement chips (which were integrated on game circuit boards) helped to keep it competitive in the marketplace.


Super NES Music was considered legendary, here's why

Behind the SuperNES impressive sound capabilities we have the S-SMP audio processing unit, The sound chip used in the SNES is the Sony SPC700 which consists of an 8-bit SPC700, a 16-bit DSP, 64 kB of SRAM shared by the two chips, and a 64 byte boot ROM. The audio subsystem is almost completely independent from the rest of the system: it is clocked at a nominal 24.576 MHz in both NTSC and PAL systems, and can only communicate with the CPU via 4 registers on Bus B. It was designed by Ken Kutaragi and was manufactured by Sony.

Nintendo S-SMP
Nintendo S-SMP Audio processing unit

The Sony SPC700  is the S-SMP's integrated 8-bit processing core manufactured by Sony with an instruction set similar to that of the MOS Technology 6502 (as used in the Commodore 1541 diskette drive and the Vic 20, Apple II, BBC Micro and in modified form in the original NES).

It is located on the left side of the sound module. It shares 64 KB of PSRAM with the S-DSP (which actually generates the sound) and runs at 2.048 MHz, divided by 12 off of the 24.576 MHz crystal. It has six internal registers, and can execute 256 opcodes. The SPC700 instruction set is quite similar to that of the 6502 microprocessor family, but includes additional instructions, including XCN (eXChange Nibble), which swaps the upper and lower 4-bit portions of the 8-bit accumulator, and an 8-by-8-to-16-bit multiply instruction.

Other applications of the SPC700 range from sound chip to the CXP82832/82840/82852/82860 microcontroller series. The Proson A/V receiver 2300 DTS uses an CXP82860 microcontroller that utilizes the SPC 700 core.

The S-DSP is capable of producing and mixing 8 simultaneous voices at any relevant pitch and volume in 16-bit stereo at a sample rate of 32 kHz. It has support for voice panning, ADSR envelope control, echo with filtering (via a programmable 8-tap FIR), and using noise as sound source (useful for certain sound effects such as wind). S-DSP sound samples are stored in RAM in compressed (BRR) format. Communications between the S-SMP and the S-DSP are carried out via memory-mapped I/O.

The RAM is accessed at 3.072 MHz, with accesses multiplexed between the S-SMP (1⁄3) and the DSP (2⁄3). This RAM is used to store the S-SMP code and stack, the audio samples and pointer table, and the DSP's echo buffer.

The S-SMP operates in a somewhat unconventional manner for a sound chip. A boot ROM is running on the S-SMP upon power-up or reset, and the main SNES CPU uses it to transfer code blocks and sound samples to the RAM. The code is machine code developed specifically for the SPC700 instruction set in much the same way that programs are written for the CPU; as such, the S-SMP can be considered as a coprocessor dedicated for sound on the SNES.

Since the module is mostly self-contained, the state of the APU can be saved as an .SPC file, and can be emulated in a stand-alone manner to play back all game music (except for a few games that constantly stream their samples from ROM). Custom cartridges or PC interfaces can be used to load .SPC files onto a real SNES SPC700 and DSP. The sound format name .SPC comes from the name of the audio chip core.

Bibliography:

Diaz & Driscoll. "Endless loop: A brief history of chiptunes". Transformative Works and Cultures. Retrieved July 6 2016.
http://journal.transformativeworks.org/index.php/twc/article/view/96/94

Collins, Karen (2008). Game sound: an introduction to the history, theory, and practice of video game music and sound design. MIT Press. p. 12. ISBN 0-262-03378-X. Retrieved July 6 2016


"History | Corporate". Nintendo. Retrieved February 24, 2013.
http://www.nintendo.co.uk/Corporate/Nintendo-History/Nintendo-History-625945.html

"Anomie's S-DSP Doc" (text). Romhacking.net. Retrieved July 6 2016

"Anomie's SPC700 Doc" (text). Romhacking.net. Retrieved July 6 2016

"CXP82832/82840/82852/82860 CMOS 8-bit Single Chip Microcomputer" (PDF). 090423  datasheetcatalog.org  Retrieved July 6 2016

"The Nintendo Years: 1990". June 25, 2007. p. 2. Archived from the original on August 20, 2012. Retrieved July 6 2016

Appendix O, "6581 Sound Interface Device (SID) Chip Specifications", of the Commodore 64 Programmer's Reference Guide (see the C64 article).
Retrieved July 6 2016

Bagnall, Brian. On The Edge: The Spectacular Rise and Fall of Commodore, pp. 231–238,370–371. ISBN 0-9738649-0-7.
Retrieved July 6 2016

Commodore 6581 Sound Interface Device (SID) datasheet. October, 1982. Retrieved July 6 2016

"Inside the Commodore 64". PCWorld. November 4, 2008. Retrieved July 6 2016.

"I. Theory of Operation". Atari Home Computer Field Service Manual - 400/800 (PDF). Atari, Inc. pp. 1–11. Retrieved July 6 2016

Michael Current, "What are the SALLY, ANTIC, CTIA/GTIA, POKEY, and FREDDIE chips?", Atari 8-Bit Computers: Frequently Asked Questions Retrieved July 6 2016

Hague, James (2002-06-01). "Interview with Doug Neubauer". Halcyon Days. Retrieved July 6 2016
The Atari 800 Personal Computer System, by the Atari Museum, accessed November 13, 2008
http://www.atarimuseum.com/computers/8BITS/400800/ATARI800/A800.html




Saturday, July 2, 2016

Sample based synthesis

Sample based synthesis
Sample-based synthesis is a form of audio synthesis that can be contrasted to either subtractive synthesis or additive synthesis. The principal difference with sample-based synthesis is that the seed waveforms are sampled sounds or instruments instead of fundamental waveforms such as sine and saw waves used in other types of synthesis.

Unlike analogue or FM, sample synthesizers utilize samples in place of the oscillators. These samples, rather than consisting of whole instrument sounds, also contain samples of various stages of a real instrument along with the sounds produced by normal oscillators. For instance a typical sample-based synthesiser may contain five different samples of the attack stage of a piano, along with a sample of the decay, sustain and release portions of the sound. This means that it is possible to mix the attack of one sound with the release of another to produce a complex timbre.

Commonly, up to four of these individual 'tones' can be mixed together to produce a timbre and each of these individual tones can have access to numerous modifiers including LFOs, filters and envelopes. This obviously opens up a whole host of possibilities not only for emulating real instruments, but also for creating complex sounds. this method of synthesis has become the de facto standard for any synthesizer producing realistic instruments. By combining both samples of real-world sounds with all the editing features and functionality of analogue synthesisers, they can offer a huge scope for creating both realistic and synthesized sounds.

Granular Synthesis


One final form of synthesizer that has started to make an appearance with the evolution of technology is the granular synthesizer it is rare to see a granular synthesizer employed in hardware synthesizers due to it's complexity, but software synthesizers are being developed for the public market that utilize it. Essentially, it works by building up sounds from a series of short segments of sounds called 'grains'. This is best compared to the way that a film projector operates, where a series of still images, each slightly different from the last, are played sequentially at a rate of around 25 pictures per second, fooling the eyes and brain into believing there is a smooth continual movement.

A granular synthesizer operates in the same manner with tiny fragments of sound rather than still images. By joining a number of these grains together, an overall tone is produced that develops over a period of time. To do this, each grain must be less than 30 ms in length as, generally speaking, the human ear is unable to determine a single sound if they are less than 30-50 ms apart. This also means that a certain amount of control has to be offered over each grain. In any one sound there can be anything from 200 to 1000 grains, which is the main reason why this form of synthesizer appeals mostly in the form of software. Typically, a granular synthesizer will offer most, but not necessarily all, of the following five parameters.

  • Grain length: This can be used to alter the length of each individual grain. As previously mentioned, the human ear can differentiate between two grains if they are more than 30-50 ms apart, but many granular synthesisers usually go above this range, covering 20-100 ms. By setting this length to a higher value, it's possible to create a pulsing effect.
  • Density: This is the percentage of grains that are created by the synthesizer. Generally, it can be said that the more grains created, the more complex a sound will be, a factor that is also dependent on the grains shape.
  • Grain shape: Commonly, this offers a number between 0 and 200 and represents the curve of the envelopes. Grains are normally enveloped so that they start and finish at zero amplitude, helping the invidigual grains mix together coherently to produce the overall sound. By setting a longer envelope (a higher number) two individual grains will mix together, which can create too many harmonics and often result in the sound exhibiting lots of clicks as it fades from one grain to the other.
  • Grain pan: This is used to specify the location within the stereo image where each grain is created. This is particularly useful for creating timbres that inhabit both speakers
  • Spacing: This is used to alter the period of time between each grain. If the time is set to a negative value, the preceding grain will continue through the next created grain; however, if this space is less than 30 ms, the gap will be inaudible.

The sound produced with granular synthesizers depends on the synthesizer in question. Usually, the grains consist of single frequencies with specific waveforms or occasionally they are formed from segments of samples or noise that have been filtered with a bandpass filter. Thus, the constant change of grains can produce sounds that are both bright and incredibly complex, resulting in a timbre that's best described as glistening. After creating this sound by combing the grains, the whole sound can be shaped by using envelopes, filters and LFOs.

If you want to know more about acoustic science please read. Rick Snoman's Dance Music Manual (Second Edition) Tools, Toys and Techniques.


Friday, July 1, 2016

Digital audio Part 4 - Signal Reconstruction & Dithering

Reconstructed signal
Reconstruction of the digitalized signal using simple retention

Let's see now with more detail the process of reconstructing a signal. The simplest procedure will consist in simply obtaining a proportional value of the binary number of each sample using a digital analogue converter and keep it constant until a new sample arrives, which is usually when a new sample cycle starts, this process is named simple retention and that's the procedure used in the figure above.

Once the signal has been reconstructed, we must use a smoothing filter which in this case is a low pass filter that rounds the sharp edges that are a result of the simple retention. Such filter must have similar characteristics to the antialiasing filter that we had introduced for the digitalization process, this means it should have a very abrupt slope and eliminate almost entirely the frequencies over 20 kHz and it should allow the ones which are under 20 kHz to pass through completely. These kind of filters are complex and they are likely to introduce phase distortions. To solve this situation the concept of oversampling has been introduced.

Oversampling consists in interleaving the samples of the signal that are really obtained or stored, other "samples" that are calculated by interpolation using complex algorithms, thus an oversampling with a multiplier of 8 adds 7 interpolated samples of every real sample. The result is equivalent to a sample rate 8 times superior to the original. If fM =  44.1 kHz, then the new sample rate would be 352.8 kHz which can be eliminated with lowpass filters that are much simpler and with less effects over the phase and transients of the signal. Oversampling is used regularly today in compact disc players, which is possible because the speed of electronics is much higher than when this technology had just arrived.

Dithering 


When very low level signals are digitalized (near to the converter's resolution) digitalization noise turns into a distortion whose effect is more harmful than random noise. For example if a 100 Hz sine signal was digitalized whose amplitud would be less than one step, the signal we'd obtain after reconstruction would resemble more a square wave as shown in the figure below, and because of this it will have harmonics in 300 Hz, 500Hz, 700Hz, etc. If instead of one sine wave we would apply two or more we'd get an intermodulation distortion that would be really undesirable. 

Distortion created by sampling low level signals


One way of solving these inconveniences is to apply a small amount of random noise before sampling and digitalization. This noise, whose effective value is less than one step is known as dither. Dither has the side effect of worsening slightly the signal to noise ratio, from an auditive point of view the distortion is then transformed into a random noise that is more acceptable specially in those low levels.

Dither is also usually applied  in the requantization processes, which means you want to reduce the resolution of a signal that was recorded in 20 bits to 16 bits so we can dump it into a commercial format like a compact disc. If we just truncated the 20 bit data eliminating the least significant 4 bits, we would get similar inconveniences to the one described. In this case noise is generated digitally and is added before proceeding to truncate the bits.



Thursday, June 30, 2016

Digital Audio Part 3 - Digitalization

Audio digitalization
Digitalization is the process of turning a type of information or signal into a number in order to store it, this number has to be a binary number,  and in the case of Audio this function is performed by a device called analogue / digital converter or ADC that converts tension values into binary numbers.

Once the sample is taken we need to store it, and in order to store it we need to transform it into a number, more specifically a binary number. The function is performed by a device named analogue / digital converter or ADC that converts tension values into binary numbers.

In the figure below we are using 3 digit binary numbers. Since every binary digit is denominated as a bit (from binary digit), we'd be using 3 bit numbers. It is easy to see that there are 8 (= 23)  3 bit numbers: 000, 001, 010, 011, 100, 101, 110, 111. To represent the diverse tension values that our samples could take, we divide the range of variation of the signal in 8 levels, and we approximate every sample to the immediate inferior level. 

In the central part of the figure below, we can compare the exact samples (empty dots), and the digitalized samples (filled dots). By comparing them we can see that the maximum error that occurs is of a division that is corresponding to one bit. The re-constructed waveform is considerably different from the original due to the fact 3 bits is a very low resolution.

Audio Digitalization
Effect of the sampling and digitalization process applied to a sine wave. The resolution is of 3 bits and the sample rate is 14.7 times higher than the wave's frequency. In the central figure the empty dots represent the exact samples and the filled dots represent the digitalized samples. Below is the reconstructed signal.

In the previous example we adopted, in an arbitrary way a 3 bit resolution. The result as it could be observed was very lacking because the reconstructed waveform is very distorted. It would be interesting to have a more systematic criteria to select the required resolution.

The problem is similar to deciding how many decimal digits would be required to present with pinpoint accuracy a given length of objects that are less than 1mt. In order to do this we would need 3 decimal digits, because such objects would have a measure between 0 and 999 mm. If we required a precision of tenths of millimeters, we would need 4 digits, because objects could have a measure between 0 and 9.999 tenths of mm.

In the world of audio, the criterion to determine the "precision" is the signal to noise ratio. Let's analyze the example of the figure above from that point of view. Forgetting about the own noise that the signal might contain, one collateral effect of digitalization is the appearance of an error, that could be assimilated to a noise. This noise is known as digitalization noise. Under this interpretation, the maximum peak to peak value of the signal is proportional to 8, and the maximum peak to peak value of noise is proportional to 1. Thus the signal to noise ratio is 8/1 = 8 that expressed in dB is:

 

If we take into account that in audio high fidelity is handled nowadays signal to noise ratios over 96 dB we can understand why a 3 bit resolution is totally insufficient. 

Let's suppose now that we increase the resolution to 4 bits. Due to the fact we have 16 possible values now instead of 8 the signal to noise ratio in dB would now be:


We can see that we had a 6 dB increase. This can be interpreted like this: While the signal's amplitude didn't change, when we doubled the amount of levels, every level reduced to the half, thus the signal noise was halved as well. Then if the signal to noise ratio is doubled, and a doubling is equivalent to a 6 dB increase. If we now increase the resolution in just 1 bit, taking it to 5 bits, we can observe that noise will be reduced in half as well, so the signal to noise ratio would experience another 6 dB increase.

We could obtain a more general expression of the signal to noise ratio. If we adopt a resolution of n bits, where n is any integer number, resulting in:



Applying this formula to the standard 16 bit resolution that is used in the most popular storage formats of digital audio, it results in a signal to noise ratio of 96 dB. This signal to noise ratio is, in normal conditions, enough to create impressive dynamic contrasts. In effect let's take into account that rarely we feel music in a level over 110 dB (which is quite deafening and not recomendable at all). If we subtract 96 dB to this value, we obtain 14 dB, a level of sound that probably few people would have "listened", because even during night time when sound conditions are very silent in an isolated room, it's normally difficult to get sound pressure levels below 20 dB.

It's necessary to warn that even though a system is working with digital audio formats of 16 bits, it's signal to noise ratio won't necessarily be of 96 dB. This is because given the diversity of analogue components that make part of every device, some noise is generated and this noise is added to the digitalization noise. In low cost equipment  low quality electronics are used, so it's manufacture is particularly noisy and the signal to noise ratio is much less than 96 dB.

Bibliography: Federico Miyara (2003) Acústica y sistemas de sonido. UNR Editora
ISBN 950-673-196-9