Windows driver API basics

Before you continue: this text is very technical. If you are not interested in technical backgrounds, don't read it.

We get many questions directly related to the different driver interfaces that are available under Windows. This article informs you about the basics behind soundcard drivers on a technical but still simpliefied level. Everyone talks about WDM, ASIO, MME, DirectX, etc. these days ... however, we had to notice from time to time that the terms are often mixed up with each other or are even used when they definitly should not be used. We hope that this article clears up possible confusion.

"Nice introduction but what is an API anyway?" - this is what you may think ... API stands for Application Programming Interface. That's the way/method an application can use to access a certain function. A soundcard driver API is used by audio software to access the hardware. Common APIs for sound- or audiocards are: MME (also called 'wave' or 'mmsystem'), DirectSound (sometimes called 'DirectX') or ASIO. The often mentioned WDM is not an API but more about that later.

structure of a driver

driver structure diagramThe diagram on the right side shows the basic structure of an audiocard driver for the Windows operating systems. The darker area with the blue font represents what we call "driver" in general words. The software is accessing the driver section, the driver section accesses the hardware. This means that the driver is the interface between your audio application and the audiocard. That's really simple :-)

The way the application is talking with the driver is defined by the used API. The four main APIs that are used with professional audiocards at the moment are: MME, DirectSound, ASIO and GSIF. Note that most people usually refer to these APIs still with the word 'driver' although they are just small parts of the structure offered by the complete audiocard driver. In fact, the MMSYSTEM DLL files as well as the DirectSound interface DLL files are part of the Windows operating system and universally used for all soundcards, they are not part of the driver offered by the soundcard manufactuer. Other APIs such as ASIO are also represented by a DLL file and are in reality nothing more than that (DLL = Dynamic Link Library, a module that is loaded and used by the application on request). This explains why the word 'driver' is wrong for the APIs but nearly everyone is using this term now, so we will do it as well from time to time.

The APIs are accessing the real kernel module of the (real) driver that is loaded by the operating system on startup. The diagram displays this separatly from the four APIs one level below them.

But why are there so many different APIs anyway (the four listed ones are not alone ... these are only the most common ones)? Explaining that would be a very long story that exceeds the limit of such an article. The main reasons to invent new APIs can be found in the interests of the different software vendors to have special support for special functions of certain hardware: existing APIs often did not provide the needed functionality.

MME or 'Wave' API

The MME devices (or drivers) have been invented by Microsoft with the not very well known operating system called "Windows with MultiMedia Extensions 1.0" that was based on Windows 3.0 and was the base for later released Windows 3.1 and 3.11. The name of the OS ("MultiMedia Extensions") was also used for the API to access soundcards. Windows MME 1.0 was really the first Microsoft operating system that provided a universal API that worked with any soundcard hardware exactly the same way (if there was a driver). The same API with small modifications is used until today to playback and record wave audio.

If you go to Control Panel > Multimedia in any modern Windows version (95 and higher) you can select the prefered devices for audio playback and recording. These devices are representing the MME API of the soundcard driver. MIDI I/O is also handled via MME devices.

DirectSound API

DirectSound was invented by Microsoft after the release of Windows 95. The reason was to provide game developers a more flexible way to access soundcards from Windows applications. Together with a number of other similar operating system extensions like DirectShow (filtering and playback of streaming audio or video signals ... in the audioworld often refered to as "DirectX plugins"), DirectDraw (direct and faster access to the graphic card) or later DirectSound 3D (output of 4 channel audio signals), the term DirectX was used as a name for the operating systems extensions. With this powerful package of simplified APIs that just provided the most important functions with best performance, Microsoft could convince most makers of PC games to develop for Windows rather than for the older DOS plattform.

Soon a number of software vendors noticed that DirectSound has an advantage over MME when it comes to playback because of different buffer handling: the latency was smaller. Latency (the time it takes until you can hear the signal) is critical for software synthesizers. Most early software synthesizer applications did use DirectSound rather than MME because of this (until today DirectSound is especially famous among makers of shareware software). DirectSound however has a huge limitation: it is not possible to record audio signals.

If a driver is not optimized for DirectSound, Windows will automatically emulate DirectSound output using the MME devices. If a WDM driver is used (see below), DirectSound support is not implemented by the driver developer but by the operating system. Any WDM driver provides full DirectSound compatibility as Microsoft defined the WDM structure in a way to allow universal access by the DirectSound function to any WDM driver.

ASIO API

ASIO stands for Audio Streaming Input/Output and was invented by Steinberg [1]. It was introduced into the PC world with the presentation of the Cubase VST 3.5 software. The first version of the ASIO API (ASIO 1.0) did provide the possibility to reach similar or even lower latency values as DirectSound but both not only for playback, also for recording. Theoretically this could be done with some MME drivers also so why did Steinberg develop their own interface? The main reason was not the low latency (originally) but the multichannel functionality. With ASIO 1.0 it was possible for the first time to have access to all physical input and output channels of the hardware over one single device. This made work for the application programmers easier. At the same time it reduces the CPU load when multiple I/O channels are used and improves the synchronisation between the different physical I/O channels.

The ASIO 2.0 API was released later and added control functions for the monitoring functions of the hardware. Cubase (and other apps supporting ASIO 2.0) are able to control the input monitoring of the hardware remotely. Also the ASIO 2.0 documentation emphasizes the importance of proper sync between the different physical channels (although this is possible with exactly the same results on ASIO 1.0 already).

GSIF

The GigaSampler-InterFace support was invented by Nemesys (now Tascam [2]) to allow multichannel output for the GigaSampler (and later GigaStudio) software. As ASIO it allows to access the hardware for multichannel playback over one device. Recording is not possible. You may argue that the same functionality is also possible with ASIO and GSIF would not be needed and yes, probably you are right with that. Nemesys certainly had reasons not to use the already well-supported ASIO interface invented by a different audio software vendor.

GigaSampler / GigaStudio also work with DirectSound drivers but that does not allow output of multiple channels at the same time. Because of that, special GSIF support is needed for multichannel audiocards.

Multiclient GSIF support

As the different driver models (... that word again, correction: the different APIs) are competing with each other, there are simple technical problems if you want to use them at the same time from different audio applications. Now GigaSampler/-Studio originally was not developed to be used with a Audio-/MIDI-sequencer software at the same time on the same PC, still many users demanded that functionality from the vendors. The solution: a special multiclient driver that allows the usage of ASIO or MME at the same time as GSIF, usually the physical output channels need to be assigned to one or the other driver model.

But what about WDM?

OK ... you might say that all this info sounds interesting but where is that info about WDM? The reason is simple: nothing that was mentioned so far has any relation with WDM. WDM is not an API, it does not compete with MME, ASIO, DirectSound or any other API.

WDM stands for Windows Driver Model and is nothing more and nothing less than a file format for a driver that is now used for soundcards. WDM drivers can be installed under Windows 98 SE, Windows ME, Windows 2000 and Windows XP. Other Windows versions are not supported (esp. Windows 95, Windows 98, Windows NT 4.0). Microsoft invented this format to allow hardware vendors to make one driver for all current and future Windows operating system versions. Also WDM drivers have special features (more later) that are not available on other driver models/formats.

The well-known competing format is called VXD, the driver format for Windows 9x/Me for soundcards (in .VXD file format). Also competing is the NT4 Kernel Mode driver model (as WDM in the .SYS file format) that can be used under Windows NT 4.0, 2000 and XP.

All three driver models/formats (WDM, VXD, NT4 Kernel Mode) have the ability to provide different APIs for the audio software. This includes all APIs mentioned above: MME, DirectSound, ASIO, GSIF. This ensures software compatibility between the different driver formats. For a normal application that is using the soundcard via one of these APIs, it makes no difference at all if the driver format is WDM, VXD or NT4 Kernel Mode.

WDM provides a few special functions that are not available on the older driver models. The most important addition is the so-called KMixer (= Kernel Mixer) that allows mixing, effect processing (via operating system plugins), encoding/decoding (e.g. mp3 or AC-3 data) that works on kernel mode built-into the operating system. The problem however is that the KMixer adds about 20~30ms of latency to any processed audiosignal. What is nice for consumer applications is not usable for any modern audio-sequencer that integrates software synthesizers.

That means that WDM is not as good as VXD or NT4 Kernel Mode when it comes to regular audio applications using DirectSound or MME. Of course it makes no difference when using ASIO or GSIF as these APIs are bypassing the KMixer.

Another limitation of WDM is the number of available devices under Windows 2000: Windows 2000 only allows you to have max. 10 MME wave devices with WDM drivers installed in the system. This means that you cannot use all I/O channels from MME applications under Windows 2000 when more than one ST Audio DSP24 series card is installed as the number of I/O channels exceeds the Windows 2000 limitation. ASIO or GSIF applications are not affected by this limitation. Microsoft noticed this problem after serious protests from various audiocard vendors. As a result, this has been fixed under Windows XP.

WDM Kernel Streaming

WDM also introduces another way to access the audiocard hardware, this method is called WDM Kernel Streaming (WDM KS). It makes use of the same strucuture every real WDM driver has to use the KMixer but bypasses the KMixer. This is done without the usage of the regular MME, DirectSound or even ASIO APIs - the driver's kernel module is accessed directly from the audio application. This method was first used by Cakewalk [3] in their SONAR software. By accessing the kernel module of the driver directly from the application without the usage of any high level API, very low latency figures can be achieved (similar to ASIO, depeding on the driver structure and hardware even lower than with ASIO). Other (but not all) software vendors are now working to support WDM KS inside their future audio applications.

>> Back to Knowledge Base <<

last updated: 01/31/2003 author: Claus Riethmüller


References to other documents or external websites
[1] Steinberg website - the website of Steinberg; makers of Cubase VST
[2] Tascam website - the website of Tascam; the vendor behind GigaStudio
[3] Cakewalk website - the website of Cakewalk / Twelve Tone Systems; makers of SONAR

Einige der genannten Waren- oder Firmennamen sind eingetragene Warenzeichen ihrer jeweiligen Hersteller und unterliegen daher den entsprechenden gesetzlichen Bestimmungen. Seite zuletzt aktualisiert am 17.04.2005.