Detailed DMC Operation

I have come up with and implemented a model of the DMC which explains all DMC behavior I've so far observed, and properly handles the DMC saw waves hack. The following is a description of the model, which may not match the DMC hardware. Following the model are several tests which yield the same results running on an emulator based on the model and NES hardware.

Model

The DMC consists of three units: The sample buffer either holds a single sample byte or is empty. It is filled by the DMA unit and emptied by the output unit. Only the output unit can empty it, so once loaded with a sample it will be eventually output.

The output unit is always cycling. During each cycle it either outputs a sample byte or is silent for equal duration. The type of cycle is determined just before the cycle starts, based on the full/empty status of the sample buffer. Once a cycle has started, its type can't be changed nor can it be interrupted.

A cycle consists of 8 steps, each consisting of a delay followed by a possible action. The delay is determined by the DMC period ($4010) at the time the delay begins, thus the DMC period is accessed 8 times per cycle. For a sample output cycle, at the end of each step the next sample bit is applied to the DAC. For a silence cycle, each step consists of just the delay.

After a cycle is complete, a new cycle is started; its type is determined by the status of the sample buffer. If the sample buffer is empty, a silence cycle is started, otherwise a sample output cycle is started (using the sample from the buffer) and the sample buffer is cleared.

The DMA unit constantly watches for an opportunity to fill the sample buffer. If the the sample buffer is empty and there are bytes remaining in the current sample, the DMA unit reads the next sample byte from memory and puts it into the sample buffer. After fetching the sample, it decrements the number of remaining sample bytes; if this becomes zero, further action is taken: if looping is enabled, the sample is restarted (see below), otherwise an IRQ is generated (if enabled via $4010).

When a CPU write to $4015 occurs, bit 4 determines one of two actions to perform: if 0, set the remaining bytes in the current sample to 0. If bit 4 is 1 and the number of remaining bytes is zero, restart the sample, otherwise do nothing.

Restarting the DMC sample involves setting the DMA unit's current address and bytes remaining based on the values in DMC registers $4012 and $4013


Notes

If nothing is currently playing, the sample buffer will be empty and the output unit will be in a silence cycle. If the DMC is then enabled, the DMA unit will immediately fill the buffer. If the sample's length is only 1 byte, this will result in almost immediate clearing of the DMC's enable bit and an IRQ (if it's enabled). The sample buffer will then be filled, so when the output unit completes its current silence cycle, it will play the buffered sample byte.

If a sample is current playing and the DMC is then disabled, the sample buffer will still be full and the output unit will be in the middle of a sample, thus the currently playing sample byte will complete, then the remaining sample byte in the buffer will also play, then silence. It might be possible that the CPU disables the DMC just after the output unit empties the sample buffer and before the DMA unit notices it's empty, which would result in one fewer sample byte than usual before silence (the DMA unit seems to take a few cycles to run through its checking cycle).

The number of clock cycles until the transition of the DMC from enabled to disabled can be calculated as follows (the sample buffer will already be full):
    clocks until the end of the output unit's step +
    steps remaining in current cycle * DMC sample bit period +
    (sample bytes remaining - 1) * 8 * DMC sample bit period

Tests

I developed and tested the model with several carefully-designed sequences which were run on NES hardware. The model agrees with the results, and an emulator based on the model generates the same results. Each test is titled by the conclusion which it supports. Samples are shown of the output of NES hardware and the sequence which generated it.

NES ROMs for some tests are available. The ROM name is listed at the beginning of assembly sequences, if one is included. I plan on making better test ROMs which are designed to find defects in emulation, rather than test the DMC model as the current tests are designed to do. Feedback on improvements is welcome.

Each sequence starts out with the DMC's DAC stabilized at 32 (1/4 full range), and the DMC sample set to a series of $55 sample values which result in alternating positive and negative transitions. The DMC's frequency is set at the lowest and IRQ is disabled ($4010 = 0). The DMC sample length is set to 17 bytes ($4013 = 1). Many sequences mark a point in time by directly setting the DAC in order to generate a noticeable output transition.

There is an independent 8-sample-bit output section

Start (large positive transition) and stop (large negative transition) DMC at regular intervals

The DMC only responds every 8 sample bit periods, and the sample always plays for a multiple of 8 sample bits, indicating an independent sample output section that can only be configured every 8 sample bits, even when it's currently not playing anything (silent).

The lowest DMC frequency results in an 8-bit sample taking approximately 1.9 msec. By making the delays double this (3.8 msec each), the latency becomes constant:

                        ; latency.nes

      ldy   #4          ; iterations

    loop:
      lda   #36         ; mark output
      sta   $4011
      lda   #$10        ; start DMC
      sta   $4015

      lda   #22         ; delay 2.2 msec
      jsr   ms_delay

      lda   #32         ; mark output
      sta   $4011
      lda   #0          ; stop DMC
      sta   $4015

      lda   #43         ; delay 4.3 msec
      jsr   ms_delay

      dey
      bne   loop

There is an intermediate buffer in addition to what the 8-bit sample output section uses

In the previous test, the stopping latency was always over 8 samples, indicating an extra byte buffer which the 8-bit sample output section draws on. This test sets up a ramp sample in memory and configures the DMC for it. It starts the DMC and marks the output. Then it changes the sample value in memory to a neutral toggling value. The output shows that the DMC sample doesn't start playing until well after it's started (due to the previously demonstrated latency), so if the DMC didn't buffer the sample value, it should use the neutral toggling sample. The output shows that it uses the original value, indicating an additional byte buffer. The development cartridge I use contains RAM in the upper 32K address space and it allows the CPU to modify it; with ROM the equivalent could be achieved by switching banks.

      lda   #$80        ; DMC sample at $E000
      sta   $4012
      lda   #0          ; DMC sample length = 1 byte
      sta   $4013

      ldy   #4          ; iterations

    loop:
      lda   #$FF        ; set sample value to ramp
      sta   $E000
      lda   #$10        ; start DMC
      sta   $4015

      lda   #30         ; mark output
      sta   $4011

      lda   #$55        ; set sample value to neutral
      sta   $E000

      lda   #42         ; delay 4.2 msec
      jsr   ms_delay

      dey
      bne   loop

The intermediate buffer can only be emptied by the sample output section

This test enables the DMC, then immediately disables it, but an 8-bit sample is outputted anyway. This supports the existence of an intermediate buffer; it is filled immediately when the DMC is enabled, and is only emptied when the sample output section needs a new byte.

                        ; buffer_retained.nes

      lda   #$10        ; start
      sta   $4015

      lda   #$00        ; immediately stop
      sta   $4015

The status changes to "not playing" when all sample bytes have been read

Since the 8-bit sample output unit and intermediate buffer form an effective 2-byte buffer, the status can change to "not playing" up to 16 sample bit periods before the last sample bit is added to the DAC.

This test starts a 17-byte sample and polls the status. Once it changes to "not playing", the output is marked with a transition.


The transition occurs immediately after the last bit of the 15th sample byte is applied to the DAC, then the 16th and 17th sample bytes are output.
                        ; status.nes (status_irq.nes for IRQ version)

      lda   #$10        ; start DMC
      sta   $4015

    wait:
      bit   $4015       ; wait for status to change to 0
      bne   wait

      lda   #0          ; mark output
      sta   $4011
If the sample length is changed to 1 byte, the status changes to "not playing" before the sample even starts! This is explained by the intermediate buffer being filled immediately:



Back to Blargg's Video Game Sound Emulation