These are the main functions of the EMU Ultra sampler as I understand them after many years of studying the schematics and poking around inside.
Note that a lot of this analysis is based on the assumption that inputs and bidirectional pins are on the left side of schematic symbols and outputs are on the right. This is correct for the obvious things and also seems to make sense in that right-hand pins tend to be connected to left-hand pins, but of course there might be exceptions or mistakes that throw me off.
The main CPU. This runs the system software, programs the custom chips, runs the user interface, handles MIDI, loads and saves sounds, etc.
The main CPU stores its operating system in Flash, a 72-pin SIMM. On the Ultras, this is a 4MB stick, but it can take up to 32MB, although I'm not sure such a module actually exists. It is slightly different to the sound Flash modules. It also has a single 72-pin socket for CPU-memory (uses for presets and sequences, as well as general runtime stuff), it should accept up to 64 megabytes, although it's not much use to have that much.
The Coldfire has two UARTs built in. Number 1 is connected to a six-pin debug header, unpopulated on the production EMUs (and silent if you do populate it), but probably used for some diagnostics. The second UART is the MIDI port on the DWAM board.
The floppy drive is handled by an Intel 82078 floppy controller, connected to the Coldfire. Standard PC stuff.
The IDE port is handled partially by the FPGA, the signalling lines coming from it and the data from the CPU.
The SCSI port is handled by an 85C80, with the signalling (reset etc) also coming from the FPGA. The 85C80 also contains a further two UARTs, one handling the built-in MIDI and the other the keyboard connector on the DWAM port.
The CPUs databus is connected directly to the RAM and RAM. All other devices are connected to only the top 16 bits (which is how the Coldfire connects to 16 bit devices) and only via two 8-bit latches, controlled by a buffered data output enable signal (schematic signal BDOEN) generated by the PAL. The system appears to make use of the DMA in the Coldfire 5206E.
The CPU gets a 33 MHz clock from a triple PLL which is connected to a 16 MHz oscillator.
The Ultra contains two bits of programmable logic, a fairly large Altera FPGA and a smaller PAL. There's very little other logic onboard (an inverter in the reset circuitry, a few buffers here and there). This is in contrast to the Classic E4s, that are based on lots of discrete logic.
The FPGA is connected to the CPUs data and address bus and handles all generation of chip select signals except for the ones used by the CPU RAM and ROM. Unfortunately, this means it is impossible to work out a memory map from the schematic alone.
The PAL is used to configure the FPGA, presumably with data contained in the CPU ROM. The PAL is connected to the CPU's bottom 5 data bits, the highest-order address bit (A23), all six chip select outputs (which, depending on configuration, are also address bits), and a write pin, but not to any clock source.
The FPGA and PAL are also involved in the sound generation, see below.
The display is an apparently off-the-shelf graphical LCD module (LMBJ6T003E34P), which features the Toshiba T6369 display controller, connected to the Coldfire bus via the FPGA.
The front panel LEDs are directly on two latches on the databus. The buttons are connected to an ASIC called the K-chip in a matrix configuration. The only interesting thing about this is that the network that connects the buttons to the K-chip is labelled BUTTSCAN on the schematic.
The K-chip has a bunch of unconnected pins on the Ultra - on the non-Ultra EMUs, these go to headers for attaching the (piano) keyboard on the E4K version.
Sounds from the EMU come from either sound RAM or sound ROMs (or sound Flash). There are four slots for ROMs and two for RAM. These share a 128MB address space. If a sound ROM is installed, anything over 64MB RAM is ignored.
There are two custom chips used for sound generation: the H-chips and the G-chips. These are programmed by the main processor. In the 64-voice system, there is one G-chip and 2 H-chips; the 128-voice system has double this.
The G-chips are called voice chips. They probably implement the patented EMU repitching algorithms, which interpolates between four adjacent samples using fancy maths. They are connected to both the regular CPU data and address busses, and to the sound memory data and address busses. The schematic layout suggests they output the addresses and read the sound data in. They also handle RAS and CAS for the sound RAM.
The H-chips implement the filtering. They're on the normal databus and address bus only, i.e. they do not see sound data from the memory directly, only from the G-chips.
Each G-chip is connected to two H-chips, the schematics suggests that these process even and odd voices respectively and probing with an oscilloscope they alternate for each note played. The G-chip has two audio data outputs that go to an input on the H-Chip. The H-chips have cascade input and output pins and form a chain, such that only the very last one is connected to the FPGA, which would appear to then perform the conversion to the I2S streams that feed the output DACs.
There is a major difference between the Ultra and the Classic E4s here, in that on the classics, the DACs are connected to the last H-chip directly, so we see it quite clearly turn the stream of voices into a submix for each output. On the Ultra, part of this is done by the FPGA, but I suspect that H-chip is still responsible for mixing in some way, and that this is why the ADAT and analogue output expansions, which both raise the number of output streams to 16, contain four more H-chips. The reason the FPGA is involved might be because the H-chip doesn't support the data format of the DAC in the Ultra, which is not the same as in the Classic.
The main DAC gets it I2S data from the outputs of the EMU8000 effects chip. The 8k is actually a sample-based synthesizer in its own right, often found on PC sound cards, but all the sound memory pins are simply left unconnected! It is connected to the regular data and address bus, and receives a single stream audio stream from the FPGA which is hard-wired to the main DAC. The other DACs are connected directly to the FPGA.
The G-chips and H-chips have a clock input, separate from the main CPU clock. This signal is labeled 512FS on the schematic. It comes from the FPGA, which gets a signal from the triple PLL labeled C22_25M (presumably 22.25MHz) that the schematic claims is for the sound engine.
The FPGA appears to take the sound stream from the filter chips and convert it to I2S signals that is fed to the DACs. The FPGA also handles the digital audio interface.
The PAL is used to decode the top 3 bits of the sound memory address bus, to generate chip selects for the sound ROM slots.