I2O is an emerging standard for Intelligent
Input/Output. A primary benefit of I2O is the ease
with which multiple peripheral devices and boards can be
integrated into a single system. Such systems are common in DSP
applications, where different types of I/O and processing
boards are required. I2O's implementation requires
placing intelligence on the peripheral board in the form of a
local processor. With this, the host CPU no longer has to act
as the data distributor. The peripherals can themselves carry
out peer-to-peer communications. As a result, the CPU response
time is faster and more deterministic and allows a DSP to focus
on the signal processing task at hand. This article presents
benefits of I2O for signal processing and
telecommunications applications.
The Need for Intelligence
Demands on CPU's are increasing steadily. Operating systems
are becoming more and more complex. The number and complexity
of peripherals is increasing, and the data through I/O channels
is expanding. The types of I/O with which the CPUs are having
to contend are voice, audio, video, and large data files.
Supporting the CPU to manage the data are intelligent I/O
devices and co-processor type devices, such as DSP boards.
Today's applications require multiple O/Ss, I/O, and
co-processors as shown in
Figure 1. Integrating these
systems requires development of complex device drivers.
Figure 1: A typical multi-board system for DSP
applications with a CPU, coprocessor, DSP board, and multiple
I/O channels.
Consider the dashed line arrows in Figure 1. These
indicate the devices that need to communicate with one another.
Writing a single device driver (non- I2O) can be
challenging. Implementing a multi-platform intercommunicating
driver is exponentially more difficult. Imagine writing the
device drivers to make all these devices talk to one another as
indicated, without intelligence on the boards; it's a daunting
task. Making those drivers actually work is another. All the
interrupts need to be configured, priority structures must be
established, and custom code to support all the
inter-communication(s) must be written and debugged.
Here, large volumes of data are ostensibly transferred from
board to board. Control is passing between multiple layers of
the architecture. Without intelligence in the I/O and
co-processor boards, the hosts can spend significant time
handling I/O requests and transfers. The I/O and co-processor's
throughput will suffer as well and overall performance will be
limited.
I2O
The onset of these issues has led to the creation of a new
specification called Intelligent Input/Output, or
I
2O. The objective of this specification is to
provide a device driver structure that is independent of both
the individual device being controlled, and the host O/S where
the application is running. This independence is accomplished
by adding intelligence to the peripheral platform, and by
separating the device driver into two parts. A new "messaging
layer" is then implemented in the system for communication
between the two.
Figure 2: A simplified DSP peripheral platform with
multiple I/O devices and a local processor for
I2O
Figure 2 shows a simplified I2O device
platform, which illustrates the use of multiple "devices" on a
single board. The platform also contains a local processor (the
"intelligence"); it processes I2O messages and
executes the device driver modules, both of which are described
below.
Split-Driver Model
Figure 3: Split driver model for
I2O"traditional device drivers are written as a
single block of code, interfacing to both the O/S and the
hardware device. I2O splits the driver into two
pieces and defines a standard set of messages to be used for
communication between the two.
Historically, device drivers have been written specifically
tailored to a particular O/S and a particular peripheral
device. In I2O, the driver is split into two parts:
the OSM (O/S Services Module) which provides the interface only
to the O/S, and the HDM (Hardware Driver Module) which provides
the interface only to the peripheral device. The two
communicate (Figure 3) via standard message packets
across a layered system composed of a messaging layer which
resides on a transport layer. The messages are passed between
the OSM's and HDM's via two virtual FIFO queues"one outbound,
one inbound.
To accomplish this communication, an I/O processor (IOP) is
required on the peripheral side to process the HDM's. Intel's
i960RP processor is specifically tailored for I2O;
however, the HDM (Figure 3) can be hosted on any
applicable processor. By standardizing the messages, platforms
communicate without knowledge of underlying bus architectures,
OS's, device specifics, and I/O hardware. Thus buses such as
VME, PCI, cPCI, and so on are all potential candidates for use
with this specification.
I2O Classes
For standardization, several "classes" of devices have been
defined by the I
2O Special Interest Group (SIG),
each utilizing a specified set of standard messages to
communicate with the host. "A message class is a formal
interface describing the messages that can be sent to the
interface and the replies that will return." Currently, the
following message classes have been identified (with
examples):
- Random Block Storage (HDD or CD-ROM)
- Sequential Storage (tape drives)
- LAN (Ethernet or Token Ring)
- WAN (ATM controller)
- Fibre Channel Port
- SCSI Peripheral
- ATE Port (ATE controller)
- ATE Peripheral (an ATE device)
- Floppy Controller
- Floppy Device
- Bus Adapter Port
- Peer - Peer.
Additionally, RadiSys (DSPD) is currently designing the
specification of a class for Telecom devices.
Benefits of I2O
Fewer interruptions
Messages in I
2O are passed between the OSM(s) and
HDM(s) via two virtual FIFO queues"one outbound, one inbound -
thus substantially reducing the number of interrupts the other
processors in the system need to handle. In ordinary systems,
the interrupts of the CPU rob it of a significant portion of
its processing power. So in lightening the load on it,
I
2O enables the main processor to devote its power
to running code, rather than managing I/O traffic!
Multi-Processor, Multi-Peripheral Platforms
The messaging structure of I2O enables it to
operate in systems with any number of hosts, I/O, and DSP
platforms. In Figure 4, host CPUs operating under two
different operating systems are communicating with three
different classes of devices (A,B,C) on two separate I/O
platforms (or boards). Communication is via messages between
host and peripheral, or peripheral to peripheral.
Figure 4: I2O can communicate across
multiple operating systems, and between peripheral platforms on
a peer-to-peer basis
System Control
A system such as is seen in Figures 1 or 7 without
I2O would be brought to a halt should the controller
processor need to be reset. I2O loosens the coupling
of boards so that resetting any one processor, including the
host(s), will not force any other peripheral to stop its
processing.
I2O and DSP
DSP and I/O boards can take advantage of the I
2O
standard. Implementing I
2O in a DSP system involves
treating the DSP subsystem as another I/O platform and the
addition of an IOP, or I/O processor on the DSP board. The
standard actually allows the DSP processor to handle the
I
2O HDM trafficking and host communication itself;
however, such an implementation would take away many valuable
MIPS from the available processing power on the DSP.
Figure 5: I2O capable peripheral board
with integrated DSP, multiple I/O connections, and an H.100
style switch. The i960 handles all on board I/O transactions as
well as communication with the host(s).
In the board shown in Figure 5, RadiSys' SPIRIT-6000
has its multiple components all on a single cPCI form factor.
The i960 is the IOP. The H.100 (a computer telephony bus
standard) switch selects from several input sources (e.g. TI/E1
framer, ISDN, Frame Relay, POTS). The i960 controls the switch,
as well as the data flow into and out of the DSP, be it
directly through a host port to the TMS320C602 DSP, or via the
H.100 through a high speed serial port. The i960 (or IOP) also
handles all the communication with the host, or other devices,
external to the card.
Since all these tasks are now performed by the IOP (the
i960), the DSP is freed up to do intensive signal processing
with minimal servicing of the host, resulting in greater
throughput and greater determinism. DSP software development
and maintenance is easier, and fewer DSP external memory
accesses are required since host communication code is no
longer in DSP memory. Applications become modular and more
robust.
Distributed Intelligence"System Level Implications
In most applications today, the CPU is involved in all board
to board transactions. Host bus bandwidth usage is high, since
the data is read from the peripheral by the CPU, and then
transferred by the CPU back to another peripheral device.
Consider again the multi-host, multi-I/O platform
environment(s) shown above in Figure 4. Such systems
present particular I/O complexities that the distributed
intelligence of I2O can seriously simplify. As
Figure 4 indicates, I2O supports multi-host (and
multiple O/S), and multi-IOP systems, due to its structured
nature and standardized classes of devices. The host no longer
needs to manage all of the data, and the migraines of the
complexities of inter-peripheral communication are eliminated.
Implementing I2O forces distributed intelligence,
which is a natural requirement for large I/O and
computationally intense systems.
Distributed, or local, intelligence is not a new concept. As
an example, designers have used a 386 as a controller for this
task. In "system on a chip," a controller core may accompany a
processor and an I/O core. These local intelligent processors
are advantageous, but they create system wide havoc. Trying to
get them all to communicate over a standard bus with a standard
set of API's has been quite difficult.
The difference with I2O is that regardless of the
IOP selected, the API's for the peripherals within the same
class(es) are identical.
Distributed intelligence can provide another important
feature. The IOP minimizes DSP interruptions when processing
host commands. For general I/O applications, this is a plus.
For the real time applications encountered where DSP's
determinism is critical, this is a phenomenal advantage.
In fact, in communications with the host, studies with other
classes of I/O platforms have shown at least a three times (3X)
data throughput improvement rate (IOP to host), while reducing
the load on the host processor by up to 50%.
DSP Subsystem
DSP applications can take advantage of the peer-to-peer
communication made possible with I
2O. Where multiple
DSP boards are included in the same system, each can talk to
the other(s) without assistance from the host, and without the
DSPs themselves even knowing it! No compute cycles are lost,
and the existing processors can devote more compute power to
the data itself, rather than playing traffic cop.
Figure 6: Simple parallel processing DSP
application. An image is broken into four pieces, and each DSP
board is given one piece to process.
Consider a parallel processing application involving four
DSP's where a controlling processor divides an image into four
parts and distributes one piece to each DSP board for
processing. In such systems, three main factors determine the
efficiency of the overall throughput:
- Processor Node Speed (how fast each DSP board can
process)
- Processor to Processor Link Speed (how fast can the data
be transferred among boards)
- Processor Overhead on Data Transfers (how much of the DSP
processor's power is required for data transfers).
Where many DSPs are working together with I/O and an
application specific board(s), custom architecture specific
programs had to be written. The DSP's themselves had to process
data transfers, using up precious DSP MIPS, and reducing node
speed. With the CPU also involved in all data transfers between
peripheral boards, the bus bandwidth used is twice that which a
peer-to-peer communication system would allow.
With a standardized model as I2O, the DSP's node
speed is increased, since the I/O tasks are now off-loaded. The
i960 acts as the data pump / DMA engine, and the link can take
full advantage of the host bus speed. Data transfers are direct
from peer to peer, rather than through the CPU, effectively
halving the bus bandwidth usage.
Further, the host processor can still maintain control and
do load balancing by monitoring the % utilization of each DSP,
and routing incoming data to the device least used.
Data flow emulation is also made easier by I2O.
The messages that are passed contain a header and a payload.
The header contains information about the data, which can
include: source, destination, type and size of data,
parameters, coefficients, algorithm and processing
requirements, and so on. The payload contains the data itself.
Thus, the CPU (by writing to the IOP) can control the
destination of the data for optimal load balancing.
Multiple processors and boards can easily be integrated
together to handle the data for the DSP(s), facilitating
powerful real time processing environments with a minimum of
overhead.
A Telecom (DSP) Example
Figure 7 illustrates the entire architecture of the
system presented in
Figure 1. This configuration is
ideal for telecom applications where, for instance, the I/O is
multiple T1/E1, the DSP is performing voice channel processing,
and the CPU is doing call processing, database management,
system monitoring and control.
The diagram shows a typical cPCI based system for Telecom
applications. Again, the host is a Pentium CPU based
controller; the co-processors consist of an x86 board and a DSP
board. The I/O is either a daughter card on the co-processor
boards or a separate I/O board over the cPCI bus. Also a
secondary bus, called the H.100, is shown. The H.100 is
specific for high speed telecom traffic. The host O/S is
Windows NT with extension for real-time and an algorithm
execution environment called TASK (Telecom Application Specific
Kernel).
Figure 7: RadiSys' integrated solution for
multi-platform processing
Here, all the potential complexities noted in the first
section of this article are painfully apparent. This system
requires a multi-platform intercommunicating driver (system)
that must be fully integrated before any of the boards can talk
with one another.
Enter I2O. Utilizing the split driver model
(Figure 3), one hardware driver module (HDM) is written
for each hardware device (the DSP(s) and the I/O devices),
independent of all the other devices and the O/S. For most I/O
devices, this HDM will be provided by the board or device
vendor. Standard O/S side drivers (OSMs) can be implemented,
independent of the specifics of the device to be controlled (or
purchased from a vendor, such as Wind River). Inter-device
communication is handled by the I2O system; the
developer need not even be concerned with it. Product
development cycle time is thus dramatically reduced. The
results are modular in nature, and easily reproducible.
In telecom data-logging applications, I2O can
significantly decrease the system integration time. Typically
in these applications, many channels of voice are passing via
the I/O. The channels are processed (compressed) and stored.
Upon a request by the host the stored channels are read from
the disk, uncompressed and played back on the host.
With the availability of I2O drivers, the I/O
device can be configured easily to pass the data from the I/O
to its IOP. The IOP sends data directly from the I/O to the DSP
board's IOP which locally transfers the data to the DSP
processor. After the compression, the DSP transfers the data
back to its IOP which sends the data to disk. For playback, the
same sequence takes place in reverse; but instead of playing
out over the output, the data may be played to the host CPU. A
full duplex application (simultaneous record and playback) with
no loss of data requires six to eight months of painful driver
and application integration. The promise of I2O,
assuming all the boards support it, is that such integration
can be done in less than one fourth the time.
Although I2O is new and initial efforts from
board and CPU vendors will be required, once we all cross this
painful threshold, a decrease will be seen in application
development costs, board support costs, development cycle time,
and time to market.