How to ensure the quality and performance of the speech engine design

This article refers to the address: http://

The universal SoC currently used in cordless phones or IP phones integrates access devices and unified wireless communication devices, and integrates a software voice engine in the system software to fully support software digital signal processing required by VoIP. The speech engine uses a soft-DSP (soft-DSP) implementation technology to meet the system performance requirements of embedded processors. To ensure that VoIP has phone-quality voice performance, system software must meet the real-time requirements of the voice engine.

Next-generation soft DSP products use real-time processing and broadband (high-definition) voice communication technology to achieve greater end-user satisfaction and market potential than current technologies. These products set new standards for high-definition standards for voice communications. Products developed according to the recommendations in this article can achieve better than telephone quality communication. Conversely, failure to meet these real-time requirements will cause many symptoms of voice quality degradation, including dropped calls, significant delays, pops or clicks, fax/modem call failures or fax page clutter, and due to packet loss or excess latency. The resulting speech is unclear and so on. Failure to meet real-time requirements will also result in missed deadlines, which will be a serious system failure that requires an entire system reset to be resolved unless the system supports hardware and software recovery.

Voice communication in a telephone call is two-way: the transmission and reception of audio occur simultaneously. Therefore, minimizing the delay in the speech system to ensure audio quality is critical; however, the optimization of reducing latency is in conflict with meeting the voice processing requirements. In traditional playback audio systems, such as audio (MP3) playback or multimedia streaming, the buffer can be made large to compensate for the low processing power of the system, where latency is independent of quality. The speech engine can't do this because the audio cache must be fully processed at a fixed time. This architecture typically employs interrupt prioritization and software scheduling to enhance the real-time performance of the operating system to ensure the completion of speech processing.

In the speech engine system, the software interrupt service routine will exchange speech samples with the speech hardware codec. The speech hardware codec performs a back-to-back conversion between the analog signal and the audio sample at a sampling rate of 8 kHz. In telephony applications, the hardware codec is connected to a Subscriber Line Interface Circuit (SLIC) that is the physical interface of the phone or a DECT radio frequency circuit for the cordless phone. In the case of IP phones or mobile phones, the hardware codec is connected to the amplifier, which in turn is connected to the microphone and speaker.

The SoC hardware interface plays a key role in ensuring real-time performance and accurate scheduling of the speech engine. If the SoC has a TDM or AC97 peripheral, the phone voice codec can be connected directly to the processor. If the embedded processor does not have these peripherals, the lowest cost solution is to connect to the processor via a CPLD. CPLDs can send and receive samples one by one from the hardware codec, which is the most time sensitive and represents the worst case timing requirements.

Whether through TDM, AC97, or CPLD, voice hardware services must be prioritized to ensure that interrupts are responded; other system software must not affect the critical timing of this interrupt. At a sampling rate of 8 kHz, the interrupt will occur every 125 Î¼s. For SoCs running at 200MHz, the speed-optimized CPLD interrupt service routine processing time is within 25Î¼s. This allows the maximum interrupt latency to be calculated as 90Î¼s (125Î¼s â€“ (25Î¼s + interrupt service settling time 10Î¼s)). In order for the system to meet the real-time time limit, the operating system must call the interrupt service routine within 90 Î¼s after receiving the codec interrupt, and the operating system must allow the service to run and complete immediately.

The operating system must also ensure that the interrupt service routine can schedule the speech engine for immediate processing of the audio buffer. The interrupt service routine uses the cache ready signal to activate this schedule, as shown. As can be seen in the figure, the DMA peripheral is used to collect audio samples into the buffer for processing by the speech engine. This method is more efficient than the CPLD implementation.

The requirement for the speech engine is to complete the processing of the speech samples before the next speech buffer is ready. The time required to process speech in the speech engine depends on a number of factors, including processor, cache size, RAM speed, number of physical voice interfaces (audio channels), software DSP processing required for caching, and the type of speech encoder used.

For a comprehensive analysis of speech engine timing requirements, please refer to the attached table. The t _idle parameter represents the time remaining for all other system processes or system applications to leave available for processing. From the perspective of speech engine design, it means idle time. The processing of all lower priority systems occurs during the idle time after the speech engine completes real-time speech processing. In the worst case, t _idle may be 0ms, at which point the speech engine processing will have multiple iterations.

D2 Technologies' vPort software includes performance benchmarks for supported configurations. For example, the vPort version may dictate the voice processing of a three-way G.729AB voice conference call, requiring the voice engine to provide a maximum of 100 MHz processing power per 10 ms, as a worst case scenario and continuous buffer clearing. If running on a 400MHz RISC processor, t _voice requires 100MHz (25% of CPU processing power) for worst case processing, corresponding to 2.5ms processing time in every 10ms processing interval. If the t _switch exceeds 7.5ms (t _switch =t _buffer â€“(t _voice +t _idle )), the real-time time limit cannot be met. This time is not included during the speech engine processing due to other peripheral interrupts, lower half processing or â€œ Tasklet" The extra overhead caused by soft interrupts.

Here are some of the most important design guidelines designers need to consider when integrating a speech engine for soft DSP processing:

1. In order to optimize quality, voice communication requires minimizing system delay;

2. Voice communication is continuous, losing samples or losing real-time will be the most serious error;

3. Voice hardware has strict timing requirements and requires an error recovery mechanism when timing is lost;

4. The voice engine real-time processing must complete the processing of the voice buffer within 10ms of the software time limit. The Speech Engine Interrupt Service Routine has strict timing constraints based on CPU peripheral hardware.

Figure 1: Timing diagram of the speech engine.

Table 1: D2's speech engine timing requirements.

Housing Wares

Housing Wares,Ware Properties ,Warehouse House ,New Houses In Ware

Display Stand,Paper Crafts Co., Ltd. , http://www.nbdisplays.com