Our Latest News

How to design high speed and high precision sound source localization system by taking advantage of FPGA hardware parallelism

Part I Design Overview / Design IntroducTIon

1.1 Design Purpose


The frequent and chaotic sound of sirens not only affects the quality of life of the surrounding residents, but also increases driver fatigue, affects driving safety, and makes passengers and pedestrians feel more irritable when traveling. Signs prohibiting horn honking are often found on most urban roads, but not all people are conscientious enough to follow the rules, and appropriate penalties for honking are necessary to ensure that this rule is successfully implemented.

We decided to use microphone arrays to obtain sound signals, FPGA technology to calculate the location of the sound, and OPENMV to capture the image and finally locate the horn-beeping vehicle.

1.2 Application Areas

This work has a wide range of practical applications.

In traffic monitoring, it can locate and take photos of vehicles with illegal horn sounding to improve the monitoring efficiency; in audio-video conference system, it can capture the voice signal of the conference speaker and process it in real time to determine the coordinates of the speaker’s current position; in security system, it can use the sound source positioning system to supplement the traditional camera to adjust the monitoring direction, which can make up for the shortage of ordinary motion recognition in dim light conditions. In security systems, the use of acoustic positioning systems to assist traditional cameras to adjust the direction of surveillance, making up for the shortcomings of ordinary motion recognition in dim light conditions, improving security effects; etc.

In the military field: it can effectively discover the location of enemy targets, but also can fully hide itself.

1.3 Main technical features

(1) Using microphone array to obtain sound signals Compared with traditional microphones, microphone array has spatial selectivity and can significantly suppress interference; it can be used to obtain signals from multiple sound sources or moving sound sources, and can also be used in some special occasions, and the system can work normally for both distant and near sound sources.

(The former is a fast algorithm of the discrete Fourier transform (DFT), which is a finite point discrete sampling of the finite long series Fourier transform, thus realizing the frequency domain discretization and making the frequency domain sampling according to the method of digital operation. The latter is a “simplification” algorithm that converts many complex operations into an iterative operation that requires only shifting and addition.

(This work uses FPGAs as processors for sound signals, taking advantage of the FPGA hardware parallelism to perform more processing tasks per clock cycle, exceeding the computing power of digital signal processors.

1.4 Key Performance Indicators

(1) Complete the localization of the siren source in indoor environments such as laboratories, and capture the siren with the camera and tiller head, with a success rate of more than 90% and the siren deviating from the center of the photo by no more than 50% each time.

(2) To locate the sound source of the hooter in the indoor environment such as the laboratory, and to follow the hooter with the camera and the tiller head, with a success rate of more than 90% and no loss of the image of the hooter in the camera during the follow-up process.

(3) To locate the sound source of the hooter in a fast-moving indoor environment such as a laboratory, and to follow the hooter with the camera head and tiller head, with a success rate of more than 80%, and the time when the hooter appears in the camera head during the follow up process is more than 80% of the total time of the follow up.

(4) The capture of the above index (1) is completed within 0.5 seconds after the hooter starts.

1.5 Main Innovation Points

(1) All processes are completely digital signal processing, all communications are digital communications, all processed signals are digital signals, compared to the analog signal system, which is susceptible to various kinds of interference, digital signal processing is more anti-jamming ability, through the parallel processing of multiple signals to achieve.

(2) By taking advantage of FPGA hardware parallelism, we break the pattern of sequential execution and perform more processing tasks per clock cycle, exceeding the computing power of a digital signal processor (DSP). The positioning accuracy is improved by using as many microphone channels as possible.

(3) The good computing performance of the FPGA allows for a real-time positioning system that can track a car honking at high speed.

(4) The project expands the positioning space from the original two-dimensional space to three-dimensional space, which improves the flexibility and accuracy of tracking and positioning.

Part II System ConstrucTIon & FuncTIon DescripTIon

2.1 Overall Introduction

system_diagram The system consists of a sound source localization system and an image capturing system, where the sound source localization system consists of a microphone array module, a PDM decoding module and a phase calculation module, the latter two modules are implemented by an FPGA board, and the image capturing system is implemented by OPENMV.

The sound source generates sound signal, transmits it to the microphone array, encodes it to generate PDM wave, and then sends it to the high-order fir filter through the buffer of the received PDM wave to decode the PDM, and then passes the result to the phase calculation module, i.e., first analyzes the spectrum by the FFT algorithm, then calculates the phase by the CORDIC algorithm to get the coordinates of the sound source, and finally displays the location of the sound source through the OPENMV-based image capture system. Finally, the sound source location is displayed and captured by the OPENMV-based image capturing system.

2.2 Introduction of each module

2.2.1 Microphone Array Module

We use the silicon microphone model SPW0690LM4H-1, which is a small, high-performance, low-power, bottom-port silicon digital microphone with a single PDM output. It includes an acoustic transducer, a low-noise input buffer, and sigma-delta modulator.

It features: low distortion/high AOP, high signal-to-noise ratio, low current consumption in low power mode, flat frequency response, high drive capability, RF shielding, dual multi-channel support, extremely stable performance, omnidirectionality, and more. In terms of acquiring sound, it maintains consistent gain over a wide frequency band, captures speech signals with high fidelity, and is sensitive enough to detect faint sound signals in the environment. Its omnidirectionality can pick up sound in all directions and is equally sensitive to sound from all directions, making it particularly suitable for this project.

2.2.2 Processor

This work uses Ego1 development board as the processor, model XC7A35T-1CSG324C FPGA of Xilinx Artix-7 series.

Xilinx 7 series FPGA chips have two internal 12bit bit wide ADCs with 1MSPS sampling rate and up to 17 external analog signal input channels, providing a universal, high precision analog input interface for user designs.

2.2.3 PDM decoding module – based on a high-order fir low-pass filter

Although the PDM code has only 0 and 1 levels, the PDM code retains all the frequency components of the original uncoded data while increasing the high frequency noise components FIR filter is the most basic element in the digital signal processing system, it can guarantee the arbitrary amplitude and frequency characteristics while having strict linear phase frequency characteristics, and its unit sampling response is finite length. This system is finite in length and stable. According to the top-down hierarchical and modular design idea, the whole filter design is divided into several modules, and the functional design of each module is carried out by using the hardware description language Verilog, and the coefficients of the 98th order filter taps are designed by Matlab software.

The Fourier transform is performed on the PDM code, and the frequency response is obtained as follows.

Since the sound positioning system is to get the sound that can be distinguished by human ear, or to get the clear sound of bicycle horn, and the sound frequency that can be distinguished by human ear is 20-20000Hz, and the sound signal above 20000Hz is not needed, so the passband frequency of our low-pass filter is set to 0-20000Hz, the cutoff frequency is set to 48000Hz, and the blocking frequency is set to 100000Hz. The PDM signal passes through this filter, which not only decodes the PDM signal to PCM signal, but also filters out the high frequency sound signal that we don’t need.

The differential equation of this fir filter is expressed as

A comparison of the original signal encoded and passed through a 97th order fir low-pass filter with the original signal is shown in Figures 9 and 10, where the green is the decoded signal and the blue is the original signal.

As can be seen from the figure, the filter is designed to reduce the encoded signal to the original signal, and the frequency components contained in the original signal are less affected.

The implementation of the 97th order digital filter in verilog language using VIVADO software consumes more time because of the large number of serial floating point operations, but it can be processed in parallel through hardware. By calculation, we need 97 multipliers and 98 adders for the 97th order filter, and the code is shown in the Appendix.

2.2.4 Phase calculation module

Spectrum analysis by FFT algorithm

FFT is a fast algorithm for the discrete Fourier transform (DFT), which is a finite point discrete sampling of a finite length sequence of Fourier transforms, thus achieving frequency domain discretization and enabling frequency domain sampling according to digital operations.

The Fast Fourier Transform is performed using Xilinx Vivado’s built-in Fast FourierTransform IP core, configured using the Radix-2 architecture, using 8 channels, each containing 512 data points in a frame. The input data bit width is 16 bits, and the output is Fixed Point, Unscale, and sequential, as shown in Figure 12.

Calculating Phases with CORDIC Algorithm

The CORDIC algorithm is a “simplification” algorithm that converts many complex operations into an iterative operation that “requires only shifts and additions”.

Suppose there is a point P1 (x1, y1) in the xy coordinate system, and the point P1 is rotated by an angle of θ around the origin to obtain the point P2 (x2, y2).

The relationship between P1 and P2 can then be obtained as follows

2.2.5 Image capturing system

In this work, we use a camera with a resolution of 640*480 with a digital image sensor as the core, and a servo with a constantly changing angle that can be maintained to form an image capture system.

OPENMV receives the sound source position information from the FPGA serial port and controls the steering of the servo in the direction of the sound source, so that the camera can be accurately aligned with the sound source and the command is given to the host computer (PC) to take pictures or record. The pictures will be stored in the memory of the host computer.

Part III Completion and Final Design & Performance Parameters

(1) We have completed locating the stationary horn source in the laboratory and capturing the horn player with the camera and the helm cloud platform. The photo effect is shown in the figure below.

(2) The camera and tiller head are used to follow the hooter in real time with a success rate of over 80%, and the time that the hooter appears on the camera during the process of following is more than 95% of the total time of following, and the effect of following is shown in the figure below.

(3) The screen of the host computer can display the situation of the camera in real time and store the pictures of the hooter captured and the video of the continuous hooting.

Section IV Conclusions

4.1 Expandability

(1) The 4-way digital microphone array PCB we used has an additional 28 solderless microphone jacks, which can be expanded to 32 channels. This minimizes the BER of the digital microphone reception and improves the positioning accuracy again.

(2) The OPENMV, which is used to control the servo head, has its own camera with image recognition function, so that we can use OPENMV to process the image and cooperate with the sound source positioning system for integrated follow-up and capture in the future, thus improving the success rate and accuracy of the follow-up and capture.

(3) We use a high-performance host computer to display the images of follow-up and capture in real time, and save them to the host computer. In the future, the host computer can carry out secondary analysis on the saved photos, identify the license plate of the captured vehicles, upload the violation records to the cloud, and use big data to supervise and punish the vehicles with a high number of violations.

(4) The FPGA chip model used in this project is only XILINX’s A series entry-level XC7A35T. If it is replaced by a model with more on-board resources, the speed of sound source localization operation will be further improved. 

    GET A FREE QUOTE

    FPGA IC & FULL BOM LIST

    We'd love to

    hear from you

    Highlight multiple sections with this eye-catching call to action style.

      Contact Us

      Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

      • Sales@ebics.com
      • +86.755.27389663