Our Latest News

New Solution for Mobile GPU Light Chase: Details on Imagination’s Just Released DXT Architecture

Imagination Technologies recently released the next generation of IMG DXT architecture GPU IP – this release of DXT products is mainly for mobile devices. If you are familiar with Imagination’s GPU IP, you should know that the 2019 release of the IMG A-series architecture is a watershed moment for the company’s products and technologies, as discussed in last year’s PowerVR Architecture 30th Anniversary Review article.

The basic concept of AXT (A series) is mainly ultra-wide ALU design, and Imagination previously emphasized its contribution in PPAB (power, performance, area, bandwidth); while the BXT series in the following year, it started to adopt more decentralized multi-core and modular solutions, and reached what we now The following year, the BXT series began to adopt more decentralized multi-core, modular solutions, and reached what we now call GPU designs covering from cell phones to data centers through elastic scaling, and also began to support chiplet designs; the CXT series brought the PowerVR Photon architecture, or optical chasing acceleration, and formally proposed hardware-level optical chasing acceleration solutions in the mobile market.

Request FPGA Chip or Full Bom List Quote Now

The release of the D series is also basically in line with expectations. In fact, Imagination’s GPU schedule for 2019 has the D-series release in 2022 – choosing to release it in January this year is also almost in time for the timeline. The latest release of IMG DXT, in addition to performance improvements, should be a generation architecture that further enhances scalability and resilience from the general direction, especially in terms of light tracing; and further efficiency improvements are achieved through certain features (such as FSR). This article focuses on some of the new features introduced by the D series, and some of the improvements.

Table of Contents

New Generation DXT Overview: A More Resilient Design

Here we will not dwell on the regular components and elements of Imagination GPU architecture, such as USC (Unified Shading Cluster) module, TPU (Texture Processing Unit) unit, Raster/Geometry module and other fixed functional units, and cache, firmware processor.
In general, compared to the C series, this time the DXT a core unit within the ALU and TPU performance can be increased by up to 50%; more importantly, the additional with the light chase module (RAC, Ray Acceleration Cluster) in terms of scale and location has greater flexibility – this This point will also be mentioned in detail later.

According to the flexible scaling scheme, Imagination cites three configuration examples, as shown in the figure above. In Imagination’s positioning, these three configurations are for mainstream, high-end, and flagship machines. Each configuration has different FP32 arithmetic and texture fill rates, as well as optional ray tracing acceleration hardware. (For example, DXT 8-256, 8 means texture fill rate 8 GTexels/s, 256 means FP32 arithmetic power 256 GFLOPS) The base model DXT-8-256 is said to reach more than 20% performance density improvement, which means better performance per area.
DXT’s elastic scalability improvement is mainly reflected in: SPU (Scalable Processing Unit) as the basic unit of scalability, now can be used on more ALU, TPU. so this generation can be designed with higher density SPU, as shown above, can contain 3 USC/TPU modules, plus other shared single

The previous generation CXT-48-1536, for example, was designed with three SPUs, so each SPU was paired with a RAC (optical pursuit acceleration cluster), which also constituted the CXTP-48-1536 RT3. And in this generation, in addition to such a design with three SPUs (three 2x 8-256 SPUs), it can now be paired with two SPUs –But each SPU is 3 USC/TPU units (i.e., two 3x 8-256 SPUs). And based on 1 RAC per SPU, then it can be combined to make the DXT-48-1536 RT2. and also use half of the RAC to make the DXT-48-1356-0.5 RT2.
This means that reaching the same floating point and texture performance as before can now be paired with different configuration options for RT1, RT2, and RT3, and the largest single core can be made RT4 (up to 4 SPUs for a single core). In addition, the size of a single SPU to do more, in fact, will be smaller than the previous generation to reach the same arithmetic power required area, but also to improve the performance density. At the same time the highest performance configuration of arithmetic power is also increased.

Request FPGA Chip or Full Bom List Quote Now

Although the launch, Imagination only mentioned the DXT-72-2304 RT3, but the DXT technology white paper said that the single core can be from the past CXT-64-2048, to the current generation of DXT-96-3072, the performance is also increased by 50%. The specific different configuration options are as follows.

Light tracing with FSR

We have written more than one article before about the implementation of optical chasing in the PowerVR Photon architecture, including the level of optical chasing defined by Imagination. It is a consensus among mobile AP SoC vendors that mobile devices should use phototracking GPUs.
Stephen Barton, senior director of technical product management at Imagination, said in an interview, “We have spun off RAC as a separate IP, meaning that it can be used without affecting the performance of the GPU itself and that ray tracing can run independently, which is important for mobile applications. Mobile is just starting to do ray tracing technology and will definitely start with mixed mode. At first it will be a lot of light computing and a little light, and as the ray tracing technology becomes more mature, it will move towards more complete ray tracing. Our architecture is particularly well suited for such a development model, with each stage providing the ray-tracing performance our customers need.”
“The idea is that DXT can better enable ray tracing with hardware-level, overhead that can really be accepted on mobile to be rolled out to more devices.” David Harold, chief marketing officer at Imagination, said, “Only then will developers who are developing content for ray tracing be willing to do that.”

This is supposed to be about the flexibility of the RAC – we think that indeed, in the absence of a decent ray tracing game in the mobile space, the transistors consumed for ray tracing acceleration will become so-called dark silicon. The RAC unit size available to chip design companies has a wider range of options, including half a RAC (216 MRay/s, 8 GBoxTests/s). more mainstream models, increase market coverage, get more developers on board, and make this technology truly mainstream.”
Imagination said in the DXT technology white paper that the configuration scheme of the previous architecture was 2 ALU modules sharing the RAC, while this generation can be shared by more ALU modules sharing a RAC; and the RAC can be relocated to different layers of the GPU – the aforementioned flexible design of the SPU module, itself in affecting the layout of the RAC.
In addition Stephen added: “Ray tracing levels L1 and L2, are the majority of ray tracing technology that you can see on the market today to achieve the level. And we offer L4.” L4 level of ray tracing technology in addition to the two important hardware acceleration support, but also need to take into account the power sensitivity of mobile platforms. So L4 is based on the BVH traversal of L3, the light with coherence to do sorting and aggregation (coherency sorting), such as sorting the light reflected by certain materials in the same direction, to achieve higher data reuse and improve the utilization of parallel ALU pipeline.

We have written about the internal composition of the RAC, so we won’t go into details here. The key to reaching L4 is the PCG (Packet Coherency Gather) (plus RS, RTS, etc.), which aggregates coherent light to do calculations, “using the same instructions to complete parallel calculations, saving a lot of power. ” Ike, technical director of Imagination China, said.

Another important feature update that comes with this DXT architecture update is FSR (Fragment Shading Rate). This FSR is not AMD FSR, but similar to VRS variable rate shading. Students who are concerned about the development of games and graphics technology should not be unfamiliar with this. Simply put, it is the unimportant areas of the screen (such as the background part of the screen), or do not need high-precision rendering part of the native resolution level of rendering, but also to reduce power consumption and load effect.

Request FPGA Chip or Full Bom List Quote Now

For example, in a racing game, when the car is moving at high speed, the objects around the car actually only need to be drawn at low quality, because they will be subsequently blurred by the motion effect.
DXT supports multiplexing of multi-pixel shader execution at different scales. The different ratios also correspond to the different image quality. It is said that by multiplexing shader execution with 4×4 arrays of pixels, “about 93% savings in fragment computing power” is achieved.

Imagination said that FSR can do a good match with ray tracing. The general direction is to reduce the rendering accuracy of the native screen and apply more resources to ray tracing, which can effectively improve the frame rate of the final rendered screen. But here is another key, FSR / VRS in the whole process in the position, or than DLSS such ultra-score technology forward a lot.

Applying FSR also means that fewer shader calls and rays are needed, fewer rays need to be processed, and a larger area of pixel results can be reused. In other words, less shader processing and fewer rays significantly reduce the overall overhead. “Without FSR on, you may need 6.9MRays of arithmetic power per frame; but if paired with FSR computing, which areas of computing once, which areas of computing twice, which areas need to be detailed performance, so that a frame only needs 3.2MRays of computing power.” Ike said, “with scalable RAC, with a smaller RAC will be able to achieve the game in that scene light chase effect.”

Other key feature updates

Other features introduced alongside the new IMG DXT architecture include, among others, the following.

2D Dual-Rate Texturing emphasizes performance improvements for TPU post-processing effects, and Imagination says they have observed that many games spend more time performing post-processing algorithms, including implementing effects such as shallow depth-of-field, bloom, and blur. The bottleneck for much of this load is in the TPU throughput. But it is not reasonable to violently increase the hardware resources of TPU units.

Based on some typical characteristics of post-processing load and image processing found by Imagination, the development team implemented a new TPU model to double post-processing performance after detecting these characteristics. Details are recommended in the DXT technical white paper. The DXT-48-1536 is said to be able to achieve 96-1536 for a given processing type, processing double the number of bilinear filtered texture samples per clock cycle, and thus achieving double the execution rate. In fact, the aforementioned DXT architecture optimizations (e.g., light chasing to handle less light) also place higher demands on post-processing results, and the 2D Dual-Rate TPU becomes a natural fit.

Pipelined Data Masters – As mentioned in the previous article explaining the IMG A series, there is a firmware processor within the GPU. There are various Data Masters inside the GPU, such as 2D Data Master, 3D Data Master, Compute Data Master, Geometry Data Master… Master…
In the technical white paper, Imagination said that the previous generations of architecture used single-tasking single-tasking Data Master, which means that the Data Master performs a specific task first, and the firmware processor needs to set the task if it wants to change it. This means that the Data Master first performs a specific job, and if it wants to change the job it needs the firmware processor to do the setup. Then there is an idle time while the firmware processor sets the next job and programs the registers; the setup job itself also requires data access and other synchronization tasks.

Request FPGA Chip or Full Bom List Quote Now

In the case of a large GPU size and a large number of SPUs, this brings a larger performance impact, especially if the firmware processor size remains the same. So this new generation architecture implements pipelining (pipelining) of the Data Master – the firmware can set the next job while the previous job is still being processed and executed by the GPU. From the previous serialization of firmware-set jobs and rendering tasks, the current parallelization improves the GPU’s resource utilization.

Now, this part of the work is pipelined
In addition this RISC-V architecture has a 40% performance increase in the firmware processor itself. The whitepaper says that DXT has moved to a RISC-V based firmware processor this generation – it seems that there was news about this part being RISC-V based when the A-series was released previously. And Imagination is now well known for doing RISC-V based CPU IP in its heterogeneous processor strategy.

The last item listed in the above PPT is ASTC HDR support – in fact the Vulkan API has forced ASTC (Adaptive Scalable Texture Compression) LDR textures before, and Imagination has continued to support it for several generations. architecture has been supported. Imagination believes that HDR will evolve in the coming years, and it is natural to support HDR input compression textures based on ASTC algorithms. So DTX implemented this type of compressed texture. HDR texture needs no more explanation, which means that the ratio of light in the dark and light parts of the picture can be very large.
As for TBDR, PVRIC image compression, decentralized multi-core architecture design and modular extension, etc. all belong to Imagination’s regular projects. Interested readers can check out our previous technical articles about Imagination GPU IP.

Finally, let’s talk about ecology. The ecological issue involves the entire Imagination IP products, not only limited to this time’s DXT. future plans for the specific transformation of the DXT architecture into a chip were not mentioned by Imagination at the launch. But Ike gave some ecological results: “In 2021, we applied ray tracing technology to the cell phone field, but also promoted to O3DE (Open 3D Engine), so that the open source community developers feel the evolution and development of ray tracing technology.”
“We also released demo demos with ray tracing effects with Amazon, which can show the effect of changing light and shadow around the clock with a great sense of immersion.” Ike said, “In addition, we also and major game developers to introduce new features, new game release we will go to participate in the test, so that some features in time to get applied; product new features are also introduced to them, in the game development process can be used to build mobile game development ecology.”
Development of ecological tools related to the part, in addition to the more basic for the Vulkan API and other support, Stephen mentioned that such as light tracking ecosystem construction and reserve aspects of the work began very early, “2021 CXT release also has released related ray tracing tools.” David said, “We are working with partners including Perfect World, NetEase, Tencent, Unity, and OPPO, Vivo, and others.
At this stage, the difficulty for Imagination to promote its own GPU IP, especially the excellent technology of the latest generations of architecture, is still on the ecological expansion. We expect that in the new year, its GPU IP will land on more types of devices – for example, the application of DXT architecture should not be limited to cell phones, for example, VR will also be one of its application directions – and products like cars. The applications will continue to drive the four major application areas of Imagination’s new strategy: Mobile, Consumer, Automotive, and Data Center.

We'd love to

hear from you

Highlight multiple sections with this eye-catching call to action style.

Contact Us

Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

Sales@ebics.com
+86.755.27389663

Our Latest News

New Solution for Mobile GPU Light Chase: Details on Imagination’s Just Released DXT Architecture

New Generation DXT Overview: A More Resilient Design

Other key feature updates

Related posts:

We'd love to

hear from you

Menu

Contact Us