Our Latest News

Explanation of large memory pooling solution based on CXL technology

CXL is an industry-supported Cache-Coherent interconnection of processors, memory extensions and gas pedals. This technology maintains consistency in CPU memory space and memory on attached devices, allowing resource sharing for higher performance, reduced software stack complexity and lower overall system costs, and allows users to escape the hassles of redundant memory management hardware in gas pedals and shift more effort to targeted workloads.

CXL is designed as an industry open standard interface for high-speed communications as gas pedals are increasingly used to complement CPUs to support emerging applications such as artificial intelligence and machine learning. The CXL 2.0 specification adds support for sector data exchange to connect to more devices, with memory capacity available on demand and much more efficient use.CXL 2.0 fully supports CXL 1.1 and 1.0, saving investment for industry users.

What is CXL?

CXL is an industry-supported Cache-Coherent interconnection of processors, memory extensions and gas pedals. This technology maintains consistency in CPU memory space and memory on attached devices, allowing resource sharing for higher performance, reduced software stack complexity and lower overall system costs, and allows users to escape the hassles of redundant memory management hardware in gas pedals and shift more effort to targeted workloads.

CXL is designed as an industry open standard interface for high-speed communications, as gas pedals are increasingly used to complement CPUs to support emerging applications such as artificial intelligence and machine learning.

CXL 2.0 specification adds support for sector data exchange to connect to more devices, memory capacity on demand, and much more efficient usage. cxl 2.0 fully supports CXL 1.1 and 1.0, saving investment for industry users.

For a detailed introduction, we can read this article by @LaoWolf at zhuanlan.zhihu.com/p/65. Since I do my own research on persistent memory, I am more interested in the relationship between CXL and persistent memory. FaceBook’s efforts in memory pooling FaceBook has been a strong advocate of separating DRAM memory from the CPU that uses it and creating a pooled memory layer shared by many systems. FaceBook has been working for years to disaggregate and pool memory to make memory work better in its servers and to try to control memory costs while improving its performance.

FaceBook has been working with University of Michigan Assistant Professor Mosharaf Chowdhury for years on memory pooling technology, starting with the Infiniswap Linux kernel extension, which pools memory on top of InfiniBand or Ethernet via the RDMA protocol, which was analyzed in June 2017. Infiniswap is a memory load balancer across servers that first appeared as several different transport and memory semantic protocols – IBM’s OpenCAPI memory interface protocol, Xilinx’s CCIX protocol, Nvidia’s NVLink protocol, HP Enterprise Edition’s Gen -Z protocol, supported by Dell, and similar ideas in memory pooling. At this point, at least for memory pooling in the rack, Intel’s CXL protocol has become the dominant standard for disaggregate memory, not just for connecting far memory in gas pedals and flash memory to the CPU, and the protocol will be common in new and future servers.

FaceBook researchers (Chowdhury) are taking another stab at the idea of disaggregate memory, taking some of Infiniswap’s ideas forward with a Linux kernel extension called Transparent Page Placement (TPP), which takes memory page management in a slightly different way than DRAM attached to the CPU, and takes into account the relative distance of CXL main memory. The researchers outlined this latest work in a paper, arxiv.org/abs/2206.0287. The TPP protocol, which is open-sourced by the FaceBook platform, is being used in conjunction with the company’s Chameleon memory tracking tool, which runs in Linux user space so people can track CXL memory in their applications in their applications.

  • 2.CXL-Memory

As CPUs evolved, system architects moved down from main memory, adding one, two, three, and sometimes four levels of cache between the core and main memory, and outputting them through the system bus to tape, then disk and tape, then flash, disk, and tape. In recent years, we’ve added persistent memory such as 3D XPoint.

As CPUs evolved, system architects moved down from main memory, adding one, two, three, and sometimes four levels of cache between the core and main memory, and outputting them through the system bus to tape, then disk and tape, then flash, disk, and tape. In recent years, we’ve added persistent memory such as 3D XPoint.

The following are the challenges faced by Facebook in terms of memory capacity, bandwidth, power and cost for several generations of machines.

As you can see from the graph above, memory capacity grows faster than memory bandwidth, which has a significant impact on performance. If the bandwidth is higher, then less memory capacity may be needed to do a certain amount of work on a certain number of CPUs. Just like a CPU clocked at 10GHz, this is much better than 2.5GHz. But faster CPUs and memory clocks generate exponential heat, so the system architecture tries to do more work and stay within a reasonable power range. But it doesn’t work. Because of the need for higher performance, system power and memory power are increasing on each generation of servers, and memory costs are rising as a percentage of total system costs. At this point, the main cost of the system is the memory, not the CPU itself. (This is true not only in FaceBook , but all over the world)

Extend system memory capacity and bandwidth with CXL with essentially the same latency as NUMA access

Therefore, it is basically necessary to use the CXL protocol overlay to move the main memory to the PCI Express bus in order to expand the memory capacity and memory bandwidth in the system without adding more memory controllers to the CPU chip. There is a bit of latency in this memory, but it is the same size as the NUMA link between the two CPUs in a shared memory system, as shown in the following figure.

The trick to trying to figure out how to use CXL memory, like Optane 3D XPoint DIMMs and various speeds of flash memory, is to find out how much of the data used in memory is hot , warm , cold and then figure out a mechanism to get hot data in the fastest memory, cold data in the coldest memory, and warm memory in warm data. You also need to know how much data is on each temperature tier in order to get the correct capacity. This is what the Chameleon tool created by FaceBook Platform and Chowdhury is all about.

According to Charles Fan, CEO of MemVerge: The ability to dynamically combine servers and acquire 10TB+ memory pool capacity will drive more applications to run in memory and avoid external storage IO streams for read and write. Storage level memory will become the primary hot data storage tier, with NAND and HDD for warm data and tape for cold data, respectively. Now that the CXL market has experienced a year of growth, this is a major architectural change in the industry in the last decade that could lead to a new market for shared memory architectures across multiple servers. MemVerge software combines DRAM and Optane DIMM persistent memory into a cluster storage pool for use by server applications without code changes. In other words, this software has combined fast and slow memory.

CXL Brings Us to the Era of Big Memory – From the MemVerge Perspective

    GET A FREE QUOTE

    FPGA IC & FULL BOM LIST

    We'd love to

    hear from you

    Highlight multiple sections with this eye-catching call to action style.

      Contact Us

      Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

      • Sales@ebics.com
      • +86.755.27389663