Home page > Technologies> TMS320C6678 memory access performance (on)

TMS320C6678 memory access performance (on)

The TMS320C6678 has eight C66x cores, with a typical speed of 1GHz. Each core has 32KB of L1D SRAM, 32KB of L1P SRAM, and 512KB of LL2 SRAM. All DSP cores share 4MB of SL2 SRAM. A 64-bit 1333MTS DDR3 SDRAM interface can support 8GB of external expansion memory.


    Memory access performance is critical to the software running on the DSP. In the C6678 DSP, all the main module, including multiple DSP cores and multiple DMA can access all of the memory.


    Each DSP core can perform up to 128 bits of load or store operations per clock cycle. In the 1GHz clock frequency, DSP core access L1D SRAM bandwidth can reach 16GB / S.


    DSP's internal bus switching network, TeraNet, provides the interconnection between the C66x core (including its local memory), external memory, EDMA controller, and on-chip peripherals. A total of 10 EDMA transfer controllers can be configured to execute at the same time Any data transfer between memory.


    This article provides the designer with basic information on memory access performance evaluation, provides performance test data for various operating conditions, and explores some of the factors that affect memory access performance.


1. Introduction to memory systems

                          

The TMS320C6678 has eight C66x cores, each with:


    32KB L1D (Level 1 Data) SRAM, which operates at the same speed as the DSP core and can be used as a normal data memory or data cache.


    32KB L1P (Level 1 Program) SRAM, which operates at the same speed as the DSP core and can be used as a general purpose program or program cache.


    512KB LL2 (Local Level 2) SRAM, which operates at half the speed of the DSP core and can be used as a general purpose memory or as a cache for both data storage and program storage.


    All DSP cores share 4MB of SL2 (Shared Level 2) SRAM, which runs at half the speed of the DSP core, storing both data and programs.


    TMS320C6678 integrates a 64-bit 1333MTS DDR3 SDRAM interface, can support 8GB external expansion memory, can store data can also be stored procedures. Its bus width can also be configured to 32 bits or 16 bits.


    Memory access performance is critical to the efficiency of the software running on the DSP. In the C6678 DSP, all the main module, including multiple DSP cores and multiple DMA can access all of the memory.


    Each DSP core can perform up to 128 bits of load or store operations per clock cycle. In the 1GHz clock frequency, DSP core access L1D SRAM bandwidth can reach 16GB / S. When accessing secondary (L2) memory or external memory, access performance depends primarily on the access mode and cache.


    Each DSP core has an internal DMA (IDMA), which can support up to 8GB / sec transmission at a clock rate of 1GHz. But IDMA can only access L1 and LL2 and configuration registers, it can not access the external memory.


    The DSP's internal bus switching network, TeraNet, provides the interconnection between the C66x core (including its local memory), external memory, EDMA controllers, and on-chip peripherals. A total of 10 EDMA transfer controllers can be configured to perform data transfers between any memory at the same time. There are two main TeraNet modules in the chip. One connects each endpoint with a 128-bit bus at a rate of one-third of the DSP core frequency. Theoretically, each port on the 1GHz device supports 5.333 GB / sec. Of bandwidth; A TeraNet internal bus switching network connects each endpoint with a 256-bit bus at a rate of one-half the speed of the DSP core. In theory, each port on a 1 GHz device supports 16 GB / s of bandwidth.


    A total of 10 EDMA transfer controllers can be configured to perform data transfers between any memory at the same time. Two of them are connected to the TeraNet internal bus switching network with 256-bit, 1/2 DSP core speed, and eight are connected to 128-bit, 1/3 DSP core TeraNet internal bus switching networks.


    Figure 1 shows the TMS320C6678 memory system. The number on the bus represents its width. Most modules operate at 1 / n of the DSP core clock speed, and the typical speed of DDR is 1333 MTS (Million Transfer per Second).

Figure 1 TMS320C6678 memory system


This article provides the designer with basic information on memory access performance evaluation, provides performance test data for various operating conditions, and explores some of the factors that affect memory access performance.


This article will help you analyze the following common issues:


1. Should DSP core or DMA to copy data?


2. How often does a function that accesses memory frequently?


3. How much of a module's performance will be affected when multiple masters share memory?


Most of the data in this article is on the C6678 EVM (EValuation Module) board test, which has a 64-bit 1333MTS of DDR memory.


2. DSP core, EDMA3, IDMA copy of the performance data comparison


The bandwidth of the data copy is determined by the worst of the following three factors:


1. Bus bandwidth


2. Source throughput


3. Destination throughput


     Table 1 summarizes the theoretical bandwidth of the C66x core, IDMA, and EDMA on the C6678.


The information from the network, if infringement, please contact us