SPRACT8 September   2020 66AK2H06 , 66AK2H12 , 66AK2H14

 

  1.   Abstract
  2.   Trademarks
  3. 1Introduction
    1. 1.1 TI Processor SDK RTOS
    2. 1.2 TI NDK
    3. 1.3 66AK2H Device
    4. 1.4 FTP Offering in TI Processor SDK RTOS
  4. 2Hardware and Software
  5. 3Develop the FTP Server on K2H
    1. 3.1 Reference FTP Server Example
    2. 3.2 Create K2H FTP Server Example
    3. 3.3 Test K2H FTP Server Example
  6. 4Performance Tuning
    1. 4.1 Quick Code Check
      1. 4.1.1 FTP Transmitting Code Check
      2. 4.1.2 FTP Receiving Code Check
      3. 4.1.3 CCS Project Optimization
    2. 4.2 Increase the TCP Buffer Sizes
    3. 4.3 UIA CPU Load Instrumentation
    4. 4.4 What Can We Do on the PC Side?
      1. 4.4.1 TCP Window Scaling Check
      2. 4.4.2 Receive Interrupt Coalescing Check
    5. 4.5 What Else Can We Do on the K2H Side?
      1. 4.5.1 TCP/IP Checksum Offloading Check
      2. 4.5.2 NIMU Driver Efficiency Profiling
      3. 4.5.3 Receive Interrupt Coalescing
    6. 4.6 Final FTP Throughput Results
  7. 5Summary
  8. 6References

NIMU Driver Efficiency Profiling

The NIMU driver source code is NIMU_INSTALL_DIR\src\v2\nimu_eth.c. The transmitting function is EmacSend() and the receiving function is EmacRxPktISR(). There is a debug macro TIMING can be enabled to profile the CPU cycles spent in transmitting and receiving routines. The timestamp function is nimuUtilReadTime32(). One may further expand this debug to record additional cycle count information for each part of the code, and judge where efforts probably can be spent for driver code optimization.

For example, the transmitting packet number counter, the delta from last time EmacSend() is entered and total time spent in the EmacSend() are saved in a circular debug buffer for further analysis, as shown below. The EmacSend() function is found to take about 1550 to 2950 cycles (the third column).

By looking into the implementation of EmacSend(), the driver does transmitting return queue processing (in a loop over number of descriptors in gTxReturnQHnd), it pops the gTxReturnQHnd and it returns PBM packet back to the NDK stack, and pushes transmitting descriptor to the transmitting free queue, in addition to actually sending the packet out. After some code optimization, the cycle is reduced to about 1600 in the same routine, saving up to 45%.

GUID-20200819-CA0I-CFM9-C0V8-NZWFLJDRVG4Q-low.png Figure 4-12 Code Cycle Count Analysis Before and After Code Optimization

Note that the NIMU library needs to be rebuilt and linked with the application for the changes taking into effect.