SPRAD27A July   2022  – August 2022 AM2431 , AM2432 , AM2434 , AM2631 , AM2631-Q1 , AM2632 , AM2632-Q1 , AM2634 , AM2634-Q1 , AM2732 , AM2732-Q1 , AM6411 , AM6412 , AM6421 , AM6422 , AM6441 , AM6442

 

  1.   Abstract
  2.   Trademarks
  3. 1Introduction
  4. 2Trigonometric Optimizations
    1. 2.1 Lookup Table-Based Approximation
    2. 2.2 Polynomial Approximation
      1. 2.2.1 Optimizing Sine and Cosine
        1. 2.2.1.1 Sine Cosine Polynomials From Sollya
      2. 2.2.2 Optimizing Arctangent and Arctangent2
        1. 2.2.2.1 Arctangent Polynomials
  5. 3Trig Library Benchmarks
    1. 3.1 C Math.h Library
    2. 3.2 Arm “Fast Math Functions” in CMSIS
    3. 3.3 TI Arm Trig Library
    4. 3.4 Table of Results
  6. 4Optimizations
    1. 4.1 Branch Prediction
    2. 4.2 Floating-Point Single-Precision Instructions
    3. 4.3 Memory Placement
    4. 4.4 Compiler
  7.   Revision History

Branch Prediction

Using branches in functions creates unpredictability in the exact cycle count as the branch predictor may not predict correctly and any missed predictions cost approximately 8 cycles/miss. Arm provides conditional instructions that can be used in place of branch statements, ensuring that the functions always execute in the same number of cycles. Figure 4-1 shows that the conditional codes that can be appended to instructions.

GUID-20220314-SS0I-D3P1-1KWM-WL2KW0XPTV9P-low.png Figure 4-1 Condition Code Suffixes and Related Flags

The main reason for creating the .asm versions of the trigonometric functions was to remove branches inserted by the compiler and replace with conditional instructions instead. This had the effect of reducing the max cycles due to incorrect branch predictions. This reduction was enabled by replacing branch instructions with conditional operations. The delta between the max and the average could not be completely removed as the algorithm contains some divide instructions in the range reduction code which are conditionally implemented depending on the input values.

Note: The TI Arm Clang compiler performs these assembly optimizations automatically when compiler optimization is enabled, therefore these assembly versions have been replaced with their C-equivalent starting with MCU+ SDK v8.5.