The recent replacement of a loop into "ippsMulC_32f_I" function introduces performance slowdown on first few iterations.
After few iteration of run, the performance is good.
IPP Version: IPP 7.0
Linking Type: Static Linking
Is there any way to avoid this initial slow down and improve the performance?
My application code is compiled with -mtune=core7-avx optimization.
I hope the IPP 7.0 has optimized for SSE3.0. Will this avx to sse makes some issue? If yes, why only for few iterations and later it gives good performance?