The 703 turns out to be 4x faster (!) than the new 802 function, but cpu load is extreme and would not give space to other Tasks in complex applications

Second Trial: in the Loop a sleep(100) directive was used

The 703 still Shows an extremly high cpu load though my calculated cpu use time is only at 5% !! 802 works as expected with few remaining load.

Now what shall i do ? I don´t want to use 703 because it seems to be bugous that cpu load is constantly high even after the call finished and the thread is sleeping. The 802 Performance is way poorer however.

Stefan

Attachment	Size
Download ipp703.png	16.3 KB
Download ipp802.png	12.97 KB
Download ipp703_sleep100.png	14.07 KB
Download ipp802_sleep100.png	9.59 KB

↧

Updated from IPP 7.1.1 to 8.2.1, seeing segmentation faults on AVX (e9)

May 5, 2015, 8:41 am

Latest and popular articles on Intel Technologies

≫ Next: 64 bit C# wrapper for ipp 6.1

≪ Previous: performance issue ippiCrossCorrNorm_8u32f_C1R

We have been using the Intel IPP's for many years now (Dialogic was once an Intel Company :)). A few years back we updated to version 7.1.1 and all was well until we ran into some segmentation faults on certain newer systems. The crashes were on systems which supported AVX and AVX2 processors. We found that we were able to work around this by limiting the CPU type to AVX.

We recently updated to IPP 8.2.1 hoping that this limitation would no longer be required. However, we are seeing more frequent segmentation faults on systems which support AVX using the e9 IPP functions.

First, in the crypto libraries. This was from when we were originally using the deprecated functions. Updating to the newer AES API's did not resolve this issue.

Apr 30 08:58:46 sut-1330 kernel: [6765] trap invalid opcode ip:7fe0be224e7a sp:7fde82bc8b80 error:0 in

#0 0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm
#1 0x00007fde82bc8cd0 in ?? ()
#2 0x00007fe0be22425d in e9_ippsRijndael128EncryptCTR () from /usr/dialogic/data/ssp.mlm

Second . . .

#0 0x00007fb554d4cee1 in e9_owniCopyReplicateBorder_8u_C1R ()

Debug I added indicating the IPP settings being used . . .

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: APInit.c.162:DisplayIPPCPUFeatures: 0x46 : 0x46

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: APInit.c.175:DisplayIPPCPUFeatures: Limiting from 0x46 to 0x46

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippCore 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippIP AVX (e9) 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippSP AVX (e9) 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippVC AVX (e9) 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: Processor supports Advanced Vector Extensions instruction set

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: 8 cores on die

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippGetMaxCacheSizeB 20480 k

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: Available 0xfdf Enabled 0xfdf

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: MMX A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE2 A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE3 A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSSE3 A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: MOVBE X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE41 A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE42 A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX(OS) A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AES A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: CLMUL A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ABR X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: RDRRAND X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: F16C X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX2 X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ADCOX X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: RDSEED X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: PREFETCHW X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SHA X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: KNC X X

We use gcc for building our product which links with the IPP libs.

gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.

redhat-release-5Server-5.4.0.3
redhat-release-notes-5Server-29

↧

64 bit C# wrapper for ipp 6.1

May 11, 2015, 12:52 am

Latest and popular articles on Intel Technologies

≫ Next: uncore performance-monitoring events

≪ Previous: Updated from IPP 7.1.1 to 8.2.1, seeing segmentation faults on AVX (e9)

I am still using ipp 6.1 wrapped in C# library ipp_cs. I don't have the need to update the version as of now. At present I am using 32 bit version of C# wrapper. I need to migrate to 64bit version of this library. I couldn't this on the products page. Could someone please advise where can I download the 64bit version C# wrapper for ipp?

Also for experienced developers, will it be time saving just to upgrade to latest ipp version and write a C# wrapper myself? I read on the forums that starting version 8.0, intel doesn't provide C# wrapper but we need to write our own. This I suppose is done because the new wrappers are simpler to develop?

Regards,

Alok

↧

uncore performance-monitoring events

May 11, 2015, 9:20 pm

Latest and popular articles on Intel Technologies

≫ Next: EigenValuesVectors - two matrices/complex values

≪ Previous: 64 bit C# wrapper for ipp 6.1

Hello~
l am using a machine that have Intel Xeon(R) CPU, x5570 (2.93Hz) on IBM system x3650 M2 server.
I have proceeded to experiment with manual
"Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2", chapter 19 performance-monitoring events.

I want to get information uncore event. for example this manual's Table 19-14. "Non-Architectural Performance Events In the Processor Uncore for Intel® Core™ i7 Processor and Intel® Xeon® Processor 5500 Series (Contd.)".

But, I has failed to obtain the information in several machine...

process :

1. check :

Event num :2FH
Umast num : 01H
Event Mask Mnemonic : UNC_QMC_WRITES.FULL.CH0
Description : Counts number of full cache line writes to DRAM

2. progress experiment in my linux machine using linux perf tool

for example : perf stat -e r12f sleep ( channel 0 )
perf stat -e r22f sleep ( channel 1 )
perf stat -e r42f sleep ( channel 2 )

3. result : data is zero( 0 )

In addition, a similar experiment was carried out.
other cpu xeon 5600 Serise.............

but,,same result...

Table 19-16. Non-Architectural Performance Events In the Processor Uncore for
Processors Based on Intel® Microarchitecture Code Name Westmere (Contd.)

Event num :2CH
Umast num : 01H
Event Mask Mnemonic : UNC_QMC_NORMAL_READS.C
H0
Description : Counts the number of Quickpath Memory Controller
channel 0 medium and low priority read requests. The
QMC channel 0 normal read occupancy divided by this
count provides the average QMC channel 0 read
latency.

In counclusion, Uncore data is most not output ....

please help me

↧

EigenValuesVectors - two matrices/complex values

May 12, 2015, 6:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem while replacing old ippiResizeCenter method with ippiResize method

≪ Previous: uncore performance-monitoring events

Hello,

I really need to implement into C++ code calculation of EigenValues and EigenVectors using same algorithm as Matlab function:

    [V,D] = eig(A,B) produces a diagonal matrix D of generalized
    eigenvalues and a full matrix V whose columns are the corresponding
    eigenvectors so that A*V = B*V*D.

First of all, when I check at available constructors at Intel IPP documentation: https://software.intel.com/en-us/node/505270 I can't find any constructor that makes usage of complex numbers (I am interested in Ipp64fc).

Furthermore all constructors take only one matrix as an argument. Do you have any idea how can I get similar effect to Matlab eig(A,B) with usage of Intel IPP?

I am using Intel IPP 7.1.

↧

Problem while replacing old ippiResizeCenter method with ippiResize method

May 13, 2015, 3:50 am

Latest and popular articles on Intel Technologies

≫ Next: Examples of IPP Bi-Quad Filter?

≪ Previous: EigenValuesVectors - two matrices/complex values

Hi,

We were using the resizeCenter method in our software which takes parameters to scaleX & scaleY and offsets in X & Y directions .

Now we want to upgrade the software to 8.2 where this method is totally removed.

I saw the resizeCubic method but this method doesn't take the scale in X & Y directions and also offset.

But the new method considers only the ROI's of source and destination.

I really doesn't understand the concept behind the new resizeCubic method ,that how we can scale down/ up the source raster to fit inside the

destination raster.

All i need is to perform both scaling and shift together like the methods ippiResizeCenter and ippiResizeSqrPixel.

Unfortunately both these methods were depreciated and ippiResizeCenter is totally remove.

We have to perform the zoom and pan in the source image which we used to perform before using ippiResizeCenter using the new ippiresizeCubic

We even have the source image with different aspect ration which we used to handle by setting the different scale X & ScaleY in resizeCenter method. I didn' understand how to handle these kind of images and perform zoom and pan using the new method ippiResize<interpolationtype>

Can you please provide a small code snippet to perform the zoom and pan on the source image to display it in the destination buffer using the method ippiResize<interpolationtype> method?

Thanks & Regards,

Muralidhar

↧

Examples of IPP Bi-Quad Filter?

May 13, 2015, 1:23 pm

Latest and popular articles on Intel Technologies

≫ Next: IPP multi-threaded libraries are not installed - static link

≪ Previous: Problem while replacing old ippiResizeCenter method with ippiResize method

Hi All,

I'm trying to use IPP to run a bandpass filter one some audio data (single channel).

I've written the following class and helper function, but my results seem to be way off the mark. I hope this isn't too much code to dump

// BiQuad Coefs
// http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt
struct BiQuad
{
float coefs[6];
};

// Return bandpass coefficients
BiQuad getBandPass( float f0, float Fs, float Q )
{
	float omega = IPP_2PI * f0 / Fs;
	float alpha = sinf( omega ) / ( 2.f*Q );

	float b0 = sinf( omega ) / 2.f;
	float b1 = 0.f;
	float b2 = -b0;
	float a0 = 1.f + alpha;
	float a1 = -2.f*cosf( omega );
	float a2 = 1.f - alpha;

	// Divide all by a0, set a0 to 1.f
	return{ {b0 / a0, b1 / a0, b2 / a0, 1.f, a1 / a0, a2 / a0} };
}

class IppBiquad
{
	int pBufSize{ 0 };
	int nBQ{ 0 };
	IppsIIRState_32f * m_State{ nullptr };
	Ipp8u * m_pBuf{ nullptr };
public:
	// Default constructor, takes # of cascaded filters
	IppBiquad( int N = 2 )
		: nBQ( N )
	{
		if ( nBQ > 0 )
		{
			ippsIIRGetStateSize_BiQuad_32f( 2, &pBufSize );
			m_pBuf = ippsMalloc_8u( pBufSize );
		}
	}
	// Set the filter components
	inline void setFilt( float f0, float Fs, float Q )
	{
		vector<BiQuad> taps( nBQ, getBandPass( f0, Fs, Q ) );
		ippsIIRInit_BiQuad_DF1_32f( &m_State, (Ipp32f *) taps.data( ), taps.size( ), 0, m_pBuf );
	}
	// Free work buf (Do I need to free the state?)
	~IppBiquad( )
	{
		if ( m_pBuf != nullptr )
			ippsFree( m_pBuf );
	}
	// Run the filter
	inline IppStatus operator()( float * input, float * output, int size )
	{
		if ( input && output && size > 0 && m_State )
			return ippsIIR_32f( input, output, size, m_State );
		return IppStatus::ippStsNullPtrErr;
	}
};

I use the BiQuad Struct to store my 6 float coefficients (taps, according to the docs), the getBandPass function to return the correct normalized taps for a Bandpass filter centered around f0 given the sample rate Fs and Q value, and I use the class in order to manage the work buffer without actually having to manage that.

When I need to run the filter, I invoke the () (parentheses) operator, sort of making my class like a function. To test the class I made an audio sample with several 500Hz sine wave "chirps" to see if I could get isolate the chirps. However I see the chirps most at very low frequencies (f0=100Hz), and it seems like the amplitude of my output has been changed somehow.

Am I interpreting the use of the BiQuad functions wrong? None of the examples in the docs actually use a BiQuad, they all use an arbitrary IIR filter (as far as I can tell).

I apologize in advance if the example is too object oriented; I'm happy to provide some straight C code, I just thought this was a bit clearer. Sorry for the use of std::vector, if anyone is averse to that...

Thanks for your help,

John

↧

IPP multi-threaded libraries are not installed - static link

May 15, 2015, 4:46 am

Latest and popular articles on Intel Technologies

≫ Next: Inverse Fourier Transform

≪ Previous: Examples of IPP Bi-Quad Filter?

hello,

my error is

...v110\ImportBefore\Intel.Libs.IPP.v110.targets(92,5): error : IPP multi-threaded libraries are not installed.

i have one computer which I compiled a project with IPP. and linked the lib which is created from this project with another project. on this computer I have Intel parallel studio 2015 installed.

my goal is to link the IPP project into the other project without having to install IPP for all the the other developers on my team.

the error i'm getting is that probably IPP is not installed on the other computer.

how Can I compile and IPP dependent project into a lib? so other project want have to have IPP installed? I can attach ipp libs and include. but I don't want to have all the developers install IPP

↧

Inverse Fourier Transform

May 26, 2015, 6:17 pm

Latest and popular articles on Intel Technologies

≫ Next: where is the link for IPP JPEG and IPP-UIC ?

≪ Previous: IPP multi-threaded libraries are not installed - static link

Hello,

I am a bit struggling to find function that would allow me to perform Inverse Discrete Fourier transform. I am using Intel IPP 7.1.

I performed FFT operation with IPP_FFT_NODIV_BY_ANY parameter, ippsFFTFwd_CToC_64fc function, how can I inverse it?

↧

where is the link for IPP JPEG and IPP-UIC ?

May 28, 2015, 2:33 pm

Latest and popular articles on Intel Technologies

≫ Next: linear and nearest neighbor interpolation

≪ Previous: Inverse Fourier Transform

Hello

I want to downlad IPP based jpeg encoder/decorder sample application. Can you tell me the link (for IPP-UIC, IPP JPEG etc) so that I can download it. The old link below seems broken.

thanks
Frank

↧

linear and nearest neighbor interpolation

May 30, 2015, 9:59 am

Latest and popular articles on Intel Technologies

≫ Next: Reinstatement of Intel® IPP in-place functions

≪ Previous: where is the link for IPP JPEG and IPP-UIC ?

function output = calculateBlackLevel(blStruct, AG, ET)
blLut = zeros(length(blStruct), size(blStruct{1}.black_level,2));
etLut = zeros(length(blStruct), 1);

for k = 1 : length(blStruct)
bl = blStruct{k};

if length(bl.analog_gain) == 1
blLut(k, :) = bl.black_level;
elseif AG > max(bl.analog_gain) || AG < min(bl.analog_gain)
blLut(k, :) = interp1(bl.analog_gain, bl.black_level, AG, 'nearest', 'extrap');
else
blLut(k, :) = interp1(bl.analog_gain, bl.black_level, AG);
end

etLut(k) = bl.exposure_time;
end

if length(etLut) == 1
output = blLut;
elseif ET > max(etLut) || ET < min(etLut)
output = interp1(etLut, blLut, ET, 'nearest', 'extrap');
else
output = interp1(etLut, blLut, ET);
end
end

here is the matlab code i'm trying to convert, my question is. does ipp have any sort of interpolation functions?

↧

Reinstatement of Intel® IPP in-place functions

May 31, 2015, 11:29 pm

Latest and popular articles on Intel Technologies

≫ Next: Video is getting swapped when the image is decompressed using Intel Media SDK

≪ Previous: linear and nearest neighbor interpolation

Some of the users are using the old Intel® IPP releases, and may notice the deprecation warnings on the in-place functions.

After reviewing the feedback from the users, we decided to keep these in-place functions in the Intel® IPP releases.

The deprecation warning was removed since Intel® IPP 8.1 release. These functions continues to be supported.

Check here to find the new features in Intel® IPP 8.2, and your feedback is welcome on the product.

↧

Video is getting swapped when the image is decompressed using Intel Media SDK

June 2, 2015, 10:13 pm

Latest and popular articles on Intel Technologies

≫ Next: intel deflate decompression implementation/library

≪ Previous: Reinstatement of Intel® IPP in-place functions

Hi,

Context: I am using Intel media sdk to Decompress Image from different camera input(like 1080p, 720p....) at the same time. For each Camera I am using seperate pipe line. Like this I have configured my system in such a way that inputs from different 16 cameras are being decompressed. During Rendering the image, sometimes one camera image is taking the image of all other 15 cameras in a sequential manner. If we stop the pipeline for that camera(programatically) and reinitialize again, issue is not disappearing also.

What could be the reason for this behaviour? Intel Graphics card is internally doing any swapping?

↧

intel deflate decompression implementation/library

June 4, 2015, 11:26 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Integrated Performance Primitives (Intel® IPP) upgrade options

≪ Previous: Video is getting swapped when the image is decompressed using Intel Media SDK

I came across a good whitepaper for "High Performance DEFLATE Decompression on Intel® Architecture Processors"

http://www.intel.com/content/dam/www/public/us/en/documents/white-papers...

Could anyone tell me where I can find its implementation or library? Thanks

↧

Intel® Integrated Performance Primitives (Intel® IPP) upgrade options

June 5, 2015, 3:20 am

Latest and popular articles on Intel Technologies

≫ Next: h264 developing in network

≪ Previous: intel deflate decompression implementation/library

Dear IPP users,

If you are presently using Intel IPP in your applications and if license is expiring soon, we have exciting news for you regarding your Intel® IPP license extension.

Because our Intel customers were seeing a lot of synergy in using Intel® IPP in combination with the various Intel Development Tools, Intel® IPP is now delivered along with other performance libraries like threading libraries (Intel® TBB), Math Kernel Library (Intel® MKL) and Intel Compiler in our various suites (Intel® Parallel Studio XE, Intel® System Studio, or Intel® Integrated Native Developer Experience). Majority of our customers are already realizing the value that this change has brought.

As our existing customer you can either

Continue to renew the support maintenance for your existing Intel® IPP license or

Upgrade to one of our Intel Studio products based on your specific needs and enjoy a wider access to Intel performance libraries, threading libraries and Intel compilers.

Pick a suite which best fits your software application requirements.

Product Name : Intel® System Studio
Type of Software Applications : Used in System software and applications for embedded or mobile devices. For example, embedded applications in digital surveillance, test measure equipment, medical imaging, telecommunication, multi- functional printer
The product supports Linux*, Android* and Windows* targets

Product Name :Intel® Parallel Studio XE
Type of Software Applications : Used in Enterprise and Desktop application with focus on parallelization and vectorization optimization. The product supports Windows*, Linux*, and OS X*

Product Name : Intel® Integrated Native Developer Experience (Intel® INDE)
Type of Software Applications : Used in any C++/Java* applications that has to support cross-OS, cross-Architecture for Windows* on Intel® architecture and Android* on Intel® architecture and ARM*.
Support host systems: Windows*, OS X*.
Support target systems: Android*, Windows*, OS X*

For buying and renewal options for Intel® IPP, please contact us by intel.software.sales@intel.com, or visit @ https://software.intel.com/en-us/intel-ipp/try-buy

↧

h264 developing in network

June 29, 2015, 12:21 am

Latest and popular articles on Intel Technologies

≫ Next: How to use "ippicopy_mod" function to merge three channels R,G,B images to a RGB color image?

≪ Previous: Intel® Integrated Performance Primitives (Intel® IPP) upgrade options

hi,i am developing in network video project,but i find the cpu of ipp decoding and encoding h264 is so high ,more than 130 percent,i hope intel developer can give me some devices.THanks!!

↧

How to use "ippicopy_mod" function to merge three channels R,G,B images to a RGB color image?

June 29, 2015, 11:12 am

Latest and popular articles on Intel Technologies

≫ Next: IPP 7.0

≪ Previous: h264 developing in network

Hi,

Now, There are three single channel images:

i don't now how to use "ippicopy_mod" functions to merge this three channels images to a RGB color image.

↧

IPP 7.0

July 1, 2015, 7:38 am

Latest and popular articles on Intel Technologies

≫ Next: CPU feature recognition not always working

≪ Previous: How to use "ippicopy_mod" function to merge three channels R,G,B images to a RGB color image?

Hi,

We purchased the IPP SDK from Intel about 3 years ago and built a direct show filter to use the SDK to decode H264 video. We use the SDK as a static library. Our decoder filter is based on the w_ipp-samples_p_7.0.5.059 sample from Intel.

We ran into a problem recently when we tried to decode video streams in 1080p at 30 fps in 4 mbps or higher. The decoder shows stutter video when it tries to decode an I-Frame. For example, we see cars pause and resume in our video of moving cars every time an I-Frame needs to be rendered. The I-Frame size is between 170K to 200K bytes. We found that the pauses might come from the GetFrame() call. It takes the function about 100 milliseconds to decode an I-Frame.

Is this a known problem? Is there a new version of IPP fixes the problem? Any suggestions and help would be greatly appreciated.

Thanks

↧

CPU feature recognition not always working

July 1, 2015, 8:46 am

Latest and popular articles on Intel Technologies

≫ Next: ippiConv

≪ Previous: IPP 7.0

Hey there,

I am facing a problem with the ippInit() auto recognition of the available and enabled features. This only seems to happen on a WinXP SP3 32-bit on a notebook running an Intel i3-2348M. Some times our software was crashing with an illegal instruction error. We were able to identify an AVX instruction that was being executed. Since WinXP is not able to handle AVX at all it should not be enabled. This is in about 90-95% of the cases true but in these 5-10% IPP selects the g9 arch which would be AVX capable. Most of the time the p8 arch is selected which is totally fine for WinXP and the given CPU.

Now I am not sure how the initialization of the enabled instructionssets is working. Though I suspect that it needs to be enabled by the OS kernel, I am not quite sure about that.

My suspicion is based on this little code example i used for my tests:

#include "stdafx.h"

#include "immintrin.h"
#include <iostream>

#include "ippi.h"
#include "ippcore.h"
#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
 	__m256* a;
 	__m256 b;

	int i = 0;

	Ipp64u features;
	ippGetCpuFeatures(&features, 0);
 	IppStatus status = ippInit();

 	std::cout << ippGetLibVersion()->Version << ""<< ippiGetLibVersion()->targetCpu << "; hasAVX: "<< (features & ippCPUID_AVX) << "; hasOSAVX: "<< (features & ippAVX_ENABLEDBYOS) << std::endl;

 	while (i<100)
 	{
 		a = new __m256;
 		std::cout << "*a = "<< ((float*)a)[0] << ""<< ((float*)a)[1] << ""<< ((float*)a)[2] << ""<< ((float*)a)[3] << std::endl;
 		std::cout << "a = "<< a << std::endl;

 		b = _mm256_loadu_ps((float*)a);
 		std::cout << "b = "<< ((float*)&b)[0] << ((float*)&b)[1] << ((float*)&b)[2] << ((float*)&b)[3] << std::endl;
 		++i;
	}

	return 0;
}

This usually gives us a p8 arch on WinXP but sometimes we get the mentioned g9. This behavior can sometimes be seen more often after a reboot. The latter codepart (the while loop) should mess with the AVX instruction. This works on some Win8.1 but crashes on WinXP though it sometimes does get through one iteration. I know that this could result from (un)lucky timing with the windows scheduler. For testing the selected arch I commented out the loop and executed the program 100 times using some batch for loop.

Still my colleagues and I have no clue why IPP is selecting the wrong arch. Right now we catch that case and initialize with the p8 arch manually. Does anyone have a clue? Thanks in advance.

↧

Latest Images