Wonder how hard it would be to offset some of the video comparison to a GPU. Although GPU's can get expensive, in some large scale systems this would be a fraction of the cost to upgrade a server.
Just an idea!

Code: Select all
kfir@GentooB0X ~/Programming/sse2/Delta $ ./a.out
Got platform 0: NVIDIA CUDA by NVIDIA Corporation: OpenCL 1.1 CUDA 4.2.1 FULL_PROFILE
Got platform 1: Intel(R) OpenCL by Intel(R) Corporation: OpenCL 1.1 LINUX FULL_PROFILE
Using the first platform
Platform 0 got device 0: GeForce 8800 GT by NVIDIA Corporation of version OpenCL 1.0 CUDA driver version 295.59 profile FULL_PROFILE local memory size 16384
Detected a x86\x86-64 processor with SSSE3
Standard delta:
8192000 delta pixels generated in 31060692 nanoseconds, 263 million pixels/s
8192000 delta pixels generated in 31138061 nanoseconds, 263 million pixels/s
8192000 delta pixels generated in 31115701 nanoseconds, 263 million pixels/s
8192000 delta pixels generated in 30994484 nanoseconds, 264 million pixels/s
8192000 delta pixels generated in 31015300 nanoseconds, 264 million pixels/s
8192000 delta pixels generated in 31138496 nanoseconds, 263 million pixels/s
8192000 delta pixels generated in 30993518 nanoseconds, 264 million pixels/s
8192000 delta pixels generated in 31117034 nanoseconds, 263 million pixels/s
Average: 263 million pixels/s
SSE2 delta:
8192000 delta pixels generated in 15322235 nanoseconds, 534 million pixels/s
8192000 delta pixels generated in 15178973 nanoseconds, 539 million pixels/s
8192000 delta pixels generated in 15296857 nanoseconds, 535 million pixels/s
8192000 delta pixels generated in 15271068 nanoseconds, 536 million pixels/s
8192000 delta pixels generated in 15195582 nanoseconds, 539 million pixels/s
8192000 delta pixels generated in 15192612 nanoseconds, 539 million pixels/s
8192000 delta pixels generated in 15312284 nanoseconds, 534 million pixels/s
8192000 delta pixels generated in 15324501 nanoseconds, 534 million pixels/s
Average: 536 million pixels/s
SSSE3 delta:
8192000 delta pixels generated in 14107486 nanoseconds, 580 million pixels/s
8192000 delta pixels generated in 13960627 nanoseconds, 586 million pixels/s
8192000 delta pixels generated in 14000164 nanoseconds, 585 million pixels/s
8192000 delta pixels generated in 13940673 nanoseconds, 587 million pixels/s
8192000 delta pixels generated in 14010928 nanoseconds, 584 million pixels/s
8192000 delta pixels generated in 13974089 nanoseconds, 586 million pixels/s
8192000 delta pixels generated in 14106036 nanoseconds, 580 million pixels/s
8192000 delta pixels generated in 13996167 nanoseconds, 585 million pixels/s
Average: 584 million pixels/s
OpenCL delta:
8192000 delta pixels generated in 432417251 nanoseconds, 18 million pixels/s | OpenCL kernel execution time: 5563616 nanoseconds
8192000 delta pixels generated in 5816905 nanoseconds, 1408 million pixels/s | OpenCL kernel execution time: 5550560 nanoseconds
8192000 delta pixels generated in 5820545 nanoseconds, 1407 million pixels/s | OpenCL kernel execution time: 5545280 nanoseconds
8192000 delta pixels generated in 5811061 nanoseconds, 1409 million pixels/s | OpenCL kernel execution time: 5538592 nanoseconds
8192000 delta pixels generated in 5815694 nanoseconds, 1408 million pixels/s | OpenCL kernel execution time: 5542624 nanoseconds
8192000 delta pixels generated in 5821011 nanoseconds, 1407 million pixels/s | OpenCL kernel execution time: 5545184 nanoseconds
8192000 delta pixels generated in 5821873 nanoseconds, 1407 million pixels/s | OpenCL kernel execution time: 5543744 nanoseconds
8192000 delta pixels generated in 5822396 nanoseconds, 1406 million pixels/s | OpenCL kernel execution time: 5551040 nanoseconds
Average: 1233 million pixels/s
Code: Select all
kfir@GentooB0X ~/Programming/sse2/Blend $ ./a.out
Got platform 0: NVIDIA CUDA by NVIDIA Corporation: OpenCL 1.1 CUDA 4.2.1 FULL_PROFILE
Got platform 1: Intel(R) OpenCL by Intel(R) Corporation: OpenCL 1.1 LINUX FULL_PROFILE
Using the first platform
Platform 0 got device 0: GeForce 8800 GT by NVIDIA Corporation of version OpenCL 1.0 CUDA driver version 295.59 profile FULL_PROFILE local memory size 16384
Detected a x86\x86-64 processor with SSSE3
Standard FastBlend:
32768000 colours blended in 32504734 nanoseconds, 1008 million colours/s
32768000 colours blended in 32513275 nanoseconds, 1007 million colours/s
32768000 colours blended in 32280170 nanoseconds, 1015 million colours/s
32768000 colours blended in 32286214 nanoseconds, 1014 million colours/s
32768000 colours blended in 32443637 nanoseconds, 1009 million colours/s
32768000 colours blended in 32435966 nanoseconds, 1010 million colours/s
32768000 colours blended in 32289727 nanoseconds, 1014 million colours/s
32768000 colours blended in 32419658 nanoseconds, 1010 million colours/s
Average: 1010 million colours/s
SSE2 FastBlend:
32768000 colours blended in 16667805 nanoseconds, 1965 million colours/s
32768000 colours blended in 16768181 nanoseconds, 1954 million colours/s
32768000 colours blended in 16658275 nanoseconds, 1967 million colours/s
32768000 colours blended in 16904805 nanoseconds, 1938 million colours/s
32768000 colours blended in 16542495 nanoseconds, 1980 million colours/s
32768000 colours blended in 16701763 nanoseconds, 1961 million colours/s
32768000 colours blended in 16767732 nanoseconds, 1954 million colours/s
32768000 colours blended in 16720840 nanoseconds, 1959 million colours/s
Average: 1959 million colours/s
OpenCL FastBlend:
32768000 colours blended in 572275226 nanoseconds, 57 million colours/s | OpenCL kernel execution time: 2693824 nanoseconds
32768000 colours blended in 2955255 nanoseconds, 11088 million colours/s | OpenCL kernel execution time: 2681312 nanoseconds
32768000 colours blended in 2948971 nanoseconds, 11111 million colours/s | OpenCL kernel execution time: 2679072 nanoseconds
32768000 colours blended in 2964044 nanoseconds, 11055 million colours/s | OpenCL kernel execution time: 2680512 nanoseconds
32768000 colours blended in 2957312 nanoseconds, 11080 million colours/s | OpenCL kernel execution time: 2681280 nanoseconds
32768000 colours blended in 2960775 nanoseconds, 11067 million colours/s | OpenCL kernel execution time: 2678880 nanoseconds
32768000 colours blended in 2958346 nanoseconds, 11076 million colours/s | OpenCL kernel execution time: 2679968 nanoseconds
32768000 colours blended in 2971600 nanoseconds, 11027 million colours/s | OpenCL kernel execution time: 2680576 nanoseconds
Average: 9695 million colours/s