Page 1 of 1

GPU?

Posted: Tue Jan 31, 2012 11:10 pm
by Yamanipanuchi
I know there's a lot of programming involved with an idea like this... But..

Wonder how hard it would be to offset some of the video comparison to a GPU. Although GPU's can get expensive, in some large scale systems this would be a fraction of the cost to upgrade a server.

Just an idea! :D

Re: GPU?

Posted: Sun Jun 24, 2012 9:53 pm
by sag

Re: GPU?

Posted: Thu Jul 05, 2012 3:41 pm
by frankster
I was just thinking today about VDPAU...I have two colour hd cameras and my core 2 duo can't keep up with decoding and motion detecting at full speed (and sure I could turn fps down but what if I wanted to get more cameras!). So I was wondering whether VDPAU can be used to speed up the H264 decompression and the answer seems to be that yes it can, as you can get the decoded data out of the VDPAU API. Then I looked some more and I could see that ffmpeg already has some support for VDPAU...so this might be something for me to look at...

But then I was thinking maybe its actually the motion detection that is the bottleneck, not the H264 decoding. In which case VDPAU won't be much use!

Re: GPU?

Posted: Mon Aug 13, 2012 1:45 am
by mastertheknife
I was doing a little experiment to see how GPU offloading can speed up the motion detection process, by running a small portion of the motion detection code found in ZM. The results show that there is definitely a speedup.

Code: Select all

    kfir@GentooB0X ~/Programming/sse2/Delta $ ./a.out
    Got platform 0: NVIDIA CUDA by NVIDIA Corporation: OpenCL 1.1 CUDA 4.2.1 FULL_PROFILE
    Got platform 1: Intel(R) OpenCL by Intel(R) Corporation: OpenCL 1.1 LINUX FULL_PROFILE
    Using the first platform
    Platform 0 got device 0: GeForce 8800 GT by NVIDIA Corporation of version OpenCL 1.0 CUDA driver version 295.59 profile FULL_PROFILE local memory size 16384
    Detected a x86\x86-64 processor with SSSE3
    Standard delta:
    8192000 delta pixels generated in 31060692 nanoseconds, 263 million pixels/s
    8192000 delta pixels generated in 31138061 nanoseconds, 263 million pixels/s
    8192000 delta pixels generated in 31115701 nanoseconds, 263 million pixels/s
    8192000 delta pixels generated in 30994484 nanoseconds, 264 million pixels/s
    8192000 delta pixels generated in 31015300 nanoseconds, 264 million pixels/s
    8192000 delta pixels generated in 31138496 nanoseconds, 263 million pixels/s
    8192000 delta pixels generated in 30993518 nanoseconds, 264 million pixels/s
    8192000 delta pixels generated in 31117034 nanoseconds, 263 million pixels/s
    Average: 263 million pixels/s
    SSE2 delta:
    8192000 delta pixels generated in 15322235 nanoseconds, 534 million pixels/s
    8192000 delta pixels generated in 15178973 nanoseconds, 539 million pixels/s
    8192000 delta pixels generated in 15296857 nanoseconds, 535 million pixels/s
    8192000 delta pixels generated in 15271068 nanoseconds, 536 million pixels/s
    8192000 delta pixels generated in 15195582 nanoseconds, 539 million pixels/s
    8192000 delta pixels generated in 15192612 nanoseconds, 539 million pixels/s
    8192000 delta pixels generated in 15312284 nanoseconds, 534 million pixels/s
    8192000 delta pixels generated in 15324501 nanoseconds, 534 million pixels/s
    Average: 536 million pixels/s
    SSSE3 delta:
    8192000 delta pixels generated in 14107486 nanoseconds, 580 million pixels/s
    8192000 delta pixels generated in 13960627 nanoseconds, 586 million pixels/s
    8192000 delta pixels generated in 14000164 nanoseconds, 585 million pixels/s
    8192000 delta pixels generated in 13940673 nanoseconds, 587 million pixels/s
    8192000 delta pixels generated in 14010928 nanoseconds, 584 million pixels/s
    8192000 delta pixels generated in 13974089 nanoseconds, 586 million pixels/s
    8192000 delta pixels generated in 14106036 nanoseconds, 580 million pixels/s
    8192000 delta pixels generated in 13996167 nanoseconds, 585 million pixels/s
    Average: 584 million pixels/s
    OpenCL delta:
    8192000 delta pixels generated in 432417251 nanoseconds, 18 million pixels/s | OpenCL kernel execution time: 5563616 nanoseconds
    8192000 delta pixels generated in 5816905 nanoseconds, 1408 million pixels/s | OpenCL kernel execution time: 5550560 nanoseconds
    8192000 delta pixels generated in 5820545 nanoseconds, 1407 million pixels/s | OpenCL kernel execution time: 5545280 nanoseconds
    8192000 delta pixels generated in 5811061 nanoseconds, 1409 million pixels/s | OpenCL kernel execution time: 5538592 nanoseconds
    8192000 delta pixels generated in 5815694 nanoseconds, 1408 million pixels/s | OpenCL kernel execution time: 5542624 nanoseconds
    8192000 delta pixels generated in 5821011 nanoseconds, 1407 million pixels/s | OpenCL kernel execution time: 5545184 nanoseconds
    8192000 delta pixels generated in 5821873 nanoseconds, 1407 million pixels/s | OpenCL kernel execution time: 5543744 nanoseconds
    8192000 delta pixels generated in 5822396 nanoseconds, 1406 million pixels/s | OpenCL kernel execution time: 5551040 nanoseconds
    Average: 1233 million pixels/s

Re: GPU?

Posted: Tue Aug 14, 2012 2:15 am
by mastertheknife
Another test, this time for blend performance (accounts for 30% of zma's CPU usage):

Code: Select all

    kfir@GentooB0X ~/Programming/sse2/Blend $ ./a.out
    Got platform 0: NVIDIA CUDA by NVIDIA Corporation: OpenCL 1.1 CUDA 4.2.1 FULL_PROFILE
    Got platform 1: Intel(R) OpenCL by Intel(R) Corporation: OpenCL 1.1 LINUX FULL_PROFILE
    Using the first platform
    Platform 0 got device 0: GeForce 8800 GT by NVIDIA Corporation of version OpenCL 1.0 CUDA driver version 295.59 profile FULL_PROFILE local memory size 16384
    Detected a x86\x86-64 processor with SSSE3
    Standard FastBlend:
    32768000 colours blended in 32504734 nanoseconds, 1008 million colours/s
    32768000 colours blended in 32513275 nanoseconds, 1007 million colours/s
    32768000 colours blended in 32280170 nanoseconds, 1015 million colours/s
    32768000 colours blended in 32286214 nanoseconds, 1014 million colours/s
    32768000 colours blended in 32443637 nanoseconds, 1009 million colours/s
    32768000 colours blended in 32435966 nanoseconds, 1010 million colours/s
    32768000 colours blended in 32289727 nanoseconds, 1014 million colours/s
    32768000 colours blended in 32419658 nanoseconds, 1010 million colours/s
    Average: 1010 million colours/s
    SSE2 FastBlend:
    32768000 colours blended in 16667805 nanoseconds, 1965 million colours/s
    32768000 colours blended in 16768181 nanoseconds, 1954 million colours/s
    32768000 colours blended in 16658275 nanoseconds, 1967 million colours/s
    32768000 colours blended in 16904805 nanoseconds, 1938 million colours/s
    32768000 colours blended in 16542495 nanoseconds, 1980 million colours/s
    32768000 colours blended in 16701763 nanoseconds, 1961 million colours/s
    32768000 colours blended in 16767732 nanoseconds, 1954 million colours/s
    32768000 colours blended in 16720840 nanoseconds, 1959 million colours/s
    Average: 1959 million colours/s
    OpenCL FastBlend:
    32768000 colours blended in 572275226 nanoseconds, 57 million colours/s | OpenCL kernel execution time: 2693824 nanoseconds
    32768000 colours blended in 2955255 nanoseconds, 11088 million colours/s | OpenCL kernel execution time: 2681312 nanoseconds
    32768000 colours blended in 2948971 nanoseconds, 11111 million colours/s | OpenCL kernel execution time: 2679072 nanoseconds
    32768000 colours blended in 2964044 nanoseconds, 11055 million colours/s | OpenCL kernel execution time: 2680512 nanoseconds
    32768000 colours blended in 2957312 nanoseconds, 11080 million colours/s | OpenCL kernel execution time: 2681280 nanoseconds
    32768000 colours blended in 2960775 nanoseconds, 11067 million colours/s | OpenCL kernel execution time: 2678880 nanoseconds
    32768000 colours blended in 2958346 nanoseconds, 11076 million colours/s | OpenCL kernel execution time: 2679968 nanoseconds
    32768000 colours blended in 2971600 nanoseconds, 11027 million colours/s | OpenCL kernel execution time: 2680576 nanoseconds
    Average: 9695 million colours/s
mastertheknife