About

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as zed
  & handle of notzed

Tags

android (44)
beagle (63)
biographical (87)
blogz (7)
business (1)
code (63)
cooking (30)
dez (7)
dusk (30)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (434)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (224)
java ee (3)
javafx (48)
jjmpeg (77)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (8)
opencl (119)
os (17)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
wanki (3)
workshop (3)
zcl (1)
zedzone (21)
Thursday, 18 August 2011, 12:07

GEGL/OpenCL

So apparently a lad's been working on getting some OpenCL code into GEGL. What surprises me is just how slow the result is - and how slow GEGL is at doing the super-simple operation of brightness/contrast even with a CPU.

Of course, I'm not sure exactly what is being timed here, so perhaps it's timing a lot more than just the mathematics. Well obviously it has to be, my ageing Pentium-M laptop can do a 1024x1024xRGBA/FLOAT brightness/contrast in about 70ms with simple single-threaded Java code. So 500ms for the same operation using 'optimised sse2' is including a hell of a lot of extra stuff beyond the maths. Curiously, the screenshot of the profiler shows 840 'tiles' have been processed, if they are 128x64 as suggested then that is 6MP, not 1MP as stated in the post - in that case 500ms isn't so bad (it isn't great either, but at least it's in the same order).

I tried posting this to the forum linked to this phoronix post but for whatever reason it refused to take the post, so i'll post it here instead.


This result is really slow. Like about 100x off if I have the relative performance of that gpu correct. Even the CPU timings look suspect - is GEGL really that slow?

A list of potential bottlenecks:

A list of things which can't be bottlenecks:

Suggestions

In the nvidia profiler, look at the 'gpu time width plot' to see when the gpu is actually doing work. You'll probably see the individual jobs (and memory transfers) take almost no time and it's mostly sitting idle waiting for work from the cpu. It's that idle time which is going to be 99% of the elapsed time which is where you find all the gains at this point.

Don't even bother looking at the graph you posted - memory transfer time will have to be greater than the processing time since the processing is so simple and the gpu memory bandwidth is so much higher than pci speed. All you're doing is confirming that fact. The memory transfer time can mostly be hidden using asynchronous programming techniques anyway, so it is basically irrelevant.

Tagged hacking, opencl.
Another one bites the dust ... | 10K
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!