I was playing around over the last couple of days with adding extra work queues to the OpenCL application i've been working on for work. Up until this point I was using a single queue referenced off a central 'context' object, which has worked ok, but I had to add some longer-running background processing steps which don't fit into the rest of the application.
I noticed one weird thing - I have two separate processes, one of which (say) takes 5 seconds to run, the other 6. If I run process 1 and 2 at the same time (on separate queues), process 1 takes about 6 seconds to run but process 2 blows out to 25 seconds. If I run two lots of process 1 then they both take about 10 seconds to execute.
I suspect it is because although they execute in about the same amount of time, process 1 is made of fewer steps and the scheduler is just alternating through the jobs in function-call sequence and so a lot more of process 1 gets run when process 2 is also active. Although it doesn't matter at this point it could be a significant problem in a 'real' application.
They are both run on a separate thread and throw out a lot of queue.finish()'s as well - mostly so it can detect user cancellation without queuing up too much work, but also because it seemed to run into resource problems if I queued up thousands of function calls at once (the over-all processing time is important but the interaction is more important at this point). So that also might be affecting the scheduling time. This is on NVidia.
I also found a bug in JOCL and filed another bug on the jogamp bugzilla - the 3rd JOCL bug there, and the 3rd i've filed myself. Hmmm.