I still get occasional GPU hangs, but it is completely random: maybe it's code bugs, maybe it's driver bugs ...
It scales fairly well: I tried a 4288x2848 pic, and it was still interactively-quick (swing stuffed up visually since it isn't in a scroll-pane, so I couldn't see how long it was taking), and much of that time was to/fro java/swing/X. For a 1546x1178 image it was about 8ms for a the full wavelet processing (kernel times). It's only greyscale. This 1-2 orders of magnitude faster than a contemporary high-end cpu, so is well worth it: and it makes real-time 1080P processing possible.
But for an interactive application however you would only need to perform the forward transform once and just apply the thresholding and inverse with each change. So in that case it could be quite a bit faster.