So I came up with a couple more ideas to try ... actually they all pretty much amounted to nothing but here is a small update anyway. See the notes on the home page.
Some things I tried and discarded: incrementally chaining the hash table (avoiding the realloc+copy); inlining the hash function; calculating all hashes of source and target at the start; doing the same in parallel. Most of them amounted to no gain although the parallel hashing helped quite a bit in some cases but not often enough to be worth the memory required.
Thus this is mostly a post about the expanded homepage where i added a bit detail and more benchmarks.
I was a bit surprised with the memory use compared to the "competition" because dez amounts to a full exhaustive search, albeit index assisted, whereas the others do not.