About

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

Tags

android (44)
beagle (63)
biographical (82)
business (1)
code (56)
cooking (29)
dez (6)
dusk (30)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (414)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (216)
java ee (3)
javafx (48)
jjmpeg (67)
junk (3)
kobo (15)
linux (3)
mediaz (27)
ml (15)
nativez (3)
opencl (117)
os (17)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
politics (7)
ps3 (12)
puppybits (17)
rants (134)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
wanki (3)
workshop (2)
zedzone (13)
Tuesday, 09 October 2018, 10:07

Other JNI bits/ NativeZ, jjmpeg.

Yesterday I spent a good deal of time continuing to experiment and tune NativeZ. I also ported the latest version of jjmpeg to a modularised build and to use NativeZ objects.

Hashing C Pointers

C pointers obtained by malloc are aligned to 16-byte boundaries on 64-bit GNU systems. Thus the lower 4 bits are always zero. Standard malloc also allocates a contiguous virtual address range which is extended using sbrk(2) which means the upper bits rarely change. Thus it is sufficient to generate a hashcode which only takes into account the lower bits (excluding the first 4).

I did some experimenting with hashing the C pointer values using various algorithms, from Knuth's Magic Number to various integer hashing algorithms (e.g. hash-prospector), to Long.hashCode(), to a simple shift (both 64-bit and 32-bit). The performance analysis was based on Chi-squared distance between the hash chain lengths and the ideal, using pointers generated from malloc(N) for different fixed values of N for multiple runs.

Although it wasn't the best statistically, the best performing algorithm was a simple 32-bit, 4 bit shift due to it's significantly lower cost. And typically it compared quite well statically regardless.

static int hashCode(long p) {
    return (int)p >>> 4;
}

In the nonsensical event that 28 bits are not sufficient the hash bucket index it can be extended to 32-bits:

static int hashCode(long p) {
    return (int)(p >>>> 4);
}

And despite all the JNI and reflection overheads, using the two-round function from the hash-prospector project increased raw execution time by approximately 30% over the trivial hashCode() above.

Whilst it might not be ideal for 8-bit aligned allocations it's probably not that bad either in practice. One thing I can say for certain though is NEVER use Long.hashCode() to hash C pointers!

Concurrency

I also tuned the use of synchronisation blocks very slightly to make critical sections as short as possible whilst maintaining correct behaviour. This made enough of a difference to be worth it.

I also tried more complex synchronisation mechanisms - read-write locks, hash bucket row-locks and so on, but it was at best a bit slower than using synchronize{}.

The benchmark I was using wasn't particularly fantastic - just one thread creating 10^7 `garbage' objects in a tight loop whilst the cleaner thread freed them. No resolution of exisitng objects, no multiple threads, and so on. But apart from the allocation rate it isn't an entirely unrealistic scenario either and i was just trying to identify raw overheads.

Reflection

I've only started looking at the reflection used for allocating and releaseing objects on the Java side, and in isolation these are the highest costs of the implementation.

There are ways to reduce these costs but at the expense of extra boilerplate (for instantiation) or memory requirements (for release).

Still ongoing. And whilst the relative cost over C is very high, the absolute cost is still only a few hundred nanoseconds per object.

From a few small tests it looks like that maximum i could achieve is a 30% reduction in object instantiation/finalisation costs, but I don't think it's worth the effort or overheads.

Makefile foo

I'm still experiemnting with this, I used some macros and implicit rules to get most things building ok, but i'm not sure if it couldn't be better. The basic makefile is working ok for multi-module stuff so I think i'm getting there. Most of the work is just done by the jdk tools as they handle modules and so on quite well and mostly dicatate the disk layout.

I've broken jjmpeg into 3 modules - the core, the javafx related classes and the awt related classes.

Tagged java, jjmpeg, nativez.
Sunday, 07 October 2018, 08:51

GC JNI, HashTables, Memory

I had a very busy week with work working on porting libraries and applications to Java modules - that wasn't really the busy part, I also looked into making various implementation's pluggable using services and then creating various pluggable implementations, often utilising native code. Just having some (much faster) implementation of parts also opened other opportunities and it sort of cascaded from there.

Anyway along the way I revisited my implementation of Garbage Collection with JNI and started working on a modular version that can be shared between libraries without having to copy core object, and then along the way found bugs and things to improve.

Here are some of the more interesting pieces I found along the way.

JNI call overheads

The way i'm writing jni these days is typically just write the method signature as if it were a Java method and just mark it native. Let the jni handle Java to C mappings direclty. This is different to how I first started doing it and flies in the convention i've typically seen amongst JNI implementations where the Java just passes the pointers as a long and has a wrapper function which resolves these longs as appropriate.

The primary reason is to reduce boilerplate and signficiantly simplify the Java class writing without having a major impact on performance. I have done some performance testing before but I re-ran some tests and they confirm the design decisions used in zcl for example.

Array Access

First, I tested some mechanisms for accessing arrays. I passed two arrays to a native function and had it perform various tests:

  1. No op;
  2. GetPrimitiveArrayCritical on both arrays;
  3. GetArrayElements for read-only arrays (call Release(ABORT))
  4. GetArrayElements for read-only on one array and read-write on the other (call Release(Abort, Commit));
  5. GetArrayRegion for read-only, to memory allocated using alloca
  6. GetArrayRegion and SetArrayRegion for one array, to memory using alloca
  7. GetArrayRegion for read-only, to memory allocated using malloc
  8. GetArrayRegion and SetArrayRegion for one array, to memory using malloc

I then ran these tests for different sized float[] arrays, for 1 000 000 iterations, and the results in seconds are below. It's some intel laptop.

       NOOP         Critical     Elements                  Region/alloca             Region/malloc
            0            1            2            3            4            5            6            7
    1  0.014585537  0.116005779  0.199563981  0.207630731  0.104293268  0.127865782  0.185149189  0.217530639
    2  0.013524620  0.118654092  0.201340322  0.209417471  0.104695330  0.129843794  0.193392346  0.216096210
    4  0.012828157  0.113974453  0.206195102  0.214937432  0.107255090  0.127068808  0.190165219  0.215024016
    8  0.013321001  0.116550424  0.209304277  0.205794572  0.102955338  0.130785133  0.192472825  0.217064583
   16  0.013228272  0.116148320  0.207285227  0.211022409  0.106344162  0.139751496  0.196179709  0.222189471
   32  0.012778452  0.119130446  0.229446026  0.239275912  0.111609011  0.140076428  0.213169077  0.252453033
   64  0.012838540  0.115225274  0.250278658  0.259230054  0.124799171  0.161163577  0.230502836  0.260111468
  128  0.014115022  0.120103332  0.264680542  0.282062633  0.139830967  0.182051151  0.250609001  0.297405818
  256  0.013412645  0.114502078  0.315914219  0.344503396  0.180337154  0.241485525  0.297850562  0.366212494
  512  0.012669807  0.117750316  0.383725378  0.468324904  0.261062826  0.358558946  0.366857041  0.466997977
 1024  0.013393850  0.120466096  0.550091063  0.707360155  0.413604094  0.576254053  0.518436072  0.711689270
 2048  0.013493996  0.118718871  0.990865614  1.292385065  0.830819392  1.147347700  0.973258653  1.284913436
 4096  0.012639675  0.116153318  1.808592969  2.558903773  1.628814486  2.400586604  1.778098089  2.514406096

Some points of note:

I didn't look at ByteBuffer because it doesn't really fit what i'm doing with these functions.

Anyway - the overheads are unavoidable with JNI but are quite modest. The function in question does nothing with the data and so any meaningful operation will quickly dominate the processing time.

Object Pointer resolution

The next test I did was to compare various mechanisms for transferring the native C pointer from Java to C.

I created a Native object with two long fields, native final long p, and native long q.

  1. No operation;
  2. C invokes getP() method which returns p;
  3. C invokes getQ() method which returns q;
  4. C access to .p field;
  5. C access to .q field;
  6. The native signature takes a pointer directly, call it resolving the .p field in the caller;
  7. The native signature takes a pointer directly, call it resolving the .p field via a wrapper function.

Again invoking it 1 000 000 times.

NOOP         getP()       getQ()       (C).p        (C).q        (J).p        J wrapper
     0            1            2            3            4            5            6
0.016606942  0.293797182  0.294253973  0.020146810  0.020154508  0.015827028  0.016979563

In short, just passing Java objects directly and having the C resolve the pointer via a field lookup is slightly slower but requires much less boilerplate and so is the preferred solution.

Logger

After I sorted out the basic JNI mechanisms I started looking at the reference tracking implementation (i'll call this NativeZ from here on).

For debugging and trying to be a more re-usable library I had added logging to various places in the C code using Logger.getLogger(tag).fine(String.printf());

It turns out this was really not a wise idea and the logging calls alone were taking approximately 50% of the total execution time - versus java to C to java, hashtable lookups and synchronisation blocks.

Simply changing to use the Supplier versions of the logging functions approximately doubled the performance.

  Logger.getLogger(tag).fine(String.printf());
->
  Logger.getLogger(tag).fine(() -> String.printf());
    

But I also decided to just make including any of the code optional by bracketing each call to a test against a final static boolean compile-time constant.

This checking indirectly confirmed that the reflection invocations aren't particualrly onerous assuming the're doing any work.

HashMap<Long,WeakReference>

Now the other major component of the NativeZ object tracking is using a hash-table to map C pointers to Java objects. This serves two important purposes:

  1. Allows the Java to resolve separate C pointers to the same object;
  2. Maintains a hard reference to the WeakReference, without which they just don't work.

For simplicity I just used a HashMap for this purpose. I knew it wasn't ideal but I did the work to quantify it.

Using jol and perusing the source I got some numbers for a jvm using compressed oops and an 8-byte object alignment.

ObjectSize
HashMap.Node32Used for short hash chains.
HashMap.TreeNode56Used for long hash chains.
Long24The node key
CReference48The node value. Subclass of WeakReference

Thus the incremental overhead for a single C object is either 104 bytes when a linear hashchain is used, and 128 bytes when a tree is used.

Actually its a bit more than that because the hashtable (by default) uses a 75% load factor so also allocates 1.5 pointers for each object but that's neither here nor there and also a feature of the algorithm regardless of implementation.

But there are other bigger problems, the Long.hashCode() method just mixes the low and high words together using xor. If all C pointers are 8 (or worse, 16) byte aligned you essentially only get every 8 (or 16) buckets ever in use. So apart from the wasted buckets the HashMap is very likely to end up using Trees to store each chain.

So I wrote another hashtable implementation which addresses this by using the primitive long stored in the CReference directly as the key, and using the CReference itself as the bucket nodes. I also used a much better hash function. This reduced the memory overhead to just the 48 bytes for the CReference plus a (tunable) overhead for the root table - anywhere from 1/4 to 1 entry per node works quite well with the improved hash function.

This uses less memory and runs a bit faster - mostly because the gc is run less often.

notzed.nativez

So i'm still working on wrapping this all up in a module notzed.nativez which will include the Java base class and a shared library for other JNI libraries to link to which includes the (trivial) interface to the NativeZ object and some helpers to help write small and robust JNI libraries.

And then of course eventually port jjmpeg and zcl to use it.

Tagged java, nativez.
Saturday, 29 September 2018, 09:09

Bye Bye Jaxby

So one of the biggst changest affecting my projects with Java 11 is the removal of java.xml.bind from the openjdk. This is a bit of a pain because the main reason I used it was the convenience, which is a double pain because not only do i have to undo all that inconvience, all that time using and learning it in the first place has just been confirmed as wasted.

I tried using the last release as modules but they are incompatible with the module system because one or two of the packages are split. I tried just making a module out of them but couldn't get it to work either. And either i'm really shit at google-foo or it's just shit but I couldn't for the life of me find any other reasonable approach so after wasting too much time on it I bit the bullet and just wrote some SAXParser and XMLStreamWriter code mandraulically.

Fortunately the xml trees I had made parsing quite simple. First, none of the element names overlapped so even parsing embedded structures works without having to keep track of the element state. Secondly almost all the simple fields were encoded as attributes rather than elements. So this means almost all objects can be parsed from the startElement callback, and a single stack is used to track encapsulated fields. Becuase I use arrays in a few places a coule of ancilliary lists are used to build them (or I could just change them to Lists).

It's still tedious and error-prone and a pretty shit indightment on the state of Java SE in 2018 vs other languages but once it's done it's done and not having a dependency on half a dozen badly over-engineered packages means it's only done once and i'm not wasting my time learning another fucking "framework".

I didn't investigate where javaee is headed - it'll no doubt eventually solve this problem but removing the dependency from desktop and command-line tools isn't such a bad thing - there have to be good reasons it was dropped from JavaSE in the first place.

One might point to json but that's just as bad to use as a DOM based mechanism which is also just as tedious and error prone. json only really works with fully dynamic languages where you don't have to write any of the field bindings, although there are still plenty of issues with no canonicalised encoding of things like empty arrays or null strings. In any event I need file format compatability so the fact that I also think it's an unacceptably shit solution is entirely moot.

Modules

By the end of the week i'd modularised my main library and ported one of the applications that uses it to the new structure. The application itself also needs quite a bit of modularisation but that's a job for next week, as is testing and debugging - it runs but there's a bunch of broken shit.

So using the modules it's actually quite nice - IF you're using modules all the way down. I didn't have time to look further to find out if it's just a problem with netbeans but adding jars to the classpath generally fucks up and it starts adding strange dependencies to the build. So in a couple of cases I took existing jars and added a module-info myself. When it works it's actually really nice - it just works. When it doesn't, well i'm getting resource path issues in one case.

I also like the fact the tools are the ones dictating the source and class file structures - not left to 3rd party tools to mess up.

Unfortunately I suspect modularisation will be a pretty slow-burn and it will be a while before it benefits the average developer.

Netbeans / CVS

As an update on netbeans I joined the user mailing list and asked about CVS - apparently it's in the netbeans plugin portal. Except it isn't, and after providing screenshots of why I would think that it doesn't exist I simply got ignored.

Yeah ok.

Command line will have to do for me until it decides to show up in my copy.

Java After Next

So with Oracle loosening the reigns a bit (?) on parts of the java platform like JavaFX i'm a little concerned about where things will end up.

Outside of the relatively tight core of SE the java platform there are some pretty shitty "industry standard" pieces. ant - it's just a horrible to use tool. So horrible it looks like they've added javascript to address some of it's issues (oh yay). maven has a lot of issues beyond just being slow as fuck. The ease with which it allows one to bloat out dependencies is not a positive feature.

So yeah, if the "industry" starts dictating things a bit more, hopefully they wont have a negative impact.

Tagged java, rants.
Wednesday, 26 September 2018, 10:32

Java Modules

So I might not be giving a shit and doing it for fun but I'm still looking into it at work.

After a couple of days of experiments and quite a bit of hacking i've taken most of the libraries I have and re-combined them into a set of modules. Ostensibly the modules are grouped by functionality but I also moved a few bits and pieces around for dependency reasons.

One handy thing is the module-info (along with netbeans) lets you quickly determine dependencies between modules, so for example when I wanted to remove java.desktop and javafx from a library I could easily find the usages. It has made the library slightly more difficult to use because i've moved some methods to static functions (and these functions are used a lot in my prototype code so there's a lot of no-benefit fixing to be done to port it) but it seems like a reasonable compromise for the first cut. There may be other approaches using interfaces or subclasses too, although I tend to think that falls into over-engineering.

Spi

One of the biggest benefits is the service provider mechanism that enables pluggability by just including modules the path. It's something I should've looked into earlier rather than the messy ad-hoc stuff i've been doing but I guess things get done eventually.

I've probably not done a good job with it yet either but it's a start and easy to modify. There should be a couple of other places I can take advantage of it as well.

Redesign

I'm also mid-way through cleaning out a lot of stuff - cut and paste, newer-better implementations, or just experiments that take too much code and are rarely used.

I had a lot of stream processing experiements which just ended up being over-engineered. For example I tried experimenting with using streams and a Collector to calculate more than just min/sum/max, instead calculating multi-dimensional statistics (i.e. all at once) on multi-dimensional data (e.g. image channels). So I came up with a set of classes (1 to 4 dimensions), collector factories, and so on - it's hundreds of lines of code (and a lot of bytecode) and I think I only use it in one or two places in non-performance critical code. So it's going in the bin and if i do decide to replace it I think I can get by with at most a single class and a few factory methods.

The NotAnywhereZone

Whilst looking for some info on netbeans+cvs I tried finding my own posts, and it seems this whole site has effectively vanished from the internet. Well with a specific search you can still find blog posts on google, but not using the date-ranges (maybe the date headers are wrong here). All you can find on duckduckgo is the site root page.

So if you're reading this, congratulations on not being a spider bot!

Tagged java.
Sunday, 23 September 2018, 12:26

Not Netbeans 9.0, Java 11

Well that effort was short-lived, no CVS plugin anymore.

It's not that hard to live without, just use the command line and/or emacs, but today i've already wasted enough time trying to find out if it will ever return (on which question I found no answer, clear or otherwise).

It was also going to be a bit of a pain translating an existing project into a 'modular' one, even though from the perspective of a makefile it's only a couple of small changes.

Tagged java.
Saturday, 22 September 2018, 10:12

Netbeans 9, Java 11

So after months of rewriting license headers netbeans 9 is finally out. So I finally had a bit more of a serious look at migrating to openjdk + openjfx 11 for work.

Eh, it's going to be a bit of a pain but I think overall it should be an improvement worth the effort. When i'm less hungover and better-slept i'll start looking into jmodularising my projects too.

One unfortunate bit is that netbeans doesn't seem to support native libraries in modules so i'll need to use makefiles for those. This is one of the more interesting features of jmods so i'm also looking into utilising that a bit more.

At the moment as i'm looking into some deep learning stuff so i've got a lot of time between drinks - pretty much every stage of it is an obnoxiously slow process.

Lots of other little things to look into as well, and now the next yearly contract has finally been done I don't have an easy excuse for putting in fuck-all hours!

Tagged java.
Wednesday, 06 June 2018, 10:03

Free Software JavaFX

I've been keeping a (rather loose) eye out on the availability of a fully free-software JavaFX for a while. Whilst all distributions now ship OpenJDK by default and that works fine for anything not JavaFX, the OpenJFX stuff has still been work in progress and hasn't generally been available.

Anyway I thought i'd try to compile it myself. Of course the first thing I noticed was that a preview build IS now available but I decided to keep going anyway so I could build a complete integrated JDK/JRE rather than having to manually link it in at runtime.

For the first part of the problem, following the build instructions was quite straightforward, at least on Ubuntu 16.04. I manually installed a couple of the specific build tools required and a couple of extra -dev packages. I didn't bother with building the bundled webkit for now. Even on this gutless under-RAM'd machine it didn't even take terribly long. I used the openjdk 10 for the compiler.

Then I looked into integration witht the jdk (later section in the build instructions). I was too lazy to read most of the build instructions so I just went with configure && make, although after a few iterations I settled on:

$ mkdir build
$ cd build
$ ../configure  --with-import-modules=../../rt/build/modular-sdk \
  --with-jtreg=../../jtreg \
  --prefix=/opt \
  --with-native-debug-symbols=none
$ make product-images
  

Some time later it's all done and I ran some simple tests and Bob's your uncle. Stripping the debug symbols just reduces the size of the install (significantly) although it's probably worth keeping them at this early stage.

The slowest part was checking out the mecurial repositories.

Oh, I couldn't get the jdk tests to run :-/ jtreg just complained that it couldn't determine the JVM version. Trying to search the interwebs has so far been fruitless - it's not terribly important for now and I successfully ran the bootcycle-build which self-compiles after bootstrapping which is at least some bit of testing.

Update: Not sure where I got it from but the jtreg I had was out of date so it couldn't handle the new version string. Starting on the jtreg page got me the latest (4.2b12) and now the testing is running.

Torrents

Although this box doesn't have much disk it does have enough for a few big blobs so I thought I'd look into distributing a build (because, well!). To that end I sussed out setting up a private tracker and seeding torrents directly from this machine.

Anyway i've mostly worked it out but it isn't quite ready but i'll get to it eventually.

Java 11

With the short lifetime of Java 9 and 10 i've basically just ignored them completely - i'm still using 8.x everywhere. But Java 11 is going to be LTS release so it's probably time to start into it.

Having a fully free build of the whole platform is an important part of that for my hobby code. Now I just need the motivation ...

Update 2: And NetBeans, which is still not supporting a Java that was EOL'd 3 months ago. I'm sure they've nearly go the licensing sorted out though!

Update 3

Ok so I guess i'm a bit out of the loop (and search engines failed me again, it took me hours to come across this stuff, mostly by accident).

JavaFX is going to removed from the JDK. And additionally the JRE will also go away(?) and basically replaced with standalone platform-specific builds via jlink. Actually i'm not sure how the latter will work, maybe that's just wrt the openjdk; and in any event Oracle are committing to another few years of Java 8 releases.

Given that all the 'action' is server-side I guess it makes some sense. Just not for me!

Huh, I wonder why the current jdk build process allows linking in the JavaFX module then. Unclear. Just doing so breaks the tests so it's probably just the build system lagging slightly. Then again the make_runargs.sh script used to run against the javafx module doesn't seem to work against openjdk11 anyway. Well unless there's some other meachanism but neither javafxpackager nor javapackager get built - does jlink do all that now? So far my searching has been pretty futile and a lot of the documentation is either out of date or hard to find. I suppose it's still basically work in progress and/or i'm just not that motivated to dig deep enough.

Well I did waste the day anyway so whatever. At least AMD announced some nice hardware and Intel made an arse of themselves with their 'first' 5Ghz cpu demo which needed to be plugged into a fridge to run for 10 seconds.

Oh, there is also openjfx in ubuntu repos, but it only goes to 8. I can't remember if i looked at it in the past, probably did but I think I was looking at java9 at the time not realising how short lived it would be. I should've just stayed with 8.

Update 4

Well I did find something about running against the standalone javafx on this blog post. I guess all that javapackager stuff is gone or something since webstart and applets are defunct? Not that I ever used it anyway. Shrug.

Also apparently using jlink and making a partial copy of the jre for every application will be `easier' than just sending out a jar and having a central single installation of a jre. Wot?

Everyone seems to be a bit upset that Oracle are only going to provide commercial support for the oracle jdk11+ too - but given how much they've invested in making the openjdk feature parity I think they did ok for such an `evil' company. I mean, it's not like they forked some public commons like Linux or something as Google did.

Tagged java, javafx.
Monday, 23 April 2018, 21:21

Versioning DB

Well I don't have any code ready yet but between falling asleep today i did a little more work on a versioned data-store i've been working on ... for years, a couple of decades infact.

In it's current iteration it utilises just 3 core (simple) relational(-like) tables and can support both SVN style branches (lightweight renames) and CVS style branches (even lighter weight). Like SVN it uses global revisions and transactions but like CVS it works on a version tree rather than a path tree, but both approaches are possible within the same small library.

Together with dez it allows for compact reverse delta storage.

Originally I started in C but i've been working Java since - but i'm looking at back-porting the latest design to C again for some performance comparisons. It's always used Berkeley DB (JE for Java) as storage although I did experiment with using a SQL version in the past.

My renewed interest is that the goal is eventually run this site with it as a backing storage - for code, documentation, musings. e.g. the ability to branch a document tree for versioning and yet have it served live from common storage. This was essentially the reason I started investigating the project many years ago but never quite got there. I'm pretty sure I've got a solid schema but still need to solidify the API around a few basic use-cases before I move forward.

The last time I touched the code as 2 years ago, and the last time I did any significant code on it was 3 years ago, so it's definitely a slow burner project!

Well, more when I have more to say.

Tagged dez, java, rez.
Newer Posts | Older Posts
Copyright (C) 2018 Michael Zucchi, All Rights Reserved.Powered by gcc & me!