About

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

Tags

android (44)
beagle (63)
biographical (82)
business (1)
code (56)
cooking (29)
dez (6)
dusk (30)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (414)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (216)
java ee (3)
javafx (48)
jjmpeg (67)
junk (3)
kobo (15)
linux (3)
mediaz (27)
ml (15)
nativez (3)
opencl (117)
os (17)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
politics (7)
ps3 (12)
puppybits (17)
rants (134)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
wanki (3)
workshop (2)
zedzone (13)
Tuesday, 15 May 2018, 19:00

Backend stuff

Winter has hit here and along with insomnia i'm not really feeling like doing much of an evening but i've dabbled a few times and basically ported the Java version of a tree-revision database to C.

At this point i've just got the core done - schema/bindings and most of the client api. I'm pretty sure it's solid but I need to write a lot of testing and validation code to make sure it will be reliable and performant enough, and then write a bunch more to turn it into something interesting.

But i've been at a desk for 10 hours straight and my feet are icy cold so it's not happening tonight.

Tagged hacking, zedzone.
Sunday, 29 April 2018, 12:02

BDB | !BDB?

I mentioned a few posts ago that there doesn't seem to be many NoSQL databases around anymore - at least last time I looked a year or two ago, all the buzz from a decade ago had gone away. Various libraries became proprietary-commercial or got abandoned.

For some reason I can't remember I went looking for BerkeleyDB alternatives and hit this stackoverflow question which points to some of them.

So I guess I was a little mistaken, there are still a few around, but not all are appropriate for what I want it for:

I guess the best of those is LMDB - i'd come across it whilst using Caffe but never looked into it. Given it's roots in replacing BDB it has enough similarities in API and features to be a good match for what I want (and written in a sane language) although a couple of niggles exist such as the lack of sequences and all the fixed-sized structures (and database size). Being a part of a specific project (OpenLDAP) means it's hit maturity without features that might be useful elsewhere.

The multi-version concurrency control and so on is pretty neat anyway. No transaction logs is a good thing. If I ever get time I might play with those ideas a little in Java - not because I necessarily think it's a great idea but just to see if it's possible. I played with an extensible hash thing for indexing in camel many years ago but it was plagued by durability problems.

Back to LMDB - i'll definitely give it a go for my revisioned database thing - at some point.

Tagged hacking, zedzone.
Thursday, 26 April 2018, 09:05

Rabbit Holes All The Way Down

I kept poking around the blog code over the last couple of days. It just keeps leading to more and more questions.

DBD

Tuesday I mostly spent re-brushing up on the C api for Berkeley DB and designing the schema to implement my version database using it. At some point since I last looked foreign key constraints must have been added so I implemented that - unfortunately unlike JE they don't support self-referential keys (where a field references the primary key of the same object) so I will have to code up a couple of cases for that manually. Actually i'm not sure I even need fully indexed key constraints as the database is designed never to have deletions. If I ever get that far i'll do some benchmarking to evaluate the tradeoffs, or decide how to do deletions.

During the journey I also discovered that at some point Berkeley DB JE changed licenses again - it had been AGPL3 last time I looked. Now it's changed to Apache. I wonder if this is another project soon to be abandoned to the ASF? Anyway it doesn't make much difference to my Free Software projects (not that I ever got far enough to publish any) but it'll be handy for work as i've wanted to use it pleny of times. It's about the only decent NoSQL DB left these days.

Uploading JavaScript

I pretty much detest JavaScript but I wanted to look at how to write some sort of web-based editor for writing posts and I don't really feel like writing yet another MIME parser to handle multipart/form. Well I probably will have to eventually (or likely dig one of the few i've already written back up) but in the mean-time I investigated direct uploads using XJAX.

Most results from searching turn up JQuery snot but I eventually found some raw JavaScript using XMLHttpRequest directly. Given it's only a few lines of code one has to wonder about these 'frameworks'. I digress. I played around a bit, extended my FastCGI library to support streaming stdin and wrote a basic REST-like `uploader' that can handle binary blobs directly without any messy protocol parsing. Yay. And then I fell down another hole ... how the fuck am I going to do security?

I don't really want to buy an SSL cert for this site but using a self-signed certificate isn't really any good. Without that pretty much any auth system is wildly insecure. I started looking into JavaScript libraries for crypto - some are a little over the top but there are a few smaller ones that might serve the purpose. Crypto has a lot of gotchas and one can't be an expert in everything so i'm not sure I want to start down what would be a very long and winding road just to post to a website.

So i'm toying with a few ideas. First just do nothing, stick to ssh and emacs for posting. If I ever bother with comments or feedback they can be anonymous and not require auth. Or instead of using JavaScript write a standalone Java editor / operator console that calls REST services. Or even using an ssh driven backend. This has some appeal personally but I'll see. Another is to use SSL + Digest Auth - this way I let the browsers and server handle all the complexity and get a mostly ok system. If I install my own CA on my local browser(s) and enforce client certificates from the server side, it should be reasonably secure.

Damn windy road already.

I need a real rest

My sleep has been particularly bad of late. The sleep apnoae is quite bad and I regularly (mostly) forget to wear the mouth splint which doesn't-treat-it-particularly-well-but-it's-better-than-nothing. At least I remembered last night.

Today I gotta try and do some hours for work though. At the moment i'm trying to decipher some statistical software written in matlab, which is about my most favouritist thing in the whole world. Fuck matlab.

Oh, I also bought some mice. I've got a couple of small 'travel mouse' mice that I much prefer to the standard fare and although they used to be easy to find they've become quite scarce around here. What ever happened to BenQ anyway? All the local retail only have microsoft or logitech or their own badged chinese crap now. Coordless also seem to have taken over (higher margins one suspects). I looked everwhere locally and on the usual suspects online but couldn't find anything decent. Oddly enough the ThinkPad one I already have was one of cheapest, and from the source, so I ordered a couple to tide me over for the forseeable future. On a whim I also added a wireless 'laser' one as well, although it's marginally larger.

Tagged hacking, zedzone.
Tuesday, 24 April 2018, 17:38

FastCGI experiments

It's not particularly important - i'm lucky to get more than one-non-bot hit in a given day - but I thought i'd have a look into FastCGI. If in the future I do use a database backend or even a Java one it should be an easy way to get some performance while leveraging the simplicity of CGI and leaving the protocol stuff to apache.

After a bit of background reading and looking into some 'simple' implementations I decided to just roll my own. The 'official' fastcgi.com site is no longer live so I didn't think it worth playing with the official sdk. The way it handled stdio just seemed a little odd as well.

With the use of a few GNU libc extensions for stdio (cookie streams) and memory (obstacks) I put together enough of a partial (but robust) implementation to serve output-only pages from the fcgid module in a few hundred lines of code.

This is the public api for it.

struct fcgi_param {
        char *name;
        char *value;
};

struct fcgi {
        // Active during cgi request
        FILE *stdout;
        FILE *stderr;

        // Current request info
        unsigned char rid1, rid0;
        unsigned char flags;
        unsigned char role;

        // Current request params (environment)
        size_t param_length;
        size_t param_size;
        struct fcgi_param *param;
        struct obstack param_stack;

        // Internal buffer stuff
        int fd;
        size_t pos;
        size_t limit;
        size_t buffer_size;
        unsigned char *buffer;
};

typedef int (*fcgi_callback_t)(struct fcgi *, void *);

struct fcgi *fcgi_alloc(void);
void fcgi_free(struct fcgi *cgi);

int fcgi_accept_all(struct fcgi *cgi, fcgi_callback_t cb, void *data);
char *fcgi_getenv(struct fcgi *cgi, const char *name);

I didn't bother to implement concurrent requests, the various access control roles, or STDIN messages. The first doesn't appear to be used by mod_fcgi (it handles concurrency itself) and I don't need the rest (yet at least). As previously stated I used GNU libc extensions to implement custom stdio streams for stdout and stderr, although I used a custom 'zero-copy' buffer implementation for the protocol handling (wherein the calls can access the internal buffer address rather than having to copy data around).

Converting a CGI program is a little more involved than using the original SDK because it doesn't hide the i/o behind macros or use global variables to pass information. Instead via a context-specific handle it provides stdio compatible FILE handles and a separate environmental variable lookup function. Of course it is possible to write a handler callback which can implement such a solution.

The main function of a the fast cgi program just allocates the context, calls accept_all and then free. The callback is invoked for each request and can access stdout/stderr from the context using stdio calls as it wishes.

Apache config

Here's the basic apache config snipped I used to hook it into `/blog' on a server (I did this locally rather than live on this site though).

        ScriptAlias /blog /path/fcgi-test.fcgi

        FcgidCmdOptions /path/fcgi-test MaxProcesses 1

        <Directory "/path">
                AllowOverride None
                Options +ExecCGI
                Require all granted
        </Directory>

Custom streams and cookies

Using a GNU extension it is trivial to hook up custom stdio streams - one gets all the benefits of libc's buffering and formatting and one only has to write a couple of simple callbacks.

#define _GNU_SOURCE

#include <sys/types.h>
#include <sys/uio.h>
#include <stdio.h>
#include <unistd.h>

static ssize_t fcgi_write(void *f, const char *buf, size_t size, int type) {
        struct fcgi *cgi = f;
        size_t sent = 0;
        FCGI_Header header = {
                .version = FCGI_VERSION_1,
                .type = type,
                .requestIdB1 = cgi->rid1,
                .requestIdB0 = cgi->rid0
        };

        while (sent < size) {
                size_t left = size - sent;
                ssize_t res;
                struct iovec iov[2];

                if (left > 65535)
                        left = 65535;

                header.contentLengthB1 = left >> 8;
                header.contentLengthB0 = left & 0xff;

                iov[0].iov_base = &header;
                iov[0].iov_len = sizeof(header);
                iov[1].iov_base = (void *)(buf + sent);
                iov[1].iov_len = left;
                
                res = writev(cgi->fd, iov, 2);
                if (res < 0)
                        return -1;

                sent += left;
        }

        return size;
}

static int fcgi_close(void *f, int type) {
        struct fcgi *cgi = f;
        FCGI_Header header = {
                .version = FCGI_VERSION_1,
                .type = type,
                .requestIdB1 = cgi->rid1,
                .requestIdB0 = cgi->rid0
        };
        if (write(cgi->fd, &header, sizeof(header)) < 0)
                return -1;
        return 0;
}
  

Well perhaps the callbacks are more `straightforward' than simple in this case. FastCGI has a payload limit of 64K so any larger writes need to be broken up into parts. I use writev to write the header and content directly from the library buffer in a single system call (a pretty insignificant performance improvment in this case but one nonetheless). I might need to handle partial writes but this works so far - in which case the writev approach gets too complicated to bother with.

The actual 'cookie' callbacks just invoke the functions above with the FCGI channel to write to.

  
static ssize_t fcgi_stdout_write(void *f, const char *buf, size_t size) {
        return fcgi_write(f, buf, size, FCGI_STDOUT);
}

static int fcgi_stdout_close(void *f) {
        return fcgi_close(f, FCGI_STDOUT);
}

const static cookie_io_functions_t fcgi_stdout = {
        .read = NULL,
        .write = fcgi_stdout_write,
        .seek = NULL,
        .close = fcgi_stdout_close
};
  

And opening a custom stream is as as simple as opening a regular file.

static int fcgi_begin(struct fcgi *cgi) {
        cgi->stdout = fopencookie(cgi, "w", fcgi_stdout);

        ...;

        return 0;
}
  

Example

Here's a basic example that just dumps all the parameters to the client. It also maintains a count to demonstrate that it's persistent.

I went with a callback mechanism rather than the polling mechanism of the original SDK mostly to simplify managing state. Shrug.

#include "fcgi.h"

static int cgi_func(struct fcgi *cgi, void *data) {
        static int count;

        fprintf(cgi->stdout, "Content-Type: text/plain\n\n");
        fprintf(cgi->stdout, "Request %d\n", count++);
        fprintf(cgi->stdout, "Parameters\n");
        for (int i=0;i<cgi->param_length;i++)
                fprintf(cgi->stdout, " %s=%s\n", cgi->param[i].name, cgi->param[i].value);

        return 0;
}

int main(int argc, char **argv) {
        struct fcgi * cgi = fcgi_alloc();
        
        fcgi_accept_all(cgi, cgi_func, NULL);
        
        fcgi_free(cgi);
}

Notes

I haven't worked out how to get the CGI script to 'exit' when the MaxRequestsPerProcess limit has been reached without causing service pauses. Whether I do nothing or whether I exit and close the socket at the right time it still pauses the next request for 1-4 seconds.

I haven't converted my blog driver to use it yet - maybe later on tonight if I keep poking at it.

Oh and it is quite fast, even with a trivial C program.

Tagged hacking, zedzone.
Tuesday, 19 December 2017, 20:26

jjmpeg, jni, javafx

So I guess the mood took me, I somehow ended poking away until the very late morning hours (4am) the last couple of nights hacking on jjmpeg. Just one more small problem to solve ... that never ended. Today I should've been working but i've given up and will write it off, it's nearly xmas break anyway so there's no rush, and i'm ahead of the curve anyway.

JJMediaReader

I got this ported over and playing video fairly easily, and then went through on a cleanup spree. I removed all the BufferedImage, multi-buffering, and scaling stuff and a few other experiments which never worked. Some api changes allowed me to consolidate more code into a base class, and some changes to AVStream necessitated a different approach to initialising the AVCodecContext (using AVCodecParameters). I made a few other little tweaks on the way.

The reason I removed the BufferedImage code is because I didn't want to pollute it with "platform specific" code. i.e. swing, javafx, etc. I've moved that functionality into a separate namespace (module?).

My first cut just took the BufferedImage code and put it into another class which provides the functionality by taking the current AVFrame from the JJMediaReader video stream. This'll probably do but when working on similar functionality for JavaFX I took a completely different approach - implementing a native PixelReader() so that the native code can decide the best way to write to the buffer. This is perhaps a little more work but is a lot cleaner to use.

swscale

jjmpeg1 lets you scale images 'directly' to/from primitive arrays or direct ByteBuffers in addition to AVFrame. Since they have no structure description (size, format), this either has to be passed in to the functions (messy) or stored in the object (also messy). jjmpeg1 used the latter option and for now I simply haven't implemented them.

The PixelReader mentioned above does implement it internally but for code re-use it might make sense to implement them with the structure information as explicit parameters, and use higher level objects such as PixelReader/Writer to track such information. On the other hand the native code has access to more information so it also makes sense to leave it there.

I went a bit further and created a re-usable super-class that does most of the work and toolkit specific routines only have to tweak the invocation. This approach hides libswscale behind another api. The slice conversions don't work properly but they're not necessary.

jni

So far I had public constructors and `finalisers' because otherwise the reflection code failed. That's a bit too ugly (and `dangerous') so I made them private. The reflection code just had to look up the methods and set them Accessible.

    Constructor cc = jtype.getDeclaredConstructor(Long.TYPE);

    cc.setAccessible(true);

    return cc.newInstance(p);

Whilst working on JJMediaReader I hit a snag with the issue of ownership. In most cases objects are either created anew and released (or gc'd) by the Java code, or are simply references to data managed elsewhere. I was addressing the latter problem by simply having an empty release() method for the instance, but that isn't flexible enough because some objects are created or referenced the the context determines which.

So I expanded the Java-side object tracking to include a `refer' method in addition to the 'resolve' method. `resolve' either creates a new instance or returns and existing one with a weak-reference object which will invoke the static release method when it gets finalised. `refer' on the other hand does the same thing but uses a different weak-reference object which does nothing.

I then noticed (the rather obvious) that if an object is created, it can't possibly 'go away' from the object tracking if it is still alive; therefore the `resolve' method was doing redundant work. So I created another `create' method which assumes the object is always a new one and simply adds it to the table. It can also do some checking but i'm pretty sure it can't fail ...

If on the other hand the underlying data was reference counted then the `resolve' method would be useful since it would be possible to lookup an existing object despite it being `released'. So i'll keep it in CObject.

As part of this change I also improved CObject in other ways.

I was storing the weak reference to the object itself inside the object so I could implement explicit release and to avoid copying the pointer. I removed that reference and only store the pointer now. The WeakReference it already tracked in a hash table so I just look it up if I need it. This lets me change the jni code to use a field lookup rather than a function call to retrieve it (I doubt it makes much perf difference but I will profile it at some point).

I also had some pretty messy "cross-layer" use of static variables and messy synchronisation code. I moved all map references to outside of the weak reference routine and use a synchronised map for the pointer to object table.

For explicit release I simply call .clear() and .enqueue() on the WeakReference - which seems to do the right thing, and simplifies the release code (at least conceptually) since it always runs on the same thread.

Tagged hacking, java, javafx, jjmpeg.
Sunday, 05 November 2017, 14:51

JNI and garbage collection

I've started on an article about creating garbage collectible JNI objects. This is based on the system used in zcl but simplified further for reuse by using the class object as the type specifier and binding release via static declared methods.

This also supports `safe' explicit release which may be required in some circumstances where the gc is not run often enough.

It should work well with the JVM as it uses reference queues and no finalize methods. It requires minimal "extra" application support - just a class specific release() method.

Read it here.

Tagged code, hacking, java.
Friday, 03 June 2016, 16:34

Using GNU make to build Java software

I finally finished writing an article about Java make i started some time ago, multiple times. I was going through cleaning up a new release of dez (still pending) and decided to fill it out with the junit stuff and then write it up what I actually ended up with.

The following few lines is now the complete makefile for dez. This supports `jar' (normal build target), `sources' (ide source jar), `javadoc' (ide javadoc jar), dist (complete rebuildable source), and now even `test' or `check' (unit and integration tests via JUnit 4) targets. The stuff included from java.make is reusable and is under 200 lines once you exclude voluminous comments and documentation.

java_PROGRAMS = dez

dez_VERSION=-1
dez_JAVA_SOURCES_DIRS=src
dez_TEST_JAVA_SOURCES_DIRS=test

DIST_NAME=dez
DIST_VERSION=-1.3
DIST_EXTRA=COPYING.AGPL3 README Makefile

include java.make

The article is over on my home page at Using GNU Make for java under my software articles section.

Tagged code, dez, gnu, hacking, java.
Sunday, 29 May 2016, 14:34

Images, Pixels, Java Streams

This morning I wrote and published article about writing an image container class for Java which supports efficient use of Streams. It is on my local home page under Pixels - Java Images, Streams.

Although there is much said of it, there is still quite a bit unsaid about how many wrong-footed experiments it took to accomplish the seemingly obvious final result. The code itself is now (or will be) part of an unpublished library I apparently started writing just over 12 months ago for reasons I can no longer recall. It doesn't have enough guts to make publishing it worthwhile as yet.

I'm also still playing with fft code and toying with some human-computer-interaction ideas.

Tagged code, hacking, java.
Older Posts
Copyright (C) 2018 Michael Zucchi, All Rights Reserved.Powered by gcc & me!