No nice things
So my experiment with bittorrent here has ended before it even
started. I was running transmission-daemon as the seeder - but
apart from a couple of short tests only had it running with no
torrents being seeded.
Despite that i'm getting 500MB+ traffic PER DAY! just from peer requests (I
guess, as there seems to be no way to find out what's going on
And the thing is I never publically released the torrent files it
was seeding so some port scraper has done it and added it to some
peer exchange. Despite blocking the port i'm still getting dozens
of incoming packets per minute but I guess they'll quieten down
Fortunately I don't pay for traffic.
Winter has hit here and along with insomnia i'm not really feeling
like doing much of an evening but i've dabbled a few times and
basically ported the Java version of a tree-revision database to
At this point i've just got the core done -
schema/bindings and most of the client api. I'm pretty sure it's
solid but I need to write a lot of testing and validation code to
make sure it will be reliable and performant enough, and then
write a bunch more to turn it into something interesting.
But i've been at a desk for 10 hours straight and my feet are icy
cold so it's not happening tonight.
Had a bug in my fastcgi code, that broke the blog for some web
clients depending on their ID string. It just happened to break
on mobile phones more often. Oops.
cdez + other stuff
I started porting dez to C to look
at using it here somewhere. Along the way I found a bug in the
matcher implementation but otherwise got very distracted trying to
gain a few neglible percent out of the delta sizes by manipulating
the address encoding mechanism.
I tried modifying the matcher in various ways - experimenting with
the hash table details. These involved including the hash value
(i.e. to reduce spurious string matching - it just slows it down) or
using a separate index table (no real difference). Probably the
most surprising was that the performance was already somewhat better
than covered in the dez benchmarks. Both considerably faster
processing and smaller generated deltas. I guess that must have
been an earlier implementation and I need to update them. For
example the bible compression test only takes 11 seconds and creates
a 1 566 019 byte delta - or 65% of the runtime at 90% of
the output size.
This insprired me to play with the
tunable - which sets how deep the hashtable chain gets before it
starts to throw away older values. Using a setting of 5 (32
depth) it just beats the previous published results but in only
0.7s - still somewhat slower than 0.1 for gzip but at least it's
not out of the range of practicality. This is where I found the
bug in the entry discard indexing which was an easy fix.
This does mean that the other timings I did are pretty much
pointless though - using a larger block search size than 1 just
produces so much worse results and it's still slower. I haven't
tried with a large source input string however, where a chain limit
will truncate the search space prematurely.
Then I spent way too much time and effort trying various address
encoding mechanisms to try to squeeze a little bit more out of the
algorithm. In the end although I managed to get about 2.5% best
case improvement in some cases I doubt it's really worth worrying
about. However some of the alternative address encoding schemes are
conceptually and mechanically simpler so I might use one of them
(and break the file format).
Because of all that faffing about I never really got very far with
the cdez conversion although I have the substring matcher
basically done which is the more complex part. The
encoding/decoding code is quite involved but otherwise
straightforward bit bashing.
Update I tried a different test - one where i simulated the
total delta size of encoding 180 revisions of jjmpeg development -
not a particularly active project but still a real one. The
original encoding is easily the best in this case.
For some reason the blog went offline for a few hours. It kept
getting segfaults in libc somewhere. All I did to fix it was
make install (which simply copied the binary into
the cgi directory and didn't rebuild anything) and it started
working again. Unfortunately I didn't think to preserve the binary
that was there to find out why it stopped working.
Something to keep an eye on anyway.
BDB | !BDB?
I mentioned a few posts ago that there doesn't seem to be many
NoSQL databases around anymore - at least last time I looked a
year or two ago, all the buzz from a decade ago had gone away.
Various libraries became proprietary-commercial or got abandoned.
For some reason I can't remember I went looking for BerkeleyDB
stackoverflow question which points to some of them.
So I guess I was a little mistaken, there are still a few around,
but not all are appropriate for what I want it for:
- Unstructured ones are a pain to use;
- Many don't do full ACID;
- Most don't handle multi-process concurrency; or
- Written in exotic languages i'm not interested in having a
I guess the best of those is LMDB - i'd come across it whilst
using Caffe but never looked into it. Given it's roots in
replacing BDB it has enough similarities in API and features to be
a good match for what I want (and written in a sane language)
although a couple of niggles exist such as the lack of sequences
and all the fixed-sized structures (and database size). Being a
part of a specific project (OpenLDAP) means it's hit maturity
without features that might be useful elsewhere.
The multi-version concurrency control and so on is pretty neat
anyway. No transaction logs is a good thing. If I ever get time
I might play with those ideas a little in Java - not because I
necessarily think it's a great idea but just to see if it's
possible. I played with an extensible hash thing for indexing in
camel many years ago but it was plagued by durability problems.
Back to LMDB - i'll definitely give it a go for my revisioned
database thing - at some point.
https, TLS upgrade
Ahah, so it seems things have changed a bit since last I looked
into certificates and certificate authorities - and even then I
was looking into code and email signing certs anyway.
After a short poke around I quickly became aware of the
Let's Encrypt project which
provides automated and free server domain certificates. It can be
automated because you control the server and part of the issuing
process creates temporary server resources that the signer can
cross-check. And all the certs are created locally.
So after a bit of fudging around with
the C-based acme
client and some apache config I got it all turned on and
(compatible) browsers automagically redirecting to the TLS
I didn't want to go with the offical CertBot because python isn't
otherwise installed on this server and I didn't want to drag all
that snot in for no other reason.
Because the acme-client is a little out of date I had to pass it a
few extra parameters to make it create certificates (and had to do
some small porting related changes to it using libressl rather than
zedzone.space www.zedzone.space code.zedzone.space
Once created a daily cron job runs it (without the -vNn options)
which requests new certificates if the old ones are within a month
of their expirey date (since the Let's Encrypt certificates only
last for 90 days).
I then added a https server config:
Header always set Strict-Transport-Security "max-age=31536000"
Header always set Content-Security-Policy upgrade-insecure-requests
And finally another header to the main server which tells
compatible clients to upgrade to use https. This can be a bit odd
on the first access but thereafter it does the right thing. I
Header always set Content-Security-Policy upgrade-insecure-requests
I didn't want to use a rewrite rule because at the moment I want
to keep both url's active, but i might change that in the future.
It seems like it might be useful - on the other hand any client
anyone is likely to use will support TLS wont it?
I've left code.zedzone.space unencrypted for now (even
though it's currently the only part of the site that can be logged
into!) because I need to check things work with virtual
servers on https first and more importantly i'm too hungover to
care this fine yet overcast afternoon!
Update: For what it's worth, the server gets an A+ rating
SSL Server Test at the time of posting. Although to get the
score above B required a few mod_ssl config changes.
Rabbit Holes All The Way Down
I kept poking around the blog code over the last couple of days.
It just keeps leading to more and more questions.
Tuesday I mostly spent re-brushing up on the C api for Berkeley DB
and designing the schema to implement my version database using
it. At some point since I last looked foreign key constraints
must have been added so I implemented that - unfortunately unlike
JE they don't support self-referential keys (where a field
references the primary key of the same object) so I will have to
code up a couple of cases for that manually. Actually i'm not
sure I even need fully indexed key constraints as the database is
designed never to have deletions. If I ever get that far i'll do
some benchmarking to evaluate the tradeoffs, or decide how to do
During the journey I also discovered that at some point Berkeley
DB JE changed licenses again - it had been AGPL3 last time I
looked. Now it's changed to Apache. I wonder if this is another
project soon to be abandoned to the ASF? Anyway it doesn't make
much difference to my Free Software projects (not that I ever got
far enough to publish any) but it'll be handy for work as i've
wanted to use it pleny of times. It's about the only decent NoSQL
DB left these days.
write some sort of web-based editor for writing posts and I don't
really feel like writing yet another MIME parser to handle
multipart/form. Well I probably will have to eventually (or
likely dig one of the few i've already written back up) but in the
mean-time I investigated direct uploads using XJAX.
Most results from searching turn up JQuery snot but I eventually
it's only a few lines of code one has to wonder about these
'frameworks'. I digress. I played around a bit, extended my
FastCGI library to support streaming stdin and wrote a basic
REST-like `uploader' that can handle binary blobs directly without
any messy protocol parsing. Yay. And then I fell down another
hole ... how the fuck am I going to do security?
I don't really want to buy an SSL cert for this site but using a
self-signed certificate isn't really any good. Without that
pretty much any auth system is wildly insecure. I started looking
top but there are a few smaller ones that might serve the purpose.
Crypto has a lot of gotchas and one can't be an expert in
everything so i'm not sure I want to start down what would be a
very long and winding road just to post to a website.
So i'm toying with a few ideas. First just do nothing, stick to
ssh and emacs for posting. If I ever bother with comments or
feedback they can be anonymous and not require auth. Or instead
console that calls REST services. Or even using an ssh driven
backend. This has some appeal personally but I'll see. Another is
to use SSL + Digest Auth - this way I let the browsers and server
handle all the complexity and get a mostly ok system. If I
install my own CA on my local browser(s) and enforce client
certificates from the server side, it should be reasonably secure.
Damn windy road already.
I need a real rest
My sleep has been particularly bad of late. The sleep apnoae is
quite bad and I regularly (mostly) forget to wear the mouth splint
At least I remembered last night.
Today I gotta try and do some hours for work though. At the moment
i'm trying to decipher some statistical software written in
matlab, which is about my most favouritist thing in the whole
world. Fuck matlab.
Oh, I also bought some mice. I've got a couple of small 'travel
mouse' mice that I much prefer to the standard fare and although
they used to be easy to find they've become quite scarce around
here. What ever happened to BenQ anyway? All the local retail
only have microsoft or logitech or their own badged chinese crap
now. Coordless also seem to have taken over (higher margins one
suspects). I looked everwhere locally and on the usual suspects
online but couldn't find anything decent. Oddly enough
one I already have was one of cheapest, and from the source, so I
ordered a couple to tide me over for the forseeable future. On a
whim I also added
'laser' one as well, although it's marginally larger.
Further the previous post I did end up porting my blog driver to
my fastcgi implementation.
Benchmarking using `ab' from home it doesn't really make any
difference reading the front page of the blog - if anything it's
actually marginally slower.
Running the benchmark locally though things are quite different.
Previous standard cgi:
Concurrency Level: 1
Time taken for tests: 13.651 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 37100000 bytes
HTML transferred: 36946000 bytes
Requests per second: 73.25 [#/sec] (mean)
Time per request: 13.651 [ms] (mean)
Time per request: 13.651 [ms] (mean, across all concurrent requests)
Transfer rate: 2654.05 [Kbytes/sec] received
Concurrency Level: 1
Time taken for tests: 0.706 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 37062000 bytes
HTML transferred: 36908000 bytes
Requests per second: 1416.39 [#/sec] (mean)
Time per request: 0.706 [ms] (mean)
Time per request: 0.706 [ms] (mean, across all concurrent requests)
Transfer rate: 51264.00 [Kbytes/sec] received
So yeah, only 20x faster. If I up the concurrency level of the
benchmark it gets better but it's hard to tell much from it since
everything is running on the same machine.
Regardless, I made it live.
Copyright (C) 2018 Michael Zucchi, All Rights Reserved.Powered by gcc & me!