About

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as zed
  & handle of notzed

Tags

android (44)
beagle (63)
biographical (87)
blogz (7)
business (1)
code (63)
cooking (30)
dez (7)
dusk (30)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (434)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (224)
java ee (3)
javafx (48)
jjmpeg (77)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (8)
opencl (119)
os (17)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
wanki (3)
workshop (3)
zcl (1)
zedzone (21)
Sunday, 01 June 2014, 22:09

note to self ...

Remember you need to save the status register before you execute an add.

I went about coding up that atomic counter idea from the previous post but had a hard time getting it to work. After giving up for a bit I had another go (couldn't let it rest as it was so close) and I discovered the register-saving preamble i'd copied from the async dma isr was simply broken.

_some_isr:
        sub     sp,sp,#24
        strd    r0,[sp]
        strd    r2,[sp,#1]
        strd    r4,[sp,#2]

        movfs   r5,status

... routine

        movts   status,r5

        ldrd    r4,[sp,#2]
        ldrd    r2,[sp,#1]
        ldrd    r0,[sp],#3
        rti

Well it certainly looked the part, but it clobbers the status register in the first instruction. Oops.

        strd    r4,[sp,#-1]
        movfs   r5,status
        sub     sp,sp,#24
        strd    r0,[sp]
        strd    r2,[sp,#1]

There goes the symmetry. Actually it could probably just leave the stack pointer where it is because the routine isn't allowing other interrupts to run at the same time.

Fixed, the atomic counter worked ok. It shows some ... interesting ... behaviour when things get busy. It's quite good at being fair to all the cores except the core on which the atomic counter resides. Actually it gets so busy that the core has no time left to run any other code. The interrupt routine is around 200 cycles for 16 possible clients.

So ... I might have to try a mutex implementation and do some profiling. Under a lot of contention it may still be better due to the fixed bandwidth requirements but it's probably worth finding out where the crossover point is and how each scales. A mutex implementation wont support host access either ... always trade-offs. Definitely a candidate for FPGA I guess.

Tagged hacking, parallella.
atomic counters | ipc primitive ideas
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!