About

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as zed
  & handle of notzed

Tags

android (44)
beagle (63)
biographical (87)
blogz (7)
business (1)
code (63)
cooking (30)
dez (7)
dusk (30)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (434)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (224)
java ee (3)
javafx (48)
jjmpeg (77)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (8)
opencl (119)
os (17)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
wanki (3)
workshop (3)
zcl (1)
zedzone (21)
Tuesday, 12 August 2014, 10:17

asm vs c II

I dunno, i'm almost lost for words on this one.

typedef float float4 __attribute__((vector_size(16))) __attribute__((aligned(16)));

void mult4(float *mat, float4 * src, float4 * dst) {
 dst[0] = src[0] + mat[0];
}
notzed@minized:src$ make simd.o
arm-linux-gnueabihf-gcc -c -o simd.o simd.c -O3 -mcpu=cortex-a9 -marm -mfpu=neon
notzed@minized:$ arm-linux-gnueabihf-objdump -dr simd.o

simd.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 :
   0:   f4610aef        vld1.64  {d16-d17}, [r1 :128]
   4:   ee103b90        vmov.32  r3, d16[0]
   8:   edd07a00        vldr     s15, [r0]
   c:   e24dd010        sub      sp, sp, #16
  10:   ee063a10        vmov     s12, r3
  14:   ee303b90        vmov.32  r3, d16[1]
  18:   ee063a90        vmov     s13, r3
  1c:   ee113b90        vmov.32  r3, d17[0]
  20:   ee366a27        vadd.f32 s12, s12, s15
  24:   ee073a10        vmov     s14, r3
  28:   ee313b90        vmov.32  r3, d17[1]
  2c:   ee766aa7        vadd.f32 s13, s13, s15
  30:   ee053a90        vmov     s11, r3
  34:   ee377a27        vadd.f32 s14, s14, s15
  38:   ee757aa7        vadd.f32 s15, s11, s15
  3c:   ed8d6a00        vstr     s12, [sp]
  40:   edcd6a01        vstr     s13, [sp, #4]
  44:   ed8d7a02        vstr     s14, [sp, #8]
  48:   edcd7a03        vstr     s15, [sp, #12]
  4c:   f46d0adf        vld1.64  {d16-d17}, [sp :64]
  50:   f4420aef        vst1.64  {d16-d17}, [r2 :128]
  54:   e28dd010        add      sp, sp, #16
  58:   e12fff1e        bx       lr
notzed@minized:/export/notzed/src/raster/gl/src$ 

I thought that the store/load/store via the stack was a particularly cute bit of work, especially given the results were already in the right order and in adequately aligned registers. r3 also seems a little too popular.

I guess the vector extensions to gcc just aren't finished - or just don't work. Maybe I used the wrong flags or my build is broken. It produces similar junk code for the epiphany mind you. I've never really tried using them but after a bunch of OpenCL in the past I thought it might be worth a shot to access SIMD without machine code.

My NEON is very rusty but I think it could be something like this:

notzed@minized:src$ arm-linux-gnueabihf-objdump -dr neon-mat4.o

neon-mat4.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 :
   0:   f4a02caf        vld1.32  {d2[]-d3[]}, [r0]
   4:   f4210a8f        vld1.32  {d0-d1}, [r1]
   8:   f2000d42        vadd.f32 q0, q0, q1
   c:   f4020a8f        vst1.32  {d0-d1}, [r2]
  10:   e12fff1e        bx       lr

As can be seen from the names I started with a "simple" matrix multiply but whittled it down to something I thought the compiler could manage after seeing what it did to it - this is just a meaningless snippet.

After a pretty long day at work I was just half-heartedly poking at filling out the frontend to the epiphany gpu but just got distracted by whining at the compiler again. I should've just started with NEON, after a little poking I remembered how nice it was.

Tagged hacking, rants.
epiphany gpu and bits | And I thought I hated tooltips ...
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!