Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as zed
  & handle of notzed


android (44)
beagle (63)
biographical (102)
blogz (9)
business (1)
code (74)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (455)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (24)
Tuesday, 30 March 2010, 12:33

Damn numbers

Hmm, that was frustrating.

Have been trying to write a `kernel' boot header - one that sets the MMU up for the kernel to execute at another address (0xC0000000) and then jumps to it. Been very tired from sleeping poorly and a bit brain-dead after work so I haven't been really switched on, but it's been dragging on so much I was about to give up (well not really, but it felt like I should).

Apart from a few little bugs, i was using the wrong TEXCB/AP flags for the level 2 page entry for devices ... but I don't know why it's wrong. It seems to check out in the manual, but for whatever reason it just crashes the code (FWIW I was using 0xb2 - 'non sharable device, rw everyone' rather than '0x16' 'sharable device, rw supervisor only). Blah. One little number change and now it works. $@%!$#

I plan to use the two translation table mode, which means the system memory will start at 0x80000000 - so it may make sense to just identity map the kernel at that address. But for now the memory map will have the kernel at 0xc0000000, and i'll start shared libraries or something else at 0x80000000.

So here it is ... in hindsight I may have done things in the wrong order, but this way makes things easy. I set aside some memory in the BSS section for the page tables and let the linker manage allocating space for them, also for the I/O devices - although this means a couple of physical pages are lost at present.

There is a few little `tricks' that I use so the code is position independent, although there are possibly better ways to do it. The init code has to be position independent because the linker script is set up so that all the code starts from the same virtual address - it could be done otherwise, but then I would need an ELF loader to relocate the image - which is somewhat more work.

        adr     r12,_start              @ this will be physical load address
        mov     sp,r12
        push    { r0 - r3 }

First I just setup r12 and the stack to point to our load address - which is 0x80008000 as set by the linker script. This gives the code a fixed location from which to calculate physical and virtual addresses. The incoming arguments are saved too - although nothing uses them yet (das u-boot can pass in arguments or information about modules or filesystems it preloaded into memory).

        ldr     r1,bss_offset
        ldr     r2,bss_offset+4
        add     r1,r1,r12
        add     r2,r2,r12
        mov     r0,#0
1:      str     r0,[r1],#4
        cmp     r1,r2
        blo     1b

Clear the BSS - the code reads a relative offset that the linker creates, that indicates where the BSS starts and stops, and then uses r12 to map that to the physical address. The ldr r1,bss_offset is assembled into a pc-relative instruction so will work no-matter where it's loaded.Then there is a loop which uses a table to initialise the page tables. I first need to find the space within the BSS where it is stored, and then iterate through the entries. Each range is defined by a virtual target address, a start offset relative to _start, a virtual end address, and the `small page' flags for the pages.

        ldr     r11,ttb_offset
        add     r11,r12                 @ physical address of kernel_ttb
        add     r10,r11,#16384          @ same for kernel_pages

        adr     r9,ttb_map
        mov     r8,#ttb_size
1:      ldm     r9!, { r4, r5, r6, r7 } @ virtual dest, start offset, virtual end, flags
        add     r5,r12                  @ physical address

2:      mov     r3,r4,lsr #20
        ldr     r2,[r11, r3, lsl #2]
        cmp     r2,#0

If the l2 page isn't set yet, then just allocate one and update the l1 entry.

        moveq   r2,r10
        addeq   r10,#1024
        orreq   r2,#1
        streq   r2,[r11, r3, lsl #2]

Form and store the l2 page table entry.

        bic     r2,#0xff                        @ r2 = physical address of l2 page
        mov     r1,r4,lsr #12
        and     r1,#0xff
        orr     r0,r5,r7
        str     r0,[r2, r1, lsl #2]

And then loop for all the pages and all the entries in the table. Here I compare for equality for the end address - I do this so I could map the last page of memory if I wanted to. But currently I don't use this.

        add     r4,#4096
        add     r5,#4096
        cmp     r4,r6
        bne     2b

        subs    r8,#1
        bne     1b

That's really the meat of it - the table has the smarts in it, and uses the linker to create the interesting values required.Then it just turns on the MMU - this could probably be simplified as I can just enforce the state I want (i.e. don't bother preserving bits). Putting 1 in CP15_TTBCR means that two page tables are used, the TTBR1 table is used for any address with the top bit set (i.e. >= 0x80000000).

        mrc     15, 0, r0, CP15_SCTLR
        bic     r0,#SCTLR_ICACHE
        mcr     p15, 0, r0, CP15_SCTLR

        mov     r0,#0
        mov     r1,#1

        mcr     p15, 0, r0, CP15_TLBIALL
        mcr     p15, 0, r1, CP15_TTBCR          @ Top 2G uses TTBR1   
        mcr     p15, 0, r11, CP15_TTBR0
        mcr     p15, 0, r11, CP15_TTBR1
        mcr     p15, 0, r0, CP15_TLBIALL
        sub     r0,#1
        mcr     p15, 0, r0, CP15_DACR

        pop     { r0 - r3 }

        mrc     15, 0, r8, CP15_SCTLR
        orr     r8,#SCTLR_MMUEN
        mcr     p15, 0, r8, CP15_SCTLR

This last instruction turns the MMU on (and will probably eventually turn on the caches/etc). The input arguments are restored before turning on the MMU since the stack memory will no longer be valid or mapped (actually I should probably map the same 32K to the system stack wherever I decide to put that). The CPU now flushes the pipeline and starts executing instructions from the current pc - but with the MMU on. Because of this the code has to ensure this instruction is still mapped to the same address otherwise it's a one-way trip to la-la land.In this case the ldr pc,=vstart will force the assembler to generate a constant load from the constant pool (via a pc-relative load). The linker will set this constant up to point to the virtual address properly.

        ldr     pc, =vstart

Now come the relative offsets used to locate the BSS range, as well as the page table memory from within BSS.

        .word   __bss_start__ - _start
        .word   __bss_end__ - _start
        .word   kernel_ttb - _start

And then the important stuff - the page table mapping descriptions. Rather than store the 'virtual end' address it could probably store the length of the address range, but so long as they are aligned properly it doesn't really make much difference. Note that even with the relative addresses any range in memory can be accessed using the simple arithmetic that the linker supports.

        @ this page, so mmu can be enabled
        .word   LOADADDR, 0, LOADADDR + start_sizeof, CODE
        @ kernel text at virt address
        .word   __executable_start, 0, __data_start__, CODE
        @ kernel data
        .word   __data_start__, __data_start__-_start, __bss_end__,DATA
        @ system stack, 32K, 4K from end of memory
        .word   0 - 32768 - 4096, 0x8000000 - LOADADDR, 0-4096, DATA
        @ i/o of gpio, for debug too (LEDs!)
        .word   GPIO5, 0x49056000 - LOADADDR, GPIO5+4096, NDEV
        @ do serial port too, for debug stuff
        .word   UART3, 0x49020000 - LOADADDR, UART3+4096, NDEV

        .set    ttb_size, (. - ttb_map) / 16

The .ltorg ensures the constant pool is stored at this point, so we can guarantee they are within the one page which needs to be identity mapped immediately after turning on the MMU.

        ldr     sp,=-4096                       @ init stack
@       bl      __libc_init_array               @ static intialisers
        mov     r8,#(0xf<<20)                   @ enable NEON coprocessor access (still off though)
        mcr     p15, 0, r8, c1, c0, 2
        b       main

And this is the 'virtual address' entry point. This could just occur immediately after the setup code, but separating it makes it more obvious it's separated. About the only necessary setup is the (system) stack pointer. I was going to place this at the end of the virtual memory but having it one page back protects from stack underflow as well.

And finally there is the size of this code, and the BSS which stores the bare minimum so I can set it up and see it works (i.e. the UART or blink the LEDs).

        .set    start_sizeof, ((. - _start)+4095) & 0xfffff000

        .balign         16384
        .global kernel_ttb, kernel_pages, UART3
        .skip   16384
        .skip   1024*32
GPIO5:  .skip   4096
UART3:  .skip   4096

And ... it's done. Phew.

Unfortunately this means all my 'library code' that uses fixed physical addresses wont work any more, including the debug printing stuff. But that's something to worry about later.

One goal I had was that code isn't just setting up the page table to be thrown away later - this is sufficient to remain the kernel page table forever. Either for a supervisor level kernel process/threads, or for in this case as the `system page table' which is used for any address above 0x80000000. It still needs a little tweaking - the page table should be write-through cache-able for instance - but now it works I can worry about the details. Well now hopefully I can move on to more interesting things.

Interpolating arbitrary values

For work I have been playing with a few things of some interest. I thought I needed a function that could interpolate a set of values spread across an arbitrary 2d plane into a grid of values. I came across this interesting implementation of Thin Plate Splines which seemed to do the job. Unfortunately it turned out that I needed to interpolate more values than is practical with this algorithm (it does it, it just takes too long), and I can probably just force the values to be in a grid anyway so I can use much simpler methods. But still, this is an interesting algorithm to have in the toolkit and it produces pleasant looking results. Interestingly I found the C++ 'ludecomposition' code too messy to convert to C (i'm using different data structures) and just used the Java one it references as a starting point instead. It was much more C-like and translated in a very straightforward manner.

So I wrote a basic bicubic interpolater - the code uses bilinear at the moment although in an inconsistent way which doesn't really work since values can be missing. I was hoping bicubic would be a more natural fit for what it is doing, and worry about the missing values later. Unfortunately it doesn't seem to help much - the input data is just too noisy/inconsistent so I guess there is more to fix first (sorry this doesn't make much sense, I can't really say what it's trying to do).

Walls, dirt

I have some photo's of the progress on the retaining walll but i'm too lazy to put them up today. I got some ag-pipe on the weekend, so I'm just about ready to back-fill at least some of the wall (i don't think I have enough gravel to do the whole lot, but i'll see), although I'm not sure where to run it - and an outlet mid-way along the wall i've already laid will be a bastard! I was going to have it coming out the ends but now i'm not so sure. I need to decide so I can get the right fittings too (which for some reason are rather expensive for what they are).

Boral are having a sale on bricks and whatnot this week so I went and ordered another pile of retaining wall blocks (40% off makes it worth it, even if I don't need them for a while). I wasn't really sure how many I needed to start with, and I used a lot more than I thought originally (just the main wall uses most of them). I have a better plan on what I want to end up with now, so hopefully I got it right ... I guess I can always put them around trees or something if I have too many, or create a lower wall if I don't have enough.

Since I wont need to use them for a while i'm going to try to get them delivered into the driveway - so I don't have to move them off the verge by hand. So today I also moved the rest of the roadbase off of the drive-way to a pile out the back. Unfortunately I overloaded my cheap wheelbarrow and it turned over and I bent the handle (well it was only $60), but it's still usable. If I get stuck into finishing off the walls around the paving area it will get used up pretty fast anyway - of the 3 tons I probably have under 1 left. I'll get the bricks before easter, so it could be a very long long weekend if I get stuck into it ...

Tagged beagle, biographical, hacking.
Bye bye CELL | Well said.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!