August 20, 2010 分类: ASM/C/C++, Linux     作者: hoverlees     留言: 发表

最近有个项目,我使用NASM编写的,运行在32位windows和linux主机上,但后来需求增加了,需要在64位windows和linux上运行,windows自身有个wow(windows on windows)机制,32位程序根本不用移植就能在64位机器上跑,而linux虽然没有LOL机制(是Linux on linux,不是laugth out loud哈,呵呵 ~),但linux 可以安装ia-libs库(ia 应该是 Intel x86 Archive的简写)达到LOL效果,不过,编译ELF64和WIN64OBJ也是我比较感兴趣的,所以我要移植程序!

首先是了解CPU,寄存器,基本上所有的32位寄存器都升级了,eax变成了rax,ebx变成了rbx,等等,它们带宽变长了,用起来自然也爽了,一次处理 8个字节,一步可以做很多以前需要几步的操作了。寄存器增加了r8,r9,r10,r11,r12,r13,r14,r15,这么多寄存器,又要少用多 少内存做中间变量,效率又高了,可以自己保存使用的是r12-r15,以前一般只有esi,edi,ebx三个寄存器用作自己保存,现在可好了,有 r12-r15,rbx,一共有5个!为什么没有rsi和rdi?问得好,在Linux系统里,这两个寄存器在64位CPU上用作参数传递,所以它们一般不用作保存了,但 是,rsi,rdi这两个寄存器还是很重要的,lodsb,stosb之类的指令还是得用rsi,rdi保存源地址和目的地址。这点,我觉得做得很不好, 为什么不拿新加的寄存器来传参数,偏要用到我心爱的rsi和rdi寄存器呢。。。我不会做CPU,我还不能抱怨啊!抱怨归抱怨,这种情况下,要方便移植,最好就是不要用lodsb之类的指令,而是直接用基址加变址的方式访问内存。

接下来是函数调用,Unix 64 ABI规定使用rdi,rsi,rdx,rcx,r8,r9来传递前6个参数,少于6个的,按上面的顺 序,要几个就用几个,超过6个的,前6个按上面的顺序放入寄存器,剩下的从后向前压入堆栈,然后,设置rax=0,最后使用call指令调用函数,如果超 过6个参数,函数返回后需要修复堆栈,你以前压入了几个参数,就把栈顶指针回移 几*8 个字节,以平衡堆栈。注意的是Windows的ABI规定又不一样了!

另外64位CPU不支持将32位寄存器直接入栈,所以,不好意思,你的push eax 不能用了,使用push rax,pop rax。不过,直接操作堆栈指针rsp/esp是一种可同时在32位和64位CPU上编译通过,且不会出问题的方式,而且要连续push多个数值时(比如函数调用),往往一次性减掉esp/rsp,再用基址加变址的形式存参数,会比一个一个push参数的效率高!GCC进行API调用的时候就是这么实现的,所以其实写汇编是不如用gcc的,一不注意,GCC编译的C程序都会比汇编写的程序效率还高。我一般正式的项目都是用C语言的,但NASM可以让我了解得更深,这点是无话可说的!!

而自己实现的函数,还是可以用以前的c-call方式,如下:

Function:
%define param1   rbp+16
%define param2   rbp+24
%define param3   rbp+32
enter 16,0
%define local1   rbp-8
%define local2   rbp-16
;.....
leave
ret

最后,就是在移植时困扰了我的问题,就是C函数的返回值,64位CPU中C函数的返回值不是在rax中,而是在edx:eax中。其实大多数函数都没问题, 一般在返回-1的时候,这个问题就出来了,edx:eax是-1,但是rax不是-1,高32位全是0.低32位全是1。。

现在时间不多,下次再写一篇文章详细讨论。

结束之前,引用C语言的部分文档。

==========================================

Interfacing HLL code with asm

C calling convention – standard stack frame

Arguments passed to a C function are pushed onto the stack, right to left, before the function is called. The first thing the called function does is push the (E)BP register, then copy (E)SP into it. This creates a data structure called the standard C stack frame.

32-bit code 16-bit code, TINY, SMALL, or COMPACT memory models 16-bit code, MEDIUM, LARGE, or HUGE memory models
Create standard stack frame, allocate 16 bytes for local variables, save registers push ebp

mov ebp,esp

sub esp,16

push edi

push esi

push bp

mov bp,sp

sub sp,16

push di

push si

push bp

mov bp,sp

sub sp,16

push di

push si

Restore registers, destroy stack frame, and return

pop esi

pop edi

mov esp,ebp

pop ebp

ret

pop si

pop di

mov sp,bp

pop bp

ret

pop si

pop di

mov sp,bp

pop bp

retf

Size of ‘slots’ in stack frame, i.e. stack width 32 bits 16 bits 16 bits
Location of stack frame ‘slots’ [ebp + 8]
[ebp + 12]
[ebp + 16]…
[bp + 4]
[bp + 6]
[bp + 8]…
[bp + 6]
[bp + 8]
[bp + 10]…

If an argument passed to a function is wider than the stack, it will occupy more than one ‘slot’ in the stack frame. A 64-bit value passed to a function (long long or double) will occupy 2 stack slots in 32-bit code or 4 stack slots in 16-bit code.

Function arguments are accessed with positive offsets from the BP or EBP registers. Local variables are accessed with negative offsets. The previous value of BP or EBP is stored at [bp + 0] or [ebp + 0]. The return address (IP or EIP) is stored at [bp + 2] or [ebp + 4].

C calling convention – return values

A C function usually stores its return value in one or more registers.

32-bit code 16-bit code, all memory models
8-bit return value AL AL
16-bit return value AX AX
32-bit return value EAX DX:AX
64-bit return value EDX:EAX space for the return value is allocated on the stack of the calling function, and a ‘hidden’ pointer to this space is passed to the called function
128-bit return value hidden pointer hidden pointer

C calling convention – saving registers

GCC expects functions to preserve the callee-save registers:

EBX, EDI, ESI, EBP, DS, ES, SS

You need not save these registers:

EAX, ECX, EDX, FS, GS, EFLAGS, floating point registers

In some OSes, FS or GS may be used as a pointer to thread local storage (TLS), and must be saved if you modify it.

C calling convention – leading underscores

Some C compilers (those for DOS and Windows, and those with COFF output) prepend an underscore to the names of C functions and global variables. If a C global variable, e.g. conv_mem_size, is accessed by asm code, it should be declared with a leading underscore in the asm code:

EXTERN _conv_mem_size      ; NASM syntax

mov [_conv_mem_size],ax

Linux ELF does NOT use underscores. Watcom C uses trailing underscores for function names, and leading underscores for global variables.

If your GCC supports it, leading underscores can be turned off with the compiler option -fno-leading-underscore

Pascal calling conventions

Function arguments are pushed onto the stack from left to right before the function is called. C-style variable-length argument lists are not possible in Pascal. (Look in file STDARG.H and think about it.)

In C, the calling function must ‘clean up the stack’ (remove function arguments from the stack after the called function returns). In Pascal, the called function must do this, before returning.

Pascal identifiers are case-insensitive. MyKewlProc() will be stored in the object code file as MYKEWLPROC

Other calling conventions

The __stdcall calling convention, used by Windows, is a hybrid of the C and Pascal calling conventions. Like C, function arguments are pushed right-to-left. Like Pascal, the called function must clean up the stack. Exception: the caller must clean up the stack for functions that accept a variable number of arguments, e.g. printf(const char *format, …);

Watcom C uses a register-based calling convention. See sections 7.4, 7.5, 10.4, and 10.5 in cuserguide.pdf in the Watcom documentation. Individual functions can be declared to use the normal, stack-based calling convention.

GCC can be made to use a register calling convention by compiling with gcc -mregparm=NNN …
See the GCC documentation for details.

标签: , , , ,
我来留个言

您的电子邮箱我一定会保密的哦!

昵称

邮箱

评论内容