Click to See Complete Forum and Search --> : CPU Registers


Charred_Phoenix
06-09-2003, 01:00 AM
What is %eax for, here is my current (rather limited) understanding of registers:

%esp - stack pointer, points to last piece of data on the stack.

%ebp - frame pointer, used to reference local variables.

%eax - annoying register that pops up every so often to confuse me.

Strogian
06-09-2003, 09:53 AM
EAX, EBX, ECX, and EDX are all 32-bit general-purpose registers. I'm not really sure what you're asking, though. :) Basically, there are opcodes to put data into these registers, opcodes to take data out of them, opcodes to do arithmetic on them, and you can probably even use them for addressing. I've never actually programmed in assembly, but I've read about it. :D

Basically, they're just there for the programmer to do whatever he needs to do.

You do know that a CPU needs to pull data out of memory (into a register) in order to work on it, right?

Stuka
06-09-2003, 09:55 AM
In a Pentium-class processor, you have several registers:
ESP - stack pointer
EBP - frame pointer (usually)
EAX - general purpose register, usually holds return values for non-void functions
EBX - general purpose register, traditionally used as an array index
ECX - general purpose register, traditionally used for loop counter
EDX - general purpose register
EIP - instruction pointer register, read-only for the programmer
ESI - general purpose register, used in string instructions as the 'source index'
EDI - general purpose register, used in string instructions as the 'destination index'
CS - code segment register for 16 bit code
DS - data segment register for 16 bit code
ES - 'extra' segment register for 16 bit code

There are, IIRC, a couple other general purpose registers(FS and GS), but like DS and ES, these are primarily used in 16 bit real-mode programming. There are also MMX registers, and FP registers, but I've never dealt with them.

infidel
06-09-2003, 10:09 AM
Greetings,

Intel x86 has 4 general purpose registers: EAX, EBX, ECX and EDX. EAX is used very often because most OS's require that the function code for the system call be passed in EAX. Even most languages, for example C/C++, whenever you declare a function like: 'int function(...)', the int that is returned comes in the EAX register.
If you are interfacing assembly with C code,, most compilers require that EBX, EBP and some other that i can't remember be preserved between function calls.
ECX is a general purpose register like the others, except when using instructions like 'loop', 'rep movsb', the number of iterations you want the loop to execute must come in ECX.
EDX, as far as i can tell does not have any special case like the other general purpose registers.
The other registers:
EBP - base/frame pointer;
EDI - Destination Index in the string opeartions: movsb, movsw, movsd, etc..., but can be used as a general purpose register most of the time;
ESI - Source index same as EDI, but it points to the source in the string operations;
ESP - Stack pointer;
EIP - Instruction pointer. Points to the current instruction being executed by the CPU. Cannot be changed directly. (Actually you can, with jump, call, int, ret, etc, but not with movl <some_val>, %eip).

Segment registers (16-bit):
CS - Code Segment. It holds a value from the GDT/LDT (not sure about the LDT) that describes the segment of memory where you program is running, privilegdes, etc... This register cannot be changed directly.
DS - Data Segment;
ES - Is a general purpose segment register;
FS - Same as ES;
GS - Same as ES;
SS - Stack segment;

In 32bit flat protected mode you generally don't need or sometimes can't manipulate the segment registers.
This is just off the top of my head, i used to do a lot of assembly programming. For further and more accurate information check: On-line Intel Documentation (http://x86.ddj.com/intel.doc/inteldocs.htm).
Hope this helps.

Charred_Phoenix
06-10-2003, 02:38 AM
I kind of though that it might be a general purpose type thing, but I wondered why the stack wasn't then used, thanks alot :)

bwkaz
06-10-2003, 09:35 PM
Because your stack resides in memory, and the 8 registers (E[A-D]X, E[SB]P, and E[SD]I) are right on the CPU. It's MUCH, MUCH faster to access a register than it is to go out even to the L1 cache. Let alone all the way past L1, L2, and even (if you have it) L3, and into the system memory. Sure, memory's quicker than disk, but it's quite a bit slower than cache, which is slower than registers.

If your program has a very tiny working set, that will fit inside 6 registers (E[A-D]X and E[SD]I), then it will run quite a bit faster with the data in there rather than a couple pieces in registers at a time, and the rest in memory.

dchidelf
06-10-2003, 10:59 PM
bwkaz's explination is very good.
Even aside from the speed difference between registers and memory, you need general purpose registers because you cannot do anything with data in memory. You need to move it to a register before you can add, subtract, compare or do anything useful.

retoon
06-10-2003, 11:30 PM
Hey, I read somewhere that the opteron is supposed to introduce more registers. A total of 15 registers I believe. Any word on what those extra registers are geared towards?

Charred_Phoenix
06-11-2003, 03:47 AM
Thanks again.

The only thing I wonder about though, is with thousands of functions constantly returning values, isn't there some conflict regarding %eax?

infidel
06-11-2003, 06:10 AM
Greetings,

There are no conflicts regarding EAX or any other register. The compiler takes care of that. If you're doing the assembly programming then it's up to you to save or use the value returned (or any other register) according to your needs before calling another function.
BTW it is perfectly possible and sometimes necessary to perform the calculations directly in memory, you can use instructions like 'add dword [<some_address>], <some_value>', 'sub' (subtract), 'inc' (increase), 'dec' (decrease), 'cmp' (compare) directly on memory operands.

Strogian
06-11-2003, 11:53 AM
Originally posted by infidel
Greetings,

There are no conflicts regarding EAX or any other register. The compiler takes care of that. If you're doing the assembly programming then it's up to you to save or use the value returned (or any other register) according to your needs before calling another function.
BTW it is perfectly possible and sometimes necessary to perform the calculations directly in memory, you can use instructions like 'add dword [<some_address>], <some_value>', 'sub' (subtract), 'inc' (increase), 'dec' (decrease), 'cmp' (compare) directly on memory operands.

Ah yes, because the x86 is a CISC processor, right? From what I understand though, it actually translates those kinds of instructions into several RISC-type instructions, and ends up performing the calculation in a register anyway. I'm not too sure about the specifics -- that's just what I heard. :)

dchidelf
06-11-2003, 04:15 PM
It would require very fancy ram for in-memory arithmetic to take place. Memory provides store, retrieve functions (sometimes checksum, parity functions), but no arithmetic functions as far as I know. Performing an add,sub,inc...etc. on a memory location would need to mov the value into the CPU (perhaps into an internal register not accessable by the user) perform the operation, and mov the result back. Regardless of the inner workings, we still need to move the data into CPU to process it.

If anyone knows of a way in-memory arithmetic is possible, please post.

dchidelf
06-11-2003, 08:15 PM
As an experiment I decided to try to determine what the difference between "in-memory" arithmetic operations and register based operations was.

This is the inital codebase:

#include <stdio.h>
int main(){
int a;
a = 1;
fprintf(stderr,"%d\n",a);
++a;
fprintf(stderr,"%d\n",a);
return;
}


The preincrement (++a) results in the following assembly once compiled:

incl -4(%ebp)

An "in-memory" operation!

Assembly won't be low-level enough for this experiment, so I'll look at machine code...
On my FreeBSD system the machine code generated from this assembly statement: 0xFF,0x45,0xFC

Instead of using "in-memory" operations, I change the assembly to perform the operation in a register:

movl -4(%ebp), %eax
incl %eax


The resulting machine code for just the inc %eax command is: 0x40
(the machine code for the mov is: 0x8B,0x45,0xFC)

I can't read machine code, but maybe 0xFF,0x45,0xFC is shorthand for move to %eax and increment...?

Lets try one more thing...
Going back to the original compiler generated assembly, lets check the value in %eax after the "in-memory" operation:
Original assembly

...
movl $1,-4(%ebp)
addl $-4,%esp
movl -4(%ebp),%eax
pushl %eax
pushl $.LC0
pushl $__sF+176
call fprintf # ...
addl $16,%esp # clean up after last fprintf
incl -4(%ebp) # increment a
addl $-4,%esp # skip 4 bytes on stack???
movl -4(%ebp),%eax # mov a to %eax
pushl %eax # push 3rd arg
pushl $.LC0 # push 2nd arg
pushl $__sF+176 # push 1st arg
call fprintf # call fprintf
...

We will make these changes

...
movl $1,-4(%ebp)
addl $-4,%esp
movl -4(%ebp),%eax
pushl %eax
pushl $.LC0
pushl $__sF+176
call fprintf # ...
addl $16,%esp # clean up after last fprintf
xorl %eax,%eax ##### Make sure %eax is 0
incl -4(%ebp) # increment a
addl $-4,%esp # skip 4 bytes on stack???
#movl -4(%ebp),%eax ##### SKIP moving a to %eax after inc
pushl %eax # push 3rd arg
pushl $.LC0 # push 2nd arg
pushl $__sF+176 # push 1st arg
call fprintf # call fprintf
...

Now we'll compile and run it...
Hmm... that's a 'no go'. %eax is still 0 after the inc.
In fact, using gdb's 'info reg' none of the regular registers change after the inc command.

This is probably good.
If you expect your registers to stay intact, the "in-memory" operations shouldn't touch any of the user registers.
It possibly uses some non-user-visible registers... or perhaps magic math gnomes in the CPU.

If anyone has a definitive answer, please let us know.

Strogian
06-11-2003, 09:47 PM
Well, here's a page I saw a while ago when reading about this stuff:
http://infocom.cqu.edu.au/Units/win2000/85349/Assessment/Past_Assignments/jonatha1/

It's not really technical, but it has pictures. :) What is the MBR there? That could probably be used, if it actually represents a register on an x86.

Hmm.. Are there any operations that work on two memory addresses? Maybe you should try something with that.

bwkaz
06-11-2003, 10:24 PM
Originally posted by retoon
Hey, I read somewhere that the opteron is supposed to introduce more registers. A total of 15 registers I believe. Any word on what those extra registers are geared towards? That is correct. I wish I remembered where the pretty pictures that I saw on the Opteron were, but the registers are twice as wide, and there are twice as many of them. All the extras are general-purpose (and if your code isn't running with the compatibility mode flag set on the page it's in, then even ESP and EBP are probably general purpose as well, though I don't know for sure, you'd have to look in an Opteron programmer's reference).

Originally posted by Strogian
What is the MBR there? That could probably be used, if it actually represents a register on an x86. I don't believe it does. This register is very likely internal (it's probably one of the pipeline registers, actually, though I don't know for sure).

Hmm.. Are there any operations that work on two memory addresses? Maybe you should try something with that. I believe there are, but I'd have to look at my IA32 assembly reference... hang on a minute.

Well... sort of. The rep movsd instruction (actually, rep is a prefix, but whatever) copies the block of memory pointed to by DS:%ESI to the block of memory pointed to by ES:%EDI, a doubleword at a time (doublewords are 32 bits long). The count of doublewords is stored in %ECX. This is a single x86 instruction, it gets compiled into two bytes (the rep prefix is one byte -- 0xF3, and the movsd is another byte -- 0xA5). A single movsd, without the prefix, will copy a single doubleword.

Originally posted by dchidelf
As an experiment I decided to try to determine what the difference between "in-memory" arithmetic operations and register based operations was. This would be better done with an IA32 instruction set reference, like the one that shipped with Borland Turbo Assembler 2 (but I think Intel still provides a PDF version on their developer site somewhere). But let's see what you did. :)

The preincrement (++a) results in the following assembly once compiled:
incl -4(%ebp)
An "in-memory" operation! Yeah, it should do that. The IA32 instruction set supports it. Duh... :p

Assembly won't be low-level enough for this experiment, so I'll look at machine code... Huh? Assembly and machine code are EXACTLY the same thing. There is a one-to-one conversion between the AT&T-syntax assembly opcodes that gcc produces, and the final resulting machine code. It's just that assembly is easier for humans to understand than pure numbers.

On my FreeBSD system the machine code generated from this assembly statement: 0xFF,0x45,0xFC According to my Turbo Assembler book, those are indeed the machine code bytes for an "inc" instruction that references memory. "inc" has an initial byte opcode of 0xFF. The %ebp, along with the fact that the assembly was referencing 32 bits, turned into something like the 0x45. The bit-wise breakup of this byte is as follows: 01 for the Mod field, which selects 8-bit-displacement-from-pointer-in-reg mode, then 110 for the /6 code (the part that says "I'm referencing 32 bits" -- if your variable was a short or a char, then this would be 000 instead), then 101 for the %EBP+displacement form. So this byte should, assuming my book is correct (which isn't always the case), be 01110101, or 0x75 instead of 0x45 (unless your operand was a short or char?).

The displacement is the next byte, 0xFC, which is (surprise surprise) negative 4 when you interpret it as a signed 8-bit integer.

Instead of using "in-memory" operations, I change the assembly to perform the operation in a register:

movl -4(%ebp), %eax
incl %eax


The resulting machine code for just the inc %eax command is: 0x40
(the machine code for the mov is: 0x8B,0x45,0xFC) Yes. The "inc %eax" turns into a single-byte opcode, 0x40 plus 3 bits for the register to increment (eax is 0).

The movl acts a lot like the incl from before, except with a different opcode (the 0x8B instead of 0xFF).

I can't read machine code, but maybe 0xFF,0x45,0xFC is shorthand for move to %eax and increment...? No, it's not. It's machine code for "increment the memory pointed to by %ebp-4".

Hmm... that's a 'no go'. %eax is still 0 after the inc.
In fact, using gdb's 'info reg' none of the regular registers change after the inc command.

This is probably good.
If you expect your registers to stay intact, the "in-memory" operations shouldn't touch any of the user registers.
It possibly uses some non-user-visible registers... or perhaps magic math gnomes in the CPU. It's exactly that. The RISC core in the processor is using registers that you as a programmer don't have access to. In the original 386, which had no RISC core, the increment actually did happen in memory. No registers were touched.

Strogian
06-11-2003, 11:18 PM
In the original 386, which had no RISC core, the increment actually did happen in memory. No registers were touched.

Hmm. Does that mean there were more wires going from the CPU to the memory in the 386? :) Or were they the same wires, just connected to more parts in the CPU? Or should I just shut up, since I obviously have no idea what I'm talking about? :D

dchidelf
06-12-2003, 06:27 PM
Originally posted by dchidelf
As an experiment I decided to try to determine what the difference between "in-memory" arithmetic operations and register based operations was.

This would be better done with an IA32 instruction set reference, like the one that shipped with Borland Turbo Assembler 2 (but I think Intel still provides a PDF version on their developer site somewhere).

Where's the fun in that?


The preincrement (++a) results in the following assembly once compiled:

incl -4(%ebp)

An "in-memory" operation!

Yeah, it should do that. The IA32 instruction set supports it. Duh...

This was a "Yeah! we don't have to make an in-memory operation ourselves!" not a "That's not possible!?"


Assembly won't be low-level enough for this experiment, so I'll look at machine code...

Huh? Assembly and machine code are EXACTLY the same thing. There is a one-to-one conversion between the AT&T-syntax assembly opcodes that gcc produces, and the final resulting machine code. It's just that assembly is easier for humans to understand than pure numbers.

I'm not sure where my confusion came from here. Probably from discussions of MMX instructions performing multiple operations.

Thanks for your post.

bwkaz
06-12-2003, 10:31 PM
Originally posted by Strogian
Hmm. Does that mean there were more wires going from the CPU to the memory in the 386? :) Or were they the same wires, just connected to more parts in the CPU? Or should I just shut up, since I obviously have no idea what I'm talking about? :D Nope, there were the same number of wires (actually, there were fewer -- the 386, 486, and Pentium had 32 address lines, while the PPro and higher have 36), but this operation took up two memory cycles. It was implemented just like "load to register, increment register, store back to memory" is implemented in a RISC processor, but it was less for the programmer to do (and the code to make it happen was 4 bytes smaller).

Of course, at that time, memory ran at pretty much the same speed as the CPU (slow ;)), so it took 2 CPU clocks too. I think there's one more clock cycle used for decoding the instruction, or something like that, too.

Originally posted by dchidelf
Where's the fun in that? :p

This was a "Yeah! we don't have to make an in-memory operation ourselves!" not a "That's not possible!?" Oh, sorry. Misunderstood you there then. :)

I'm not sure where my confusion came from here. Probably from discussions of MMX instructions performing multiple operations.

Thanks for your post. Ah, OK. Happy to help, and to finally use some of that crusty 386 assembly knowledge. ;)