Assembly Language Primer
Before understanding the world of Malwares and Trojans, one has to learn what Assembly language is. In this tutorial, I will try to explain what assembly language is, where it comes from and how it is used.
According to WikiPedia,
"An assembly language is a low-level language used in the writing of computer programs. Assembly language uses mnemonics, abbreviations or words that make it easier to remember a complex instruction and make programming in assembler an easier task."
Therefore, in order to understand Assembly Language we will first have to learn more about these mnemonics or "Processor Instructions".
Processor Instructions
All computer processors work with data in binary codes, which is defined internally in processor chip by the manufacturer. These codes (instructions) outline what functions the processor should perform and how the data needs to be utilized.
Different processors will have different type of instruction codes, but they will generally handle these instruction codes in similar fashion.

- A processor reads instruction codes that are stored in memory. Each instruction code can contain one or more bytes of information instructing processor on what to do. Now with each instruction code read, processor also reads the data required by that instruction to and from the memory.
- To differentiate between instructions and data while i/o from memory, the processor uses “pointers”
- Instruction Pointer – IP is used to keep track of instruction code running and next instruction code to run.
- Data Pointer – EP is used to help processor keep track of where the data area in memory starts. This area is generally referred as stack. When data is entered into the stack, EP moves down and when data is read from stack EP moves up.
- After processor completes an instruction code, it reads the next one from memory as pointed by IP.
Note:Stack is region reserved at the end of memory range, which the processor reserves for the application. Processor then uses “Stack Pointer” to point to this memory location to push or pop data out of it.
Instruction Code
Every instruction contains Opcode (OP), which defines the function processor, has to perform. Each processor family may have different set of Opcode available. Instruction sets used in IA-32 family consists of four main parts:

- Optional Instruction Prefix (0-4 Bytes) – This contains prefixes that can modify the Opcode behavior. These prefixes are :
- Lock and Repeat – Indicates that any shared memory areas will be used exclusively by the instruction.
- Segment Override and branch hint – Segment override is to redefined segment register value and Branch Hint prefix provides processor a hint as to what will be the most likely path the program will take in a conditional jump statement.
- Operand Size override – Informs processor if the program switches between 16 and 32-bit operand sizes.
- Address size override – Informs processor if the program switches between 16 and 32bit memory addresses.
- Opcode (1-3 Bytes) – The opcode uniquely defines the function that the processor has to perform.
- Optional Modifier (0-6 bytes) – Additional modifiers to define what registers and memory locations are involved.
- ModR/M – Define the register or addressing mode used in the instruction
- SIB (Scale-Index-Base) – Specifies the scale factor of operand, index register for the memory access and the register that is used as the base register for memory access.
- Displacement – Indicates the offset to memory location defined in ModR/M and SIB bytes.
- Optional Data Element (0-4 bytes) – Data element that is used by the function. Some instruction codes include the data rather than reading data values from memory.
The core of Assembly Language is to understand the Opcode mnemonics and how they are used to create programs. However we don’t have to understand and remember the byte codes used for each Opcode. Assembly language is generally written in following format "PUSH %ebp" and these mnemonics are converted to byte code such as "55".
Some Opcode might be valid on only limited processors, more details about Opcode for your chipset can be found here (if you are running Intel processor).
Post new comment