Instruction Decoding by Sample of the x86 Architecture

 

Users and programmers are under the illusion they communicate with the computer through higher-level languages or assembly languages. In either case, the communication is through symbols. These are not understood by the target system, which only interprets bits, binary symbols 0 and 1. However, it can interpret strings of bits of various length. These strings result from mapping symbols.

 

This section discusses instructions, their representation, encoding and decoding, and the length of instructions and data.

Synopsis

·        Motivation

·        Definitions

·        Instruction Encoding and Decoding

·        Sample Assembler Source Program

·        Listing of Sample Assembler Program

·        Exercises

·        Literature References

Motivation

·        Target understands (interprets) solely bits

·        Humans express themselves in symbols, not bits

·        Communicating through bit strings too tedious, time consuming, sickening

·        Abstract symbols must be encoded as bit strings

·        Bit strings must be decoded by target hardware

·        After decoding bit strings, target can execute

 


Definitions

Assembler:

Software tool that maps abstract assembler text into binary code. Some addresses may not be resolved in a multi-module program or on a relocatable target.

Binary Code:

String of bits which, when interpreted according to conventions, results in executable programs and/or corresponding data.

CISC:

Complex Instruction Set Computer: An architecture whose instructions vary in length, depending on the number and type of operands. Generally, such an architecture must decode in discrete steps the string of bits constituting an instruction. The result of one partial decode defines the next decoding (and interpretation) step. This generally extends the time perceived as instruction execution time.

Decoding:

Process of breaking strings of bits into substrings according to position and value. These substrings then can be interpreted.

Encoding:

Process of mapping symbols into strings of bits according to rules and convenmtions.

Linker:

Software tool that combines 1 or more assembler outputs into one binary code object. Typically, a linker resolves external names, includes libraries that were assumed to exist, and provides the last information to the loader regarding address locations that were not resolvable at link-time.

 Opcode:

Bits in an instruction that identify the meaning and purpose of the instruction plus instruction’s operands.

RISC:

Reduced Instruction Set Computer, one in which the length of instructions is consistent, typically 32 bits. The time to execute any one instruction is generally unit. Most operations are interpreted by hardware, not microcode. Instruction length does not need to be decoded; it is known a-priori.


Instruction Encoding and Decoding

 

·        Assembler and compiler encode instructions and (some) data

·        Hardware (processor decoding unit) and disassembler decode binary code

·        After the decode, the hardware can execute instructions

·        Detail of decoding:

 

 

·        Find first instructions of complete program

·        Determine the length; easy in RISK architecture; requires interpretation on CISC architecture

·        Find the next instruction; equivalent to finding next value of instruction pointer

·        At each step: fetch sufficient bytes (in one or multiple steps) to decode complete instruction with all of its operands

·        Fixed-field encoding reserves predetermined number and location of bits for opcode

·        For example, it would be possible to dedicate always the first (leftmost, lowest addressed) byte of 1 or more instruction bytes for the opcode in a CISC architecture

·        Huffman encoding reserves increasing number of bits for instructions with decreasing frequency of execution.

·        Thus, Huffman encoding allows the most used (run-time) instruction to be identifiable with a single bit, but all other instructions may then not use that bit, i.e. must have this bit set to 0

·        Other opcodes in Huffman encoding use further bits for encoding

·        Length decoding simple on RISC architecture; length known a-priori

·        Opcode identification on RISC is stil possible either way, fixed-field of using Huffman encoding


Sample Assembler Source Program

 

·        Assembler source below run through masm

·        Masm generate optional listing

·        Listing shows the way data are encoded, with initial values, if any

·        Listing shows addresses of data

·        Also shows how instructions are encoded

·        And shows addresses of instructions

·        Addresses of two adjacent instructions allows computation of length of prior instruction

·        Program below defines several simple integer data objects

·        Some integers (dw) are initialized

·        Others are intentionally left uninitialized (via ?)

·        Uninitialized scalars are still set to 0 by masm; since any value is allowed, 0 is permitted too

·        Arrays are defined using the dup pseudo-op

·        Arrays defined to be left uninitialized are not preset by the Microsoft assembler masm

 

; Source file:  arith.asm

; Author:       Herb Mayer

; Date:         2-8-1997

; Purpose:      arithmetic operations and listing

start     macro              ; no parameters

          mov  ax, @data     ; @data predefined macro

          mov  ds, ax        ; now data segment reg set

          endm               ; end macro start

    

Done      macro     ret_code  ; if all o.k.: ret_code will be 0

          mov  ah, 4ch       ; DOS routine to terminate

          mov  al, ret_code  ; we wanna terminate, ah + al

          int  21h           ; terminate finally

          endm               ; and macro Done  

 

          .model    small     ; assumes stack data code

          .stack    100h      ; assumes name: stack

          .data              ; assumes name: data

 

d_first   dw     +0

w0        dw   +109

w1        dw      ?

a1        dw      1 dup( ? )

w2        dw   -109

a2        dw      5 dup( 0ffffh )

a3        dw      5 dup( 3 dup( 0deadh ) )

a4        dw      5 dup( 0beefh, 0deedh, 0babeh )

 

          .code              ; assumes name: code

 

arith     proc               ; include a few arithmetic ops

          mov  ax, 100       ; literal into ax

          mov  cx, 100       ; literal into non ax

          mov  ax, w1        ; reloc into ax

          mov  cx, w1        ; reoc into non ax

          mul  cx

          imul cx

          neg  cx

          div  ax

          idiv ax

          ret                ; back to caller

arith     endp

    

main      proc

          start

          call arith

          Done 0

main      endp

 

          end  main          ; start here!


Listing of Sample Assembler Program

 

     ; Source file:  arith.asm

     ; Author:       Herb Mayer

     ; Date:         2-8-1997

     ; Purpose:      arithmetic operations and listing

    

start     macro              ; no parameters

          mov  ax, @data     ; @data predefined macro

          mov  ds, ax        ; now data segment reg set

          endm               ; end macro start

         

Done      macro ret_code     ; if all o.k.: ret_code will be 0

          mov  ah, 4ch       ; DOS routine to terminate

          mov  al, ret_code  ; we wanna terminate, ah + al

          int  21h           ; terminate finally

          endm               ; and macro Done  

    

          .model    small     ; assumes stack data code

          .stack    100h      ; assumes name: stack

    

          .data              ; assumes name: data

 

 0000  0000   d_first   dw     +0

 0002  006D   w0        dw   +109

 0004  0000   w1        dw      ?

 0006  0001[  a1        dw      1 dup( ? )

        ???? ]

    

 0008  FF93   w2        dw   -109

 000A  0005[  a2        dw      5 dup( 0ffffh )

        FFFF ]

 

    

 0014  0005[  a3        dw      5 dup( 3 dup( 0deadh ) )

          0003[

               DEAD ] ] 

 

    

 0032  0005[  a4        dw      5 dup( 0beefh, 0deedh, 0babeh )

        BEEF           

        DEED           

        BABE ]

 


                        .code                   ; assumes name: code

    

 0000     arith     proc          ; include a few arithmetic ops

 0000  B8 0064     mov  ax, 100   ; literal into ax

 0003  B9 0064     mov  cx, 100   ; literal into non ax

 0006  A1 0004 R   mov  ax, w1    ; reloc into ax

 0009  8B 0E 0004 R mov  cx, w1    ; reloc into non ax

 000D  F7 E1       mul  cx

 000F  F7 E9       imul cx

 0011  F7 D9       neg  cx

 0013  F7 F0       div  ax

 0015  F7 F8       idiv ax

 0017  C3          ret           ; back to caller

 0018              arith endp

         

 0018         main proc

          start

 0018  B8 ---- R 1 mov  ax, @data ; @data predefined macro

 001B  8E D8     1 mov  ds, ax    ; now data segment reg set

    

 001D  E8 0000 R   call arith

         

          Done 0

 0020  B4 4C     1 mov  ah, 4ch   ; DOS routine to terminate

 0022  B0 00     1 mov  al, 0     ; we wanna terminate, ah + al

 0024  CD 21     1 int  21h       ; terminate finally

 0026         main endp

    

               end  main          ; start here!

 


 

Macros:

N a m e                           Lines

DONE . . . . . . . . . . . . . .         3

START  . . . . . . . . . . . . .         2

 

Segments and Groups:

 

N a m e                     Length    Align    Combine Class

 

DGROUP . . . . . . . . . . . . .      GROUP

  _DATA  . . . . . . . . . . . .      0050      WORD PUBLIC    'DATA'

  STACK  . . . . . . . . . . . .      0100      PARA STACK     'STACK'

_TEXT  . . . . . . . . . . . . .      0026      WORD PUBLIC    'CODE'

 

Symbols:           

 

N a m e                      Type     Value     Attr

 

A1 . . . . . . . . . . . . . . .      L WORD    0006 _DATA

A2 . . . . . . . . . . . . . . .      L WORD    000A _DATA Length = 0005

A3 . . . . . . . . . . . . . . .      L WORD    0014 _DATA Length = 0005

A4 . . . . . . . . . . . . . . .      L WORD    0032 _DATA Length = 0005

ARITH  . . . . . . . . . . . . .      N PROC    0000 _TEXT Length = 0018

 

D_FIRST  . . . . . . . . . . . .      L WORD    0000 _DATA

 

MAIN . . . . . . . . . . . . . .      N PROC    0018 _TEXT Length = 000E

 

W0 . . . . . . . . . . . . . . .      L WORD    0002 _DATA

W1 . . . . . . . . . . . . . . .      L WORD    0004 _DATA

W2 . . . . . . . . . . . . . . .      L WORD    0008 _DATA

 

@CODE  . . . . . . . . . . . . .      TEXT  _TEXT       

@CODESIZE  . . . . . . . . . . .      TEXT  0      

@CPU . . . . . . . . . . . . . .      TEXT  0101h       

@DATASIZE  . . . . . . . . . . .      TEXT  0      

@FILENAME  . . . . . . . . . . .      TEXT  arith       

@VERSION . . . . . . . . . . . .      TEXT  510    

 

 

     53 Source  Lines

     58 Total   Lines

     29 Symbols

 

  47700 + 404728 Bytes symbol space free

 

      0 Warning Errors

      0 Severe  Errors