Windows Malware Development Part 1: PE File Format

8 minute read

Objective

Hey guys! Welcome to the first post in a series of posts aimed at learning Windows malware development! In this post, we will be looking at the PE file format, which is an important step in learning how to make a Windows executable.

What is a PE?

A Portable Executable (PE) is a file format for executable files that can be understood and run by the Windows OS. The PE format is one of the file formats that are known as Common Object File Format (COFF) files.

This format does not only apply for .exe files, but is also used for Dynamic Link Libraries (.dll), driver configuration files (.sys), Screen Saver files (.scr), and more.

The PE file format is a very important part of the Windows OS, since it contains necessary information required in order for the OS to load an executable in to memory and execute it.

The PE Format

This diagram is a short overview of what the PE file structure looks like. Let’s go over each part in detail in order to understand this better.

PE Diagram

The DOS Header

The first 64 bytes of a PE file are occupied by the DOS header. This header is used in order to identify whether the file is a valid MS DOS executable or not. The structure of the DOS header is as follows:

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

Out of all these values, two are of interest to us:

  • e_magic: This WORD is also called the magic number, and has a fixed value of 0x4D 0x5A (MZ) in Windows executables. This is a signature used in order to identify a MS-DOS executable.
  • e_lfanew: This member holds an offset to the beginning of the NT Headers.

The DOS Stub

This is a program that prints the error “This program cannot be run in DOS mode.” in case the executable is loaded in DOS Mode. It should be noted that this message can be changed by the developer during compilation.

0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 6D 6F 64 65 2E 0D 0D 0A 24 00  00 00 00 00 00 00 

DOS Stub

NT Headers

The structure NT headers is defined in winnt.h as IMAGE_NT_HEADERS, and contains important information about the PE file. It should be noted, that the structure is defined differently for 32-bit and 64-bit.

32-bit:

typedef struct _IMAGE_NT_HEADERS { 
DWORD                     Signature; 
IMAGE_FILE_HEADER         FileHeader; 
IMAGE_OPTIONAL_HEADER32   OptionalHeader; 
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

64-bit:

typedef struct _IMAGE_NT_HEADERS { 
DWORD                     Signature; 
IMAGE_FILE_HEADER         FileHeader; 
IMAGE_OPTIONAL_HEADER64   OptionalHeader; 
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS64;

Signature

The DWORD member Signature is used to identify a PE file, and has a fixed value of 0x50450000 which translates to PE\0\0.

Signature

File Header

The FileHeader member is a structure of type IMAGE_FILE_HEADER, and is also known as the COFF header. The structure is defined as follows:

typedef struct _IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

Out of all the members, three are of interest to us:

  • NumberOfSections: Field that holds the number of sections in the PE file (discussed later in this post).
  • Characteristics: Holds information related to the type of PE file, whether it is a .dll or a console application.
  • SizeOfOptionalHeader: Holds the size of the Optional Header (discussed below). This member exists because the size of the Optional Header is not fixed.

Optional Header

Don’t let the name fool you, because out of all the headers in the NT header, this is the most important. This header is essential for the execution of image files. Some file types (for example object files) don’t have this header, that is why its called optional.

This header is defined differently for 32-bit and 64-bit with a few key differences:

  • The size of some members vary between the versions.
  • The 64-bit version uses ULONGLONG instead of DWORD.
  • The 32-bit version has some additional members.

32-bit:

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

64-bit:

typedef struct _IMAGE_OPTIONAL_HEADER64 {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  ULONGLONG            ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  ULONGLONG            SizeOfStackReserve;
  ULONGLONG            SizeOfStackCommit;
  ULONGLONG            SizeOfHeapReserve;
  ULONGLONG            SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

Members that are of interest to us are:

  • Magic: Specifies the state of the image file (32-bit or 64-bit, ROM image or not).
  • MajorOperatingSystemVersion: The major version number of the required operating system (10, 11).
  • MinorOperatingSystemVersion: The minor version number of the required operating system (1511,1607).
  • SizeOfCode: The size of the code, or .text section.
  • AddressOfEntryPoint: Offset to the entry-point of the file (usually the main() function).
  • BaseOfCode: Offset to the start of the .text section.
  • SizeOfImage: Size of the image file.
  • ImageBase: The preferred address of the first byte of the image when it is loaded in memory.
    • However, note that you will rarely see an executable of today being mapped to the Image Base address defined here, since the address will change every time the executable is loaded into memory. This is due to Windows protection mechanisms such as Address Space Layout Randomization (ASLR), which have been implemented in order to make reverse engineering binaries and developing exploits more challenging.
    • The Windows loader has its own PE relocation process in order to make sure that the execution of the PE is not hindered by these mechanisms.
  • DataDirectory: Important member of the optional header. An array that contains information about the various directories in a PE (discussed below).
Data Directories

The Data Directory array can be found at the end of the Optional Header. The array is of type IMAGE_DATA_DIRECTORY, and is defined as follows:

typedef struct _IMAGE_DATA_DIRECTORY {
  DWORD VirtualAddress;
  DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

The size of the array is defined as IMAGE_NUMBEROF_DIRECTORY_ENTRIES, and has a fixed value of 16, which means that the array can have up to 16 members.

#define IMAGE_NUMBEROF_DIRECTORY_ENTRIES 16

The indexes for the array are as follows:

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

Out of these directory entries, two are extremely important:

  • Export Directory: Contains addresses of any variables or functions that are exported from the executable. You will usually see this in .dll, as they export functions.
  • Import Directory: Contains addresses of any functions that have been imported from other executables.

PE Sections

Every PE contains sections, which consists of code and data that are executed by the executable file. Each section is assigned a label, depending on what it contains. The number of sections in a PE file are not constant because some loaders tend to add more sections to the PE. However, there are sections that are present in almost every PE:

  • .text: Contains code that is to be executed by the PE file.
  • .data: Contains information about initialized variables in the code.
  • .rdata: Contains read-only data (const variables).
  • .idata: Contains import tables. These tables contain information related to which .dlls to load and what functions do each of those .dlls use.
  • .reloc: Contains information needed by the Windows loader in order to perform relocation.
  • .rsrc: Contains resources used by the PE file (bitmaps, icons, etc).

Each PE section has a header called IMAGE_SECTION_HEADER, which is a data structure that contains addresses and pointers related to that section. It is defined as follows:

typedef struct _IMAGE_SECTION_HEADER {
   BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
   union {
     DWORD PhysicalAddress;
     DWORD VirtualSize;
   } Misc;
   DWORD VirtualAddress;
   DWORD SizeOfRawData;
   DWORD PointerToRawData;
   DWORD PointerToRelocations;
   DWORD PointerToLinenumbers;
   WORD  NumberOfRelocations;
   WORD  NumberOfLinenumbers;
   DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Conclusion

This is by no means a complete breakdown of the PE file structure, however, the content covered in this post will come in handy later when we are developing Windows Malware, and want to employ anti-analysis, anti-VM and AV bypassing techniques.

In the next post, we will be looking at the basics of Windows API, and how to incorporate it in our code.