Windows Malware Development Part 1: PE File Format
Objective
Hey guys! Welcome to the first post in a series of posts aimed at learning Windows malware development! In this post, we will be looking at the PE file format, which is an important step in learning how to make a Windows executable.
What is a PE?
A Portable Executable (PE) is a file format for executable files that can be understood and run by the Windows OS. The PE format is one of the file formats that are known as Common Object File Format (COFF) files.
This format does not only apply for .exe
files, but is also used for Dynamic Link Libraries (.dll
), driver configuration files (.sys
), Screen Saver files (.scr
), and more.
The PE file format is a very important part of the Windows OS, since it contains necessary information required in order for the OS to load an executable in to memory and execute it.
The PE Format
This diagram is a short overview of what the PE file structure looks like. Let’s go over each part in detail in order to understand this better.
The DOS Header
The first 64 bytes of a PE file are occupied by the DOS header. This header is used in order to identify whether the file is a valid MS DOS executable or not. The structure of the DOS header is as follows:
typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header
WORD e_magic; // Magic number
WORD e_cblp; // Bytes on last page of file
WORD e_cp; // Pages in file
WORD e_crlc; // Relocations
WORD e_cparhdr; // Size of header in paragraphs
WORD e_minalloc; // Minimum extra paragraphs needed
WORD e_maxalloc; // Maximum extra paragraphs needed
WORD e_ss; // Initial (relative) SS value
WORD e_sp; // Initial SP value
WORD e_csum; // Checksum
WORD e_ip; // Initial IP value
WORD e_cs; // Initial (relative) CS value
WORD e_lfarlc; // File address of relocation table
WORD e_ovno; // Overlay number
WORD e_res[4]; // Reserved words
WORD e_oemid; // OEM identifier (for e_oeminfo)
WORD e_oeminfo; // OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
Out of all these values, two are of interest to us:
e_magic
: ThisWORD
is also called the magic number, and has a fixed value of0x4D 0x5A
(MZ) in Windows executables. This is a signature used in order to identify a MS-DOS executable.e_lfanew
: This member holds an offset to the beginning of the NT Headers.
The DOS Stub
This is a program that prints the error “This program cannot be run in DOS mode.” in case the executable is loaded in DOS Mode. It should be noted that this message can be changed by the developer during compilation.
0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00
NT Headers
The structure NT headers is defined in winnt.h
as IMAGE_NT_HEADERS
, and contains important information about the PE file. It should be noted, that the structure is defined differently for 32-bit and 64-bit.
32-bit:
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
64-bit:
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS64;
Signature
The DWORD
member Signature
is used to identify a PE file, and has a fixed value of 0x50450000
which translates to PE\0\0
.
File Header
The FileHeader
member is a structure of type IMAGE_FILE_HEADER
, and is also known as the COFF header. The structure is defined as follows:
typedef struct _IMAGE_FILE_HEADER {
WORD Machine;
WORD NumberOfSections;
DWORD TimeDateStamp;
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
Out of all the members, three are of interest to us:
NumberOfSections
: Field that holds the number of sections in the PE file (discussed later in this post).Characteristics
: Holds information related to the type of PE file, whether it is a.dll
or a console application.SizeOfOptionalHeader
: Holds the size of the Optional Header (discussed below). This member exists because the size of the Optional Header is not fixed.
Optional Header
Don’t let the name fool you, because out of all the headers in the NT header, this is the most important. This header is essential for the execution of image files. Some file types (for example object files) don’t have this header, that is why its called optional.
This header is defined differently for 32-bit and 64-bit with a few key differences:
- The size of some members vary between the versions.
- The 64-bit version uses
ULONGLONG
instead ofDWORD
. - The 32-bit version has some additional members.
32-bit:
typedef struct _IMAGE_OPTIONAL_HEADER {
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
DWORD BaseOfData;
DWORD ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
DWORD SizeOfHeapReserve;
DWORD SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
64-bit:
typedef struct _IMAGE_OPTIONAL_HEADER64 {
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
ULONGLONG ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
ULONGLONG SizeOfStackReserve;
ULONGLONG SizeOfStackCommit;
ULONGLONG SizeOfHeapReserve;
ULONGLONG SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
Members that are of interest to us are:
Magic
: Specifies the state of the image file (32-bit or 64-bit, ROM image or not).MajorOperatingSystemVersion
: The major version number of the required operating system (10, 11).MinorOperatingSystemVersion
: The minor version number of the required operating system (1511,1607).SizeOfCode
: The size of the code, or.text
section.AddressOfEntryPoint
: Offset to the entry-point of the file (usually themain()
function).BaseOfCode
: Offset to the start of the.text
section.SizeOfImage
: Size of the image file.ImageBase
: The preferred address of the first byte of the image when it is loaded in memory.- However, note that you will rarely see an executable of today being mapped to the Image Base address defined here, since the address will change every time the executable is loaded into memory. This is due to Windows protection mechanisms such as Address Space Layout Randomization (ASLR), which have been implemented in order to make reverse engineering binaries and developing exploits more challenging.
- The Windows loader has its own PE relocation process in order to make sure that the execution of the PE is not hindered by these mechanisms.
DataDirectory
: Important member of the optional header. An array that contains information about the various directories in a PE (discussed below).
Data Directories
The Data Directory array can be found at the end of the Optional Header. The array is of type IMAGE_DATA_DIRECTORY
, and is defined as follows:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
The size of the array is defined as IMAGE_NUMBEROF_DIRECTORY_ENTRIES
, and has a fixed value of 16
, which means that the array can have up to 16
members.
#define IMAGE_NUMBEROF_DIRECTORY_ENTRIES 16
The indexes for the array are as follows:
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0 // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT 1 // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE 2 // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY 4 // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC 5 // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG 6 // Debug Directory
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS 9 // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT 12 // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14 // COM Runtime descriptor
Out of these directory entries, two are extremely important:
Export Directory
: Contains addresses of any variables or functions that are exported from the executable. You will usually see this in.dll
, as they export functions.Import Directory
: Contains addresses of any functions that have been imported from other executables.
PE Sections
Every PE contains sections, which consists of code and data that are executed by the executable file. Each section is assigned a label, depending on what it contains. The number of sections in a PE file are not constant because some loaders tend to add more sections to the PE. However, there are sections that are present in almost every PE:
.text
: Contains code that is to be executed by the PE file..data
: Contains information about initialized variables in the code..rdata
: Contains read-only data (const
variables)..idata
: Contains import tables. These tables contain information related to which.dll
s to load and what functions do each of those.dll
s use..reloc
: Contains information needed by the Windows loader in order to perform relocation..rsrc
: Contains resources used by the PE file (bitmaps, icons, etc).
Each PE section has a header called IMAGE_SECTION_HEADER
, which is a data structure that contains addresses and pointers related to that section. It is defined as follows:
typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
Conclusion
This is by no means a complete breakdown of the PE file structure, however, the content covered in this post will come in handy later when we are developing Windows Malware, and want to employ anti-analysis, anti-VM and AV bypassing techniques.
In the next post, we will be looking at the basics of Windows API, and how to incorporate it in our code.