Regular expressions are great for searching for patterns in text. This should come as no surprise because this is what they were designed to do. However, what if you want to perform regular expressions on binary data? For example, say there is a particular sequence of bytes in a file that don't map to printable characters? Doing this in Windows PowerShell poses two unique challenges:
Regular expressions in Windows PowerShell and .NET only work on strings, not on byte arrays.
Depending on the character encoding used, not all bytes translate to characters and then back into their same byte value.
Hexadecimal representation with regular expressions
Before tackling these challenges, it would be useful to cover how to search for bytes in their hexadecimal representation using regular expressions.
For example, the word 'Hello' is comprised of five characters, each with a corresponding numeric value. Let's use Windows PowerShell to see the hexadecimal values that make up the string 'Hello':
$StringBytes = [System.Text.Encoding]::ASCII.GetBytes('Hello')
[System.BitConverter]::ToString($StringBytes)
The first line converts 'Hello' to a byte array. Next, the BitConverter class ToString method prints out the hexadecimal digits of the byte array as a string: '48-65-6C-6C-6F'. Armed with this knowledge, let's search for the word 'Hello' by using only hexadecimal:
if ('Hello, cruel world!' -match '\x48\x65\x6C{2}\x6F')
{
$Matches[0]
}
Searching for hexadecimal bytes with a regular expression requires that you prepend '\x' to the byte. The regular expression here is searching for an 'H', 'e', exactly two 'l's, and an 'o'.
Scanning for other bytes
Now that we have a foundation for using regular expressions to scan for a sequence of bytes, let's begin analyzing bytes that don't normally translate to pretty characters. As an example, I recently found myself wanting to scan 32-bit executables for hotpatchable functions.
Many 32-bit Windows functions begin with the following two-byte assembly language instruction: MOV EDI, EDI. This instruction doesn't serve any useful purpose other than serving as a placeholder for a Microsoft hotpatch.
Hotpatching allows patches to be installed while a process is running. This functionality is essential when applying patches to servers that cannot be rebooted immediately. This blog post by Raymond Chen explains this concept in more detail: Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?
But basically, I'm interested in finding a sequence of five '\xCC' or '\x90' bytes followed by '\x8B\xFF'. This regular expression would look like this: '[\xCC\x90]\x8B\xFF'.
Using 28591 encoding
If we're going to scan for any arbitrary sequence of bytes with values between 0 and 255, we have to be sure that each character of a string that represents the binary data maps back to its respective byte value. Typically, in Windows PowerShell, you would read a file as a raw string with Get-Content ?Raw and use the -Encoding flag to specify the character encoding to be used (for example, Unicode, ASCII, UTF7, or UTF8).
Unfortunately, none of the encoding schemes that are allowed in Get-Content provide a one-to-one mapping of characters back to its respective byte value. There is a magic encoding scheme that does, however: ISO-8859-1 (Codepage: 28591).
To create an encoder object for an arbitrary code page, you can use the GetEncoding static method of the .NET Encoding class. An example of this is shown here:
$Encoder = [System.Text.Encoding]::GetEncoding(28591)
We cannot provide this encoder to the Get-Content cmdlet, so we have to use some .NET to read a file as a string using our special encoding. I wrote a filter to perform the needed operation. This filter is shown here, and I uploaded the full script to the Script Center Repository: ConvertTo-String.
ConvertTo-String filter:
filter ConvertTo-String
{
<#
.SYNOPSIS
Converts the bytes of a file to a string.
.DESCRIPTION
ConvertTo-String converts the bytes of a file to a string that has a
1-to-1 mapping back to the file's original bytes. ConvertTo-String is
useful for performing binary regular expressions.
.PARAMETER Path
Specifies the path to a file.
.EXAMPLE
PS C:\>$BinaryString = ConvertTo-String C:\Windows\SysWow64\kernel32.dll
PS C:\>$HotpatchableRegex = [Regex] '[\xCC\x90]{5}\x8B\xFF'
PS C:\>$HotpatchableRegex.Matches($BinaryString)
Description
-----------
Converts kernel32.dll into a string. A binary regular expression is
then performed on the string searching for a hotpatchable code
sequence - i.e. 5 nop/int3 followed by a mov edi, edi instruction.
#>
[OutputType([String])]
Param (
[Parameter( Mandatory = $True,
Position = 0,
ValueFromPipeline = $True )]
[ValidateScript( { -not (Test-Path $_ -PathType Container) } )]
[String]
$Path
)
$Stream = New-Object IO.FileStream -ArgumentList (Resolve-Path $Path), 'Open', 'Read'
# Note: Codepage 28591 returns a 1-to-1 char to byte mapping
$Encoding = [Text.Encoding]::GetEncoding(28591)
$StreamReader = New-Object IO.StreamReader -ArgumentList $Stream, $Encoding
$BinaryText = $StreamReader.ReadToEnd()
$StreamReader.Close()
$Stream.Close()
Write-Output $BinaryText
}
Understanding the ConvertTo-String filter
ConvertTo-String relies upon the FileStream object that accepts any encoder. A StreamReader object is then used to take advantage of its ReadToEnd method, which reads a stream to the end and outputs a string using the encoding scheme specified.
Let's wrap things up and use this function to find the index of each hotpatchable function in the 32-bit version of kernel32.dll:
$BinaryString = ConvertTo-String C:\Windows\SysWow64\kernel32.dll
$HotpatchableRegex = [Regex] '[\xCC\x90]{5}\x8B\xFF'
$HotpatchMatches = $HotpatchableRegex.Matches($BinaryString)
$MatchCount = $HotpatchMatches.Count
Write-Host "Total number of matches: $MatchCount"
# Print the index (in hexidecimal) of each MOV EDI, EDI instruction
$HotpatchMatches |
ForEach-Object { "0x$(($_.Index + 5).ToString('X8'))" }
0 件のコメント:
コメントを投稿