In the previous article https://8ksec.io/dissecting-windows-malware-series-risc-vs-cisc-architectures-part-4/, we took a little detour and learnt more about CPU architectures, in order to understand the underlying mechanisms assembly code analysis is build upon.
We mostly talked about:
The differences between CISC and RISC architectures.
Where are we heading in the future
How is it related to Malware Analysis
Without further a due, let’s start learning how create malware-focused network signatures.
Table of Contents
What's In It For Me❓
In this article we’ll:
Learn what OPSEC is and how to safely investigate a malware sample.
Explore various methods malware uses to disguise its objectives, focusing on manipulation of network-related components.
Discover how to create malware network signatures based on real malware samples.
Already familiar with creating malware network signatures?
Feel free to scroll down to the ‘Creating Malware-Focused Network Signature Checklist‘.
How To Safely Investigate Malware Network Components ⚠️
When discussing the network functionality of malware, we’ll mainly focus on the following attributes:
IP addresses
Network Protocols: TCP, UDP, HTTP(S), etc.
Ports
Domain Names
Traffic Content
Given the objective of understanding the network component of a handed malware sample, we might be tempted to run the malware and observe its behavior.
This would be a mistake.
Instead, we should review the data we already have, including:
Logs
Alerts
Packet captures
Any other data already generated by the malware.
By running the malware as our first step, we risk leaking our analysis actions to the attacker, leading us to the next important concept.
What Is OPSEC and Why is It Important❓
When using the Internet for research, understanding operation security (OPSEC) is crucial.
While performing investigations, certain actions we take can alert the malware author that we’ve identified the malware or even reveal personal information to the attacker.
For example:
Analyzing the malware from home, if it was initially sent to your corporate network via email, can lead to a DNS request being made from an IP address space outside the one normally used by your corporation.
If the malware was sent to a specific individual as a spear-phishing email containing a link, any access attempts to that link from outside the geographical area of the IP address space can alert the attacker.
Monitoring for attempts to resolve an unused domain included in the malware can indicate investigation activities.
Designing an exploit with an encoded link on an Internet-accessible, editable site (e.g., blog comments) can create a private, publicly accessible infection audit trail.
Awareness of an ongoing investigation may prompt attackers to change tactics and vanish.
So How Can We Safely Investigate a Malware Sample🚦
Indirect Tactics
The most known one is to use services or mechanisms that provide anonymity, such as: Tor, Open Proxy, etc…
Another method is using a dedicated machine for research and hiding it’s location:
Using only cellular connection
Tunneling the connection through a remote infrastructure using SSH or VPN.
Using an ephemeral remote machine on a Cloud service, like Amazon EC2.
Direct Tactics
These core tactics are crucial for every malware analyst investigating a malware sample. Basic indicators, such as IP addresses and domain names, are mostly valuable for defending against a specific version of malware.
Malware authors are adept at quickly changing addresses or domains, so the solution involves:
So what’s the solution?
Defining Behavior Based Countermeasures.
Defining Content Based Countermeasures.
When talking about Behavior based countermeasures – EDRs, XDRs, FWs, WAFs, NACs and similar security solutions are the way to go.
This realm of solutions deserves a separate article.
When talking about Content based countermeasures – IDSs and IPSs are the first things that pops to mind – This is the aspect we’ll focus on in this article.
Content-Based Network Countermeasures📃
Signature-based IDSs & IPSs are the oldest deployed systems for detecting malicious activity via network traffic.
Signature based detection depends on knowing the characteristics of the malicious activity, and how does it look like.
A good signature will send an alert every time a malicious activity happens (true positive), but will not create an alert for anything that looks like malware but actually legitimate (false-positive).
Good Old Snort IDS
One of the most popular IDSs out there is Snort. It is used to create a signature or a rule that links together a series of elements (called rule options).
Only if the rule options are true – The rule will fire.
We’ll want to create a signature – Generic as Possible. Preferably detecting a malware family as a whole and not just a specific version of a certain malware type.
Malware will try to blend in the legitimate network traffic (HTTP, DNS, etc…) as much as possible – This in turn makes it more challenging to detect.
1. So what mechanisms malware will try to leverage for that purpose?
2. How will they try to impose challenges on the analyst’s efforts to
create a precise signature?
Next, we’ll see 2 malware samples that showcase and answer those questions.
A Little Bit of Snort Basics📚
But before we dive deep, let’s understand the attributes of Snort signatures:
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg:"SQL Injection Attempt Detected"; flow:to_server,established; content:"SELECT"; nocase; content:"FROM"; distance:0; within:40;content:"WHERE"; distance:0; within:40; pcre:"/(\%27)|(\')|(\-\-)|(\%23)|(#)/i"; classtype:web-application-attack; sid:1000001; rev:1;)
alert tcp – Specifies that the rule if for TCP traffic.
$EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS – Defines the traffic flow direction – from any external network using HTTP on defined HTTP ports.
msg:”SQL Injection Attempt Detected” – The message that will be logged or shown when the rule triggers.
flow:to_server,established – Specifies that this rule applies to established connections heading towards the HTTP server.
content – This specifies the specific content to look for in the packet payload.
In this case, it looks for common SQL keywords like “SELECT”, “FROM”, and “WHERE”, indicating a potential SQL injection attempt.
nocase – Makes the content match case insensitive. distance and within: These keywords are used to specify content location constraints relative to the previous match.
pcre – Perl Compatible Regular Expressions for more complex pattern matching. Here, it’s used to detect common SQL injection characters and patterns.
classtype:web-application-attack – This categorizes the type of attack the rule is designed to detect.
sid – The Snort ID for the rule. This should be unique.
rev – The revision number of the rule, useful for keeping track of updates to your signatures.
An Important Side Note
Before deploying, it’s crucial to test your signatures thoroughly to ensure they do not generate an excessive number of false positives or negatives.
Let's Start Reversing😎
1. Network Content Signature Based on Endpoint's Unique Data
Given the malware sample, we see there is no associated packet capture. This forces us to start with Basic Dynamic Analysis to help us understand how the malware operates.
Running the malware with FakeNet running in the background, we observe the following beacon:
It appears the HTTP GET request contains encoded data and downloads an image file from a certain domain (obfuscated in the picture).
Our goal, as stated, is to find hard-coded data or ephemeral coded data we could use in creating our Snort signature.
Running the malware a couple more times produces the same beacon.
Let’s then continue to analyze the assembly code.
Usually, we would scroll through the import address table (if there is one, and the malware is not packed) and look for network-related API calls.
But, as the main function opens in IDA Pro (after automatically determining the entry point), we cannot help but see and analyze the following assembly code snippets:
It appears the data passed to the function sub_4010BB is ephemeral data gathered from the endpoint profile data (this includes username, MAC address, etc.).
Analyzing the function sub_4010BB, we observe the following assembly code:
It appears the endpoint queried information is used as the data to be sent in the HTTP GET request using the API call: URLDownloadToCacheFileA.
Since we observed the network generated content data is based on the host information, our guess is to try and run the malware on a different host.
This time, the beacon the malware sends has different GET request data (also Base64 encoded) and uses a different User-Agent:
Following the data we gathered, we can conclude what the hard-coded data is and create the network signature.
The key static elements to target when analyzing a network signature are the colons and the dash that provide padding among the hardware profile bytes and the username.
Targeting these elements is challenging because the malware applies a layer of Base64 encoding before sending this content onto the network.
However, inspecting the Base64 strings we gathered, we can infer the following:
Each colon in the original string is the third character of each triple.
In Base64, all the bits in the fourth character of each quad come from the third character.
And that’s why:
Every fourth character under the colon is a 6.
Because of the dash, the sixth quad will always end with a t.
This leads us to the following final conclusions:
1. The URI that will be used will always be at least 24 characters long with specific locations for the four 6 characters and the ‘t’.
2. We know the character set that may be used to represent the rest of the URI.
3. We also know the download name is a single character that is the same as the end of the path.
All this information allows us to create the following two Snort signatures:
It’s in the form of:
/\/XXX6XXX6XXX6XXX6XXX6XXXt(XXXX){1,}\//
Where:
X represents the character set: [A-Z0-9a-z+\/]
It captures blocks of four characters ending in 6 and t.
It targets the first segment if the URI with static characters.
The second Snort rule will be based on the following pattern:
/\/[A-Z0-9a-z+\/]{24,}\([A-Z0-9a-z+\/]\)\/\1.png/
And can created in a similar manner.
2. Network Content Signature - Leveraging HTML Attributes
Given the malware sample, and since we don’t have a packet capture, we’ll start with Basic Dynamic Analysis.
Running the malware with FakeNet in the background, we observe the following request:
We notice the attacker has mistakenly hard-coded the keyword ‘User-Agent’, which can later be used in our Snort signature.
We also observe the file requests an HTML file named ‘start.htm’.
Inspecting the relevant Windows API calls, we see there is heavy use of the WinINet DLL library and the Windows COM object model.
Cross-referencing these functions in the assembly code, we get to the following code snippet:
First Goal - Identify the Beacon Content🅰️
Doing some backtracking, we find out this function is called from Winmain with two arguments.
The one used before the call to InternetOpenUrlA is the URL and defines the beacon destination.
This URL is set in another function in Winmain, which contains the following code:
In case the .exe file doesn’t exist, a call will be made to the static URL, requesting the start.htm we saw earlier.
Further analysis reveals that the ReadFile function takes a buffer as an argument, which is eventually passed all the way back to the InternetOpenUrlA function.
Thus, we can conclude that autobat.exe is a configuration file that stores the URL in plaintext.
Second Goal - Identify How The Malware Responds🅱️
Following the InternetReadFile call, we notice the following code snippet:
The strstr function (returns a pointer to the first occurrence of a search string in a string) is placed within two loops—the outer one containing the call to InternetReadFile, and the inner one containing strstr and a call to another function, sub_401000.
sub_401000 is called when the string ‘<no’ is found and a comparison to validate whether we found the correct content.
Moving forward we find out that:
The attacker tried to mix up the comparisons to the keyword < noscript > to avoid producing an obvious pattern.
The content the malware expects for a valid comparison are:
The file read from the Internet.
The URL that originally came from the configuration file.
After that, there is a jump table based on the value of the register: ‘d’, ‘s’, ‘r’, ‘n’.
Analyzing the ‘d’ case, we find out calls to CreateFile and WriteFile altering the configuration file.
The malware then:
Creates a process.
Overwrites the configuration file in order to redirect the malware to beacon to a different site.
And generally, serving as a client in a C2 server allowing lots of other functionalities.
Generating The Snort Rules🔣
Since the malware has a beacon component and a response component – we need to create multiple Snort rules to achieve full coverage.
The Beacon➡️
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"PM14.3.1 Specific
User-Agent with duplicate header"; content:"User-Agent|3a20|User-Agent|3a20|
Mozilla/4.0|20|(compatible\;|20|MSIE|20|7.0\;|20|Windows|20|NT|20|5.1\;|20|
.NET|20|CLR|20|3.0.4506.2152\;|20|.NET|20|CLR|20|3.5.30729)"; http_header;
sid:20001431; rev:1;)
We marked elements (below) are static and come from two different strings in the code. The rest, are ephemeral since they are defined by the URL.
As we mentioned, since the attacker made a mistake with hard coding the User-Agent, it is a strong indicator, and the rule should include it.
The Response⬅️
The responses will be in the following format:
... truncated_url/cmd_char.../arg96'
The malware searches for several static elements in the web page, including the noscript tag, the first characters of the URL (http://), and the trailing 96′.
Since the parsing function that reads the cmd_char structure is in a different area of the code and may be changed independently, it should be targeted separately. Thus, the following is the signature for targeting just the static elements expected by the malware:
alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:"PM14.3.2 Noscript
tag with ending"; content:""; content:"http\://"; distance:0;
within:512; content:"96'"; distance:0; within:512; sid:20001432; rev:1;)
The download and redirect functions both share the same routine to decode the URL, so we will target these two commands together:
alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:"PM14.3.3 Download
or Redirect Command"; content:"/08202016370000"; pcre:"/\/[dr][^\/]*\/
08202016370000/"; sid:20001433; rev:1;)
This signature uses the string 08202016370000, which we previously identified as the encoded representation of http://.
The PCRE rule option includes this string and forward slashes, and the ‘d’ and ‘r’ that indicate the download and redirect commands.
The \/ is an escaped forward slash, the [dr] represents either the character ‘d’ or ‘r’, the \/* matches zero or more characters that are not a forward slash, and the \/ is another escaped slash.
In a similar way, we would create the signatures for the quit and sleep commands.
So What Have We Seen👀
We talked about:
What is OPSEC and how an analyst should safely investigate a malware online (indirect & direct).
Content based countermeasures – What are they, and how they differ from behavior based counter measures.
We analyzed two malware samples – Each one having a different network characteristic.
As expected, our ability to create a precise content based signature will be determined by our ability to dissect the malware and leverage our knowledge of network internals.
Tips To Creating Accurate Snort Signatures📌
As a rule of thumb, we’ll want to create multiple signatures — each one targeting a different mechanism of the malicious code.
This approach makes detection more resilient to attacker modifications.
Attackers may try to slightly change their software to avoid detection by a specific signature.
By creating multiple signatures that key off different aspects of the communication, you can still successfully detect the malware, even if the attacker has updated a portion of the code.
Here are three additional key notes to remember when creating the signatures:
Focus on elements of the protocol that apply to both the client and server sides From the attacker’s perspective, changing code related to both the client and the server is much harder.
Look for elements of the protocol that use code on both the client and the server sides, and create a signature based on these elements.
The attacker will need to do a lot of extra work to render such a signature obsolete.
Focus on elements of the protocol known to be part of a key.
Often, some hard-coded components of a protocol are used as a key.
For example, an attacker may use a specific User-Agent string as an authentication key so that illegitimate probing can be detected (and possibly rerouted).
To bypass such a signature, an attacker would need to change code at both endpoints.
Identify elements of the protocol that are not immediately apparent in traffic.
Sometimes, the actions of multiple defenders can impede the detection of malware.
If another defender creates a signature that achieves sufficient success against an attacker, the attacker may be compelled to adjust his malware to avoid the signature.
If you rely on the same signature, or a signature that targets the same aspects of the attacker’s communication protocol, the attacker’s adjustment will affect your signature as well.
To avoid being rendered obsolete by the attacker’s response to another defender, try to identify aspects of malicious operations that other defenders might not have focused on.
Knowledge gained from carefully observing the malware will help you develop a more robust signature.
As a general conclusion, signatures based on malware analysis are more precise, reducing the trial and error needed to produce low false positive signatures.
Additionally, they have a higher likelihood of identifying new strains of the same malware.
So, What's Next❓
So far, we’ve acquired an impressive skill set of analyzing and reverse engineering malware samples. We’ve discussed major objectives malware will try to achieve: Persistance, Evasion and Stealth and saw various use cases where each one is being implemented.
Now we’re in the major league.
The next articles will focus on more advanced topics, such as:
Analyzing Rootkits
Unpacking Malware Samples
Identifying and Analyzing Shellcode
And much more😈
References
The following resources are taken from an amazing book called:
Practical Malware Analysis By Michael Sikorski and Andrew Honig
The two malware samples
The network attributes image
The Base64 image comparing decoded and encoded data
Tips and tricks regarding creating snort signatures
All other resources were generated through the analysis process
GET IN TOUCH
Excited to learn more about Windows Malware Analysis? We just released our latest training on Windows Malware Analysis And Memory Forensics! You can find more information about it here. Please don’t hesitate to reach out to us through our Contact Us page if you have any questions.
Visit our training page if you’re interested in learning more about our other course offerings and want to develop your abilities further. Additionally, you may look through our Events page and sign up for our upcoming Public trainings.