Newsletter Issue #175: YARA Deep Dive

Tools Series

Danny

Sep 04, 2025

This is a revisit to the previous posts about YARA and its capabilities.

Tools Deep Dive: YARA Part ll

Danny

October 17, 2024

This is a follow up to the previous post about YARA, the versatile pattern matching tool.

Read full story

If you haven’t read any of parts I through part III, a good mental model for what YARA can do is the following.

What regex is for text, YARA is for files
What Sigma is for logs, YARA is for files

It really is a versatile tool, which you can do much more than scan for malware.

You can use YARA for any of the following

detecting a specific sample of malware
detecting a specific class of malware
scanning email attachments
testing for certain strings in memory
utilities that need to verify file formats

Even when it’s not about malware, YARA can be useful for most things file related.

When it comes to doing any meaningful amount of testing or running YARA at scale, you’re only as good as your data.

As mentioned previously, some ideas for building a malware corpus

VirusTotal
AnyRun
Vx-Underground

This last option has some large datasets if you go to the Yearly Archives

Building “Goodware” AKA Clean samples

Microsoft, or Ubuntu ISO’s
Empty files

These are just some ideas to form your datasets.

Here’s an example rule

rule UserCreationThenDeletion {
  meta:

  events:
    $create.target.user.userid = $user
    $create.metadata.event_type = "USER_CREATION"

    $delete.target.user.userid = $user
    $delete.metadata.event_type = "USER_DELETION"

    $create.metadata.event_timestamp.seconds <=
       $delete.metadata.event_timestamp.seconds

  match:
    $user over 4h

  condition:
    $create and $delete
}

This rule looks for users that have been created and then deleted within a 4 hour time period.

Deep Dive on a Rule

Let’s take a closer look at a sample rule.

This is a rule looking to match on malware that masquerades itself to look as a legitimate Windows system binary. This would be a MITRE Technique of Masquerading

In this case, the rule is looking to match on a specific sample of the threat actor Ferocious Kitten.

This rule is looking to do the following

defines strings for bitsadmin commands
looking for a count of 0 signatures (unsigned)
looking for any of the bitsadmin defined strings

In the strings section, you can see it’s looking for the 2 bitsadmin commands in the contents of the file, in this case any of the two.

It’s then looking for the file to be unsigned in the condition section.

Now, there could be signed malware of course, but this is a simple example rule to demonstrate the concept.

Let’s take a look at a different rule this time looking to detect behavior, rather than matching on a specific sample.

This one is has a bit more content to it, but in a nutshell is looking to do the following

defining strings for legitimately Microsoft signed files
a regex pattern to match on a given file path
a PE header check
printing to stdout file name
excluding the previously defined strings for Microsoft signed files

Is this rule perfect? No, but it will catch many instances of malware masquerading itself rather than the one previously shown.

Best Practices

Some good practices to keep in mind.

A good way to limit the scope of your rule to a specific filetype, is with the header.

Take this condition from the above example rule

condition:
    uint16(0) == 0x5a4d

This is checking for the PE MZ header in Windows files.

Or take this one, checking for the ELF header in Linux.

condition:
    uint32(0) == 0x464C457F

For more on this, see this resource

https://www.optiv.com/insights/source-zero/blog/selective-yara-scanning-whats-your-type

This accomplishes two things: you narrow the scope of your rule for higher fidelity, and your rule doesn’t alert on itself (Yes, that’s a thing).

If you’re testing across a large dataset, you don’t want to be matching a string on filetypes that you are not even looking for.

Another best practice is to use modules when possible. For example, in the above example

import "pe"

would be the first line of the rule.

And then when calling it, pe.version_info will access the version information for that PE (Portable Executable) file.

Using modules is a clean and effective way to get more of the capabilities out of YARA for your rules.

This makes rule writing easier.

New Beginnings

There are always new developments being made in Security, and YARA is no exception.

Here are two big adaptations of YARA.

YARA-L

This is a detection and threat hunting language used for Google Security Operations (formerly Chronicle).

It is used for analyzing large volumes of log data (rather than file data) through pattern matching.

It then is used to create detections from the findings in the log data.

A lot of rich features, but tied to Google Security Operations.

YARA-X

This was developed by the original YARA team at VirusTotal. It has been tested across millions of files and is ready for production use.

They’re encouraging users to switch to YARA-X as no new features will be added to YARA.

Some new features for YARA-X are better error reporting and performance. This will come in handy when working with large files or filesystems (recursive)

What I Read This Week

Finding vulnerabilities in modern web apps using Claude Code and OpenAI Codex
- Findings on how well these AI Agents are at finding vulnerabilities and their false positive rate, across 11 open source projects
- “Running the exact same prompt on the exact same codebase multiple times often yielded vastly different results”
- Sounds about right. I’ve found that most people that praise AI tools as replacing humans, haven’t actually gotten in the weeds with them

Malvertising Campaign on Meta Expands to Android, Pushing Advanced Crypto-Stealing Malware
- 75 localized ads part of a Mobile malware campaign
- One of the ads was even paired with an image of a Labubu
- The links redirect to a malicious cloned version of the app
Supply Chain Security Alert: Popular Nx Build System Package Compromised with Data-Stealing Malware
- More of the same, with a new twist. Supply chain attacks, but this time with LLM agents
- “The first known case where malware harnessed developer-facing AI CLI tools”
Threat Intelligence Case Study: Dissecting a Multi-Stage Phishing Campaign Against YouTube Creators
- It’s always good to read how someone dealt with a scam attempt and get that perspective
- Some notes from the walkthrough, sometimes you can’t replace intuition.
  You can have the tools at your disposal but when something feels off, you have to recognize it, even if the phishing lure is convincing
The Ongoing Fallout from a Breach at AI Chatbot Maker Salesloft
- More on last week’s story and its continued aftermath
- The massive data haul is reported to include credentials such as AWS keys, VPN credentials, Salesforce and Snowflake creds

Wrapping Up

In this post, we refreshed on YARA’s capabilities, use cases for the tool, its new developments, and dove into the details behind a rule and what it would detect.

See you in the next one.

Danny's Newsletter

Tools Deep Dive: YARA Part ll

Discussion about this post