## Archive for the ‘Malware’ Category

### Why Learn OpenAI? In a word, Malware!

Tuesday, August 1st, 2017

Spadafora reports on Endgame‘s malware generating software, Malware Env for OpenAI Gym.

From the Github page:

This is a malware manipulation environment for OpenAI’s gym. OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This makes it possible to write agents that learn to manipulate PE files (e.g., malware) to achieve some objective (e.g., bypass AV) based on a reward provided by taking specific manipulation actions.
… (highlight in original)

The value of the OpenAI philosophy:

We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as possible. The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field.

will vary depending upon your objectives.

From my perspective, it’s better for my AI to decide to reach out or stay its hand, as opposed to relying upon ethical behavior of another AI.

You?

### Concealed Vulnerability Survives Reboots – Consumers Left in Dark

Monday, June 19th, 2017

From the post:

Until now, all malware targeting IoT devices survived only until the user rebooted his equipment, which cleared the device’s memory and erased the malware from the user’s equipment.

Intense Internet scans for vulnerable targets meant that devices survived only minutes until they were reinfected again, which meant that users needed to secure devices with unique passwords or place behind firewalls to prevent exploitation.

New vulnerability allows for permanent Mirai infections

While researching the security of over 30 DVR brands, researchers from Pen Test Partners have discovered a new vulnerability that could allow the Mirai IoT worm and other IoT malware to survive between device reboots, permitting for the creation of a permanent IoT botnet.

“We’ve […] found a route to remotely fix Mirai vulnerable devices,” said Pen Test Partners researcher Ken Munro. “Problem is that this method can also be used to make Mirai persistent beyond a power off reboot.”

Understandably, Munro and his colleagues decided to refrain from publishing any details about this flaw, fearing that miscreants might weaponize it and create non-removable versions of Mirai, a malware known for launching some of the biggest DDoS attacks known today.

Do security researchers realize concealing vulnerabilities prevents market forces from deciding the fate of insecure systems?

Should security researchers marketing vulnerabilities to manufacturers be more important than the operation market forces on their products?

More important than your right to choose products based on the best and latest information?

Market forces are at work here, but they aren’t ones that will benefit consumers.

### E-Cigarette Can Hack Your Computer (Is Nothing Sacred?)

Monday, June 19th, 2017

Kavita Iyer has the details on how an e-cigarette can be used to hack your computer at: Know How E-Cigarette Can Be Used By Hackers To Target Your Computer.

I’m guessing you aren’t so certain that expensive e-cigarette you “found” is harmless after all?

Malware in e-cigarettes seems like a stretch given the number of successful phishing emails every year.

But, a recent non-smoker maybe the security lapse you need.

### Personal Malware Analysis Lab – Summer Project

Wednesday, June 7th, 2017

Set up your own malware analysis lab with VirtualBox, INetSim and Burp by Christophe Tafani-Dereeper.

Whether you are setting this up for yourself and/or a restless child, what a great summer project!

You can play as well so long as you don’t mind losing to nimble minded tweens and teens. 😉

It’s never too early to teach cybersecurity and penetration skills or to practice your own.

With a little imagination as far as prizes, this could be a great family activity.

It’s a long way from playing Yahtzee with your girlfriend, her little brother and her mother, but we have all come a long way since then.

### Which Malware Lures Work Best?

Monday, June 1st, 2015

Which Malware Lures Work Best? Measurements from a Large Instant Messaging Worm by Tyler Moore and Richard Clayton.

Abstract:

Users are inveigled into visiting a malicious website in a phishing or malware-distribution scam through the use of a ‘lure’ – a superficially valid reason for their interest. We examine real world data from some ‘worms’ that spread over the social graph of Instant Messenger users. We find that over 14 million distinct users clicked on these lures over a two year period from Spring 2010. Furthermore, we present evidence that 95% of users who clicked on the lures became infected with malware. In one four week period spanning May–June 2010, near the worm’s peak, we estimate that at least 1.67 million users were infected. We measure the extent to which small variations in lure URLs and the short pieces of text that accompany these URLs affects the likelihood of users clicking on the malicious URL. We show that the hostnames containing recognizable brand names were more effective than the terse random strings employed by URL shortening systems; and that brief Portuguese phrases were more effective in luring in Brazilians than more generic ‘language independent’ text.

Slides

How better to learn what to teach users to avoid than by watching users choose malware?

Although since the highly trained professionals at the TSA miss 95% of test explosives and guns, I’m not sure that user training is the answer to malware URLs.

Perhaps detection and automated following of all links in messages/emails, from a computer setup to detect malware? Not sure how you would get a warning to users.

Still, I like the idea of seeing what users do rather than speculating about what they might do. The latter technique being a favorite of our national security apparatus. Mostly because it drives budgets.

### Open-Source projects: Computer Security Group at the University of Göttingen, Germany.

Monday, January 12th, 2015

I mentioned Joern March 2014 but these other projects may be of interest as well:

Joern: A Robust Tool for Static Code Analysis

Joern is a platform for robust analysis of C/C++ code. It generates code property graphs, a novel graph representation of code that exposes the code’s syntax, control-flow, data-flow and type information. Code property graphs are stored in a Neo4J graph database. This allows code to be mined using search queries formulated in the graph traversal language Gremlin. (Paper1,
Paper2,Paper3)

Harry: A Tool for Measuring String Similarity

Harry is a tool for comparing strings and measuring their
similarity. The tool supports several common distance and kernel
functions for strings as well as some excotic similarity measures. The
focus lies on implicit similarity measures, that is, comparison
functions that do not give rise to an explicit vector space. Examples of such similarity measures are the Levenshtein and Jaro-Winkler distance.

Adagio: Structural Analysis and Detection of Android Malware

Adagio is a collection of Python modules for analyzing and detecting
Android malware. These modules allow to extract labeled call graphs from Android APKs or DEX files and apply an explicit feature map that captures their structural relationships. Additional modules provide classes for designing binary or multiclass classification experiments and applying machine learning for detection of malicious structure. (Paper1, Paper2)

Salad: A Content Anomaly Detector based on n-Grams

implementation of the anomaly detection method Anagram. The method
uses n-grams (substrings of length n) maintained in a Bloom filter
for efficiently detecting anomalies in large sets of string data.
Salad extends the original method by supporting n-grams of bytes as
well n-grams of words and tokens. (Paper)

Sally: A Tool for Embedding Strings in Vector Spaces

Sally is a small tool for mapping a set of strings to a set of
vectors. This mapping is referred to as embedding and allows for
applying techniques of machine learning and data mining for
analysis of string data. Sally can applied to several types of
string data, such as text documents, DNA sequences or log files,
where it can handle common formats such as directories, archives
and text files. (Paper)

Malheur: Automatic Analysis of Malware Behavior

Malheur is a tool for the automatic analysis of program behavior
recorded from malware. It has been designed to support the regular
analysis of malware and the development of detection and defense
measures. Malheur allows for identifying novel classes of malware
with similar behavior and assigning unknown malware to discovered
classes using machine learning. (Paper)

Prisma: Protocol Inspection and State Machine Analysis

Prisma is an R package for processing and analyzing huge text
corpora. In combination with the tool Sally the package provides
testing-based token selection and replicate-aware, highly tuned
non-negative matrix factorization and principal component analysis. Prisma allows for analyzing very big data sets even on desktop machines.
(Paper)

Derrick: A Simple Network Stream Recorder

Derrick is a simple tool for recording data streams of TCP and UDP
traffic. It shares similarities with other network recorders, such as
tcpflow and wireshark, where it is more advanced than the first and
clearly inferior to the latter. Derrick has been specifically designed to monitor application-layer communication. In contrast to other tools the application data is logged in a line-based ASCII format. Common UNIX tools, such as grep, sed & awk, can be directly applied.

There are days when malware is a relief from thinking about present and proposed government policies.

I first saw this in a tweet by Kirk Borne.

### Cynomix Automatic Analysis, Clustering, and Indexing of Malware

Saturday, November 29th, 2014

From the description:

Malware analysts in the public and private sectors need to make sense of an ever-growing stream of malware on an ongoing basis yet the common modus operandi is to analyze each file individually, if at all.

In the current paradigm, it is difficult to quickly understand the attributes of a particular set of malware binaries and how they differ from or are similar to others in a large database, to re-use previous analyses performed on similar samples, and to collaborate with other analysts. Thus, work is carried out inefficiently and a valuable intelligence signal may be squandered.

In this webinar, you will learn about Cynomix, a web-based community malware triage tool that:

• Creates a paradigm shift in scalable malware analysis by providing capabilities for automatic analysis, clustering, and indexing of malware
• Uses novel machine learning and scalable search technologies
• Provides several interactive views for exploring large data sets of malware binaries.

Visualization/analysis tool for malware. Creating a global database of malware data.

No anonymous submission of malware at present but “not keeping a lot of data” on submissions. No one asked what “not keeping a lot of data” meant exactly. There may be a gap in what is meant by and heard by as “a lot.” Currently, 35,000 instances of malware in the system. There have been as many as a million samples in the system.

Very good visualization techniques. Changes to data requests produced changes in the display of “similar” malware.

Take special note that networks/clusters change based on selection of facets. Imagine a topic map that could do the same with merging.

If you are interested in public (as opposed to secret) collecting of malware, this is an effort to support.

I first saw this in a tweet by Rui SFDA.

PS: You do realize that contemporary governments, like other franchises, are responsible for your cyber-insecurity. Yes?

Saturday, July 19th, 2014

Government-Grade Stealth Malware In Hands Of Criminals by Sara Peters.

From the post:

Malware originally developed for government espionage is now in use by criminals, who are bolting it onto their rootkits and ransomware.

The malware, dubbed Gyges, was first discovered in March by Sentinel Labs, which just released an intelligence report outlining their findings. From the report: “Gyges is an early example of how advanced techniques and code developed by governments for espionage are effectively being repurposed, modularized and coupled with other malware to commit cybercrime.”

Sentinel was able to detect Gyges with on-device heuristic sensors, but many intrusion prevention systems would miss it. The report states that Gyges’ evasion techniques are “significantly more sophisticated” than the payloads attached. It includes anti-detection, anti-tampering, anti-debugging, and anti-reverse-engineering capabilities.

The figure I keep hearing quoted is that cybersecurity attackers are ten years ahead of cybersecurity defenders.

Is that what you hear?

Whatever the actual gap, what makes me curious is why the gap exists at all? I assume the attackers and defenders are on par as far as intelligence, programming skills, financial support, etc., so what is the difference that accounts for the gap?

I don’t have the answer or even a suspicion of a suggestion but suspect someone else does.

Pointers anyone?

### BinaryPig: Scalable Static Binary Analysis Over Hadoop

Friday, November 22nd, 2013

BinaryPig: Scalable Static Binary Analysis Over Hadoop (Guest post at Cloudera: Telvis Calhoun, Zach Hanif, and Jason Trost of Endgame)

From the post:

Over the past three years, Endgame received 40 million samples of malware equating to roughly 19TB of binary data. In this, we’re not alone. McAfee reports that it currently receives roughly 100,000 malware samples per day and received roughly 10 million samples in the last quarter of 2012. Its total corpus is estimated to be about 100 million samples. VirusTotal receives between 300,000 and 600,000 unique files per day, and of those roughly one-third to half are positively identified as malware (as of April 9, 2013).

This huge volume of malware offers both challenges and opportunities for security research, especially applied machine learning. Endgame performs static analysis on malware in order to extract feature sets used for performing large-scale machine learning. Since malware research has traditionally been the domain of reverse engineers, most existing malware analysis tools were designed to process single binaries or multiple binaries on a single computer and are unprepared to confront terabytes of malware simultaneously. There is no easy way for security researchers to apply static analysis techniques at scale; companies and individuals that want to pursue this path are forced to create their own solutions.

Our early attempts to process this data did not scale well with the increasing flood of samples. As the size of our malware collection increased, the system became unwieldy and hard to manage, especially in the face of hardware failures. Over the past two years we refined this system into a dedicated framework based on Hadoop so that our large-scale studies are easier to perform and are more repeatable over an expanding dataset.

To address this problem, we created an open source framework, BinaryPig, built on Hadoop and Apache Pig (utilizing CDH, Cloudera’s distribution of Hadoop and related projects) and Python. It addresses many issues of scalable malware processing, including dealing with increasingly large data sizes, improving workflow development speed, and enabling parallel processing of binary files with most pre-existing tools. It is also modular and extensible, in the hope that it will aid security researchers and academics in handling ever-larger amounts of malware.

For more details about BinaryPig’s architecture and design, read our paper from Black Hat USA 2013 or check out our presentation slides. BinaryPig is an open source project under the Apache 2.0 License, and all code is available on Github.

You may have heard the rumor that storing more than seven (7) days of food marks you as a terrorist in the United States.

Be forewarned: Doing Massive Malware Analsysis May Make You A Terrorist Suspect.

The “storing more than seven (7) days of food” rumor originated with Rand Paul R-Kentucky.

The Community Against Terrorism FBI flyer, assuming the pointers I found are accurate, says nothing about how many days of food you have on hand.

Rather it says:

Make bulk purchases of items to include:

That’s an example of using small data analysis to disprove a rumor.

Unless you are an anthropologist, I would not rely on data from CSpan2.

### Ads 182 Times More Dangerous Than Porn

Thursday, February 7th, 2013

Cisco Annual Security Report: Threats Step Out of the Shadows

From the post:

Despite popular assumptions that security risks increase as a person’s online activity becomes shadier, findings from Cisco’s 2013 Annual Security Report (ASR) reveal that the highest concentration of online security threats do not target pornography, pharmaceutical or gambling sites as much as they do legitimate destinations visited by mass audiences, such as major search engines, retail sites and social media outlets. In fact, Cisco found that online shopping sites are 21 times as likely, and search engines are 27 times as likely, to deliver malicious content than a counterfeit software site. Viewing online advertisements? Advertisements are 182 as times likely to deliver malicious content than pornography. (emphasis added)

Numbers like this make me wonder: Is anyone indexing ads?

Or better yet, creating a topic map that maps back to the creators/origins of ad content?

That has the potential to be a useful service, unlike porn blocking ones.

Legitimate brands would have an incentive to stop malware in their ads, origins of malware ads would be exposed (blocked?).

I first saw this at Quick Links by Greg Linden.

### Mapping and Monitoring Cyber Threats

Thursday, June 21st, 2012

Mapping and Monitoring Cyber Threats

From the post:

Threats to information security are part of everyday life for government agencies and companies both big and small. Monitoring network activity, setting up firewalls, and establishing various forms of authentication are irreplaceable components of IT security infrastructure, yet strategic defensive work increasingly requires the added context of real world events. The web and its multitude of channels covering emerging threat vectors and hacker news can help provide warning signs of potentially disruptive information security events.

However, the challenge that analysts typically face is an overwhelming volume of intelligence that requires brute force aggregation, organization, and assessment. What if significant portions of the first two tasks could be accomplished more efficiently allowing for greater resources allocated to the all important third step of analysis?

We’ll outline how Recorded Future can help security teams harness the open source intelligence available on various threat vectors and attacks, activity of known cyber organizations during particular periods of time, and explicit warnings as well as implicit risks for the future.

Interesting but I would add to the “threat” map known instances where recordable media can be used, email or web traffic traceable to hacker lists/websites, offices or departments with prior security issues and the like.

Security can become too narrowly focused on technological issues, ignoring that a large number of security breaches are the result of human lapses or social engineering. A bit broader mapping of security concerns can help keep the relative importance of threats in perspective.

### Adobe Releases Malware Classifier Tool

Wednesday, April 4th, 2012

Adobe Releases Malware Classifier Tool by Dennis Fisher.

From the post:

Adobe has published a free tool that can help administrators and security researchers classify suspicious files as malicious or benign, using specific machine-learning algorithms. The tool is a command-line utility that Adobe officials hope will make binary classification a little easier.

Adobe researcher Karthik Raman developed the new Malware Classifier tool to help with the company’s internal needs and then decided that it might be useful for external users, as well.

” To make life easier, I wrote a Python tool for quick malware triage for our team. I’ve since decided to make this tool, called “Adobe Malware Classifier,” available to other first responders (malware analysts, IT admins and security researchers of any stripe) as an open-source tool, since you might find it equally helpful,” Raman wrote in a blog post.

“Malware Classifier uses machine learning algorithms to classify Win32 binaries – EXEs and DLLs – into three classes: 0 for “clean,” 1 for “malicious,” or “UNKNOWN.” The tool extracts seven key features from a binary, feeds them to one or all of the four classifiers, and presents its classification results.”

Old hat that malware scanners have been using machine learning but new that you can now see it from the inside.

Lessons to be learned about machine learning algorithms for malware and other uses with software.

### Improved Call Graph Comparison Using Simulated Annealing

Friday, November 18th, 2011

Improved Call Graph Comparison Using Simulated Annealing by Orestis Kostakis, Joris Kinable, Hamed Mahmoudi, Kimmo Mustonen.

Abstract:

The amount of suspicious binary executables submitted to Anti-Virus (AV) companies are in the order of tens of thousands per day. Current hash-based signature methods are easy to deceive and are inefficient for identifying known malware that have undergone minor changes. Examining malware executables using their call graphs view is a suitable approach for overcoming the weaknesses of hash-based signatures. Unfortunately, many operations on graphs are of high computational complexity. One of these is the Graph Edit Distance (GED) between pairs of graphs, which seems a natural choice for static comparison of malware. We demonstrate how Simulated Annealing can be used to approximate the graph edit distance of call graphs, while outperforming previous approaches both in execution time and solution quality. Additionally, we experiment with opcode mnemonic vectors to reduce the problem size and examine how Simulated Annealing is affected.

From the introduction:

To facilitate the recognition of highly similar executables or commonalities among multiple executables which have been subject to modification, a high-level structure, i.e. an abstraction, of the samples is required. One such abstraction is the call graph which is a graphical representation of a binary executable, where functions are modelled as vertices and calls between those functions as directed edges. Minor changes in the body of the code are not reflected in the structure of the graph.

Can you say subject identity? 😉

How you judge subject identity depends on the circumstances and requirements of any given situation.

Very recent and I suspect important work on the detection of malware.

### Using Machine Learning to Detect Malware Similarity

Wednesday, September 21st, 2011

Using Machine Learning to Detect Malware Similarity by Sagar Chaki.

From the post:

Malware, which is short for “malicious software,” consists of programming aimed at disrupting or denying operation, gathering private information without consent, gaining unauthorized access to system resources, and other inappropriate behavior. Malware infestation is of increasing concern to government and commercial organizations. For example, according to the Global Threat Report from Cisco Security Intelligence Operations, there were 287,298 “unique malware encounters” in June 2011, double the number of incidents that occurred in March. To help mitigate the threat of malware, researchers at the SEI are investigating the origin of executable software binaries that often take the form of malware. This posting augments a previous posting describing our research on using classification (a form of machine learning) to detect “provenance similarities” in binaries, which means that they have been compiled from similar source code (e.g., differing by only minor revisions) and with similar compilers (e.g., different versions of Microsoft Visual C++ or different levels of optimization).

Interesting study in the development of ways to identify a subject that is trying to hide. Not to mention some hard core disassembly and other techniques.