Open-Source projects: Computer Security Group at the University of Göttingen, Germany.
I mentioned Joern March 2014 but these other projects may be of interest as well:
Joern: A Robust Tool for Static Code Analysis
Joern is a platform for robust analysis of C/C++ code. It generates code property graphs, a novel graph representation of code that exposes the code’s syntax, control-flow, data-flow and type information. Code property graphs are stored in a Neo4J graph database. This allows code to be mined using search queries formulated in the graph traversal language Gremlin. (Paper1,
Paper2,Paper3)
Harry: A Tool for Measuring String Similarity
Harry is a tool for comparing strings and measuring their
similarity. The tool supports several common distance and kernel
functions for strings as well as some excotic similarity measures. The
focus lies on implicit similarity measures, that is, comparison
functions that do not give rise to an explicit vector space. Examples of such similarity measures are the Levenshtein and Jaro-Winkler distance.
Adagio: Structural Analysis and Detection of Android Malware
Adagio is a collection of Python modules for analyzing and detecting
Android malware. These modules allow to extract labeled call graphs from Android APKs or DEX files and apply an explicit feature map that captures their structural relationships. Additional modules provide classes for designing binary or multiclass classification experiments and applying machine learning for detection of malicious structure. (Paper1, Paper2)
Salad: A Content Anomaly Detector based on n-Grams
Letter Salad, or Salad for short, is an efficient and flexible
implementation of the anomaly detection method Anagram. The method
uses n-grams (substrings of length n) maintained in a Bloom filter
for efficiently detecting anomalies in large sets of string data.
Salad extends the original method by supporting n-grams of bytes as
well n-grams of words and tokens. (Paper)
Sally: A Tool for Embedding Strings in Vector Spaces
Sally is a small tool for mapping a set of strings to a set of
vectors. This mapping is referred to as embedding and allows for
applying techniques of machine learning and data mining for
analysis of string data. Sally can applied to several types of
string data, such as text documents, DNA sequences or log files,
where it can handle common formats such as directories, archives
and text files. (Paper)
Malheur: Automatic Analysis of Malware Behavior
Malheur is a tool for the automatic analysis of program behavior
recorded from malware. It has been designed to support the regular
analysis of malware and the development of detection and defense
measures. Malheur allows for identifying novel classes of malware
with similar behavior and assigning unknown malware to discovered
classes using machine learning. (Paper)
Prisma: Protocol Inspection and State Machine Analysis
Prisma is an R package for processing and analyzing huge text
corpora. In combination with the tool Sally the package provides
testing-based token selection and replicate-aware, highly tuned
non-negative matrix factorization and principal component analysis. Prisma allows for analyzing very big data sets even on desktop machines.
(Paper)
Derrick: A Simple Network Stream Recorder
Derrick is a simple tool for recording data streams of TCP and UDP
traffic. It shares similarities with other network recorders, such as
tcpflow and wireshark, where it is more advanced than the first and
clearly inferior to the latter. Derrick has been specifically designed to monitor application-layer communication. In contrast to other tools the application data is logged in a line-based ASCII format. Common UNIX tools, such as grep, sed & awk, can be directly applied.
There are days when malware is a relief from thinking about present and proposed government policies.
I first saw this in a tweet by Kirk Borne.