Fuzzing To Find Subjects

Guido Vranken‘s post: The OpenVPN post-audit bug bonanza is an important review of bugs discovered in OpenVPN.

Jump to “How I fuzzed OpenVPN” for the details on Vranken fuzzing OpenVPN.

Not for the novice but an inspiration to devote time to the art of fuzzing.

The Open Web Application Security Project (OWASP) defines fuzzing this way:

Fuzz testing or Fuzzing is a Black Box software testing technique, which basically consists in finding implementation bugs using malformed/semi-malformed data injection in an automated fashion.

OWASP’s fuzzing mentions a number of resources and software, but omits the Basic Fuzzing Framework by CERT. That’s odd don’t you think?

The CERT Basic Fuzzing Framework (BFF), is current through 2016. Allen Householder has a description of version 2.8 at: Announcing CERT Basic Fuzzing Framework Version 2.8. Details on BFF, see: CERT BFF – Basic Fuzzing Framework.

Caution: One resource in the top ten (#9) for “fuzzing software” is: Fuzzing: Brute Force Vulnerability Discovery, by Michael Sutton, Adam Greene, and Pedram Amini. Great historical reference but it was published in 2007, some ten years ago. Look for more recent literature and software.

Fuzzing is obviously an important topic in finding subjects (read vulnerabilities) in software. Whether your intent is to fix those vulnerabilities or use them for your own purposes.

While reading Vranken‘s post, it occurred to me that “fuzzing” is also useful in discovering subjects in unmapped data sets.

Not all nine-digit numbers are Social Security Numbers but if you find a column of such numbers, along with what you think are street addresses and zip codes, it would not be a bad guess. Of course, if it is a 16-digit number, a criminal opportunity may be knocking at your door. (credit card)

While TMDM topic maps emphasized the use of URIs for subject identifiers, we all know that subject identifications outside of topic maps are more complex than string matching and far messier.

How would you create “fuzzy” searches to detect subjects across different data sets? Are there general principles for classes of subjects?

While your results might be presented as a curated topic map, the grist for that map would originate in the messy details of diverse information.

This sounds like an empirical question to me, especially since most search engines offer API access.

Thoughts?

Leave a Reply

You must be logged in to post a comment.