Data mining for network security and intrusion detection by Dzidorius Martinaitis.
One of my favourite stories about network security/intrusion was in a Netware class. The instructor related that in a security “audit,” of a not small firm, it was discovered the Novell servers were sitting in a room that everyone, including the cleaning crew, had access.
Guess they never heard of physical security or Linux boot disks.
Assuming you have taken care of the obvious security risks, topic maps might be useful in managing the results of data mining.
From the post:
In preparation for “Haxogreen” hackers summer camp which takes place in Luxembourg, I was exploring network security world. My motivation was to find out how data mining is applicable to network security and intrusion detection.
Flame virus, Stuxnet, Duqu proved that static, signature based security systems are not able to detect very advanced, government sponsored threats. Nevertheless, signature based defense systems are mainstream today – think of antivirus, intrusion detection systems. What do you do when unknown is unknown? Data mining comes to mind as the answer.
There are following areas where data mining is or can be employed: misuse/signature detection, anomaly detection, scan detection, etc.
Misuse/signature detection systems are based on supervised learning. During learning phase, labeled examples of network packets or systems calls are provided, from which algorithm can learn about the threats. This is very efficient and fast way to find know threats. Nevertheless there are some important drawbacks, namely false positives, novel attacks and complication of obtaining initial data for training of the system.
The false positives happens, when normal network flow or system calls are marked as a threat. For example, an user can fail to provide the correct password for three times in a row or start using the service which is deviation from the standard profile. Novel attack can be define as an attack not seen by the system, meaning that signature or the pattern of such attack is not learned and the system will be penetrated without the knowledge of the administrator. The latter obstacle (training dataset) can be overcome by collecting the data over time or relaying on public data, such as DARPA Intrusion Detection Data Set.
Although misuse detection can be built on your own data mining techniques, I would suggest well known product like Snort which relays on crowd-sourcing.
Taking Snort as an example, what other system data would you want to merge with data from Snort?
Or for that matter, how would you share such information (Snort+) with others?
PS: Be aware that cyber-attack/security/warfare are hot topics and therefore marketing opportunities.