Archive for the ‘Genetic Algorithms’ Category

Astrophysical data mining with GPU…

Tuesday, April 9th, 2013

Astrophysical data mining with GPU. A case study: genetic classification of globular clusters by Stefano Cavuoti, Mauro Garofalo, Massimo Brescia, Maurizio Paolillo, Antonio Pescape’, Giuseppe Longo, Giorgio Ventre.

Abstract:

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of GAME will be made available to the community by integrating it into the web application DAMEWARE (DAta Mining Web Application REsource), a public data mining service specialized on massive astrophysical data. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm leads to a speedup of a factor of 200x in the training phase with respect to the CPU based version.

BTW, DAMEWARE (DAta Mining Web Application REsource, http://dame.dsf.unina.it/beta_info.html.

In case you are curious about the application of genetic algorithms in a low signal/noise situation with really “big” data, this is a good starting point.

Makes me curious about the “noise” in other communications.

The “signal” is fairly easy to identify in astronomy, but what about in text or speech?

I suppose “background noise, music, automobiles” would count as “noise” on a tape recording of a conversation, but is there “noise” in a written text?

Or noise in a conversation that is clearly audible?

If we have 100% signal, how do we explain failing to understand a message in speech or writing?

If it is not “noise,” then what is the problem?

Machine Learning: Genetic Algorithms in Javascript Part 2

Sunday, September 16th, 2012

Machine Learning: Genetic Algorithms in Javascript Part 2 by Burak Kanber.

From the post:

Today we’re going to revisit the genetic algorithm. If you haven’t read Genetic Algorithms Part 1 yet, I strongly recommend reading that now. This article will skip over the fundamental concepts covered in part 1 — so if you’re new to genetic algorithms you’ll definitely want to start there.

Just looking for the example?

The Problem

You’re a scientist that has recently been framed for murder by an evil company. Before you flee the lab you have an opportunity to steal 1,000 pounds (or kilograms!) of pure elements from the chemical warehouse; your plan is to later sell them and survive off of the earnings.

Given the weight and value of each element, which combination should you take to maximize the total value without exceeding the weight limit?

This is called the knapsack problem. The one above is a one-dimensional problem, meaning the only constraint is weight. We could complicate matters by also considering volume, but we need to start somewhere. Note that in our version of the problem only one piece of each element is available, and each piece has a fixed weight. There are some knapsack problems where you can take unlimited platinum or up to 3 pieces of gold or something like that, but here we only have one of each available to us.

Why is this problem tough to solve? We’ll be using 118 elements. The brute-force approach would require that we test 2118 or 3.3 * 1035 different combinations of elements.

What if you have subject identity criteria of varying reliability? What is the best combination for the highest reliability?

To sharpen the problem: Your commanding officer has requested declaration of sufficient identity for a drone strike target.

Machine Learning: Genetic Algorithms Part 1 (Javascript)

Sunday, September 16th, 2012

Machine Learning: Genetic Algorithms Part 1 (Javascript) by Burak Kanber.

From the post:

I like starting my machine learning classes with genetic algorithms (which we’ll abbreviate “GA” sometimes). Genetic algorithms are probably the least practical of the ML algorithms I cover, but I love starting with them because they’re fascinating and they do a good job of introducing the “cost function” or “error function”, and the idea of local and global optima — concepts both important and common to most other ML algorithms.

Genetic algorithms are inspired by nature and evolution, which is seriously cool to me. It’s no surprise, either, that artificial neural networks (“NN”) are also modeled from biology: evolution is the best general-purpose learning algorithm we’ve experienced, and the brain is the best general-purpose problem solver we know. These are two very important pieces of our biological existence, and also two rapidly growing fields of artificial intelligence and machine learning study. While I’m tempted to talk more about the distinction I make between the GA’s “learning algorithm” and the NN’s “problem solver” terminology, we’ll drop the topic of NNs altogether and concentrate on GAs… for now.

One phrase I used above is profoundly important: “general-purpose”. For almost any specific computational problem, you can probably find an algorithm that solves it more efficiently than a GA. But that’s not the point of this exercise, and it’s also not the point of GAs. You use the GA not when you have a complex problem, but when you have a complex problem of problems. Or you may use it when you have a complicated set of disparate parameters.

Off to a great start!

Genetic algorithms: a simple R example

Saturday, August 4th, 2012

Genetic algorithms: a simple R example by Bart Smeets.

From the post:

Genetic algorithm is a search heuristic. GAs can generate a vast number of possible model solutions and use these to evolve towards an approximation of the best solution of the model. Hereby it mimics evolution in nature.

GA generates a population, the individuals in this population (often called chromosomes) have a given state. Once the population is generated, the state of these individuals is evaluated and graded on their value. The best individuals are then taken and crossed-over – in order to hopefully generate ‘better’ offspring – to form the new population. In some cases the best individuals in the population are preserved in order to guarantee ‘good individuals’ in the new generation (this is called elitism).

The GA site by Marek Obitko has a great tutorial for people with no previous knowledge on the subject.

As the size of data stores increase, the cost of personal judgement on each subject identity test will as well. Genetic algorithms may be one way of creating subject identity tests in such situations.

In any event, it won’t harm anyone to be aware of the basic contours of the technique.

I first saw this at R-Bloggers.

Experiments in genetic programming

Monday, March 19th, 2012

Experiments in genetic programming

Lars Marius Garshol writes:

I made an engine called Duke that can automatically match records to see if they represent the same thing. For more background, see a previous post about it. The biggest problem people seem to have with using it is coming up with a sensible configuration. I stumbled across a paper that described using so-called genetic programming to configure a record linkage engine, and decided to basically steal the idea.

You need to read about the experiments in the post but I can almost hear Lars saying the conclusion:

The result is pretty clear: the genetic configurations are much the best. The computer can configure Duke better than I can. That’s almost shocking, but there you are. I guess I need to turn the script into an official feature.

;-)

Excellent post and approach by the way!

Lars also posted a link to Reddit about his experiments. Several links appear in comments that I have turned into short posts to draw more attention to them.

Another tool for your topic mapping toolbox.

Question: I wonder what it would look like to have the intermediate results used for mapping, only to be replaced as “better” mappings become available? Has a terminating condition but new content can trigger additional cycles but only as relevant to its content.

Or would queries count as new content? If they expressed synonymy or other relations?

Data Mining: Professor Pier Luca Lanzi, Politecnico di Milano

Monday, August 8th, 2011

This post started with my finding the data mining slides at Slideshare (about 4 years old) and after organizing those, deciding to check Professor Pier Luca Lanzi’s homepage for more recent material. I think you will find it useful material.

Pier Luca Lanzi – homepage

The professor is obviously interested in video games, a rapidly growing area of development and research.

Combining video games with data mining, that would be a real coup.

Data Mining Course page

Data Mining

Includes prior exams, video (2009 course), transparencies from all lectures.

Lecture slides on Data Mining and Machine Learning at Slideshare.

Not being a lemming, I don’t find most viewed a helpful sorting criteria.

I organized the data mining slides in course order (as nearly as I could determine, there are two #6 presentations and no #7 or #17 presentations):

00 Course Introduction

01 Data Mining

02 Machine Learning

03 The representation of data

04 Association rule mining

05 Association rules: advanced topics

06 Clustering: Introduction

06 Clustering: Partitioning Methods

08 Clustering: Hierarchical

09 Density-based, Grid-based, and Model-based Clustering

10 Introduction to Classification

11 Decision Trees

12 Classification Rules

13 Nearest Neighbor and Bayesian Classifiers

14 Evaluation

15 Data Exploration and Preparation

16 Classifiers Ensembles

18 Mining Data Streams

19 Text and Web Mining

Genetic Algorithms

Genetic Algorithms Course Notes

Genetic Algorithm Examples – Post

Sunday, March 6th, 2011

Genetic Algorithm Examples

From the post:

There’s been a lot of buzz recently on reddit and HN about genetic algorithms. Some impressive new demos have surfaced and I’d like to take this opportunity to review some of the cool things people have done with genetic algorithms, a fascinating subfield of evolutionary computing / machine learning (which is itself a part of the broader study of artificial intelligence (ah how academics love to classify things (and nest parentheses (especially computer scientists)))).

Interesting collection of examples of uses of genetic algorithms.

Posted here to provoke thinking about the use of genetic algorithms in topic maps.

See also the author’s tutorial: Genetic Algorithm For Hello World.

Have you used genetic algorithms with a topic map?

Appreciate a note if you have.