Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 3, 2011

brain – Javascript for supervised machine learning

Filed under: Javascript,Machine Learning — Patrick Durusau @ 2:31 pm

brain – Javascript for supervised machine learning

From the website:

brain is a JavaScript library for neural networks and Bayesian classifiers.

The documentation reports that by default it stores in memory but it also has a Redis backend.

Reported to be used for spam filtering. Filtering is predicated on recognition of some basis for filtering, dare I say subject recognition? There are some subjects that are classes, such as spam, which are composed of included subjects, such as individual spam senders or messages. How fine grained subject recognition need be really depends upon the purpose of recognition.

What subjects are you filtering for?

May 30, 2011

Databases For Machine Learning Experiments

Filed under: Algorithms,Machine Learning — Patrick Durusau @ 6:57 pm

Databases For Machine Learning Experiments

From the website:

An experiment database is a database designed to store learning experiments in full detail, aimed at providing a convenient platform for the study of learning algorithms.

By submitting all details about the learning algorithms, datasets, experimental setup and results, experiments can be easily reproduced and reused in further studies.

By querying and mining the database, it allows easy, thorough analysis of learning algorithms while providing all information to correctly interpret the results.

To get a first idea, watch the video tutorial (updated!) of our explorer tool. Or start querying online by looking at some examples!

Video tutorials:

Experiment Database for Machine Learning Tutorial – SQL Querying

Experiment Database for Machine Learning Tutorial – Video Querying

Very interesting site. Wondering how something similar could be done to illustrate the use of topic maps?

Has anyone used this in connection with a class on machine learning?

May 18, 2011

Practical Machine Learning

Filed under: Algorithms,Machine Learning,Statistical Learning,Statistics — Patrick Durusau @ 6:45 pm

Practical Machine Learning, by Michael Jordan (UC Berkeley).

From the course webpage:

This course introduces core statistical machine learning algorithms in a (relatively) non-mathematical way, emphasizing applied problem-solving. The prerequisites are light; some prior exposure to basic probability and to linear algebra will suffice.

This is the Michael Jordan who gave a Posner Lecture at the 24th Annual Conference on Neural Information Processing Systems 2010.

May 17, 2011

TunedIT

Filed under: Algorithms,Data Mining,Machine Learning — Patrick Durusau @ 2:52 pm

TunedIT Machine Learning & Data Mining Algorithms Automated Tests, Repeatable Experiments, Meaningful Results

There are two parts to the TunedIT site:

TunedIT Research

TunedIT Research is an open platform for reproducible evaluation of machine learning and data mining algorithms. Everyone may use TunedIT tools to launch reproducible experiments and share results with others. Reproducibility is achieved through automation. Datasets and algorithms, as well as experimental results, are collected in central databases: Repository and Knowledge Base, to enable comparison of wide range of algorithms, and to facilitate dissemination of research findings and cooperation between researchers. Everyone may access the contents of TunedIT and contribute new resources and results.

TunedIT Challenge

The TunedIT project was established in 2008 as a free and open experimentation platform for data mining scientists, specialists and programmers. It was extended in 2009 with a framework for online data mining competitions, used initially for laboratory classes at universities. Today, we provide a diverse range of competition types – for didactic, scientific and business purposes.

  • Student Challenge — For closed members groups. Perfectly suited to organize assignments for students attending laboratory classes. Restricted access and visibility, only for members of the group. FREE of charge
  • Scientific Challenge — Open contest for non-commercial purpose. Typically associated with a conference, journal or scientific organization. Concludes with public dissemination of results and winning algorithms. May feature prizes. Fee: FREE or 20%
  • Industrial Challenge — Open contest with commercial purpose. Intellectual Property can be transfered at the end. No requirement for dissemination of solutions. Fee: 30%

This looks like a possible way to generate some publicity about and interest in topic maps.

Suggestions of existing public data sets that would be of interest to a fairly broad audience?

Thinking we are likely to model some common things the same and other common things differently.

Would be interesting to compare results.

May 14, 2011

Cheat Sheet: Algorithms for Supervised and Unsupervised Learning

Filed under: Machine Learning — Patrick Durusau @ 6:25 pm

Cheat Sheet: Algorithms for Supervised and Unsupervised Learning

A nice “cheat sheet,” more of a summary of key information on algorithms for supervised and unsupervised learning.

May 12, 2011

Intuition & Data-Driven Machine Learning

Filed under: Clustering,Machine Learning — Patrick Durusau @ 7:59 am

Intuition & Data-Driven Machine Learning

Ilya Grigorik includes his Intelligent Ruby: Getting Started with Machine Learning presentation and, asks the following question:

… to perform a clustering, we need to define a function to measure the “pairwise distance” between all pairs of objects. Can you think of a generic way to do so?

Think about it and then go to his blog post to see the answer.

Machine Learning and Probabilistic Graphical Models

Filed under: Machine Learning,Probalistic Models — Patrick Durusau @ 7:59 am

Machine Learning and Probabilistic Graphical Models

From the website:

Instructor: Sargur Srihari Department of Computer Science and Engineering, University at Buffalo

Machine learning is an exciting topic about designing machines that can learn from examples. The course covers the necessary theory, principles and algorithms for machine learning. The methods are based on statistics and probability– which have now become essential to designing systems exhibiting artificial intelligence. The course emphasisizes Bayesian techniques and probabilistic graphical models (PGMs). The material is complementary to a course on Data Mining where statistical concepts are used to analyze data for human, rather than machine, use.

The textbooks for different parts of the course are “Pattern Recognition and Machine Learning” by Chris Bishop (Springer 2006) and “Probabilistic Graphical Models” by Daphne Koller and Nir Friedman (MIT Press 2009).

Lecture slides and some videos of lectures.

April 22, 2011

Intuition = …because I said so!

Filed under: Data Analysis,Machine Learning — Patrick Durusau @ 1:05 pm

Intuition & Data-Driven Machine Learning

From the post:

Clever algorithms and pages of mathematical formulas filled with probability and optimization theory are usually the associations that get invoked when you ask someone to describe the fields of AI and Machine Learning. Granted, there is definitely an abundance of both, but this mental picture also tends to obscure some of the more interesting and recent developments in these fields: data driven learning, and the fact that you are often better off developing simple intuitive insights instead of complicated domain models which are meant to represent every attribute of the problem.

I wonder about the closing observation:

you are often better off developing simple intuitive insights instead of complicated domain models which are meant to represent every attribute of the problem.

Does that apply to identifications of subjects as well?

May we not be better off to capture the conclusion of an analyst that “X” is a fact, from some large body of data, rather finding a clever way in the data to map their conclusion to that of other analyst’s?

Both said “X,” what more do we need? True enough we need to identify “X” in some way but that is simpler than trying to justify the conclusion in data.

I suppose I am arguing there should be room in subject identification for human intuition, that is, “…because I said so!” 😉

April 18, 2011

Biomedical Machine Learning Classifiers

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources by Raul Pollán, Miguel Angel Guevara Lopez and Eugénio da Costa Oliveira.

Abstract:

This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.

I recommend a close reading of the article but the concluding lines caught my eye:

…tuning classifier parameters is mostly a heuristic task, not existing rules providing knowledge about what parameters to choose when training a classifier. Through BiomedTK we are gathering data about performance of many classifiers, trained each one with different parameters, ANNs, SVM, etc. This by itself constitutes a dataset that can be data mined to understand what set of parameters yield better classifiers for given situations or even generally. Therefore, we intend to use BiomedTK on this bulk of classifier data to gain insight on classifier parameter tuning.

The dataset about training classifiers may be as important if not more so than use of the framework in harnessing Grid computing resources for biomedical analysis. Looking forward to reports on that dataset.

April 11, 2011

The Wekinator

Filed under: Classifier,Machine Learning,Music Retrieval — Patrick Durusau @ 5:42 am

The Wekinator: Software for using machine learning to build real-time interactive systems

This looks very cool!

I can imagine topic maps of sounds/gestures in a number of contexts that would be very interesting.

From the website:

The Wekinator is a free software package to facilitate rapid development of and experimentation with machine learning in live music performance and other real-time domains. The Wekinator allows users to build interactive systems by demonstrating human actions and computer responses, rather than by programming.

Example applications:

  • Creation of new musical instruments
    • Create mappings between gesture and computer sounds. Control a drum machine using your webcam! Play Ableton using a Kinect!

  • Creation of gesturally-controlled animations and games
    • Control interactive visual environments like Processing or Quartz Composer, or game engines like Unity, using gestures sensed from webcam, Kinect, Arduino, etc.

  • Creation of systems for gesture analysis and feedback
    • Build classifiers to detect which gesture a user is performing. Use the identified gesture to control the computer or to inform the user how he’s doing.

  • Creation of real-time music information retrieval and audio analysis systems
    • Detect instrument, genre, pitch, rhythm, etc. of audio coming into the mic, and use this to control computer audio, visuals, etc.

  • Creation of other interactive systems in which the computer responds in real-time to some action performed by a human user (or users)
    • Anything that can output OSC can be used as a controller
    • Anything that can be controlled by OSC can be controlled by Wekinator

April 10, 2011

SIGMA:Large Scale Machine Learning Toolkit

Filed under: Machine Learning — Patrick Durusau @ 2:50 pm

SIGMA:Large Scale Machine Learning Toolkit

From the website:

The goal of this project is to provide a group of parallel machine learning functionalities which can meet the requirements of research work and applications typically with large scale data/features. The toolkit includes but not limited to: classification, clustering, Ranking, statistical analysis, etc and makes them run on hundreds of machines, thousands of CPU cores parallel. We also provide a SDK for researchers/developers to invent their own algorithms and accumulate them into the toolkit.

Algorithms in the toolkit:

  • Parallel Classification
    • Logistic Regression
    • Boosting
    • SVM
      • PSVM
      • PPegasos
    • Neural Network
  • Parallel Ranking
    • LambdaRank
    • RankBoost
  • Parallel Clustering
    • Kmeans
    • Random Walk
  • Parallel Regression
    • Linear Regression
    • Regression Tree
  • Others
    • Parallel-Regularized-SVD
    • Parallel-LDA
  • Optimization Library
    • OWL-QN

Parallelizing Machine Learning– Functionally

Filed under: Graphs,Machine Learning,Scala — Patrick Durusau @ 2:49 pm

Parallelizing Machine Learning– Functionally

A Framework and Abstractions for Parallel Graph Processing

Abstract:

Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity. We present a framework for implementing parallel and distributed machine learning algorithms on large graphs, flexibly, through the use of functional programming abstractions. Our aim is a system that allows researchers and practitioners to quickly and easily implement (and experiment with) their algorithms in a parallel or distributed setting. We introduce functional combinators for the flexible composition of parallel, aggregation, and sequential steps. To the best of our knowledge, our system is the first to avoid inversion of control in a (bulk) synchronous parallel model.

I am particularly interested in the authors’ claim that:

While also based on graphs, Pregel is a closed system that was designed to solve large-scale “graph processing” problems, which are usually simpler in nature than typical real-world ML problems. In an effort to capitalize on Pregel’s strengths while focusing on a framework more aptly-suited to ML problems, we introduce a more flexible programming model, based on high-level functional abstractions.

Mostly because identifying where we are researching because our algorithms work versus areas where algorithms await discovery is important.

But, in part so that we know where it is appropriate to apply our usual algorithms and where those are likely to break down.

April 3, 2011

Shogun – Google Summer of Code 2011

Filed under: Hidden Markov Model,Kernel Methods,Machine Learning,Vectors — Patrick Durusau @ 6:38 pm

Shogun – Google Summer of Code 2011

Students! Here is your change to work on a cutting edge software library for machine learning!

Posted ideas, or submit your own.

From the website:

SHOGUN is a machine learning toolbox, which is designed for unified large-scale learning for a broad range of feature types and learning settings. It offers a considerable number of machine learning models such as support vector machines for classification and regression, hidden Markov models, multiple kernel learning, linear discriminant analysis, linear programming machines, and perceptrons. Most of the specific algorithms are able to deal with several different data classes, including dense and sparse vectors and sequences using floating point or discrete data types. We have used this toolbox in several applications from computational biology, some of them coming with no less than 10 million training examples and others with 7 billion test examples. With more than a thousand installations worldwide, SHOGUN is already widely adopted in the machine learning community and beyond.

SHOGUN is implemented in C++ and interfaces to MATLAB, R, Octave, Python, and has a stand-alone command line interface. The source code is freely available under the GNU General Public License, Version 3 at http://www.shogun-toolbox.org.

This summer we are looking to extend the library in four different ways: Improving interfaces to other machine learning libraries or integrating them when appropriate, improved i/o support, framework improvements and new machine algorithms. Here is listed a set of suggestions for projects.

A prior post on Shogun.

March 30, 2011

Machine Learning

Filed under: Classification,Clustering,Machine Learning,Regression — Patrick Durusau @ 12:35 pm

Machine Learning

From the site:

This page documents all the machine learning algorithms present in the library. In particular, there are algorithms for performing classification, regression, clustering, anomaly detection, and feature ranking, as well as algorithms for doing more specialized computations.

A good tutorial and introduction to the general concepts used by most of the objects in this part of the library can be found in the svm example program. After reading this example another good one to consult would be the model selection example program. Finally, if you came here looking for a binary classification or regression tool then I would try the krr_trainer first as it is generally the easiest method to use.

The major design goal of this portion of the library is to provide a highly modular and simple architecture for dealing with kernel algorithms….

Update: Dlib – machine learning. Why I left out the library name I cannot say. Sorry!

March 22, 2011

Disease Named Entity Recognition

Filed under: Entity Extraction,Machine Learning — Patrick Durusau @ 7:02 pm

Disease named entity recognition using semisupervised learning and conditional random fields.

Nichalin, S., Zhu, Z., & Hsinchun, C. (2011). Disease named entity recognition using semisupervised learning and conditional random fields. Journal of the American Society for Information Science & Technology, 62(4), 727-737.

Abstract:

Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition.

Not to take anything away from this sort of technique, which would stand in good stead for topic map construction, but I am left feeling like it stops short of the mark.

In other words, say that I am happy with the result of its recognition, how do I share that with someone else, who has another set of identified subjects, perhaps from the same data?

Or for that matter, how do I combine it with data that I myself have extracted from the same data?

Can’t very well ask the software why it “recognized” one name or another can I?

Thinking I would have to add what seemed to me to be useful information to the name, in order to re-use it with other data.

Starting to sound like a topic map isn’t it?

March 20, 2011

The Ideal Large Scale Learning Class

Filed under: Machine Learning — Patrick Durusau @ 1:24 pm

The Ideal Large Scale Learning Class

Interesting collection of topics with pointers to resources on different types of scaling.

March 18, 2011

MADLib

Filed under: Analytics,Machine Learning — Patrick Durusau @ 6:50 pm

MADLib

From the website:

MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.

The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development.

Targeted at PostgreSQL and Greenplum.

March 14, 2011

Sixth International Conference on Knowledge Capture – K-Cap 2011

Sixth International Conference on Knowledge Capture – K-Cap 2011

From the website:

In today’s knowledge-driven world, effective access to and use of information is a key enabler for progress. Modern technologies not only are themselves knowledge-intensive technologies, but also produce enormous amounts of new information that we must process and aggregate. These technologies require knowledge capture, which involve the extraction of useful knowledge from vast and diverse sources of information as well as its acquisition directly from users. Driven by the demands for knowledge-based applications and the unprecedented availability of information on the Web, the study of knowledge capture has a renewed importance.

Researchers that work in the area of knowledge capture traditionally belong to several distinct research communities, including knowledge engineering, machine learning, natural language processing, human-computer interaction, artificial intelligence, social networks and the Semantic Web. K-CAP 2011 will provide a forum that brings together members of disparate research communities that are interested in efficiently capturing knowledge from a variety of sources and in creating representations that can be useful for reasoning, analysis, and other forms of machine processing. We solicit high-quality research papers for publication and presentation at our conference. Our aim is to promote multidisciplinary research that could lead to a new generation of tools and methodologies for knowledge capture.

Conference:

25 – 29 June 2011
Banff Conference Centre
Banff, Alberta, Canada

Call for papers has closed. Will try to post a note about the conference earlier next year.

Proceedings from previous conferences available through the ACM Digital Library – Knowledge Capture.

Let me know if you have trouble with the ACM link. I sometimes don’t get removal of all the tracing cruft off of URLs correct. There really should be a “clean” URL option for sites like the ACM.

March 10, 2011

evo*2011

Filed under: Data Mining,Evoluntionary,Machine Learning — Patrick Durusau @ 12:32 pm

evo*2011

From the website:

evo* comprises the premier co-located conferences in the field of Evolutionary Computing: eurogp, evocop, evobio and evoapplications.

Featuring the latest in theoretical and applied research, evo* topics include recent genetic programming challenges, evolutionary and other meta-heuristic approaches for combinatorial optimization, evolutionary algorithms, machine learning and data mining techniques in the biosciences, in numerical optimization, in music and art domains, in image analysis and signal processing, in hardware optimization and in a wide range of applications to scientific, industrial, financial and other real-world problems.

Conference is 27-29 April 2011 in Torino, Italy.

Even if you are not in the neighborhood, the paper abstracts make an interesting read!

March 6, 2011

Gaussian Processes for Machine Learning

Filed under: Algorithms,Guassian Processes,Machine Learning — Patrick Durusau @ 3:31 pm

Gaussian Processes for Machine Learning

Complete text of:

Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, MIT Press, 2006. ISBN-10 0-262-18253-X, ISBN-13 978-0-262-18253-9.

I like the quote from James Clerk Maxwell that goes:

The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind.

Interesting. Is our identification of subjects probabilistic or is our identification of what we thought others meant probabilistic?

Or both? Neither?

From the preface:

Over the last decade there has been an explosion of work in the “kernel machines” area of machine learning. Probably the best known example of this is work on support vector machines, but during this period there has also been much activity concerning the application of Gaussian process models to machine learning tasks. The goal of this book is to provide a systematic and unified treatment of this area. Gaussian processes provide a principled, practical, probabilistic approach to learning in kernel machines. This gives advantages with respect to the interpretation of model predictions and provides a well founded framework for learning and model selection. Theoretical and practical developments of over the last decade have made Gaussian processes a serious competitor for real supervised learning applications.

I am downloading the PDF version but have just ordered a copy from Amazon.

If you want to encourage MIT Press and other publishers to put materials online as well as in print, order a copy of this and other online materials.

Saying online copies don’t hurt print sales isn’t as convincing as hearing the cash register go “cha-ching!”

(I would also drop a note to the press saying you bought a copy of the online book as well.)

Genetic Algorithm Examples – Post

Filed under: Artificial Intelligence,Genetic Algorithms,Machine Learning — Patrick Durusau @ 3:31 pm

Genetic Algorithm Examples

From the post:

There’s been a lot of buzz recently on reddit and HN about genetic algorithms. Some impressive new demos have surfaced and I’d like to take this opportunity to review some of the cool things people have done with genetic algorithms, a fascinating subfield of evolutionary computing / machine learning (which is itself a part of the broader study of artificial intelligence (ah how academics love to classify things (and nest parentheses (especially computer scientists)))).

Interesting collection of examples of uses of genetic algorithms.

Posted here to provoke thinking about the use of genetic algorithms in topic maps.

See also the author’s tutorial: Genetic Algorithm For Hello World.

Have you used genetic algorithms with a topic map?

Appreciate a note if you have.

March 5, 2011

GraphLab

Filed under: GraphLab,Graphs,Machine Learning,Visualization — Patrick Durusau @ 2:51 pm

GraphLab

Progress on graph processing continues.

From the website:

A New Parallel Framework for Machine Learning

Designing and implementing efficient and provably correct parallel machine learning (ML) algorithms can be very challenging. Existing high-level parallel abstractions like MapReduce are often insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance.

The popular MapReduce abstraction, is defined in two parts, a Map stage which performs computation on indepedent problems which can be solved in isolation, and a Reduce stage which combines the results.

GraphLab provides a similar analog to the Map in the form of an Update Function. The Update Function however, is able to read and modify overlapping sets of data (program state) in a controlled fashion as defined by the user provided data graph. The user provided data graph represents the program state with arbitrary blocks of memory associated with each vertex and edges. In addition the update functions can be recursively triggered with one update function spawning the application of update functions to other vertices in the graph enabling dynamic iterative computation. GraphLab uses powerful scheduling primitives to control the order update functions are executed.

The GraphLab analog to Reduce is the Sync Operation. The Sync Operation also provides the ability to perform reductions in the background while other computation is running. Like the update function sync operations can look at multiple records simultaneously providing the ability to operate on larger dependent contexts.

See also: GraphLab: A New Framework For Parallel Machine Learning (The original paper.)

This is a project that bears close watching.

March 4, 2011

Learning to classify text using support vector machines

Filed under: Classifier,Machine Learning,Vectors — Patrick Durusau @ 3:58 pm

I saw a tweet recently that pointed to: Learning to classify text using support vector machines, which is Thorsten Joachims’ dissertation, The Maximum-Margin Approach to Learning Text Classifiers as published by Kluwer (not Springer).

Of possible greater interest would be Joachims more recent work found at his homepage, which includes software from his dissertation as well as more recent projects.

I am sure his dissertation will repay close study but at > $150 U.S., I am going to have to wait for an library ILL to find its way to me.

Metaoptimize Q+A

Metaoptimize Q+A is one of the Q/A sites I just stumbled across.

From the website:

A community of scientists interested in machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization, as well as adjacent topics.

Looks like an interesting place to hang out.

Shark: Machine Learning Library

Filed under: Machine Learning — Patrick Durusau @ 5:51 am

Shark: Machine Learning Library

From the website:

SHARK is a modular C++ library for the design and optimization of adaptive systems. It provides methods for linear and nonlinear optimization, in particular evolutionary and gradient-based algorithms, kernel-based learning algorithms and neural networks, and various other machine learning techniques.

SHARK serves as a toolbox to support real world applications as well as research in different domains of computational intelligence and machine learning. The sources are compatible with the following platforms: Linux, Windows, Solaris and MacOS X.

Benchmark: Python Machine Learning – Post

Filed under: Dataset,Machine Learning — Patrick Durusau @ 5:49 am

Benchmark for several Python machine learning packages

From the website:

We compare computation time for a few algorithms implemented in the major machine learning toolkits accessible in Python. We use the Madelon data set Guyon2004, 4400 instances and 500 attributes, that can be used in supervised and unsupervised settings and is quite large, but small enough for most algorithms to run.

Useful site for a couple of reasons:

1) A cross-check to make sure I have some of the major Python machine learning packages listed.

2) Another reminder that we don’t have similar test sets of data for topic maps.

The first one I can check and remedy fairly quickly.

The second one is going to take more thought, planning and mostly effort. 😉

Suggestions/comments?

February 28, 2011

Machine Learning – Andrew Ng – YouTube

Filed under: Machine Learning — Patrick Durusau @ 8:36 am

The lecture by Andrew Ng that I pointed to in Machine Learning Lectures (Video) on ITunes, are also available on YouTube, in no particular order. I have created an ordered listing of the lectures on YouTube below.

What would be even more useful would be a very short summary/topic listing for each lecture so that additional information could be linked in, dare I say topic mapped?, to create a more useful resource.

No promises but as time permits or as readers contribute, something like that is definitely within the range of possibilities.

Machine Learning – Andrew Ng – Stanford

February 25, 2011

scikits.learn machine learning in Python

Filed under: Machine Learning,Python — Patrick Durusau @ 5:36 pm

scikits.learn machine learning in Python

From the website:

Easy-to-use and general-purpose machine learning in Python

scikits.learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib).

It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering.

This could be a good model for a “learning topic maps” site for people interested in the technical side of topic maps.

There may not be a real call for training people who aren’t interested in learning the technical side of topic maps.

By analogy with indexing, lots of folks can use indexes (sorta, ok, I am being generous) but not that many folks can create good indexes.

I will be posting some examples of wannabe indexes next week.

February 24, 2011

Machine Learning for .Net

Filed under: .Net,Machine Learning — Patrick Durusau @ 8:06 pm

Machine Learning for .Net

From the webpage:

This library is designed to assist in the use of common Machine Learning Algorithms in conjunction with the .NET platform. It is designed to include the most popular supervised and unsupervised learning algorithms while minimizing the friction involved with creating the predictive models.

Supervised Learning

Supervised learning is an approach in machine learning where the system is provided with labeled examples of a problem and the computer creates a model to predict future unlabeled examples. These classifiers are further divided into the following sets:

  • Binary Classification – Predicting a Yes/No type value
  • Multi-Class Classification – Predicting a value from a finite set (i.e. {A, B, C, D } or {1, 2, 3, 4})
  • Regression – Predicting a continuous value (i.e. a number)

Unsupervised Learning

Unsupervised learning is an approach which involves learning about the shape of unlabeled data. This library currently contains:

  1. KMeans – Performs automatic grouping of data into K groups (specified a priori)

    Labeling data is the same as for the supervised learning algorithms with the exception that these algorithms ignore the [Label] attribute:

    1. var kmeans = new KMeans();
    2. var grouping =
      kmeans.Generate(ListOfStudents, 2);

    Here the KMeans algorithm is grouping the ListOfStudents into two groups returning an array corresponding to the appropriate group for each student (in this case group 0 or group 1)

  2. Hierarchical Clustering – In progress!
  3. Planning

    Currently planning/hoping to do the following:

    1. Boosting/Bagging
    2. Hierarchical Clustering
    3. Naïve Bayes Classifier
    4. Collaborative filtering algorithms (suggest a product, friend etc.)
    5. Latent Semantic Analysis (for better searching of text etc.)
    6. Support Vector Machines (more powerful classifier)
    7. Principal Component Analysis – Aids in dimensionality reduction which should allow/facilitate learning from images
    8. *Maybe* – Common AI algorithms such as A*, Beam Search, Minimax etc.

So, if you are working in a .Net context, here is a chance to get in on the ground floor of a machine learning project.

February 19, 2011

Group Theoretical Methods and Machine Learning

Filed under: Group Theory,Machine Learning — Patrick Durusau @ 4:28 pm

Group Theory and Machine Learning

Description:

The use of algebraic methods—specifically group theory, representation theory, and even some concepts from algebraic geometry—is an emerging new direction in machine learning. The purpose of this tutorial is to give an entertaining but informative introduction to the background to these developments and sketch some of the many possible applications, including multi-object tracking, learning rankings, and constructing translation and rotation invariant features for image recognition. The tutorial is intended to be palatable by a non-specialist audience with no prior background in abstract algebra.

Be forewarned, tough sledding if you are not already a machine learning sort of person.

But, since I don’t post what I haven’t watched, I did watch the entire video.

It suddenly got interesting just past 93:08 when Risi Kondor started talking about blobs on radar screens and associating information with them…., wait, run that by once again, …blobs on radar screens and associating information with them.

Oh, that is what I thought he said.

I suppose for fire control systems and the like as well as civilian applications.

I am so much of a text and information navigation person that I don’t often think about other applications for “pattern recognition” and the like.

With all the international traveling I used to do, being a blob on a radar screen got my attention!

Has applications in tracking animals in the wild and other tracking with sensor data.

Another illustration of why topic maps need an open-ended and extensible notion of subject identification.

What we think of as methods of subject identification may not be what others think of as methods of subject identification.

« Newer PostsOlder Posts »

Powered by WordPress