September « 2011 « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 2, 2011

Jfokus 14-16 February 2012 – Call for Papers

Filed under: Conferences,Java — Patrick Durusau @ 7:53 pm

Jfokus 14-16 February 2012 – Call for Papers

Judging from prior years, there will be more than a few presentations of interest to topic mappers at this conference.

If you submit your proposal by October 1, 2011, your presentation could be one of them.

Comments Off

Mining Associations and Patterns from Semantic Data

Filed under: Conferences,Data Mining,Pattern Matching,Pattern Recognition,Semantic Web — Patrick Durusau @ 7:52 pm

The editors of a special issue of the International Journal on Semantic Web and Information Systems on Mining Associations and Patterns from Semantic Data have issued the following call for papers:

Guest editors: Kemafor Anyanwu, Ying Ding, Jie Tang, and Philip Yu

Large amounts of Semantic Data is being generated through semantic extractions from and annotation of traditional Web, social and sensor data. Linked Open Data has provided excellent vehicle for representation and sharing of such data. Primary vehicle to get semantics useful for better integration, search and decision making is to find interesting relationships or associations, expressed as meaningful paths, subgraphs and patterns. This special issue seeks theories, algorithms and applications of extracting such semantic relationships from large amount of semantic data. Example topics include:

Theories to ground associations and patterns with social, socioeconomic, biological semantics

Representation (e.g. language extensions) to express meaningful relationships and patterns

Algorithms to efficiently compute and mine semantic associations and patterns

Techniques for filtering, ranking and/or visualization of semantic associations and patterns

Application of semantic associations and patterns in a domain with significant social or society impact

IJSWIS is included in most major indices including CSI, with Thomson Scientific impact factor 2.345. We seek high quality manuscripts suitable for an archival journal based on original research. If the manuscript is based on a prior workshop or conference submission, submissions should reflect significant novel contribution/extension in conceptual terms and/or scale of implementation and evaluation (authors are highly encouraged to clarify new contributions in a cover letter or within the submission).

Important Dates:
Submission of full papers: Feb 29, 2012
Notification of paper acceptance: May 30, 2012
Publication target: 3Q 2012

Details of the journal, manuscript preparation, and recent articles are available on the website:
http://www.igi-global.com/bookstore/titledetails.aspx?titleid=1092 or http://ijswis.org

Guest Editors: Prof. Kemafor Anyanwu, North Carolina State University
Prof. Ying Ding, Indiana University
Prof. Jie Tang, Tsinghua University
Prof. Philip Yu, University of Illinois, Chicago
Contact Guest Editor: Ying Ding <dingying@indiana.edu>

Comments Off

What is a good explanation of Latent Dirichlet Allocation? (Quora)

Filed under: Latent Dirichlet Allocation (LDA),Topic Models (LDA) — Patrick Durusau @ 7:51 pm

What is a good explanation of Latent Dirichlet Allocation? (Quora)

If you need to explain topic modeling to your boss, department chair or funder, you would be hard pressed to find a better source of inspiration.

The explanation here ranges from technical to layman to actual example (Sarah Palin’s emails so you might better check on the audience’s political persuasion). Actually it would not hurt to have LDA examples on hand that run the gamut of political persuasions. (Or national perspectives if you are in the international market.)

BTW, if you not familiar with Quora, give it a look.

This link was forwarded to my attention by Jack Park.

Comments Off

Parallel WaveCluster:…

Filed under: Clustering — Patrick Durusau @ 7:50 pm

Parallel WaveCluster: A linear scaling parallel clustering algorithm implementation with application to very large datasets by Ahmet Artu Yıldırım and Cem Özdoğana.

Abstract:

A linear scaling parallel clustering algorithm implementation and its application to very large datasets for cluster analysis is reported. WaveCluster is a novel clustering approach based on wavelet transforms. Despite this approach has an ability to detect clusters of arbitrary shapes in an efficient way, it requires considerable amount of time to collect results for large sizes of multi-dimensional datasets. We propose the parallel implementation of the WaveCluster algorithm based on the message passing model for a distributed-memory multiprocessor system. In the proposed method, communication among processors and memory requirements are kept at minimum to achieve high efficiency. We have conducted the experiments on a dense dataset and a sparse dataset to measure the algorithm behavior appropriately. Our results obtained from performed experiments demonstrate that developed parallel WaveCluster algorithm exposes high speedup and scales linearly with the increasing number of processors.

This paper mentions, although it doesn’t treat, an important issue in the use of clustering algorithms with semantic data. In its description of the WaveCluster algorithm, the authors say:

WaveCluster algorithm contains three phases. In the first phase, algorithm quantizes feature space and then assigns the objects to the units.

(This is important work and deserved a better editing from its publisher.)

The problem is the quantizes feature space when the feature space is a semantic one.

Measuring a semantic dimension is not an easy task nor are the results free from doubt. The Wikipedia article on psychological testing is woefully below par for Wikipedia. I did manage to locate course materials for a psychology “research methods” course at UC Davis. PSC 41 Research Methods In particular, take a look at the Scaling module to get an idea of what it means to construct a scale for a semantic dimension.

I don’t have a handle on current social science research methods but the links in Free Resources for Program Evaluation and Social Research Methods should give you a place to start.

BTW, the original paper on WaveCluster: Wavecluster: A multi-resolution clustering approach for very large spatial databases has the following abstract:

Many applications require the management of spatial data. Clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shape. It must be insensitive to the outliers (noise) and the order of input data. We pro-pose WaveCluster, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements. Using multi-resolution property of wavelet transforms, we can effectively identify arbitrary shape clus-ters at different degrees of accuracy. We also demonstrate that WaveCluster is highly effi-cient in terms of time complexity. Experi-mental results on very large data sets are pre-sented which show the efficiency and effective-ness of the proposed approach compared to the other recent clustering methods. (Emphasis added.)

The ranges of spatial data dimensions are not in doubt so quantization makes sense. The “range” of semantic dimensions, on the other hand, are not nearly so certain.

As I pointed out at the beginning, this is very important research and used with appropriate data sets it can make a real difference. Used with inappropriate data sets and you have just cost your employer and possibly customers a good deal of time and effort.

Comments Off

September 1, 2011

T_EX line breaking algorithm in JavaScript

Filed under: Interface Research/Design,Typography — Patrick Durusau @ 6:07 pm

T_EX line breaking algorithm in JavaScript by Bram Stein.

From the post:

This is an implementation of the Knuth and Plass line breaking algorithm using JavaScript and the HTML5 canvas element. The goal of this implementation is to optimally set justified text in the new HTML5 canvas element, and ultimately provide a library for various line breaking algorithms in JavaScript.

This is very impressive and will reassure your topic map clients that you pay attention to details. Work remains to be done here and elsewhere on browser displays.

This was forwarded to me by Sam Hunting.

Comments Off

Spatio Temporal data Integration and Retrieval

Filed under: Conferences,Data Integration,Information Retrieval,Spatial Index — Patrick Durusau @ 6:06 pm

STIR 2012 : ICDE 2012 Workshop on Spatio Temporal data Integration and Retrieval

Dates:

When Apr 1, 2012 – Apr 1, 2012
Where Washington DC, USA
Submission Deadline Oct 21, 2011

From the notice:

International Workshop on Spatio Temporal data Integration and Retrieval (STIR2012) in conjunction with ICDE 2012

April 1, 2012, Washington DC, USA

http://research.ihost.com/stir12/index.html

As the world?s population increases and it puts increasing demands on the planet?s limited resources due to shifting life-styles, we not only need to monitor how we consume resources but also optimize resource usage. Some examples of the planet?s limited resources are water, energy, land, food and air. Today, significant challenges exist for reducing usage of these resources, while maintaining quality of life. The challenges range from understanding regionally varied impacts of global environmental change, through tracking diffusion of avian flu and responding to natural disasters, to adapting business practice to dynamically changing resources, markets and geopolitical situations. For these and many other challenges reference to location – and time – is the glue that connects disparate data sources. Furthermore, most of the systems and solutions that will be built to solve the above challenges are going to be heavily depend on structured data (generated by sensors and sensor based applications) which will be streaming in real-time, come in large volumes and will have spatial and temporal aspects to them.

This workshop is focused on making the research in information integration and retrieval more relevant to the challenges in systems with significant spatial and temporal components.

Sounds like they are playing our song!

Comments Off

Everything is Subjective

Filed under: Marketing,Topic Maps — Patrick Durusau @ 6:05 pm

Everything is Subjective

I ran across Peter Brown’s keynote presentation at TMRA 2008 this morning while looking for something else.

You will see why I paused as soon as you load the slides. 😉

Having the advantage of three years to think about the issues Peter raises, I think part of his conclusion:

It’s my world – I want to organize it around what is important for me, not you.

Is spot on.

I mention it because the tenth anniversary of 9/11 approaches and the U.S. intelligence agencies are still organizing their data around what is important to them, the number 1 goal being to increase the importance and budget allocation of their agency over others.

Still, if you can see it, you can map it and having mapped it, merge data with your own. I won’t ask how you got it.

Comments Off

What every computer science major should know

Filed under: CS Lectures — Patrick Durusau @ 6:02 pm

What every computer science major should know by Matthew Might.

Matthew is an assistant professor in the CS department at the University of Utah.

I have seen similar lists but this one struck me as particularly well-organized.

You may enjoy scanning the index of his blog posts.

Comments Off

Getting Started with MALLET and Topic Modeling

Filed under: MALLET,Topic Models (LDA) — Patrick Durusau @ 6:01 pm

Getting Started with MALLET and Topic Modeling

If you don’t remember MALLET, take a look at: MALLET: MAchine Learning for LanguagE Toolkit Topic Map Competition (TMC) Contender?

Shawn is very interested in applying topic modeling to a variety of historical texts.

His blog, Electric Archaeology: Digital Media for Learning and Research looks very interesting. Covers: “Agent based modeling, games, virtual worlds, and online education for archaeology and history.”

This is the sort of person who might be interested in topic maps and related technologies.

As far as I know, there is still a real lack of example driven texts that would introduce most humanists to modern software.

Comments Off

An Introduction to Clojure and Its Capabilities for Data Manipulation

Filed under: Clojure,Data Structures — Patrick Durusau @ 6:01 pm

An Introduction to Clojure and Its Capabilities for Data Manipulation by Jean-François “Jeff” Héon.

From the post:

I mainly use Java at work in an enterprise setting, but I’ve been using Clojure at work for small tasks like extracting data from log files or generating or transforming Java code. What I do could be done with more traditional tools like Perl, but I like the readability of Clojure combined with its Java interoperability. I particularly like the different ways functions can be used in Clojure to manipulate data.

I will only be skimming the surface of Clojure in this short article and so will present a simplified view of the concepts. My goal is for the reader to get to know enough about Clojure to decide if it is worth pursuing further using longer and more complete introduction material already available.

I will start with a mini introduction to Clojure, followed by an overview of sequences and functions combination, and finish off with a real-world example.

You will encounter immutable data structures so be forewarned.

I wonder to what degree mutable data structures arose originally due to lack of storage space and processor limitations? Will have to make a note to check that out.

Comments (1)

Greenplum Community

Filed under: Algorithms,Analytics,Machine Learning,SQL — Patrick Durusau @ 6:00 pm

A post by Alex Popescu, Data Scientist Summit Videos, lead me to discover the Greenplum Community.

Hosted by Greenplum:

Greenplum is the pioneer of Enterprise Data Cloud™ solutions for large-scale data warehousing and analytics, providing customers with flexible access to all their data for business intelligence and advanced analytics. Greenplum offers industry-leading performance at a low cost for companies managing terabytes to petabytes of data. Data-driven businesses around the world, including NASDAQ, NYSE Euronext, Silver Spring Networks and Zions Bancorporation, have adopted Greenplum Database-based products to support their mission-critical business functions.

registration (free) brings access to the videos from the Data Scientist Summit.

The “community” is focused on Greenplum software (there is a “community” edition). Do be aware that Greenplum Database CE is a 1.7 GB download. Just so you know.

Comments Off

Common Lisp HyperSpec

Filed under: Lisp — Patrick Durusau @ 5:59 pm

Common Lisp HyperSpec

A hypertext version of the Common Lisp standard, along with issues resolved in its making.

Since functional programming discussions make reference to Lisp fairly often, you might want to bookmark this site.

Comments Off

CKAN – the Data Hub Software

Filed under: CKAN,Information Reuse,Metadata — Patrick Durusau @ 5:59 pm

CKAN – the Data Hub Software

From the website:

We want to make easy for people to find, share and reuse data, be they research scientists or civil servants, data nerds or the average citizen. We aim to provide a platform that’s both simple and powerful and as easy to build on and extend as to use and interact with.

OK, but the core “metadata” is roughly:

unique name

title

url + download url

author/maintainer info

license

notes

tags

[extendable with “extra” fields]

I suppose I should not be too critical as it wasn’t that many years ago that just obtaining data in electronic form was difficult and discussions of how to store/manipulate data stores were somewhat theoretical.

I mention it here so if you encounter one of these “data hubs” in the field you won’t be expecting too much.

Comments (1)

« Newer Posts