Sixth Annual Machine Learning Symposium

January 26th, 2012

Sixth Annual Machine Learning Symposium sponsored by the New York Academy of the Sciences.

There were eighteen (18) presentations and any attempt to summarize on my part would do injustice to one or more of them.

Post your comments and suggestions for which ones I should watch first. Thanks!

Employee productivity: 21 critical minutes (no-line-item (nli) in the budget?)

January 26th, 2012

Employee productivity: 21 critical minutes by Gilles ANDRE.

From the post:

Twenty-one minutes a day. That’s how long employees spend each day searching for information they know exists but is hard to find. These 21 minutes cost their company the equivalent of €1,500 per year per employee. That’s an average of two whole working weeks. This particular Mindjet study is, of course, somewhat anecdotal and some research firms such as IDC put the figure as high as €10,000 per year. These findings signal a new challenge facing businesses: employees know that the information is there, but they cannot find it. This stalemate can become extremely costly and, in some cases, can even kill off a business. Are companies really aware of this problem?

(paragraph and graphic omitted)

So far, companies have responded to this rising tide of data by spending money. They have invested large, even enormous sums in solutions to store, secure and access their information – one of the key assets of their business. They have also invested heavily in a range of different applications to meet their operational needs. Yet these same applications have created vast information silos spanning their entire organisation. Interdepartmental communication is stifled and information travels like vehicles on the M25 during rush hour.

The link to Mindjet is to their corporate website and not to the study. Ironically I did search at the Mindjet site, the solution Polyspot suggests and came up empty for “21 minutes.” You would think that would be in the report somewhere as a string.

I suspect 21 minutes would be on the low side of lost employee productivity on a daily basis.

But it isn’t hard to discover why businesses have failed to address that loss in employee productivity.

Take out the latest annual report for your business with a line item budget in it. Examine it carefully and then answer the following question:

At what line item is lost employee productivity reported?

Now imagine that your CIO proposes to make information once found, found for all employees. A mixture of a search engine, indexing, topic map, with a process to keep it updated.

You don’t know the exact figures but do you think there would be a line item in the budget from such a project?

And, would there be metrics to determine if the project succeeded or failed?

Ah, so, if the business continues to lose employee productivity there is no metric for success or failure and it never shows up as a line item in the budget.

That is the safe position.

At least until the business collapses and/or is overtaken by other companies.

If you are interested in over taking no-line-item (nli) companies consider evolving search applications that incorporate topic maps.

Topic maps: Information once found, stays found.

Spring onto Heroku

January 26th, 2012

Spring onto Heroku by Andreas Kollegger.

From the post:

Deploying your application into the cloud is a great way to scale from “wouldn’t it be cool if..” to giving interviews to Forbes, Fast Company, and Jimmy Fallon. Heroku makes it super easy to provision everything you need, including a Neo4j Add-on. With a few simple adjustments, your Spring Data Neo4j application is ready to take that first step into the cloud.

Let’s walk through the process, assuming this scenario:

Ready? OK, first let’s look at your application.

As one commenter noted, just in time for the Neo4j Challenge!

Measuring User Retention with Hadoop and Hive

January 26th, 2012

Measuring User Retention with Hadoop and Hive by Daniel Russo.

From the post:

The Hadoop ecosystem is comprised of numerous tech­nologies that can work together to provide a powerful and scalable mech­anism for analyzing and deriving insight from large quan­tities of data.

In an effort to showcase the flex­i­bility and raw power of queries that can be performed over large datasets stored in Hadoop, this post is written to demon­strate an example use case. The specific goal is to produce data related to user retention, an important metric for all product companies to analyze and understand.

Compelling demonstration of the power of Hadoop and Hive to measure raw user retention, in an “app” situation.

Question:

User retention isn’t a new issue, does anyone know what strategies were used before Hadoop and Hive to measure it?

The reason I ask is that prior analysis of user retention may point the way towards data or relationships it wasn’t possible to capture before.

For example, when an app falls into non-use or is uninstalled, what impact (if any) does that have on known “friends” and their use of the app?

Are there any patterns to non-use/uninstalls over short or long periods of time in identifiable groups? (A social behavior type question.)

Neo4j Internals

January 26th, 2012

Neo4j Internals

From the description:

At the Neo4j London user group, we’ve seen many talks on how to use Neo4j for exploiting connected data. But how does Neo4j make working with connected data so effective? In this presentation we’ll find out how as Neo4j hacker Tobias Lindaaker takes us on a guided tour through the Neo4j’s internals. We’ll discover how the internal data structures are leveraged to provide fast traversals, how live backups work, and how multiple servers synchronize in an HA cluster. As a Neo4j user you’ll find a working knowledge of the database will give you enough “mechanical sympathy” to make your data really fly. And after this talk you’ll feel confident contributing code that scratches your connected data itches.

Posting the slides for this presentation would be very helpful. Camera work is good but this is the sort of material that needs to be studied in detail.

Interesting comparison between Gremlin and Cypher. Gremlin as a DSL in Groovy has a full programming language available.

I can’t promise that this presentation will make you a better Neo4j user/developer, but it won’t hurt. ;-)

New Opportunities for Connected Data (logic, contagion relationships and merging)

January 26th, 2012

New Opportunities for Connected Data by Ian Robinson, Neo Technologies, Inc.

An in depth discussion of relational, NoSQL and graph database views of the world.

I must admit to being surprised when James Frazer’s Golden Bough came up in the presentation. It was used quite effectively as an illustration but I have learned to not expect humanities references or examples in CS presentations. This was a happy exception.

I agree with Ian that the relational world view remains extremely useful but also that it limits the data that can be represented and queried.

Complex relationships between entities simply don’t come up with relational databases because they aren’t easy (if possible) to represent.

I would take Ian’s point a step further and point out that logic, as in RDF and the Semantic Web, is a similar constraint.

Logic can be very useful in any number of areas, just like relational databases, but it only represents a very small slice of the world. A slice of the world that can be represented quite artificially without contradictions, omissions, inconsistencies, or any of the other issues that make logic systems fall over clutching their livers.

BTW, topic mappers need to take a look at timemark 34.26. The representation of the companies who employ workers and the “contagion” relationships. (You will have to watch the video to find out why I say “contagion.” It is worth the time.) Does that suggest to you that I could point topics to a common node based on their possession of some property, say a subject identifier? Such that when I traverse any of those topics I can go to the common node and produce a “merged” result if desired?

I say that because any topic could point to more than one common node, depending upon the world view of an author. That could be very interesting in terms of comparing how authors would merge topics.

AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB

January 26th, 2012

AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB by Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.

From the post:

Apache Hadoop and NoSQL databases are complementary technologies that together provide a powerful toolbox for managing, analyzing, and monetizing Big Data. That’s why we were so excited to provide out-of-the-box Amazon Elastic MapReduce (Amazon EMR) integration with Amazon DynamoDB, providing customers an integrated solution that eliminates the often prohibitive costs of administration, maintenance, and upfront hardware. Customers can now move vast amounts of data into and out of DynamoDB, as well as perform sophisticated analytics on that data, using EMR’s highly parallelized environment to distribute the work across the number of servers of their choice. Further, as EMR uses a SQL-based engine for Hadoop called Hive, you need only know basic SQL while we handle distributed application complexities such as estimating ideal data splits based on hash keys, pushing appropriate filters down to DynamoDB, and distributing tasks across all the instances in your EMR cluster.

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

Time to get that AWS account!

Persisting relationship entities in Neo4j

January 26th, 2012

Persisting relationship entities in Neo4j by Sunil Prakash Inteti

From the post:

Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. In Neo4j data gets stored in nodes connected to each other by relationship entities that carry its own properties. These relationships are very important in graphs and helps to traverse the graph and make decisions. This blog discusses the two ways to persist a relationship between nodes and also the scenario’s which suits their respective usage. Spring-data-neo4j by springsource gives us the flexibility of using the spring programming model when working with neo4j database. The code examples in this blog will be using spring-data-neo4j.

Excellent post that illustrates how a relationship is persisted can make a big difference in performance. Very much design guideline material.

Tenzing: A SQL Implementation On The MapReduce Framework

January 26th, 2012

Tenzing: A SQL Implementation On The MapReduce Framework by Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon and Michael Wong.

Abstract:

Tenzing is a query engine built on top of MapReduce for ad hoc analysis of Google data. Tenzing supports a mostly complete SQL implementation (with several extensions) combined with several key characteristics such as heterogeneity, high performance, scalability, reliability, metadata awareness, low latency, support for columnar storage and structured data, and easy extensibility. Tenzing is currently used internally at Google by 1000+ employees and serves 10000+ queries per day over 1.5 petabytes of compressed data. In this paper, we describe the architecture and implementation of Tenzing, and present benchmarks of typical analytical queries.

Of the conclusions of the authors:

  • It is possible to create a fully functional SQL engine on top of the MapReduce framework, with extensions that go beyond SQL into deep analytics.
  • With relatively minor enhancements to the MapReduce framework, it is possible to implement a large number of optimizations currently available in commercial database systems, and create a system which can compete with commercial MPP DBMS in terms of throughput and latency.
  • The MapReduce framework provides a combination of high performance, high reliability and high scalability on cheap unreliable hardware, which makes it an excellent platform to build distributed applications that involve doing simple to medium complexity operations on large data volumes.
  • By designing the engine and the optimizer to be aware of the characteristics of heterogeneous data sources, it is possible to create a smart system which can fully utilize the characteristics of the underlying data sources.
  • the last one is of the most interest to me. Which one interests you the most?

    BTW, the authors mention:

    We are working on various other enhancements and believe we can cut this time down to less than 5 seconds end-to-end, which is fairly acceptable to the analyst community.

    I think the analyst community needs to use 2400 baud modems for a month or two. ;-)

    Sub-5 second performance is sometimes useful, even necessary. But as a general requirement?

    Google: MoreSQL is Real

    January 26th, 2012

    Google: MoreSQL is Real by Williams Edwards.

    One comment on the post summarized it:

    Super rant that really crystallized my discomfort with the whole NoSQL business .. At the end of the day, it’s a ‘war’ between various APIs to access B/B+ trees !

    Well, but it is an enjoyable rant, so read it and see for yourself.

    I do think one of the advantages of all the hype has been an increase in at least considering different options and data structures. Some of them will be less useful than the ones that are common now, but it only take one substantial improvement to make it all worthwhile.

    Introduction to Graph Databases

    January 26th, 2012

    Introduction to Graph Databases

    Thursday, January 26 at 10:00 PST

    From the registration page:

    Join this webinar for a fast paced introduction to graph databases, taught by Emil Eifrem, CEO of Neo Technology.

    This webinar is for Java developers, but no previous knowledge of graph databases is required.

    Learn:

    • use cases for graph databases
    • specific coding techniques for working with a graph database

    I hate to post anything early in the day and so break “form” as it were but thought you might need time to register, etc. ;-)

    Introduction data structure for GraphDB

    January 25th, 2012

    Introduction data structure for GraphDB by Shunya Kimura.

    Detailed examination of the data structures that manage nodes and relationships between nodes. Highly recommended.

    RuleML 2012

    January 25th, 2012

    RuleML 2012: The 6th International Symposium on Rules: Research Based and Industry Focused

    Important dates:

    Abstract submission: March 25, 2012
    Paper submission: April 1, 2012
    Notification of acceptance/rejection: May 20, 2012
    Camera-ready copy due: June 10, 2012
    RuleML-2012 dates: August 27-29, 2012

    The International Symposium on Rules, RuleML, has evolved from an annual series of international workshops since 2002, international conferences in 2005 and 2006, and international symposia since 2007. This year the RuleML Symposium will be held in conjunction with ECAI 2012, the 20th biennial European Conference on Artificial Intelligence, in Montpellier, France, August 27-29, 2012.

    RuleML-2012@ECAI is a research-based, industry-focused symposium: its main goal is to build a bridge between academia and industry in the field of rules and semantic technology, and so to stimulate the cooperation and interoperability between business and research, by bringing together rule system providers, participants in rule standardization efforts, open source communities, practitioners, and researchers. The concept of the symposium has also advanced continuously in the face of extremely rapid progress in practical rule and event processing technologies. As a result, RuleML-2012 will feature hands-on demonstrations and challenges alongside a wide range of thematic tracks. It will thus be an exciting venue to exchange new ideas and experiences on all issues related to the engineering, management, integration, interoperation, and interchange of rules in distributed enterprise intranets and open distributed environments.

    We invite you to share your ideas, results, and experiences: as an industry practitioner, rule system provider, technical expert and developer, rule user or researcher, exploring foundations, developing systems and applications, or using rule-based systems.

    We invite high-quality submissions related to (but not limited to) one or more of the following topics:

    • Rules and Automated Reasoning
    • Logic Programming and Non-monotonic Reasoning
    • Int. Conference track on Pragmatic Web (see track description below)
    • Rule-Based Policies, Reputation and Trust
    • Rule-based Event Processing and Reaction Rules
    • Fuzzy Rules and Uncertainty
    • Rule Transformation, Extraction and Learning
    • Vocabularies, Ontologies, and Business rules
    • Rules in online-market research and online marketing
    • Rule Markup Languages and Rule Interchange
    • General Rule Topics

    Late summer, Montpellier, France, an interesting meeting, what more would you want?

    Searching and Browsing Linked Data with SWSE: the SemanticWeb Search Engine

    January 25th, 2012

    Searching and Browsing Linked Data with SWSE: the SemanticWeb Search Engine by Aidan Hogan, Andreas Harth, Jürgen Umbrich, Sheila Kinsella, Axel Polleres and Stefan Decker.

    Abstract:

    In this paper, we discuss the architecture and implementation of the SemanticWeb Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data { loosely also known as Linked Data { which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web { in terms of scale, unreliability, inconsistency and noise { are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-eff ort manner, for use on Web data. Throughout, we o ffer evaluation and complementary argumentation to support our design choices, and also off er discussion on future directions and open research questions. Later, we also provide candid discussion relating to the diffculties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.

    This is the paper that Ivan Herman mentions at Nice reading on Semantic Search.

    It covers a lot of ground in fifty-five (55) pages but it doesn’t take long to hit an issue I wanted to ask you about.

    At page 2, Google is described as follows:

    In the general case, Google is not suitable for complex information gathering tasks requiring aggregation from multiple indexed documents: for such tasks, users must manually aggregate tidbits of pertinent information from various recommended heterogeneous sites, each such site presenting information in its own formatting and using its own navigation system. In e ffect, Google’s limitations are predicated on the lack of structure in HTML documents, whose machine interpretability is limited to the use of generic markup-tags mainly concerned with document rendering and linking. Although Google arguably makes the best of the limited structure available in such documents, most of the real content is contained in prose text which is inherently diffcult for machines to interpret. Addressing this inherent problem with HTML Web data, the Semantic Web movement provides a stack of technologies for publishing machine-readable data on the Web, the core of the stack being the Resource Description Framework (RDF).

    A couple of observations:

    Although Google needs no defense from me, I would argue that Google never set itself the task of aggregating information from indexed documents. Historically speaking, IR has always been concerned with returning relevant documents and not returning irrelevant documents.

    Second, the lack of structure in HTML documents (although the article mixes in sites with different formatting) is no deterrent to a human reader aggregating “tidbits of pertinent information….” I rather doubt that writing all the documents in valid Springer LaTeX would make that much difference on the “tidbits of pertinent information” score.

    This is my first pass through the article and I suspect it will take three or more to become comfortable with it.

    Do you agree/disagree that the task of IR is to retrieve documents, not “tidbits of pertinent information?”

    Do you agree/disagree that HTML structure (or lack thereof) is that much of an issue for interpretation of document?

    Thanks!

    Mule Studio – Getting Started

    January 25th, 2012

    Very rough notes on the basic introduction to Mule Studio.

    Getting Started with Mule Studio

    Under What’s Next, the first line reads:

    Cloud computing is closer than ever before. Why not start by checking out our fifteen-minute Basic Tutorial?

    What does “Cloud computing is closer than ever before.” have to do with Mule Studio? The implication is that the “fifteen-minute Basic Tutorial” is going to teach me about “cloud computing.”

    Basic Studio Tutorial

    Suggestion: Either call the software MuleStudio, or Studio or Mule, or start with one and say: “we use herein ….” and then use it. Consistent naming isn’t that hard.

    BTW, documentation should be updated to reflect current directory structure used with examples.

    Documentation says: \MuleStudio\Examples\SpellChecker\InXML.

    MuleStudio\examples\SpellChecker\spellcheck.xml is the actual directory

    OK, so the directory is missing the InXML and OutXML directories.

    Ah, it is expecting an empty InXML directory so it fails if you point to the directory structure as delivered.

    So:

    MuleStudio\examples\SpellChecker mkdir InXML, and

    MuleStudio\examples\SpellChecker mkdir OutXML

    I left spellcheck.xml in the SpellChecker directory (remember to copy, not move so you don’t lose the file, or you could create another one).

    Works.

    For a confidence building example I would not switch back and forth between Windows and Unix file/path syntax. Pick one and stay with it.

    I would not suggest other thing to explore during the first example. Could have a staged follow up after the first example, but only afterwards.

    Oh, and fix the paths or say they have to be built in order for the first example to run. If possible configure the default window so the message box isn’t so small. Users need to see something happening.

    One more thing, the example is vague about whether the OS moves the file or if the file can be moved in MuleStudio. I used the OS just out of habit. I did look and there was no obvious way to copy/move the file in the application.

    The narrative could be smoother and with the technical errors fixed it would be an adequate introduction to the software. It would be a better introduction if there was some motive giving for the example application, why would I care? sort of thing. Leveraging the power of Google or something like that.

    Watch for notes on the intermediate tutorial in a day or so.


    Summary:

    In the SpellChecker directory (under MuleStudio/example) add InXML and OutXML directories, thus:

    mkdir InXML

    mkdir OutXML

    Leave spellcheck.xml in the SpellChecker directory until told to copy it into InXML

    Blogging Prize – Mule Studio

    January 25th, 2012

    From my inbox:

    Blog About Mule Studio, Get a T-Shirt

    Are you as excited about Mule Studio as we are? If so, blog about it and send a link to your blog and your postal address to reply@mulesoft.com and we’ll send you a Mule T-Shirt.

    Even if you don’t need the T-Shirt download Mule Studio and take the tour.

    Interfaces are always slightly different, range/ease of operations offered vary, documentation varies wildly, so if nothing else, you will learn something in the process.

    And, if you blog about it, etc., you will get a new T-Shirt.

    Something to look forward to in the mailbox!


    A couple of notes on getting started, not that you need them but someone else may:

    Step 1 reads:

    Before you unzip the muleStudio
    package, ensure that it has the permissions required for installation.
    To set these permissions, open a console and execute the following command:
    chmod u+x muleStudio

    The muleStudio folder or directory appears when the unzip operation completes.

    Err? Permission to install?

    Permission to install is a user privilege question, not setting the file to be executable.

    On Linux (Ubuntu 10.10) I just tossed it into my /home/patrick/working directory where I keep all manner of software. It’s just me on the box so I don’t have to worry about making apps available to others.

    But, after you unzip the file you do have to:

    chmod u+x muleStudio*

    BTW, the folder I got was: MuleStudio, so my path is /home/patrick/working/MuleStudio.

    Step 2 Execute reads:

    Unzip the muleStudio package, which is located in the following path:
    /MuleStudio
    Enter the following command in the console to launch muleStudio:
    ./muleStudio
    Alternatively, double click the muleStudio file in the Linux graphic interface, as shown above

    Err, but we just unzipped it, yes?

    Let’s re-write steps 1 and 2:

    Step 1:

    Unzip the MuleSoft package for your system into a convenient location.

    The folder or directory name will be MuleSoft

    Step 2:

    Change to the MuleSoft directory.

    Make the muleStudio* file executable with the command:

    chmod u+x muleStudio*

    Start the program by:

    Double-clicking on muleStudio* in the graphic interface, or

    entering the command:

    /.muleStudio*

    That is trivial in terms of improving the use of MuleStudio but when clear writing becomes a habit, more difficult topics become easier for users.

    Documents as geometric objects: how to rank documents for full-text search

    January 25th, 2012

    Documents as geometric objects: how to rank documents for full-text search Michael Nielsen on July 7, 2011.

    From the post:

    When we type a query into a search engine – say “Einstein on relativity” – how does the search engine decide which documents to return? When the document is on the web, part of the answer to that question is provided by the PageRank algorithm, which analyses the link structure of the web to determine the importance of different webpages. But what should we do when the documents aren’t on the web, and there is no link structure? How should we determine which documents most closely match the intent of the query?

    In this post I explain the basic ideas of how to rank different documents according to their relevance. The ideas used are very beautiful. They are based on the fearsome-sounding vector space model for documents. Although it sounds fearsome, the vector space model is actually very simple. The key idea is to transform search from a linguistic problem into a geometric problem. Instead of thinking of documents and queries as strings of letters, we adopt a point of view in which both documents and queries are represented as vectors in a vector space. In this point of view, the problem of determining how relevant a document is to a query is just a question of determining how parallel the query vector and the document vector are. The more parallel the vectors, the more relevant the document is.

    This geometric way of treating documents turns out to be very powerful. It’s used by most modern web search engines, including (most likely) web search engines such as Google and Bing, as well as search libraries such as Lucene. The ideas can also be used well beyond search, for problems such as document classification, and for finding clusters of related documents. What makes this approach powerful is that it enables us to bring the tools of geometry to bear on the superficially very non-geometric problem of understanding text.

    Very much looking forward to future posts in this series. There is no denying the power of “vector space model” but that leaves unasked what is lost in the transition from linguistic to geometric space?

    3rd Globals Challenge

    January 25th, 2012

    3rd Globals Challenge

    Contest starts: 10 Feb 12 18:00 EST
    Contest ends: 17 Feb 12 18:00 EST

    Topic mappers take note:

    All applications must be built using Globals. However, you are also allowed to use additional technologies to supplement Globals (emphasis added, additional technologies, unlike some linked data competitions)

    The email I got reports:

    • A cash prize of USD $3,500 for the winning entry
    • A press release announcing the winning participant and solution
    • A chance to win a free registration for the InterSystems Global Summit

    You might want to drop by Globals to grab a copy of the software and read up on the documentation.

    You can also see the prior challenges. These are non-trivial events but that also means you will learn a lot in the process.

    Nice reading on Semantic Search

    January 25th, 2012

    Nice reading on Semantic Search by Ivan Herman.

    From the post:

    I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

    Interested to hear your take on Ivan’s comments on owl:sameAs.

    The semantics of words, terms, ontology classes are not stable over time and/or users. If you doubt that statement, leaf through the Oxford English Dictionary for ten (10) minutes.

    Moreover, the only semantics we “see” in words, terms or ontology classes are those we assign them. We can discuss the semantics of Hebrew words in the Dead Sea Scrolls but those are our semantics, not those of the original users of those words. May be close to what they meant, may not. Can’t say for sure because we can’t ask and would lack the context to understand the answer if we could.

    Adding more terms to use as supplements to owl:sameAs just increases the chances for variation. And error if anyone is going to enforce their vision of broadMatch on usages of that term by others.

    Berlin Buzzwords 2012

    January 25th, 2012

    Berlin Buzzwords 2012

    Important Dates (all dates in GMT +2)

    Submission deadline: March 11th 2012, 23:59 MEZ
    Notification of accepted speakers: April 6st, 2012, MEZ
    Publication of final schedule: April 13th, 2012
    Conference: June 4/5. 2012

    The call:

    Call for Submission Berlin Buzzwords 2012 – Search, Store, Scale — June 4 / 5. 2012

    The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

    • IR / Search – Lucene, Solr, katta, ElasticSearch or comparable solutions
    • NoSQL – like CouchDB, MongoDB, Jackrabbit, HBase and others
    • Large Data Processing – Hadoop itself, MapReduce, Cascading or Pig and relatives

    Related topics not explicitly listed above are more than welcome. We are looking for presentations on the implementation of the systems themselves, technical talks, real world applications and case studies.

    …(moved dates to top)…

    High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

    Here is your chance to experience summer in Berlin (Berlin Buzzwords 2012) and in Montreal (Balisage).

    Seriously, both conferences are very strong and worth your attention.