Educating the Planet with Pearson by Marko A. Rodriguez.
From the post:
Pearson is striving to accomplish the ambitious goal of providing an education to anyone, anywhere on the planet. New data processing technologies and theories in education are moving much of the learning experience into the digital space — into massive open online courses (MOOCs). Two years ago Pearson contacted Aurelius about applying graph theory and network science to this burgeoning space. A prototype proved promising in that it added novel, automated intelligence to the online education experience. However, at the time, there did not exist scalable, open-source graph database technology in the market. It was then that Titan was forged in order to meet the requirement of representing all universities, students, their resources, courses, etc. within a single, unified graph. Moreover, beyond representation, the graph needed to be able to support sub-second, complex graph traversals (i.e. queries) while sustaining at least 1 billion transactions a day. Pearson asked Aurelius a simple question: “Can Titan be used to educate the planet?” This post is Aurelius’ answer.
Liking the graph approach in general and Titan in particular does not make me any more comfortable with some aspects of this posting.
You don’t need to spin up a very large Cassandra database on Amazon to see the problems.
Consider the number of concepts for educating the world, some 9,000 if the chart is to be credited.
Suggested Upper Merged Ontology (SUMO) has “~25,000 terms and ~80,000 axioms when all domain ontologies are combined.”
The SUMO totals being before you get into the weeds of any particular subject, discipline or course material.
Or the subset of concepts and facts represented in DBpedia:
The English version of the DBpedia knowledge base currently describes 3.77 million things, out of which 2.35 million are classified in a consistent Ontology, including 764,000 persons, 573,000 places (including 387,000 populated places), 333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games), 192,000 organizations (including 45,000 companies and 42,000 educational institutions), 202,000 species and 5,500 diseases.
In addition, we provide localized versions of DBpedia in 111 languages. All these versions together describe 20.8 million things, out of which 10.5 million overlap (are interlinked) with concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 10.3 million unique things in up to 111 different languages; 8.0 million links to images and 24.4 million HTML links to external web pages; 27.2 million data links into external RDF data sets, 55.8 million links to Wikipedia categories, and 8.2 million YAGO categories. The dataset consists of 1.89 billion pieces of information (RDF triples) out of which 400 million were extracted from the English edition of Wikipedia, 1.46 billion were extracted from other language editions, and about 27 million are data links to external RDF data sets. The Datasets page provides more information about the overall structure of the dataset. Dataset Statistics provides detailed statistics about 22 of the 111 localized versions.
I don’t know if the 9,000 concepts cited in the post would be sufficient for a world wide HeadStart program in multiple languages.
Moreover, why would any sane person want a single unified graph to represent course delivery from Zaire to the United States?
How is a single unified graph going to deal with the diversity of educational institutions around the world? A diversity that I take as a good thing.
It sounds like Pearson is offering a unified view of education.
My suggestion is to consider the value of your own diversity before passing on that offer.