Archive for the ‘Memory’ Category

bAbI – Facebook Datasets For Automatic Text Understanding And Reasoning

Sunday, February 21st, 2016

The bAbI project

Four papers and datasets on text understanding and reasoning from Facebook.

Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin and Tomas Mikolov. Towards AI Complete Question Answering: A Set of Prerequisite Toy Tasks. arXiv:1502.05698.

Felix Hill, Antoine Bordes, Sumit Chopra and Jason Weston. The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. arXiv:1511.02301.

Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. arXiv:1511.06931.

Antoine Bordes, Nicolas Usunier, Sumit Chopra and Jason Weston. Simple Question answering with Memory Networks. arXiv:1506.02075.


Math whizzes of ancient Babylon figured out forerunner of calculus

Thursday, January 28th, 2016

The video is very cool and goes along with:

Math whizzes of ancient Babylon figured out forerunner of calculus by Ron Cowen.


What could have happened if a forerunner to calculus wasn’t forgotten for 1400 years?

A sharper question would be:

What if you didn’t lose corporate memory with every promotion, retirement or person leaving the company?

We have all seen it happen and all of us have suffered from it.

What if the investment in expertise and knowledge wasn’t flushed away with promotion, retirement, departure?

That would have to be one helluva ontology to capture everyone’s expertise and knowledge.

What if it wasn’t a single, unified or even “logical” ontology? What if it only represented the knowledge that was important to capture for you and yours? Not every potential user for all time.

Just as we don’t all wear the same uniforms to work everyday, we should not waste time looking for a universal business language for corporate memory.

Unless you are in the business of filling seats for such quixotic quests.

I prefer to deliver a measurable ROI if its all the same to you.

Are you ready to stop hemorrhaging corporate knowledge?

Extended Artificial Memory:…

Monday, October 27th, 2014

Extended Artificial Memory: Toward an Integral Cognitive Theory of Memory and Technology by Lars Ludwig. (PDF) (Or you can contribute to the cause by purchasing a printed or Kindle copy of: Information Technology Rethought as Memory Extension: Toward an integral cognitive theory of memory and technology.)

Convention book selling wisdom is that a title should provoke people to pick up the book. First step towards a sale. Must be the thinking behind this title. Just screams “Read ME!”


Seriously, I have read some of the PDF version and this is going on the my holiday wish list as a hard copy request.


This thesis introduces extended artificial memory, an integral cognitive theory of memory and technology. It combines cross-scientific analysis and synthesis for the design of a general system of essential knowledge-technological processes on a sound theoretical basis. The elaboration of this theory was accompanied by a long-term experiment for understanding [Erkenntnisexperiment]. This experiment included the agile development of a software prototype (Artificial Memory) for personal knowledge management.

In the introductory chapter 1.1 (Scientific Challenges of Memory Research), the negative effects of terminological ambiguity and isolated theorizing to memory research are discussed.

Chapter 2 focuses on technology. The traditional idea of technology is questioned. Technology is reinterpreted as a cognitive actuation process structured in correspondence with a substitution process. The origin of technological capacities is found in the evolution of eusociality. In chapter 2.2, a cognitive-technological model is sketched. In this thesis, the focus is on content technology rather than functional technology. Chapter 2.3 deals with different types of media. Chapter 2.4 introduces the technological role of language-artifacts from different perspectives, combining numerous philosophical and historical considerations. The ideas of chapter 2.5 go beyond traditional linguistics and knowledge management, stressing individual constraints of language and limits of artificial intelligence. Chapter 2.6 develops an improved semantic network model, considering closely associated theories.

Chapter 3 gives a detailed description of the universal memory process enabling all cognitive technological processes. The memory theory of Richard Semon is revitalized, elaborated and revised, taking into account important newer results of memory research.

Chapter 4 combines the insights on the technology process and the memory process into a coherent theoretical framework. Chapter 4.3.5 describes four fundamental computer-assisted memory technologies for personally and socially extended artificial memory. They all tackle basic problems of the memory-process (4.3.3). In chapter 4.3.7, the findings are summarized and, in chapter 4.4, extended into a philosophical consideration of knowledge.

Chapter 5 provides insight into the relevant system landscape (5.1) and the software prototype (5.2). After an introduction into basic system functionality, three exemplary, closely interrelated technological innovations are introduced: virtual synsets, semantic tagging, and Linear Unit tagging.

The common memory capture (of two or more speakers) imagery is quite powerful. It highlights a critical aspect of topic maps.

Be forewarned this is European style scholarship, where the reader is assumed to be comfortable with philosophy, linguistics, etc., in addition to the more narrow aspects of computer science.

To see these ideas in practice:

Slides on What is Artificial Memory.

I first saw this in a note from Jack Park, the source of many interesting and useful links, papers and projects.

atomic<> Weapons

Saturday, February 16th, 2013

atomic<> Weapons by Herb Sutter.

C++ and Beyond 2012: Herb Sutter – atomic<> Weapons, 1 of 2

C++ and Beyond 2012: Herb Sutter – atomic<> Weapons, 2 of 2


This session in one word: Deep.

It’s a session that includes topics I’ve publicly said for years is Stuff You Shouldn’t Need To Know and I Just Won’t Teach, but it’s becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they’ll reach for the big red levers with the flashing warning lights. Since we can’t keep people from pulling the big red levers, we’d better document the A to Z of what the levers actually do, so that people don’t SCRAM unless they really, really, really meant to.

With all the recent posts about simplicity and user interaction, some readers may be getting bored.

Never fear, something a bit more challenging for you.

Multicore memory models along with comments that cite even more research.

Plus I liked the line: “…reach for the big red levers with the flashing warning lights.”


Fast Set Intersection in Memory [Foul! They Peeked!]

Monday, August 20th, 2012

Fast Set Intersection in Memory by Bolin Ding and Arnd Christian König.


Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / sqrt(w) + kr), where r is the intersection size and w is the number of bits in a machine-word. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads.

Important not only for the algorithm but how they arrived at it.

They peeked at the data.

Imagine that.

Not trying to solve the set intersection problem in the abstract but looking at data you are likely to encounter.

I am all for the pure theory side of things but there is something to be said for less airy (dare I say windy?) solutions. 😉

I first saw this at Theoretical Computer Science: Most efficient algorithm to compute set difference?

Introducing Galaxy, a novel in-memory data grid by Parallel Universe

Wednesday, July 11th, 2012

Introducing Galaxy, a novel in-memory data grid by Parallel Universe

Let me jump to the cool part:

Galaxy is a distributed RAM. It is not a key-value store. Rather, it is meant to be used as a infrastructure for building distributed data-structures. In fact, there is no way to query objects stored on Galaxy at all. Instead, Galaxy generates an ID for each item, that you can store in other items just like you’d store a normal reference in a plain object graph.

The application runs on all Galaxy nodes alongside with the portion of the data that is kept (in RAM) at each of the nodes, and when it wishes to read or write a data item, it requests the Galaxy API to fetch it.

At any given time an item is owned by exactly one node, but can be shared by many. Sharers store the item locally, but they can only read it. However, they remember who the owner is, and the owner maintains a list of all sharers. If a sharer (or any node) wants to update the item (a “write”) it requests the current owner for a transfer of ownership, and then receives the item and the list of sharers. Before modifying the item, it invalidates all sharers to ensure consistency. Even when the sharers are invalidated, they remember who the new owner is, so if they’d like to share or own the item again, they can request it from the new owner. If the application requests an item the local node has never seen (or it’s been migrated again after it had been validated), the node multicasts the entire cluster in search of it.

The idea is that when data access is predictable, expensive operations like item migration and a clueless lookup are rare, and more than offset by the common zero-I/O case. In addition, Galaxy uses some nifty hacks to eschew many of the I/O delays even in worst-case scenarios.

In the coming weeks I will post here the exact details of Galaxy’s inner-workings. What messages are transferred, how Galaxy deals with failures, and what tricks it employs to reduce latencies. In the meantime, I encourage you to read Galaxy’s documentation and take it for a spin.

May not fit your use case but like the man says, “take it for a spin.”

Jack Park sent this to my attention.

nessDB v1.8 with LSM-Tree

Saturday, March 3rd, 2012

nessDB v1.8 with LSM-Tree

From the webpage:

nessDB is a fast Key-Value database(embedded), supports Redis-Protocol(PING,SET,MSET,GET,MGET,DEL,EXISTS,INFO,SHUTDOWN).

Which is written in ANSI C with BSD LICENSE and works in most POSIX systems without external dependencies.

nessDB is very efficient on disk-based random access, since it’s using log-structured-merge (LSM) trees.

a. Better performances on Random-Read/Random-Write
b. Log recovery
c. Using LSM-Tree as storage engine
d. Background detached-thread merging
e. Level LRU
f. Support billion data

This came in over the nosql mailing list.

Pointers to literature on how “disk-based random access” has shaped our thinking/technology for processing? Or how going “off cache” for random access is going to shape the next mind-set about processing?

Translation Memory

Tuesday, December 6th, 2011

Translation Memory

As we mentioned in Teaching Etsy to Speak a Second Language, developers need to tag English content so it can be extracted and then translated. Since we are a company with a continuous deployment development process, we do this on a daily basis and as an result get a significant number of new messages to be translated along with changes or deletions of existing ones that have already been translated. Therefore we needed some kind of recollection system to easily reuse or follow the style of existing translations.

A translation memory is an organized collection of text extracted from a source language with one or more matching translations. A translation memory system stores this data and makes it easily accessible to human translators in order to assist with their tasks. There’s a variety of translation memory systems and related standards in the language industry. Yet, the nature of our extracted messages (containing relevant PHP, Smarty, and JavaScript placeholders) and our desire to maintain a translation style curated by a human language manager made us develop an in-house solution.

Go ahead, read the rest of the post, I’ll wait.

Interesting yes?

What if the title of my post were identification memory?

Not really that much difference between translation language to language and identification to identification, where we are talking about the same subject.

Hardly any difference at all when you think about it.

I am sure your current vendors will assure you their methods of identification are the best and they may be right. But on the other hand, they may also be wrong.

And there always is the issues of other data sources that have chosen to identify the same subjects differently. Like your company down the road, say five years from now. Preparing now for that “translation” project in the not too distant future, may save you from losing critical information down the road.

Preserving access to critical data is a form of translation memory. Yes?