Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 14, 2014

P2P to Freedom?

Filed under: Cybersecurity,P2P,Security — Patrick Durusau @ 2:22 pm

Anti-censorship app Lantern wants to become the SETI@home of free speech by Janko Roettgers.

From the post:

Facebook? Blocked. YouTube? Blocked. Twitter? Definitely blocked. Countries like China and Iran have been trying to control the flow of online information for years. Now, there’s an app that wants to take a page from the playbook of crowdsourced computing projects like SETI and poke holes in China’s so-called great firewall and other censorship efforts.

Lantern, as the project is called, is offering users in countries with internet censorship a proxy that unblocks social networks, news sites and political blogs. People in China and elsewhere have been using commercial proxies for years, resulting in a game of whack-a-mole, where censors would simply block access to the IP address of a proxy, forcing users to move on to the next available service.

Lantern wants to solve this issue through a P2P architecture: Users in censored countries don’t connect to a central server with an easily recognizable IP address, but instead route their website requests through a computer run by a volunteer in the U.S. or elsewhere. Lantern is a simple app available for Windows, OS X and Linux. Upon running it for the first time, users indicate whether they want to give access or get access to censored sites. And once it runs, the main UI is actually a data visualization that shows usage of the app around the world in real-time.

Janko has a great summary of the current status of Lantern and its bid to help users avoid censorship.

Lantern is focused on the problem of censorship, which is an important issue in many parts of the world.

But the same principles, that of a P2P network, should be applicable to a no-censorship but track everything network such as in the United States and elsewhere.

If you don’t think a record of your web traffic, so-called “metadata,” might provide information about you, exchange browser histories with another user.

A robust P2P solution would not prevent tracking entirely, but it could make it too burdensome to be practical in most cases. Like all security, it is a question of secure enough for some purpose and length of time.

Lantern is a project to track and contribute to if you find censorship and/or tracking problematic.

July 28, 2014

Synchronizer Based on Operational Transformation…

Filed under: Consistency,Operations,P2P,Synchronization,Version Vectors,Versioning — Patrick Durusau @ 7:53 pm

Synchronizer Based on Operational Transformation for P2P Environments by Michelle Cart and Jean Ferrié

Abstract:

Reconciling divergent copies is a common problem encountered in distributed or mobile systems, asynchronous collaborative groupware, concurrent engineering, software configuration management, version control systems and personal work involving several mobile computing devices. Synchronizers provide a solution by enabling two divergent copies of the same object to be reconciled. Unfortunately, a master copy is generally required before they can be used for reconciling n copies, otherwise copy convergence will not be achieved. This paper presents the principles and algorithm of a Synchronizer which provides the means to reconcile n copies, without discriminating in favour of any particular copy. Copies can be modified (concurrently or not) on different sites and the Synchronizer we propose enables them to be reconciled pairwise, at any time, regardless of the pair, while achieving convergence of all copies. For this purpose, it uses the history of operations executed on each copy and Operational Transformations. It does not require a centralised or ordering (timestamp, state vector, etc.) mechanism. Its main advantage is thus to enable free and lazy propagation of copy updates while ensuring their convergence – it is particularly suitable for P2P environments in which no copy should be favoured.

Not the oldest work on operational transformations, 2007, nor the most recent.

Certainly of interest for distributed topic maps as well as other change tracking applications.

I first saw this in a tweet by onepaperperday.

November 24, 2012

NuoDB [Everything in NuoDB is an Atom]

Filed under: NoSQL,NuoDB,P2P — Patrick Durusau @ 7:09 pm

NuoDB

I last wrote about NuoDB in February of 2012, it was still in private beta release.

You can now download a free community edition with a limit of two nodes for the usual platforms.

The “under the hood” page reads (in part):

Everything in NuoDB is an Atom

Under the hood, NuoDB is an asynchronous, decentralized, peer-to-peer database. The NuoDB system is also object-oriented. Objects in NuoDB know how to perform various actions that create specific behaviors in the overall database. And at the heart of every object in NuoDB is the Atom. An Atom in NuoDB is like a single bird in a flock.

Atoms are self-describing objects (data and metadata) that together comprise the database. Everything in the NuoDB database is an Atom, including the schema, the indexes, and even the underlying data. For example, each table is an Atom that describes the metadata for the table and can reference other Atoms; such as Atoms that describe ranges of records in the table and their versions.

Atoms are Powerful

Atoms are intelligent, powerful, self-describing objects that together form the NuoDB database. Atoms know how to perform many actions, like these:

  • Atoms know how to make copies of themselves.
  • Atoms keep all copies of themselves up to date.
  • Atoms can broadcast messages. Atoms listen for events and changes from other Atoms.
  • Atoms can request data from other Atoms.
  • Atoms can serialize themselves to persistent storage.
  • Atoms can retrieve data from storage.

The Atoms are the Database

Everything in the database is an Atom, and the Atoms are the database. The Atoms work in concert to form both the Transaction (or Compute) Tier, and the Storage Tier.

A NuoDB Transaction Engine is a process that executes the SQL layer and is comprised completely of Atoms. The Transaction Engine operates on Atoms, listens for changes, and communicates changes with other Transaction Engines in the database.

A NuoDB Storage Manager is simply a special kind of Transaction Engine that allows Atoms to serialize themselves to permanent storage (such as a local disk or Amazon S3, for example).

A NuoDB database can be as simple as a single Transaction Engine and a single Storage Manager, or can be as complex as tens of Transaction Engines and Storage Managers distributed across dozens of computer hosts.

Some wag in a report that reminded me to look at NuoDB again was whining about how NuoDB would perform in query intensive environments? I guess downloading a free copy to see was too much effort.

Of course, you would have to define “query intensive” environment and that would be no easy task. Lots of users with simple queries? (Define “lots” and “simple.”)

Just a suspicion as I wait for my download URL to arrive, “query” in a atom based system may not have the same internal processes as a traditional relational database. Or perhaps not entirely.

For example, what if the notion of “retrieval” from a location in memory is no longer operative? That is a query is composed of atoms that begin messaging as they are composed and so receiving information before the user reaches the end of a query string?

And more than that, query atoms that occur frequently could be persisted so the creation cost is not incurred in subsequent queries.

Hard to say without knowing more about it but it definitely should be on your short list of products to watch.

August 17, 2012

Forwarding Without Repeating: Efficient Rumor Spreading in Bounded-Degree Graphs

Filed under: Graphs,Messaging,P2P — Patrick Durusau @ 7:34 pm

Forwarding Without Repeating: Efficient Rumor Spreading in Bounded-Degree Graphs by Vincent Gripon, Vitaly Skachek, and Michael Rabbat.

Abstract:

We study a gossip protocol called forwarding without repeating (FWR). The objective is to spread multiple rumors over a graph as efficiently as possible. FWR accomplishes this by having nodes record which messages they have forwarded to each neighbor, so that each message is forwarded at most once to each neighbor. We prove that FWR spreads a rumor over a strongly connected digraph, with high probability, in time which is within a constant factor of optimal for digraphs with bounded out-degree. Moreover, on digraphs with bounded out-degree and bounded number of rumors, the number of transmissions required by FWR is arbitrarily better than that of existing approaches. Specifically, FWR requires O(n) messages on bounded-degree graphs with n nodes, whereas classical forwarding and an approach based on network coding both require {\omega}(n) messages. Our results are obtained using combinatorial and probabilistic arguments. Notably, they do not depend on expansion properties of the underlying graph, and consequently the message complexity of FWR is arbitrarily better than classical forwarding even on constant-degree expander graphs, as n \rightarrow \infty. In resource-constrained applications, where each transmission consumes battery power and bandwidth, our results suggest that using a small amount of memory at each node leads to a significant savings.

Interesting work that may lead to topic maps in resource-constrained environments.

August 6, 2012

What’s the Difference? Efficient Set Reconciliation without Prior Context

Filed under: Distributed Systems,P2P,Set Reconciliation,Sets,Topic Map Software — Patrick Durusau @ 4:56 pm

What’s the Difference? Efficient Set Reconciliation without Prior Context by David Eppstein, Michael T. Goodrich, Frank Uyeda, and George Varghese.

Abstract:

We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peer-to-peer network, and synchronizing link-state databases after a partition.

Our basic set-reconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms min-wise sketches [3] for small set differences.

Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.

Distributed topic maps anyone?

October 9, 2011

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web

Filed under: Artificial Intelligence,P2P,Semantic Web — Patrick Durusau @ 6:43 pm

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web by P. Adjiman, P. Chatalic, F. Goasdoue, M. C. Rousset, and L. Simon.

Abstract:

In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The main contribution of this paper is to provide the first consequence finding algorithm in a peer-to-peer setting: DeCA. It is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Another important contribution is to apply this general distributed reasoning setting to the setting of the Semantic Web through the Somewhere semantic peer-to-peer data management system. The last contribution of this paper is to provide an experimental analysis of the scalability of the peer-to-peer infrastructure that we propose, on large networks of 1000 peers.

Interesting research on its own but I was struck by the phrase: “but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary.

Can we say that our “peers,” share our “mappings”?

That is mappings between terms and our expectation of others with regard to those terms.

Not the mapping between the label and the subject for which it is a label.

Or is the second mapping encompassed in the first? Or merely a partial expression of the first? (That seems more likely.)

Not immediately applicable to anything but may be important in terms of the mappings we are seeking to capture.

September 13, 2011

Query processing in distributed, taxonomy-based information sources

Filed under: P2P,Query Expansion,Taxonomy — Patrick Durusau @ 7:16 pm

Query processing in distributed, taxonomy-based information sources by Carlo Meghini, Yannis Tzitzikas, Veronica Coltella, and Anastasia Analyti.

Abstract:

We address the problem of answering queries over a distributed information system, storing objects indexed by terms organized in a taxonomy. The taxonomy consists of subsumption relationships between negation-free DNF formulas on terms and negation-free conjunctions of terms. In the first part of the paper, we consider the centralized case, deriving a hypergraph-based algorithm that is efficient in data complexity. In the second part of the paper, we consider the distributed case, presenting alternative ways implementing the centralized algorithm. These ways descend from two basic criteria: direct vs. query re-writing evaluation, and centralized vs. distributed data or taxonomy allocation. Combinations of these criteria allow to cover a wide spectrum of architectures, ranging from client-server to peer-to-peer. We evaluate the performance of the various architectures by simulation on a network with O(10^4) nodes, and derive final results. An extensive review of the relevant literature is finally included.

Two quick comments:

While simulations are informative, I am curious how the five architectures would fare against actual taxonomies? Thinking that the complexity at any particular level varies greatly from taxonomy to taxonomy, assuming they are taxonomies that record natural phenomena.

Second, I think there is a growing recognition that while some data can be successfully gathered to a single location for processing, there is an increasing amount of data that may be partially accessible but that cannot be transfered for privacy, security or other concerns. And such diverse systems are likely to have their own means of identifying subjects.

September 11, 2011

Efficient P2P Ensemble Learning with Linear Models on Fully Distributed Data

Filed under: Ensemble Methods,Machine Learning,P2P — Patrick Durusau @ 7:02 pm

Efficient P2P Ensemble Learning with Linear Models on Fully Distributed Data by Róbert Ormándi, István Hegedűs, and Márk Jelasity.

Abstract:

Machine learning over fully distributed data poses an important problem in peer-to-peer (P2P) applications. In this model we have one data record at each network node, but without the possibility to move raw data due to privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult, because there is no possibility to learn local models, yet the communication cost needs to be kept low. Here we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method which-through the continuous combination of the models in the network-implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared to independent random walks. Our experimental analysis demonstrates the performance and robustness of the proposed approach.

Interesting. In a topic map context, I wonder about creating associations based on information that is not revealed to the peer making the association? Or the peer suggesting the association?

Powered by WordPress