Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 10, 2014

Parallel Data Generation Framework

Filed under: Benchmarks,Data — Patrick Durusau @ 11:06 am

Parallel Data Generation Framework

From the webpage:

The Parallel Data Generation Framework (PDGF) is a generic data generator for database benchmarking. Its development started at the University of Passau at the group of Prof. Dr. Harald Kosch.

PDGF was designed to take advantage of today’s multi-core processors and large clusters of computers to generate large amounts of synthetic benchmark data very fast. PDGF uses a fully computational approach and is a pure Java implementation which makes it very portable.

I mention this to ask if you are aware of methods for generating unstructured text with known characteristics such as the number of entities and their representations in the data set?

A “natural” dataset, say blog posts or emails, etc., can be probed to determine its semantic characteristics but I am interested in generation of a dataset with known semantic characteristics.

Thoughts?

I first saw this in a tweet by Stefano Bertolo.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress