A Tool to Generate Customizable Test Data with Python by Alec Noller.
From the post:
Sometimes you need a dataset to run some tests – just a bunch of data, anything – and it can be unexpectedly difficult to find something that works. There are some useful and readily-available options out there; for example, Matthew Dubins has worked with the Enron email dataset and a complete list of 9/11 victims.
However, if you have more specific needs, particularly when it comes to format and fitting within the structure of a database, and you want to customize your dataset to test one thing or another in particular, take a look at this Python package called python-testdata used to generate customizable test data. It can be set up to generate names in various forms, companies, addresses, emails, and more. The Github also includes some help to get started, as well as examples for use cases.
I hesitated when I first saw this given the overabundance of free data.
But then with “free” data, if it is large enough, you will have to rely on sampling to gauge the performance of software.
Introducing the hazards and dangers of strange data may not be acceptable in all cases.