Installation and Configuration of Fake2DB Tool for Auto Generate Fake but Valid Data
Data is way more expensive these days since we are in the modern digital era. Having data on your hand, you can perform experiments with various ways to explore the performance of servers, well prepare for future improvement and visualize data as needed. Example: Facebook is using user data to help the business owners to target their customers and improve marketing strategy as well as sale performance. In order to apply any technique to production, scientists have to conduct experiments of the proposal in a dataset which sometimes can take production data to conduct and sometimes have to mock the data by themself. An easy tool to auto-generate the mock data must be needed to execute the experiment.
There are many tools available on the market. In this post, we are going to introduce a tool called "fake2db" which work very well with a various database such as PostgreSQL, MongoDB, MySQL, Redis, and CouchDB. Fake2DB can generate fake but valid data for test purposes using the most popular patterns (AFAIK). You will learn about installation and configuration, usage, and further notes of Fake2DB.
INSTALLATION
Prerequisites: Before we begin to install and configure tool fake2db, we have to install PostgreSQL as mentioned in the post of Installing PostgreSQL 13 from Source in Ubuntu 20.04 then follow the instruction below to start the installation of fake2db.
In the first step, we have to install a pip (Python Installer Package) package management system. by default, ubuntu 20.04 is built with pre-installed python 3 and you can check in the terminal using
python3 --version
Next, using the below command to install Pip, a package management system for python
sudo apt install python3-pip
pip3 --version //To check version of pip
Next, Installing Fake2DB inside package management (Pip)
pip install fake2db
Collecting fake2db
Downloading fake2db-0.5.4.tar.gz (10 kB)
Collecting Faker==0.7.11
Downloading Faker-0.7.11-py2.py3-none-any.whl (579 kB)
Collecting python-dateutil>=2.4
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from Faker==0.7.11->fake2db) (1.14.0)
Building wheels for collected packages: fake2db
Building wheel for fake2db (setup.py) ... done
Created wheel for fake2db: filename=fake2db-0.5.4-py3-none-any.whl size=17157 sha256=f73072fb3a803440c68ae03145d7a4c50e1cb052b6c7d619bde773481dcf0c52
Stored in directory: /home/ubuntu/.cache/pip/wheels/0b/ca/cc/b016e22cc271ee0f5ce223522a1ee033191ab00d57fe1383b7
Successfully built fake2db
Installing collected packages: python-dateutil, Faker, fake2db
WARNING: The script faker is installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script fake2db is installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed Faker-0.7.11 fake2db-0.5.4 python-dateutil-2.8.2
We can remove the warning above by adding PATH into the profile
echo "export PATH=\"/home/ubuntu/.local/bin:\$PATH\"" >> ~/.bashrc && source ~/.bashrc
Next, Installing postgresql-devel if not exist
sudo apt install postgresql-devel
Next, Installing psycopg2
sudo apt install psycopg2-binary
* Finally, checking if tool fake2db is properly installed on the server
fake2db --help
usage: fake2db [-h] [--rows ROWS] [--db DB] [--name NAME] [--host HOST] [--port PORT] [--username USERNAME] [--password PASSWORD] [--custom CUSTOM [CUSTOM ...]] [--locale LOCALE] [--seed SEED]
optional arguments:
-h, --help show this help message and exit
--rows ROWS Amount of rows desired per table
--db DB Db type for creation: sqlite, mysql, postgresql, mongodb, redis, couchdb, to be expanded
--name NAME The name to the db to be generated
--host HOST Hostname of db
--port PORT Port of db
--username USERNAME Username
--password PASSWORD Password
--custom CUSTOM [CUSTOM ...]
Custom schema for db generation, supports functions that fake-factory provides, see fake2db github repository for options https://github.com/emirozer/fake2db
--locale LOCALE The locale of the data to be generated: {bg_BG,cs_CZ,...,zh_CN,zh_TW}. 'en_US' as default
--seed SEED Seed value for the random generator
USAGE
Example 1: Generate 2500 records
$ fake2db --db postgresql --rows 2500 --host localhost --password pwd --user postgres
2021-08-13 03:26:44,845 ubuntu Rows argument : 2500
2021-08-13 03:26:45,412 ubuntu Database created and opened succesfully: postgresql_coyygykv
2021-08-13 03:26:46,846 ubuntu simple_registration Commits are successful after write job!
2021-08-13 03:26:50,792 ubuntu detailed_registration Commits are successful after write job!
2021-08-13 03:26:56,838 ubuntu companies Commits are successful after write job!
2021-08-13 03:26:57,215 ubuntu user_agent Commits are successful after write job!
2021-08-13 03:27:01,936 ubuntu customer Commits are successful after write job!
In the above command, fake2db will create a new database with the random name "postgresql_coyygykv" and create 4 new tables simple_registration, detailed_registration, companies, user_agent, and customer with 2500 rows in each table.
Example 2: Generate 250 rows with a defined database named postgresql12345 and a table named custom with 3 columns (name, date, country)
fake2db --rows 250 --db postgresql --user postgres --password pwd --host localhost --name postgresql123456 --custom name date country
2021-08-13 05:16:20,056 ubuntu Rows argument : 250
2021-08-13 05:16:20,586 ubuntu Database created and opened succesfully: postgresql123456
2021-08-13 05:16:20,587 ubuntu fake2db found valid custom key provided: name
2021-08-13 05:16:20,587 ubuntu fake2db found valid custom key provided: date
2021-08-13 05:16:20,587 ubuntu fake2db found valid custom key provided: country
2021-08-13 05:16:20,817 ubuntu custom Commits are successful after write job!
Example 3: Define more columns for Fake2DB to generate data for us
fake2db --rows 250 --db postgresql --user postgres --password pwd --host localhost --name postgresql123457 --custom name date country currency_code credit_card_full credit_card_provider
2021-08-13 06:26:24,018 ubuntu Rows argument : 250
2021-08-13 06:26:24,174 ubuntu Database created and opened succesfully: postgresql123457
2021-08-13 06:26:24,175 ubuntu fake2db found valid custom key provided: name
2021-08-13 06:26:24,175 ubuntu fake2db found valid custom key provided: date
2021-08-13 06:26:24,175 ubuntu fake2db found valid custom key provided: country
2021-08-13 06:26:24,176 ubuntu fake2db found valid custom key provided: currency_code
2021-08-13 06:26:24,176 ubuntu fake2db found valid custom key provided: credit_card_full
2021-08-13 06:26:24,176 ubuntu fake2db found valid custom key provided: credit_card_provider
2021-08-13 06:26:24,553 ubuntu custom Commits are successful after write job!
Example 4: Define some more columns :)
fake2db --rows 250 --db postgresql --user postgres --password pwd --host localhost --name postgresql123458 --custom name date country currency_code credit_card_full credit_card_provider postalcode date_time_ad day_of_week
2021-08-13 06:28:22,386 ubuntu Rows argument : 250
2021-08-13 06:28:22,556 ubuntu Database created and opened succesfully: postgresql123458
2021-08-13 06:28:22,556 ubuntu fake2db found valid custom key provided: name
2021-08-13 06:28:22,556 ubuntu fake2db found valid custom key provided: date
2021-08-13 06:28:22,556 ubuntu fake2db found valid custom key provided: country
2021-08-13 06:28:22,558 ubuntu fake2db found valid custom key provided: currency_code
2021-08-13 06:28:22,559 ubuntu fake2db found valid custom key provided: credit_card_full
2021-08-13 06:28:22,560 ubuntu fake2db found valid custom key provided: credit_card_provider
2021-08-13 06:28:22,560 ubuntu fake2db found valid custom key provided: postalcode
2021-08-13 06:28:22,561 ubuntu fake2db found valid custom key provided: date_time_ad
2021-08-13 06:28:22,561 ubuntu fake2db found valid custom key provided: day_of_week
2021-08-13 06:28:22,959 ubuntu custom Commits are successful after write job!
As you can see in the above examples, we can add more columns for the table we want to make fake valid data. These columns are pre-defined by the fake2db team and you can check more in their GitHub repo. Moreover, if you are a python developer you can extend to add more custom columns as needed. You can also check how to integrate with other databases and examples of how to custom database data generation as well.
NOTES
- PIP is a standard package management system used to install and manage software written in Python. Most distributions of Python come with pip pre-installed
- postgresql-devel package contains the header files and libraries needed to compile C or C++ applications which will directly interact with a PostgreSQL database management server and the ecpg Embedded C Postgres preprocessor. You need to install this package if you want to develop applications which will interact with a PostgreSQL server. If you're installing postgresqlserver, you need to install this package.
- psycopg2 is the most popular PostgreSQL database adapter for the Python programming language. Psycopg2 is mostly implemented in C as a libpq wrapper, resulting in being both efficient and secure.
THANK YOU!!!