Faker

Description of the problem

Phone numbers, credit card numbers, and social security numbers--some data is sensitive and difficult to come by due to privacy issues. Applications of using fake data have been contributed to improving data analysis, logistics, and product development. For example, for software development, where continuous testing is expected to happen for a short time, you will need numerous data to test with, yet quality in data, whether the data is true or not, may not be so important. In such a case, you can quickly generate data by using the faker function. This faker project shows you how to generate basic data from scratch.

What you will build

Fake data

What you will learn

Generate race specific data
Generate data that contains names and zip code along with sensitive data such as credit card

OrderedDict
Faker

Import two packages above:

OrderedDict: a dictionary subclass that remembers the order that keys were first inserted. In the case above, languages of "en-US", "en-PH", and "ja-JP" are in order.
Faker: It generates fake data based on the attributes.

Fake.locales gets you the list of the languages. fake['en_GB'] does not work because ‘en_GB' has not been selected. Other errors in the notebook also show that you need to be specific on the data you want to generate. The code below returns an error because "luzon" is an island in Philippines, not Japan.

In this step, you can generate data including names, zip code, locations (based on the weights). For example, if you want to generate names for the Americans, Philipinos, and Japanese for a specific ratio, you can set up as follows. Attributes are yielded at random.

fake.seed can randomly generate unique objects and get the same results no matter how many times you run the code.

Here is the data generated for the Americans.

You have learned how to use faker to generate data. At the end of the project, you have created a small database with certain information on American people's name, zip code, and credit card number. It is up to you to create data with whatever variables, but with some creativity, you can make use of this technique in any business case in a situation where there is no data available or have some privacy issues on using real data.

Data Responsibly.com. Data Synthesizer
https://faker.readthedocs.io/en/master/fakerclass.html#examples