Phone numbers, credit card numbers, and social security numbers--some data is sensitive and difficult to come by due to privacy issues. Applications of using fake data have been contributed to improving data analysis, logistics, and product development. For example, for software development, where continuous testing is expected to happen for a short time, you will need numerous data to test with, yet quality in data, whether the data is true or not, may not be so important. In such a case, you can quickly generate data by using the faker function. This faker project shows you how to generate basic data from scratch.
Import two packages above:
Fake.locales gets you the list of the languages. fake['en_GB'] does not work because ‘en_GB' has not been selected. Other errors in the notebook also show that you need to be specific on the data you want to generate. The code below returns an error because "luzon" is an island in Philippines, not Japan.
In this step, you can generate data including names, zip code, locations (based on the weights). For example, if you want to generate names for the Americans, Philipinos, and Japanese for a specific ratio, you can set up as follows. Attributes are yielded at random.
fake.seed can randomly generate unique objects and get the same results no matter how many times you run the code.
Here is the data generated for the Americans.
You have learned how to use faker to generate data. At the end of the project, you have created a small database with certain information on American people's name, zip code, and credit card number. It is up to you to create data with whatever variables, but with some creativity, you can make use of this technique in any business case in a situation where there is no data available or have some privacy issues on using real data.