Python + Google Colab Tutorial for Data Analysis
Introduction
Using data analysis is very important for creating or improving more efficient public policies.
Here, we will talk about how numbers and data can be allies of public policy-making.
Public policies are basically government plans to make society better. They can be about health, education, finance, or even culture. Sometimes, we all help to think about them!
The idea is that these policies follow the rules written in the 1988 Constitution, which is like the manual of laws in Brazil. But how do we know what to do and where to invest public money? That's where data comes in.
Data works like clues that help us understand what is happening in society:
- How much people earn,
- Whether they have access to services like health and education,
- And if opportunities are fairly distributed.
For example, the Brazilian Institute of Geography and Statistics (IBGE) collects information on everything, from how many people live in a city to how long it takes them to get to work.
Transparency is crucial here. Everyone should be able to access and understand data, as this ensures fairness. There are even laws, such as the Access to Information Act and the General Data Protection Law (LGPD), that guarantee access to information and protect personal data.
Tutorial: Simple Data Analysis with Python + Google Colab
We’ll perform a simple Data Analysis using Python, Pandas, Matplotlib, and Google Colab 🚀
1. Access Google Colab
- Open: Google Colab
- You need a Google account.
- It will open a new page with a blank notebook.
2. Rename the notebook
- At the top, rename the file to
lesson1.ipynb
. - You can save it on Google Drive, GitHub, or locally.
3. Project setup
- On the left sidebar, click the folder icon to view project files.
- You can upload datasets directly, or mount Google Drive.
To mount Google Drive, create a new code cell and run:
from google.colab import drive
drive.mount('/content/drive')
Authorize the access by following the link and pasting the generated token.
Once complete, you’ll see:
Mounted at /content/drive
Now you can upload datasets to your Google Drive and access them in Colab.
Example:
import pandas as pd
df = pd.read_csv('/content/drive/My Drive/Datasets/imdb-reviews-pt-br.csv')
df.head()
4. Download open data (INEP/ENEM)
We’ll use data from INEP (National Institute of Educational Studies):
- Download the
.zip
file and extract it. - Identify the datasets we want to analyze (we’ll start with microdata).
5. Import data with Pandas
import pandas as pd
# Example with CSV file
microdata = pd.read_csv("path-to-file.csv", sep=";", encoding="ISO-8859-1")
microdata.head()
You’ll see the first rows of the dataset displayed as a Pandas DataFrame (a table with rows and columns).
6. Exploring the dataset
Check the column names:
microdata.columns.values
Select a few relevant columns to analyze:
selected = microdata.filter(items=["NO_MUNICIPIO_PROVA", "TP_FAIXA_ETARIA", "TP_SEXO"])
selected.head()
7. Simple analysis examples
Count students per municipality:
selected["NO_MUNICIPIO_PROVA"].value_counts()
Count by age group:
selected["TP_FAIXA_ETARIA"].value_counts()
Count by gender:
selected["TP_SEXO"].value_counts()
8. Data visualization with Matplotlib
import matplotlib.pyplot as plt
# Age distribution
selected["TP_FAIXA_ETARIA"].hist(bins=30)
plt.title("Age Distribution of ENEM Students")
plt.xlabel("Age Group")
plt.ylabel("Count")
plt.show()
# Gender distribution
selected["TP_SEXO"].hist()
plt.title("Gender Distribution of ENEM Students")
plt.xlabel("Gender")
plt.ylabel("Count")
plt.show()
⚠️ Note: ENEM only records binary gender (M/F). This limitation highlights the importance of public policies to include broader gender options.
Conclusion
This was a basic tutorial showing how to use open public data with Python and Google Colab.
We explored:
- How to load data from Google Drive,
- How to explore datasets with Pandas,
- And how to visualize results with Matplotlib.
The use of data analysis is essential for effective public policies. It allows governments, NGOs, and communities to make informed decisions aligned with real needs, while also evaluating results after implementation.
References
- Pandas Documentation
- Matplotlib Documentation
- INEP Open Data
- Free Data Science courses (Gov.br)
- ENAP Research on Data Science in Education
✨ Keep exploring, ask new questions, and share your insights. Data has the power to transform society!