Marathon Data

Numbers That Look Like Text

We already explored the marathon registrations file in lesson 1. It looked clean, but pandas stored the registered_date column as plain text. You can't sort it or compute differences until you fix the type. pd.to_datetime(), astype(), and pd.to_numeric() fix that.


We explored the registrations file in the first lesson. It looked clean compared to the results. But 'looking clean' is not the same as having the right types. Let's check what pandas actually stored.

Let's check the registered_date column in our marathon registrations. The values look like dates, but check the dtype:

Python
Output

pd.to_datetime() converts a string column to a proper datetime type:

Python
Output

What will be the output?

Python

Once a column is datetime, the .dt accessor unlocks date and time properties like month, year, day, and weekday.

Now that registered_date is a real datetime, we can pull out parts of it. Let's extract the month and year:

Python
Output

What will be the output?

Python

astype() converts a column to any compatible type:

Python

In our marathon data, runner_id is stored as a number. But it's really an identifier: you'd never add two runner IDs together. Let's convert it:

Python
Output

What will be the output?

Python

Sometimes numeric columns contain invalid entries like 'N/A'. astype(int) would raise an error. Use pd.to_numeric(errors='coerce') instead.

errors='coerce' turns any value that cannot be converted into NaN:

Python
Output

What will be the output?

Python

What will be the output?

Python