Numbers That Look Like Text
We already explored the marathon registrations file in lesson 1. It looked clean, but pandas stored the registered_date column as plain text. You can't sort it or compute differences until you fix the type. pd.to_datetime(), astype(), and pd.to_numeric() fix that.
We explored the registrations file in the first lesson. It looked clean compared to the results. But 'looking clean' is not the same as having the right types. Let's check what pandas actually stored.
Let's check the registered_date column in our marathon registrations. The values look like dates, but check the dtype:
pd.to_datetime() converts a string column to a proper datetime type:
What will be the output?
Once a column is datetime, the .dt accessor unlocks date and time properties like month, year, day, and weekday.
Now that registered_date is a real datetime, we can pull out parts of it. Let's extract the month and year:
What will be the output?
astype() converts a column to any compatible type:
In our marathon data, runner_id is stored as a number. But it's really an identifier: you'd never add two runner IDs together. Let's convert it:
What will be the output?
Sometimes numeric columns contain invalid entries like 'N/A'. astype(int) would raise an error. Use pd.to_numeric(errors='coerce') instead.
errors='coerce' turns any value that cannot be converted into NaN:
What will be the output?
What will be the output?