Marathon Data

Two Files, One Picture

Over the last four lessons, we cleaned both marathon files: renamed columns, removed duplicates, fixed text, and converted types. Both share a runner_id column. Now it's time to combine them. pd.concat() stacks DataFrames. merge() joins them side by side on a shared key.


We've spent the last four lessons getting both marathon files into shape: renaming columns, removing duplicates, fixing text, converting types. Both DataFrames share a runner_id column. Now we can use that to combine them into one table.

pd.concat() stacks DataFrames with the same columns on top of each other:

Python
Output

What will be the output?

Python

Imagine the marathon results got split across two files during export. pd.concat() puts them back together:

Python
Output

pd.concat() stacks rows. merge() joins two DataFrames side by side using a shared column as the key.

merge() syntax with on= and how= parameters:

Python

An inner merge keeps only rows where the key appears in both DataFrames:

Python
Output

What will be the output?

Python

Let's bring our two marathon files together. After deduplicating the results, merge them with the registrations on runner_id:

Python
Output

how='left' keeps all rows from the left DataFrame even when no match exists in the right. Missing values become NaN.

A left merge keeps every result row, even runners without a registration match:

Python
Output

What will be the output?

Python

What will be the output?

Python

What will be the output?

Python