Two Files, One Picture
Over the last four lessons, we cleaned both marathon files: renamed columns, removed duplicates, fixed text, and converted types. Both share a runner_id column. Now it's time to combine them. pd.concat() stacks DataFrames. merge() joins them side by side on a shared key.
We've spent the last four lessons getting both marathon files into shape: renaming columns, removing duplicates, fixing text, converting types. Both DataFrames share a runner_id column. Now we can use that to combine them into one table.
pd.concat() stacks DataFrames with the same columns on top of each other:
What will be the output?
Imagine the marathon results got split across two files during export. pd.concat() puts them back together:
pd.concat() stacks rows. merge() joins two DataFrames side by side using a shared column as the key.
merge() syntax with on= and how= parameters:
An inner merge keeps only rows where the key appears in both DataFrames:
What will be the output?
Let's bring our two marathon files together. After deduplicating the results, merge them with the registrations on runner_id:
how='left' keeps all rows from the left DataFrame even when no match exists in the right. Missing values become NaN.
A left merge keeps every result row, even runners without a registration match:
What will be the output?
What will be the output?
What will be the output?