Data Analysis Fundamentals

Visualizing Analysis Results

In the Matplotlib series, charts were standalone exercises. Here, the data comes from real analysis. value_counts(), groupby(), and DataFrame columns feed directly into plots.

The Matplotlib series covered plotting as a standalone skill. Here, the data comes from real analysis. A value_counts() result feeds a bar chart, a groupby() mean becomes a comparison.

Every pandas Series already has an index (labels) and values (numbers). That is exactly what matplotlib needs:

import pandas as pd
 
sales = pd.Series(
    [120, 85, 200],
    index=['Mon', 'Wed', 'Fri']
)
print(sales.index.tolist())
print(sales.values.tolist())

Python

Output

That index and values pair plugs straight into plt.bar():

import pandas as pd
import matplotlib.pyplot as plt
 
sales = pd.Series(
    [120, 85, 200],
    index=['Mon', 'Wed', 'Fri']
)
plt.bar(sales.index, sales.values)
plt.title('Sales by Day')
plt.show()

Python

Long category names overlap on the x-axis. Switching to plt.barh() for horizontal bars solves this instantly.

Same arguments as plt.bar(), just a horizontal layout:

import pandas as pd
import matplotlib.pyplot as plt
 
sales = pd.Series(
    [120, 85, 200],
    index=['Mon', 'Wed', 'Fri']
)
plt.barh(
    sales.index, sales.values
)
plt.title('Sales by Day')
plt.show()

Python

Large numbers like 3500000 display as 3.5e6 on axes, which is hard to read. Dividing by 1_000_000 before plotting and labeling the axis as "Million USD" fixes this.

What will be the output?

import pandas as pd
s = pd.Series(
    [3, 1, 2],
    index=['A', 'B', 'C']
)
print(s.index.tolist())

Python

Remember value_counts() from the earlier lesson? Its result is already a Series: categories as the index, counts as the values:

import pandas as pd
 
data = {'Color': [
    'Red','Blue','Red',
    'Green','Blue','Red'
]}
df = pd.DataFrame(data)
vc = df['Color'].value_counts()
print(vc)

Python

Output

That value_counts() Series feeds directly into plt.bar(). No reshaping needed:

import pandas as pd
import matplotlib.pyplot as plt
 
data = {'Color': [
    'Red','Blue','Red',
    'Green','Blue','Red'
]}
df = pd.DataFrame(data)
vc = df['Color'].value_counts()
 
plt.bar(vc.index, vc.values)
plt.title('Color Frequency')
plt.show()

Python

The same pattern works for groupby() results. Aggregated data is already a plottable Series:

import pandas as pd
import matplotlib.pyplot as plt
 
data = {'Team': ['A','B','A','B'],
        'Score': [80, 60, 90, 70]}
df = pd.DataFrame(data)
avg = df.groupby('Team')
avg = avg['Score'].mean()
 
plt.bar(avg.index, avg.values)
plt.title('Avg Score by Team')
plt.show()

Python

What will be the output?

import pandas as pd
s = pd.Series(
    [10, 20, 30, 40, 50]
)
print(s.max() - s.min())

Python

To check whether two numeric columns are related, pass them to plt.scatter():

import pandas as pd
import matplotlib.pyplot as plt
 
data = {'Hours': [2, 4, 6, 8],
        'Score': [50, 60, 75, 90]}
df = pd.DataFrame(data)
 
plt.scatter(df['Hours'],
            df['Score'])
plt.xlabel('Hours')
plt.ylabel('Score')
plt.show()

Python

Choosing the right chart: bar chart → compare categories, histogram → see how values are spread, scatter plot → spot relationships between two numbers.

What will be the output?

import pandas as pd
df = pd.DataFrame({
    'Team': ['A','B','A','B'],
    'Score': [80, 60, 90, 70]
})
g = df.groupby('Team')
r = g['Score'].mean()
print(r['B'])

Python

What will be the output?

import pandas as pd
s = pd.Series(
    [50, 30],
    index=['Team A', 'Team B']
)
print(s['Team A'])

Python

What will be the output?

import pandas as pd
s = pd.Series(
    ['a', 'b', 'a', 'c']
)
vc = s.value_counts()
print(vc.index[0], vc.values[0])

Python