Counting and Categorizing Values

The previous lesson revealed the dataset’s shape and types. Next question: how are the values distributed? If there are 13 industries, are they evenly spread, or does one dominate?

.value_counts() counts how often each value appears in a column:

import pandas as pd
 
df = pd.DataFrame({
    'Color': ['red', 'blue', 'red',
              'green', 'blue', 'red']
})
print(df['Color'].value_counts())

Python

Output

The result is a Series: unique values as the index, counts as the values, sorted from most to least frequent.

What will be the output?

import pandas as pd
s = pd.Series(
    ['a', 'b', 'a', 'a']
)
vc = s.value_counts()
print(vc['a'])

Python

Only care about the top entries? Chain .head(n) to slice off the rest:

import pandas as pd
 
df = pd.DataFrame({
    'Color': ['red', 'blue', 'red',
              'green', 'blue', 'red']
})
counts = df['Color'].value_counts()
print(counts.head(2))

Python

Output

What will be the output?

import pandas as pd
s = pd.Series([
    'x','y','z','x','y','x'
])
vc = s.value_counts().head(2)
print(len(vc))

Python

To get just the name of the top value, not the count, use .index[0]:

import pandas as pd
 
df = pd.DataFrame({
    'Color': ['red', 'blue', 'red',
              'green', 'blue', 'red']
})
vc = df['Color'].value_counts()
print(vc.index[0])

Python

Output

Since .value_counts() sorts by frequency, .index[0] is always the most common value, and .index[-1] the rarest.

What will be the output?

import pandas as pd
s = pd.Series([
    'cat','dog','cat','bird'
])
vc = s.value_counts()
print(vc.index[0])

Python

Because the results are sorted high-to-low, .tail(n) shows the least frequent values. Useful for spotting underrepresented categories.

For example, the rarest color in this dataset:

import pandas as pd
 
df = pd.DataFrame({
    'Color': ['red', 'blue', 'red',
              'green', 'blue', 'red']
})
vc = df['Color'].value_counts()
print(vc.tail(1))

Python

Output

What will be the output?

import pandas as pd
s = pd.Series(
    ['b', 'a', 'a', 'b', 'b']
)
vc = s.value_counts()
print(vc.index.tolist())

Python