Data Analysis Fundamentals

Counting and Categorizing Values

The dataset has 13 different industries. Are they evenly spread, or does one dominate? .value_counts() answers that in a single line.


The previous lesson revealed the dataset’s shape and types. Next question: how are the values distributed? If there are 13 industries, are they evenly spread, or does one dominate?

.value_counts() counts how often each value appears in a column:

Python
Output

The result is a Series: unique values as the index, counts as the values, sorted from most to least frequent.


What will be the output?

Python

Only care about the top entries? Chain .head(n) to slice off the rest:

Python
Output

What will be the output?

Python

To get just the name of the top value, not the count, use .index[0]:

Python
Output

Since .value_counts() sorts by frequency, .index[0] is always the most common value, and .index[-1] the rarest.


What will be the output?

Python

Because the results are sorted high-to-low, .tail(n) shows the least frequent values. Useful for spotting underrepresented categories.

For example, the rarest color in this dataset:

Python
Output

What will be the output?

Python