Counting and Categorizing Values
The dataset has 13 different industries. Are they evenly spread, or does one dominate? .value_counts() answers that in a single line.
The previous lesson revealed the dataset’s shape and types. Next question: how are the values distributed? If there are 13 industries, are they evenly spread, or does one dominate?
.value_counts() counts how often each value appears in a column:
The result is a Series: unique values as the index, counts as the values, sorted from most to least frequent.
What will be the output?
Only care about the top entries? Chain .head(n) to slice off the rest:
What will be the output?
To get just the name of the top value, not the count, use .index[0]:
Since .value_counts() sorts by frequency, .index[0] is always the most common value, and .index[-1] the rarest.
What will be the output?
Because the results are sorted high-to-low, .tail(n) shows the least frequent values. Useful for spotting underrepresented categories.
For example, the rarest color in this dataset:
What will be the output?