The definition of class width is:

**“The class width is the difference between the upper or lower class limits of consecutive classes in a bin frequency table”.**

In this topic, we will discuss the class width from the following aspects:

- name="what-is-the-class-width-">What is the class width?
- name="how-to-find-the-class-width-">How to find the class width?
- name="class-width-formula">Class width formula.
- name="role-of-class-width">Role of class width.
- Practical questions.
- Respuestas

## name="-nbsp-">

## name="what-is-the-class-width-">What is the class width?

The class width is the difference between the upper or lower class limits of consecutive classes in a bin frequency table.

The bin frequency table groups values into equal-sized bins or classes and each class includes a range of values.

The frequency of each class is the number of data points it has.

The boundaries of each class are called the lower-class limit and the upper-class limit, and the class width is the difference between the lower (or higher) limits of successive classes.

All classes should have the same width.

## name="how-to-find-the-class-width-">How to find the class width?

We will go through an example for illustration.

The following is the age (in years) of 50 participants from a certain survey.

What is the proper class width for a bin frequency table of this data?

There are no hard rules about how many bins to pick, but there are some general guidelines:

- Pick between 5 and 20 classes.
- Make sure you have a few items in each bin. For example, if you have 40 data points, you can choose 5 bins (8 data points per category), but not 20 bins (which would give you only 2 data points per bin).
- Use the mathematical formula to choose the number of classes.

The formula is log(number of observations)/ log(2). You would round up the answer to the next integer.

For this data, log(50)/log(2) = 5.6 will be rounded up to become 6, so the number of classes should be 6.

- Sort the data and subtract the minimum data value from the maximum data value to get the data range.

35 35 37 38 40 42 42 43 43 46 47 48 48 48 48 50 52 53 54 54 54 56 56 56 56 57 58 58 60 62 62 63 64 65 66 67 68 68 69 70 70 70 70 70 70 70 71 72 73 74.

In our age list, the minimum value is 35 and the maximum value is 74, so the data range = 74 – 35 = 39.

Round the number you get up to a whole number to get the class width.

Class width = 39 / 6 = 6.5. Rounded up to 7.

- Add the class width, 7, sequentially (6 times because we have 6 bins) to the minimum value to create the different 6 classes.

35 + 7 = 42 so the first class is 35-42.

42+7 = 49 so the next bin is 42-49.

49+7 = 56, so the next bin is 49-56.

56+7 = 63, so the next bin is 56-63.

63+7 = 70, so the next bin is 63-70.

70+7 = 77, so the next bin is 70-77.

- We draw a table of 2 columns. The first column carries the different classes of the data that we created in step 4.

The second column contains the frequency of age values in each class.

- The age bin “35-42” contains the ages from 35 to 42.
- The next age bin “42-49” contains the ages larger than 42 till 49, and so on.
- The class width is 7 for any two consecutive classes.
- For example, the first class is 35-42 with 35 as the lower limit and 42 as the upper limit. The next class is 42-49 with 42 as the lower limit and 49 as the upper limit. The class width = 42-35 = 49-42 = 7.
- If you sum these frequencies, you will get 50 which is the total number of data. 7+8+10+7+14+4 = 50.

We can then use this bin frequency table to plot a histogram of this data where we plot the data bins on a certain axis against their frequency on the other axis.

We see that the most frequent bin is the 63-70 bin with 14 occurrences.

We see also that the data is somewhat left-skewed.

### name="class-width-formula">Class width formula

From the above example, we see that the class width formula:

class width = data range/number of classes = (maximum – minimum)/number of classes

### name="role-of-class-width">Role of class width

By selecting the suitable class width according to the above guidelines, we can observe the data distribution.

Selecting too tight or too wide class width can result in poor representation of data distribution.

The following bin frequency table is for the age (in years) of 21407 participants from a certain survey.

The suitable number of classes = log(21407)/log(2) = 14.39 or 15.

class width = 71/15 = 4.7 or 5.

and plot this bin frequency table as a histogram.

We see that the most frequent bin is the 38-43 bin with 2154 occurrences.

We see also that the data is somewhat right-skewed.

If we use too tight class width as 2, we will get the following frequency table.

We see that the frequency table becomes too long with more than 20 bins and hard to grasp to get the data distribution.

If we plot this bin frequency table as a histogram.

There are too many bins or classes and the data distribution is hard to see.

If we use a too wide class width of 36, we will get the following frequency table.

We see that the frequency table with 2 bins only, and hard to grasp to get the data distribution.

If we plot this bin frequency table as a histogram.

With only two bins, we have no idea about the data distribution.

The following bin frequency table is for the physical activity (in Kcal/week) of 2206 participants from a certain survey.

The suitable number of classes = log(2206)/log(2) = 11.1 or 12.

Data range = 5083.2-0 = 5083.2.

class width = 5083.2/12 = 423.6 or 424.

and plot this bin frequency table as a histogram.

We see that the most frequent bin is the 0-424 bin with 1442 occurrences.

We see also that the data is somewhat right-skewed.

If we use too tight class width of 100, we will get the following frequency table.

We see that the frequency table becomes too long with more than 20 bins and hard to interpret to get the data distribution.

If we plot this bin frequency table as a histogram.

There are too many bins or classes and the class width is hard to see.

If we use too wide class width as 2600, we will get the following frequency table.

We see that the frequency table is with 2 bins only, and hard to grasp to get the data distribution.

If we plot this bin frequency table as a histogram.

src="/images/posts/5d958f169ad8abab7d5200a6806cd8b3-0.jpg">

With only two bins, we have no idea about the data distribution.

** name="preguntas-practicas">Preguntas practicas**

The number of observations = 53940.

What is the suitable class width for this data?

The number of observations = 53940.

What is the suitable class width for this data?

What is the most frequent bin?

Is the class width suitable for this data?

Can you determine a more suitable number of classes for this data?

What is wrong with this table?

Can you determine a more appropriate class width if you know that the data range is 327?

** name="respuestas">respuestas**

The data range = 18823-326 = 18497.

The class width = 18497/16 = name="-nbsp-"> 1156.062 rounded up to 1157.

The data range = 5.01-0.2 = 4.81.

The class width = 4.81/16 = name="-nbsp-"> 0.300625 rounded up to 0.31.

This data is right-skewed because it is clustered at small values and large values have a much lower frequency.

The suitable number of classes = log(number of observations)/log(2) = log(83+28+5)/log(2) = 6.86 rounded up to 7.

- The bin frequency table has many empty bins at its end. These can be deleted to not confuse the reader and the table should be:

The recommended number of bins or classes = log(34+37+66+9)/log(2) = 7.19 rounded up to 8.

The suitable class width = 327/8 = name="-nbsp-"> 40.88 rounded up to 41.