close
    search Buscar

    Ancho de clase: explicación y ejemplos

    Quien soy
    Alejandra Rangel
    @alejandrarangel

    Valoración del artículo:

    Advertencia de contenido

    Ancho de clase: explicación y ejemplos

    The definition of class width is:

    “The class width is the difference between the upper or lower class limits of consecutive classes in a bin frequency table”.

    In this topic, we will discuss the class width from the following aspects:

    • What is the class width?
    • How to find the class width?
    • Class width formula.
    • Role of class width.
    • Practical questions.
    • Respuestas

     

    What is the class width?

    The class width is the difference between the upper or lower class limits of consecutive classes in a bin frequency table.



    The bin frequency table groups values into equal-sized bins or classes and each class includes a range of values.

    The frequency of each class is the number of data points it has.

    The boundaries of each class are called the lower-class limit and the upper-class limit, and the class width is the difference between the lower (or higher) limits of successive classes.

    All classes should have the same width.

    How to find the class width?

    We will go through an example for illustration.

    ejemplo 1

    The following is the age (in years) of 50 participants from a certain survey.

    What is the proper class width for a bin frequency table of this data?

    1. Determine the number of bins or classes you need.

    There are no hard rules about how many bins to pick, but there are some general guidelines:

    • Pick between 5 and 20 classes.
    • Make sure you have a few items in each bin. For example, if you have 40 data points, you can choose 5 bins (8 data points per category), but not 20 bins (which would give you only 2 data points per bin).
    • Use the mathematical formula to choose the number of classes.

    The formula is log(number of observations)/ log(2). You would round up the answer to the next integer.


    For this data, log(50)/log(2) = 5.6 will be rounded up to become 6, so the number of classes should be 6.

    1. Sort the data and subtract the minimum data value from the maximum data value to get the data range.

    35 35 37 38 40 42 42 43 43 46 47 48 48 48 48 50 52 53 54 54 54 56 56 56 56 57 58 58 60 62 62 63 64 65 66 67 68 68 69 70 70 70 70 70 70 70 71 72 73 74.

    In our age list, the minimum value is 35 and the maximum value is 74, so the data range = 74 – 35 = 39.

    1. Divide the data range in Step 2 by the number of classes you get in Step 1.

    Round the number you get up to a whole number to get the class width.

    Class width = 39 / 6 = 6.5. Rounded up to 7.

    1. Add the class width, 7, sequentially (6 times because we have 6 bins) to the minimum value to create the different 6 classes.

    35 + 7 = 42 so the first class is 35-42.

    42+7 = 49 so the next bin is 42-49.

    49+7 = 56, so the next bin is 49-56.

    56+7 = 63, so the next bin is 56-63.

    63+7 = 70, so the next bin is 63-70.

    70+7 = 77, so the next bin is 70-77.

    1. We draw a table of 2 columns. The first column carries the different classes of the data that we created in step 4.

    The second column contains the frequency of age values in each class.


    Vemos eso:

    • The age bin “35-42” contains the ages from 35 to 42.
    • The next age bin “42-49” contains the ages larger than 42 till 49, and so on.
    • The class width is 7 for any two consecutive classes.
    • For example, the first class is 35-42 with 35 as the lower limit and 42 as the upper limit. The next class is 42-49 with 42 as the lower limit and 49 as the upper limit. The class width = 42-35 = 49-42 = 7.
    • If you sum these frequencies, you will get 50 which is the total number of data. 7+8+10+7+14+4 = 50.

    We can then use this bin frequency table to plot a histogram of this data where we plot the data bins on a certain axis against their frequency on the other axis.

    We see that the most frequent bin is the 63-70 bin with 14 occurrences.

    We see also that the data is somewhat left-skewed.

    Class width formula

    From the above example, we see that the class width formula:

    class width = data range/number of classes = (maximum – minimum)/number of classes

    Role of class width

    By selecting the suitable class width according to the above guidelines, we can observe the data distribution.

    Selecting too tight or too wide class width can result in poor representation of data distribution.

    ejemplo 1

    The following bin frequency table is for the age (in years) of 21407 participants from a certain survey.


    The suitable number of classes = log(21407)/log(2) = 14.39 or 15.

    Data range = 89-18 = 71.

    class width = 71/15 = 4.7 or 5.

    and plot this bin frequency table as a histogram.

    We see that the most frequent bin is the 38-43 bin with 2154 occurrences.

    We see also that the data is somewhat right-skewed.

    If we use too tight class width as 2, we will get the following frequency table.

    We see that the frequency table becomes too long with more than 20 bins and hard to grasp to get the data distribution.

    If we plot this bin frequency table as a histogram.

    There are too many bins or classes and the data distribution is hard to see.

    If we use a too wide class width of 36, we will get the following frequency table.

    We see that the frequency table with 2 bins only, and hard to grasp to get the data distribution.

    If we plot this bin frequency table as a histogram.

    With only two bins, we have no idea about the data distribution.

    ejemplo 2

    The following bin frequency table is for the physical activity (in Kcal/week) of 2206 participants from a certain survey.

    The suitable number of classes = log(2206)/log(2) = 11.1 or 12.

    Data range = 5083.2-0 = 5083.2.

    class width = 5083.2/12 = 423.6 or 424.

    and plot this bin frequency table as a histogram.

    We see that the most frequent bin is the 0-424 bin with 1442 occurrences.

    We see also that the data is somewhat right-skewed.

    If we use too tight class width of 100, we will get the following frequency table.

    We see that the frequency table becomes too long with more than 20 bins and hard to interpret to get the data distribution.

    If we plot this bin frequency table as a histogram.

    There are too many bins or classes and the class width is hard to see.

    If we use too wide class width as 2600, we will get the following frequency table.

    We see that the frequency table is with 2 bins only, and hard to grasp to get the data distribution.

    If we plot this bin frequency table as a histogram.

    With only two bins, we have no idea about the data distribution.

    Preguntas practicas

    1. The following information is related to some price data.

    The number of observations = 53940.

    Minimum = $326.

    Maximum = $18823.

    What is the suitable class width for this data?

    1. The following information is related to some diamond weights.

    The number of observations = 53940.

    Minimum = 0.2 grams.

    Maximum = 5.01 grams.

    What is the suitable class width for this data?

    1. The following bin frequency table is for the wind speed of some storms (in knots).

    What is the most frequent bin?

    Is this data skewed data?

    1. The following is the bin frequency table for some Ozone measurements.

    Is the class width suitable for this data?

    Can you determine a more suitable number of classes for this data?

    1. The following is the bin frequency table for some solar radiation measurements.

    What is wrong with this table?

    Can you determine a more appropriate class width if you know that the data range is 327?

    respuestas

    1. The recommended number of bins or classes = log(53940)/log(2) = 15.7 rounded up to 16.

    The data range = 18823-326 = 18497.

    The class width = 18497/16 =  1156.062 rounded up to 1157.

    1. The recommended number of bins or classes = log(53940)/log(2) = 15.7 rounded up to 16.

    The data range = 5.01-0.2 = 4.81.

    The class width = 4.81/16 =  0.300625 rounded up to 0.31.

    1. The most frequent bin is “21-32” with 2258 occurrences.

    This data is right-skewed because it is clustered at small values and large values have a much lower frequency.

    1. There are only 3 classes while there should be 5-20 classes.

    The suitable number of classes = log(number of observations)/log(2) = log(83+28+5)/log(2) = 6.86 rounded up to 7.

    1. The bin frequency table has many empty bins at its end. These can be deleted to not confuse the reader and the table should be:

    The recommended number of bins or classes = log(34+37+66+9)/log(2) = 7.19 rounded up to 8.

    The data range = 327.

    The suitable class width = 327/8 =  40.88 rounded up to 41.



    Añade un comentario de Ancho de clase: explicación y ejemplos
    ¡Comentario enviado con éxito! Lo revisaremos en las próximas horas.