Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

Visualize your data by splitting them into several bins

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

A histogram is a bar graph (visualization) that shows the occurrence of values in each of several bin ranges. Histograms provide a visualization of numerical data.
Frequency distributions visualize categorical (text) data.

Later on this page are steps to create a Histogram manually in macOS and Windows Excel 2016 and prior versions.

Throughout this page are “PROTIP” flags that highlight advice from experience not available elsewhere.

A dynamic Histogram

My favorite approach is to change a pre-defined spreadsheet which includes coding to provide a dynamic slider to control how many bins are shown:

Histogram-dynamic

  1. Click to download the Histogram-Dynamic.xls Excel file at:

    https://res.cloudinary.com/dcajqrroq/raw/upload/v1589759018/Histogram-Dynamic_re4yrn.xlsx

    It contains a pre-made histogram which you don’t have to construct from scratch. The Excel sheet is from/described here, based on Jon Peltier’s techniques:

  2. Click Save, OK, then open the file.

    histogram-freq-formula-1035x458.jpg

    Colored cells are where values and formulas are changed for alternate data.

    (This spreasheet contains only functions and no VBA macro code which trigger scary messages when loaded.)

  3. Drag the slider all the way right to see the most number of columns (10 columns), which is a good segmeentation for our base-10 numbering system.

    Data in the sample spreadsheet are ages in a human population, which has a known range of values. Cells C4 and C5 (0 and 120) display those minimum and maximum values of bars in the Histogram. BTW, this is why someone around 60 years old is called “Middle Aged”.

  4. Drag the slider all the way left to see the least number of columns (just 2 columns), which basically divides the population by half according to the range between 0 and 120, which is 60.

    Notice that when the slider is moved, the Bin Count and Bin Size (B5 and B6) changes, as well as data with the “Chart Data” box. As the Bin Count changes, different rows are used in the Chart Data box.

  5. Click on cell F5. Notice it displays the maximum values within the data range named. Cell F4 displays the minimum values in the same range.

    The “BinsArray” column contains the upper limit for each step in the Histogram.

    To display a Histogram of your own data, you would need to change the range name specified in those formulas and focumulas in the Frequency table.

    Substitute your own data

    The spreadsheet comes with an alternate set of data, modified from the Histogram-Dynamic.xls Excel file. To use it, click here to skip instructions below to insert your own data:

  6. In File, open your Excel file.
  7. Click the column heading of the entire column you want to copy.
  8. Press command+C to copy the data highlighted to your invisible Clipboard.

  9. Press command+~ to switch to the sample spreadsheet.

  10. Click the “Data” tab at the bottom.
  11. Click on cell heading “D”.

    We want to end up selecting a section of data spanning two columns (C and D).

  12. Press command+V to Paste from Clipboard. Wait until the spinner icon goes away.

    The first row contains column names.

  13. Click on cell D2 (under the heading) and press shift+command with down arrow to select all the data in that column. If there are blanks in the data, you would need to continue pressing to select the next group. Continue holding down shift+command with left arrow to select the (empty) C column as well.
  14. Manually write down somewhere the row number of the last cell.
  15. Press command+C to copy into Clipboard.

  16. Click on the Range Name in the Excel bar, and (if it doesn’t already exist) change it to

    Repo_mb

  17. Press Tab key to set it.

  18. Right-click within the selected cells to select “Define Name…” for a pop-up dialog.
  19. Click the “+” icon at the lower-left corner to create a name Excel obtained from the first row.

    histogram-names-445x203

  20. Double-click on the name and press command+C to copy the name to your Clipboard:

    Repo_Size_mb

  21. Click OK to dismiss the dialog.

  22. Press command+shift+S to save the file, changing the file name. Click Save.

    Change Histogram to alternate data

  23. Click the “Histogram” tab at the bottom.
  24. Click on cell F4. In the formula, double-click “tblData” and press command+V to paste in its place the “Repo_mb” Range Name.
  25. Click on cell F5. In the formula, double-click “tblData” and press command+V to paste in its place the “Repo_mb” Range Name.

    histogram-delete-row-207x206

  26. Click on cell C5 to change the maximum value included in the Histogram. PROTIP: Make the maximum value in the Histogram larger than the largest value in the population which is also easily divisible. For example, the maximum population value of 40,034 would have a Histogram max. value like 50,000.

  27. Click on cell C9 to change the Chart Title.
  28. Triple-click the text to select it all to type over your text.

    Select array for delete

  29. Click on cell E13 the first cell immediately under the Frequency heading. Notice its current formula:

    {=FREQUENCY(tblData[Age],C13:C22)}

    “tblData” is the Range Name.
    “[Age]” is the Column Name.
    “C13:C22” is the Bin Array.

    ”{ }” designates an array saved by pressing shift+control+return.

    On MacOS, to enable the key to actually control functions in Excel, go to System preferences > Mission Control, and disable shortcuts for Mission control.

  30. On macOS, press control+U to reveal the bin_array (“C13:C22”).

    PROTIP: Among Microsoft’s Rules is that if you try to change or delete cells in an array formula, you’ll see a “You cannot change part of an array” error.

  31. To delete the entire formula: click a cell within the array. In the Home tab, if there is “Editing”, click that. Click the icon to the right of “Find & Select”, “Go To Special”.

    histogram-gotospecial

  32. Select “Current Array”, then OK. Now you can press Delete</strong> on the keyboard.

    Rebuid the array

  33. Click the top cell of the Frequency array (E13).
  34. Hold the shift key and click the last cell of the Frequency array, which is one more row than the bins for the spillover row.

    NOTE: The spillover row’s value is not displayed in the Histogram (which is the reason why it exists).

  35. Click on the formula bar (to the right of the “fx” icon).
  36. Copy the formula below and paste it in the formula bar:

    =FREQUENCY(Repo_Size_mb,C13:C22)

    You know you have it right when the bins array is highlighted in blue:

    histogram-freq-formula-1035x458.jpg

  37. Press shift+command+return to save the array.

  38. Change “Age Group” to the X-axis name, such as “MB in repo”.

    Adjust ranges

    PROTIP: The special feature of the sample spreadsheet is that if there are a few extremely large numbers (outliers) at the leftmost or rightmost column that would distort the analysis about the remainder of the population, exclude them by changing the value in cells C4 and C5, which define the minimum and maximum values of data presented in the Histogram.

  39. If you prefer to remove extremely large (outlier) values permanently, right-click on F5 to select Copy of the value. Switch to the “Data” sheet and command+F to find that value. Right-click on the value to select “Delete…”. Select

    Extrangeous Range Names

  40. Type in a new name in the box under heading “Enter a name for the data range:”.
  41. Click OK to dismiss the dialog, which creates the name range.

    For rngCount, the range of cells contains:

    =OFFSET(‘C:/Users/Jon/Dropbox/Excel Campus/Posts/Dynamic Histogram/[Copy of Dynamic Histogram.xlsx]Histogram’!$K$13,1,0,’C:/Users/Jon/Dropbox/Excel Campus/Posts/Dynamic Histogram/[Copy of Dynamic Histogram.xlsx]Histogram’!$G$5,1)

    For rngGroups, the range of cells contains:

    =OFFSET(‘C:/Users/Jon/Dropbox/Excel Campus/Posts/Dynamic Histogram/[Copy of Dynamic Histogram.xlsx]Histogram’!$J$13,1,0,’C:/Users/Jon/Dropbox/Excel Campus/Posts/Dynamic Histogram/[Copy of Dynamic Histogram.xlsx]Histogram’!$G$5,1)

For more about dynamic named ranges using both the OFFSET and INDEX functions, see Mynda Treacy’s article.


Different versions of Excel have different procedures. Previous versions of Excel had a “Data Analysis Tookpak” which must be enabled in order for the FREQUENCY and other functions to be available.

Create a Histogram in Microsoft Excel 2016

Unlike previous versions, Excel 2016 has a easier way to create histograms with its “histogram maker” as a built-in chart.

  1. To begin, have the data you want to use in your histogram into a worksheet.

    If you opened a .csv file, save the file As a “Excel Worksheet (.xlsx)”.

  2. Click to highlight the entire dataset.
  3. Click the Insert tab.

  4. Click the Histogram icon or select Recommended Charts and scroll down to select the Histogram chart.

    histogram-insert

  5. Mouse over a number in the vertical axis until “Vertical Axis” appears, then right-click to select “Format Axis…” to open the Format Axis pane.
  6. Select Categories to display text categories.
  7. Select Bin Width to customize the size of each bin.
  8. Select Number of Bins to define a specific number of bins displayed.
  9. Choose Overflow Bin or Underflow Bin to group above or below a specific number.
  10. Close the Format Axis pane when you are done customizing the histogram.*

    histogram-cum

  11. Save the file.

Category Frequency Distribution in Microsoft Excel 2013


This video is part of a series on statistics using Excel. Begin from 1:38

Get a unique list of values

In the Raw Data sheet:

  1. Press Ctrl+Home to get to the upper left of the Raw Data sheet.
  2. Click the heading to the column you want to analyze.
  3. Press Ctrl+Shift+down to highlight all the cells of the column.

  4. PROTIP: Specify a range name (such as “Priority”, etc.) so you can refer to the same range in several functions.

  5. Click Data ribbon, Advanced, Copy to another location, Unique records only, Copy to icon.
  6. Scroll right beyond the last column in the sheet and click a cell there.
  7. Click the Copy icon again, then OK.

    excel filtered unique 207x207

  8. Scroll back to the right where you specified.
  9. You may need to clean up values in entries.

    • A trailing space counts as a separate value.

    • Make sure there are no blanks in the data.

  10. Press Ctrl+H to do a Replace All on the errant values to fix them.
  11. Delete the generated cells and
  12. Repeat the above until there are no duplicates.

  13. Sort or manually rearrange the order of items (if you have categories that don’t sort, such as “Very High”, “High”, “Medium”, “Low”).

    Make the Frequency Distribution

  14. Create Frequency, and Interval columns to the right of the unique list created.

  15. In the first data cell under the Frequency heading, type a formula =COUNTIFS()
  16. Click the first data cell of the category data being analyzed.
  17. Press Ctrl+Shift+down to select all rows.
  18. Scroll back to the distribution being built.
  19. Press command to specify another parameter.
  20. Click on the first cell of the unique items (the Criteria).
  21. Press ) to close the formula.
  22. Press Ctrl+Enter to save the formula.
  23. Double-click on the lower-right corner to populate the rest

    Calculate total and percentages

  24. Add a total in a blank row under data rows in the Frequency column.
  25. Format the total cells with borders top and bottom.
  26. Click the cell holding the total and press Alt+=.

  27. PROTIP: Verify that the total matches the number of rows in the data being analyzed.

  28. Create a percentage formula. Put a $ in front of the row number.
  29. Double-click on the lower-right corner of the first formula to populate the rest of the rows.
  30. Highlight the Percentage cells and format it as Percents.

    Bar Chart

  31. Highlight the category data and percent, including the headings.
  32. Click Insert tab, Stacked bar.
  33. Right-click the categories to select Format Axis.
  34. Check Categories in reverse order and close the pane.
  35. Click on the title and change it to “Priority distribution” or whatever.

    Add totals

  36. A total at the bottom of counts makes it easier to verify whether you

    Another Category Distribution

  37. Repeat the above for another distribution, if desired.

Create a Histogram in Microsoft Excel 2013

  • https://www.youtube.com/watch?v=Giewd9yH4q0

  • https://www.youtube.com/watch?v=YfVu7xGHgnA

  • https://www.youtube.com/watch?v=SDUgEuFrJ3o

  1. Drag the lower-right corner to populate the other rows under Bin.

    • Leave the last row empty so Excel uses it for anything larger than the last number.
  2. Click in the first data cell under Frequency heading, and drag the lower-right corner to populate the other rows until the last Interval row.

  3. Press Ctrl+Shift+Enter to auto-populate (as an array).

  4. In the first data cell under the Frequency heading, type =frequency(.

  5. Click the first data cell under the value column (C) as the first parameter.

  6. Press comma.

  7. Click the first data cell under the Bin heading (K), then click on the last cell in the column.

    • To allow for easier…
  8. Press Ctrl+Shift+Enter.

    Adjust default gaps

  9. Double-click on one of the columns in the generated chart for the Series Options UI.

  10. Drag Gap Width to zero.

References

  • https://www.youtube.com/watch?v=53DOu_vstvI
  • https://www.youtube.com/watch?v=IVhQTAF1guc

  • https://www.exceltip.com/tips/histograms-in-excel.html

  • http://www.agentjim.com/MVP/Excel/2016HistogramMac.html

  • https://www.exceltip.com/advanced-data-visualization-in-excel/4-creative-target-vs-achievement-charts-in-excel.html

https://www.youtube.com/watch?v=UASCe-3Y1to How To… Plot a Normal Frequency Distribution Histogram in Excel 2010 by Eugene O’Loughlin