- Privacy Policy
Home » Descriptive Analytics – Methods, Tools and Examples
Descriptive Analytics – Methods, Tools and Examples
Table of Contents
Descriptive Analytics
Definition:
Descriptive analytics focused on describing or summarizing raw data and making it interpretable. This type of analytics provides insight into what has happened in the past. It involves the analysis of historical data to identify patterns, trends, and insights. Descriptive analytics often uses visualization tools to represent the data in a way that is easy to interpret.
Descriptive Analytics in Research
Descriptive analytics plays a crucial role in research, helping investigators understand and describe the data collected in their studies. Here’s how descriptive analytics is typically used in a research setting:
- Descriptive Statistics: In research, descriptive analytics often takes the form of descriptive statistics . This includes calculating measures of central tendency (like mean, median, and mode), measures of dispersion (like range, variance, and standard deviation), and measures of frequency (like count, percent, and frequency). These calculations help researchers summarize and understand their data.
- Visualizing Data: Descriptive analytics also involves creating visual representations of data to better understand and communicate research findings . This might involve creating bar graphs, line graphs, pie charts, scatter plots, box plots, and other visualizations.
- Exploratory Data Analysis: Before conducting any formal statistical tests, researchers often conduct an exploratory data analysis, which is a form of descriptive analytics. This might involve looking at distributions of variables, checking for outliers, and exploring relationships between variables.
- Initial Findings: Descriptive analytics are often reported in the results section of a research study to provide readers with an overview of the data. For example, a researcher might report average scores, demographic breakdowns, or the percentage of participants who endorsed each response on a survey.
- Establishing Patterns and Relationships: Descriptive analytics helps in identifying patterns, trends, or relationships in the data, which can guide subsequent analysis or future research. For instance, researchers might look at the correlation between variables as a part of descriptive analytics.
Descriptive Analytics Techniques
Descriptive analytics involves a variety of techniques to summarize, interpret, and visualize historical data. Some commonly used techniques include:
Statistical Analysis
This includes basic statistical methods like mean, median, mode (central tendency), standard deviation, variance (dispersion), correlation, and regression (relationships between variables).
Data Aggregation
It is the process of compiling and summarizing data to obtain a general perspective. It can involve methods like sum, count, average, min, max, etc., often applied to a group of data.
Data Mining
This involves analyzing large volumes of data to discover patterns, trends, and insights. Techniques used in data mining can include clustering (grouping similar data), classification (assigning data into categories), association rules (finding relationships between variables), and anomaly detection (identifying outliers).
Data Visualization
This involves presenting data in a graphical or pictorial format to provide clear and easy understanding of the data patterns, trends, and insights. Common data visualization methods include bar charts, line graphs, pie charts, scatter plots, histograms, and more complex forms like heat maps and interactive dashboards.
This involves organizing data into informational summaries to monitor how different areas of a business are performing. Reports can be generated manually or automatically and can be presented in tables, graphs, or dashboards.
Cross-tabulation (or Pivot Tables)
It involves displaying the relationship between two or more variables in a tabular form. It can provide a deeper understanding of the data by allowing comparisons and revealing patterns and correlations that may not be readily apparent in raw data.
Descriptive Modeling
Some techniques use complex algorithms to interpret data. Examples include decision tree analysis, which provides a graphical representation of decision-making situations, and neural networks, which are used to identify correlations and patterns in large data sets.
Descriptive Analytics Tools
Some common Descriptive Analytics Tools are as follows:
Excel: Microsoft Excel is a widely used tool that can be used for simple descriptive analytics. It has powerful statistical and data visualization capabilities. Pivot tables are a particularly useful feature for summarizing and analyzing large data sets.
Tableau: Tableau is a data visualization tool that is used to represent data in a graphical or pictorial format. It can handle large data sets and allows for real-time data analysis.
Power BI: Power BI, another product from Microsoft, is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities.
QlikView: QlikView is a data visualization and discovery tool. It allows users to analyze data and use this data to support decision-making.
SAS: SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it.
SPSS: SPSS (Statistical Package for the Social Sciences) is a software package used for statistical analysis. It’s widely used in social sciences research but also in other industries.
Google Analytics: For web data, Google Analytics is a popular tool. It allows businesses to analyze in-depth detail about the visitors on their website, providing valuable insights that can help shape the success strategy of a business.
R and Python: Both are programming languages that have robust capabilities for statistical analysis and data visualization. With packages like pandas, matplotlib, seaborn in Python and ggplot2, dplyr in R, these languages are powerful tools for descriptive analytics.
Looker: Looker is a modern data platform that can take data from any database and let you start exploring and visualizing.
When to use Descriptive Analytics
Descriptive analytics forms the base of the data analysis workflow and is typically the first step in understanding your business or organization’s data. Here are some situations when you might use descriptive analytics:
Understanding Past Behavior: Descriptive analytics is essential for understanding what has happened in the past. If you need to understand past sales trends, customer behavior, or operational performance, descriptive analytics is the tool you’d use.
Reporting Key Metrics: Descriptive analytics is used to establish and report key performance indicators (KPIs). It can help in tracking and presenting these KPIs in dashboards or regular reports.
Identifying Patterns and Trends: If you need to identify patterns or trends in your data, descriptive analytics can provide these insights. This might include identifying seasonality in sales data, understanding peak operational times, or spotting trends in customer behavior.
Informing Business Decisions: The insights provided by descriptive analytics can inform business strategy and decision-making. By understanding what has happened in the past, you can make more informed decisions about what steps to take in the future.
Benchmarking Performance: Descriptive analytics can be used to compare current performance against historical data. This can be used for benchmarking and setting performance goals.
Auditing and Regulatory Compliance: In sectors where compliance and auditing are essential, descriptive analytics can provide the necessary data and trends over specific periods.
Initial Data Exploration: When you first acquire a dataset, descriptive analytics is useful to understand the structure of the data, the relationships between variables, and any apparent anomalies or outliers.
Examples of Descriptive Analytics
Examples of Descriptive Analytics are as follows:
Retail Industry: A retail company might use descriptive analytics to analyze sales data from the past year. They could break down sales by month to identify any seasonality trends. For example, they might find that sales increase in November and December due to holiday shopping. They could also break down sales by product to identify which items are the most popular. This analysis could inform their purchasing and stocking decisions for the next year. Additionally, data on customer demographics could be analyzed to understand who their primary customers are, guiding their marketing strategies.
Healthcare Industry: In healthcare, descriptive analytics could be used to analyze patient data over time. For instance, a hospital might analyze data on patient admissions to identify trends in admission rates. They might find that admissions for certain conditions are higher at certain times of the year. This could help them allocate resources more effectively. Also, analyzing patient outcomes data can help identify the most effective treatments or highlight areas where improvement is needed.
Finance Industry: A financial firm might use descriptive analytics to analyze historical market data. They could look at trends in stock prices, trading volume, or economic indicators to inform their investment decisions. For example, analyzing the price-earnings ratios of stocks in a certain sector over time could reveal patterns that suggest whether the sector is currently overvalued or undervalued. Similarly, credit card companies can analyze transaction data to detect any unusual patterns, which could be signs of fraud.
Advantages of Descriptive Analytics
Descriptive analytics plays a vital role in the world of data analysis, providing numerous advantages:
- Understanding the Past: Descriptive analytics provides an understanding of what has happened in the past, offering valuable context for future decision-making.
- Data Summarization: Descriptive analytics is used to simplify and summarize complex datasets, which can make the information more understandable and accessible.
- Identifying Patterns and Trends: With descriptive analytics, organizations can identify patterns, trends, and correlations in their data, which can provide valuable insights.
- Inform Decision-Making: The insights generated through descriptive analytics can inform strategic decisions and help organizations to react more quickly to events or changes in behavior.
- Basis for Further Analysis: Descriptive analytics lays the groundwork for further analytical activities. It’s the first necessary step before moving on to more advanced forms of analytics like predictive analytics (forecasting future events) or prescriptive analytics (advising on possible outcomes).
- Performance Evaluation: It allows organizations to evaluate their performance by comparing current results with past results, enabling them to see where improvements have been made and where further improvements can be targeted.
- Enhanced Reporting and Dashboards: Through the use of visualization techniques, descriptive analytics can improve the quality of reports and dashboards, making the data more understandable and easier to interpret for stakeholders at all levels of the organization.
- Immediate Value: Unlike some other types of analytics, descriptive analytics can provide immediate insights, as it doesn’t require complex models or deep analytical capabilities to provide value.
Disadvantages of Descriptive Analytics
While descriptive analytics offers numerous benefits, it also has certain limitations or disadvantages. Here are a few to consider:
- Limited to Past Data: Descriptive analytics primarily deals with historical data and provides insights about past events. It does not predict future events or trends and can’t help you understand possible future outcomes on its own.
- Lack of Deep Insights: While descriptive analytics helps in identifying what happened, it does not answer why it happened. For deeper insights, you would need to use diagnostic analytics, which analyzes data to understand the root cause of a particular outcome.
- Can Be Misleading: If not properly executed, descriptive analytics can sometimes lead to incorrect conclusions. For example, correlation does not imply causation, but descriptive analytics might tempt one to make such an inference.
- Data Quality Issues: The accuracy and usefulness of descriptive analytics are heavily reliant on the quality of the underlying data. If the data is incomplete, incorrect, or biased, the results of the descriptive analytics will be too.
- Over-reliance on Descriptive Analytics: Businesses may rely too much on descriptive analytics and not enough on predictive and prescriptive analytics. While understanding past and present data is important, it’s equally vital to forecast future trends and make data-driven decisions based on those predictions.
- Doesn’t Provide Actionable Insights: Descriptive analytics is used to interpret historical data and identify patterns and trends, but it doesn’t provide recommendations or courses of action. For that, prescriptive analytics is needed.
About the author
Muhammad Hassan
Researcher, Academic Writer, Web developer
You may also like
Big Data Analytics -Types, Tools and Methods
Digital Ethnography – Types, Methods and Examples
Diagnostic Analytics – Methods, Tools and...
Predictive Analytics – Techniques, Tools and...
Sentiment Analysis – Tools, Techniques and...
Emerging Research Methods – Types and Examples
An official website of the United States government
The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
- Browse Titles
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.
StatPearls [Internet].
Exploratory data analysis: frequencies, descriptive statistics, histograms, and boxplots.
Jacob Shreffler ; Martin R. Huecker .
Affiliations
Last Update: November 3, 2023 .
- Definition/Introduction
Researchers must utilize exploratory data techniques to present findings to a target audience and create appropriate graphs and figures. Researchers can determine if outliers exist, data are missing, and statistical assumptions will be upheld by understanding data. Additionally, it is essential to comprehend these data when describing them in conclusions of a paper, in a meeting with colleagues invested in the findings, or while reading others’ work.
- Issues of Concern
This comprehension begins with exploring these data through the outputs discussed in this article. Individuals who do not conduct research must still comprehend new studies, and knowledge of fundamentals in analyzing data and interpretation of histograms and boxplots facilitates the ability to appraise recent publications accurately. Without this familiarity, decisions could be implemented based on inaccurate delivery or interpretation of medical studies.
Frequencies and Descriptive Statistics
Effective presentation of study results, in presentation or manuscript form, typically starts with frequencies and descriptive statistics (ie, mean, medians, standard deviations). One can get a better sense of the variables by examining these data to determine whether a balanced and sufficient research design exists. Frequencies also inform on missing data and give a sense of outliers (will be discussed below).
Luckily, software programs are available to conduct exploratory data analysis. For this chapter, we will be examining the following research question.
RQ: Are there differences in drug life (length of effect) for Drug 23 based on the administration site?
A more precise hypothesis could be: Is drug 23 longer-lasting when administered via site A compared to site B?
To address this research question, exploratory data analysis is conducted. First, it is essential to start with the frequencies of the variables. To keep things simple, only variables of minutes (drug life effect) and administration site (A vs B) are included. See Image. Figure 1 for outputs for frequencies.
Figure 1 shows that the administration site appears to be a balanced design with 50 individuals in each group. The excerpt for minutes frequencies is the bottom portion of Figure 1 and shows how many cases fell into each time frame with the cumulative percent on the right-hand side. In examining Figure 1, one suspiciously low measurement (135) was observed, considering time variables. If a data point seems inaccurate, a researcher should find this case and confirm if this was an entry error. For the sake of this review, the authors state that this was an entry error and should have been entered 535 and not 135. Had the analysis occurred without checking this, the data analysis, results, and conclusions would have been invalid. When finding any entry errors and determining how groups are balanced, potential missing data is explored. If not responsibly evaluated, missing values can nullify results.
After replacing the incorrect 135 with 535, descriptive statistics, including the mean, median, mode, minimum/maximum scores, and standard deviation were examined. Output for the research example for the variable of minutes can be seen in Figure 2. Observe each variable to ensure that the mean seems reasonable and that the minimum and maximum are within an appropriate range based on medical competence or an available codebook. One assumption common in statistical analyses is a normal distribution. Image . Figure 2 shows that the mode differs from the mean and the median. We have visualization tools such as histograms to examine these scores for normality and outliers before making decisions.
Histograms are useful in assessing normality, as many statistical tests (eg, ANOVA and regression) assume the data have a normal distribution. When data deviate from a normal distribution, it is quantified using skewness and kurtosis. [1] Skewness occurs when one tail of the curve is longer. If the tail is lengthier on the left side of the curve (more cases on the higher values), this would be negatively skewed, whereas if the tail is longer on the right side, it would be positively skewed. Kurtosis is another facet of normality. Positive kurtosis occurs when the center has many values falling in the middle, whereas negative kurtosis occurs when there are very heavy tails. [2]
Additionally, histograms reveal outliers: data points either entered incorrectly or truly very different from the rest of the sample. When there are outliers, one must determine accuracy based on random chance or the error in the experiment and provide strong justification if the decision is to exclude them. [3] Outliers require attention to ensure the data analysis accurately reflects the majority of the data and is not influenced by extreme values; cleaning these outliers can result in better quality decision-making in clinical practice. [4] A common approach to determining if a variable is approximately normally distributed is converting values to z scores and determining if any scores are less than -3 or greater than 3. For a normal distribution, about 99% of scores should lie within three standard deviations of the mean. [5] Importantly, one should not automatically throw out any values outside of this range but consider it in corroboration with the other factors aforementioned. Outliers are relatively common, so when these are prevalent, one must assess the risks and benefits of exclusion. [6]
Image . Figure 3 provides examples of histograms. In Figure 3A, 2 possible outliers causing kurtosis are observed. If values within 3 standard deviations are used, the result in Figure 3B are observed. This histogram appears much closer to an approximately normal distribution with the kurtosis being treated. Remember, all evidence should be considered before eliminating outliers. When reporting outliers in scientific paper outputs, account for the number of outliers excluded and justify why they were excluded.
Boxplots can examine for outliers, assess the range of data, and show differences among groups. Boxplots provide a visual representation of ranges and medians, illustrating differences amongst groups, and are useful in various outlets, including evidence-based medicine. [7] Boxplots provide a picture of data distribution when there are numerous values, and all values cannot be displayed (ie, a scatterplot). [8] Figure 4 illustrates the differences between drug site administration and the length of drug life from the above example.
Image . Figure 4 shows differences with potential clinical impact. Had any outliers existed (data from the histogram were cleaned), they would appear outside the line endpoint. The red boxes represent the middle 50% of scores. The lines within each red box represent the median number of minutes within each administration site. The horizontal lines at the top and bottom of each line connected to the red box represent the 25th and 75th percentiles. In examining the difference boxplots, an overlap in minutes between 2 administration sites were observed: the approximate top 25 percent from site B had the same time noted as the bottom 25 percent at site A. Site B had a median minute amount under 525, whereas administration site A had a length greater than 550. If there were no differences in adverse reactions at site A, analysis of this figure provides evidence that healthcare providers should administer the drug via site A. Researchers could follow by testing a third administration site, site C. Image . Figure 5 shows what would happen if site C led to a longer drug life compared to site A.
Figure 5 displays the same site A data as Figure 4, but something looks different. The significant variance at site C makes site A’s variance appear smaller. In order words, patients who were administered the drug via site C had a larger range of scores. Thus, some patients experience a longer half-life when the drug is administered via site C than the median of site A; however, the broad range (lack of accuracy) and lower median should be the focus. The precision of minutes is much more compacted in site A. Therefore, the median is higher, and the range is more precise. One may conclude that this makes site A a more desirable site.
- Clinical Significance
Ultimately, by understanding basic exploratory data methods, medical researchers and consumers of research can make quality and data-informed decisions. These data-informed decisions will result in the ability to appraise the clinical significance of research outputs. By overlooking these fundamentals in statistics, critical errors in judgment can occur.
- Nursing, Allied Health, and Interprofessional Team Interventions
All interprofessional healthcare team members need to be at least familiar with, if not well-versed in, these statistical analyses so they can read and interpret study data and apply the data implications in their everyday practice. This approach allows all practitioners to remain abreast of the latest developments and provides valuable data for evidence-based medicine, ultimately leading to improved patient outcomes.
- Review Questions
- Access free multiple choice questions on this topic.
- Comment on this article.
Exploratory Data Analysis Figure 1 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD
Exploratory Data Analysis Figure 2 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD
Exploratory Data Analysis Figure 3 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD
Exploratory Data Analysis Figure 4 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD
Exploratory Data Analysis Figure 5 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD
Disclosure: Jacob Shreffler declares no relevant financial relationships with ineligible companies.
Disclosure: Martin Huecker declares no relevant financial relationships with ineligible companies.
This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.
- Cite this Page Shreffler J, Huecker MR. Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots. [Updated 2023 Nov 3]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.
In this Page
Bulk download.
- Bulk download StatPearls data from FTP
Related information
- PMC PubMed Central citations
- PubMed Links to PubMed
Similar articles in PubMed
- Contour boxplots: a method for characterizing uncertainty in feature sets from simulation ensembles. [IEEE Trans Vis Comput Graph. 2...] Contour boxplots: a method for characterizing uncertainty in feature sets from simulation ensembles. Whitaker RT, Mirzargar M, Kirby RM. IEEE Trans Vis Comput Graph. 2013 Dec; 19(12):2713-22.
- Review Univariate Outliers: A Conceptual Overview for the Nurse Researcher. [Can J Nurs Res. 2019] Review Univariate Outliers: A Conceptual Overview for the Nurse Researcher. Mowbray FI, Fox-Wasylyshyn SM, El-Masri MM. Can J Nurs Res. 2019 Mar; 51(1):31-37. Epub 2018 Jul 3.
- [Descriptive statistics]. [Rev Alerg Mex. 2016] [Descriptive statistics]. Rendón-Macías ME, Villasís-Keever MÁ, Miranda-Novales MG. Rev Alerg Mex. 2016 Oct-Dec; 63(4):397-407.
- An exploratory data analysis of electroencephalograms using the functional boxplots approach. [Front Neurosci. 2015] An exploratory data analysis of electroencephalograms using the functional boxplots approach. Ngo D, Sun Y, Genton MG, Wu J, Srinivasan R, Cramer SC, Ombao H. Front Neurosci. 2015; 9:282. Epub 2015 Aug 19.
- Review Graphics and statistics for cardiology: comparing categorical and continuous variables. [Heart. 2016] Review Graphics and statistics for cardiology: comparing categorical and continuous variables. Rice K, Lumley T. Heart. 2016 Mar; 102(5):349-55. Epub 2016 Jan 27.
Recent Activity
- Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and ... Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots - StatPearls
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back on
Connect with NLM
National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894
Web Policies FOIA HHS Vulnerability Disclosure
Help Accessibility Careers
IMAGES
VIDEO