Research Questions & Hypotheses

Steps in Quantitative Research video. Step 12 should say "Dissemination" (sharing the results).

A simple hypothesis only predicts the relationship between one independent and another independent variable.  

Example: “ Applying sunscreen every day slows skin aging .”  

6 . Complex hypothesis :

A complex hypothesis states the relationship or difference between two or more independent and dependent variables.   

Example: “ Applying sunscreen every day slows skin aging, reduces sun burn, and reduces the chances of skin cancer .” (Here, the three dependent variables are slowing skin aging, reducing sun burn, and reducing the chances of skin cancer.)  

7. Associative hypothesis:  

An associative hypothesis states that a change in one variable results in the change of the other variable. The associative hypothesis defines interdependency between variables.  

Example: “ There is a positive association between physical activity levels and overall health .”  

8 . Causal hypothesis:

A causal hypothesis proposes a cause-and-effect interaction between variables.  

Example: “ Long-term alcohol use causes liver damage .”  

Note that some of the types of research hypothesis mentioned above might overlap. The types of hypothesis chosen will depend on the research question and the objective of the study.  

all quantitative research must be hypothesis driven

Research hypothesis examples  

Here are some good research hypothesis examples :  

“The use of a specific type of therapy will lead to a reduction in symptoms of depression in individuals with a history of major depressive disorder.”  

“Providing educational interventions on healthy eating habits will result in weight loss in overweight individuals.”  

“Plants that are exposed to certain types of music will grow taller than those that are not exposed to music.”  

“The use of the plant growth regulator X will lead to an increase in the number of flowers produced by plants.”  

Characteristics that make a research hypothesis weak are unclear variables, unoriginality, being too general or too vague, and being untestable. A weak hypothesis leads to weak research and improper methods.   

Some bad research hypothesis examples (and the reasons why they are “bad”) are as follows:  

“This study will show that treatment X is better than any other treatment . ” (This statement is not testable, too broad, and does not consider other treatments that may be effective.)  

“This study will prove that this type of therapy is effective for all mental disorders . ” (This statement is too broad and not testable as mental disorders are complex and different disorders may respond differently to different types of therapy.)  

“Plants can communicate with each other through telepathy . ” (This statement is not testable and lacks a scientific basis.)  

Importance of testable hypothesis  

If a research hypothesis is not testable, the results will not prove or disprove anything meaningful. The conclusions will be vague at best. A testable hypothesis helps a researcher focus on the study outcome and understand the implication of the question and the different variables involved. A testable hypothesis helps a researcher make precise predictions based on prior research.  

To be considered testable, there must be a way to prove that the hypothesis is true or false; further, the results of the hypothesis must be reproducible.  

Research hypothesis: What it is, how to write it, types, and examples

Frequently Asked Questions (FAQs) on research hypothesis  

1. What is the difference between research question and research hypothesis ?  

A research question defines the problem and helps outline the study objective(s). It is an open-ended statement that is exploratory or probing in nature. Therefore, it does not make predictions or assumptions. It helps a researcher identify what information to collect. A research hypothesis , however, is a specific, testable prediction about the relationship between variables. Accordingly, it guides the study design and data analysis approach.

2. When to reject null hypothesis ?

A null hypothesis should be rejected when the evidence from a statistical test shows that it is unlikely to be true. This happens when the test statistic (e.g., p -value) is less than the defined significance level (e.g., 0.05). Rejecting the null hypothesis does not necessarily mean that the alternative hypothesis is true; it simply means that the evidence found is not compatible with the null hypothesis.  

3. How can I be sure my hypothesis is testable?  

A testable hypothesis should be specific and measurable, and it should state a clear relationship between variables that can be tested with data. To ensure that your hypothesis is testable, consider the following:  

  • Clearly define the key variables in your hypothesis. You should be able to measure and manipulate these variables in a way that allows you to test the hypothesis.  
  • The hypothesis should predict a specific outcome or relationship between variables that can be measured or quantified.   
  • You should be able to collect the necessary data within the constraints of your study.  
  • It should be possible for other researchers to replicate your study, using the same methods and variables.   
  • Your hypothesis should be testable by using appropriate statistical analysis techniques, so you can draw conclusions, and make inferences about the population from the sample data.  
  • The hypothesis should be able to be disproven or rejected through the collection of data.  

4. How do I revise my research hypothesis if my data does not support it?  

If your data does not support your research hypothesis , you will need to revise it or develop a new one. You should examine your data carefully and identify any patterns or anomalies, re-examine your research question, and/or revisit your theory to look for any alternative explanations for your results. Based on your review of the data, literature, and theories, modify your research hypothesis to better align it with the results you obtained. Use your revised hypothesis to guide your research design and data collection. It is important to remain objective throughout the process.  

5. I am performing exploratory research. Do I need to formulate a research hypothesis?  

As opposed to “confirmatory” research, where a researcher has some idea about the relationship between the variables under investigation, exploratory research (or hypothesis-generating research) looks into a completely new topic about which limited information is available. Therefore, the researcher will not have any prior hypotheses. In such cases, a researcher will need to develop a post-hoc hypothesis. A post-hoc research hypothesis is generated after these results are known.  

6. How is a research hypothesis different from a research question?

A research question is an inquiry about a specific topic or phenomenon, typically expressed as a question. It seeks to explore and understand a particular aspect of the research subject. In contrast, a research hypothesis is a specific statement or prediction that suggests an expected relationship between variables. It is formulated based on existing knowledge or theories and guides the research design and data analysis.

7. Can a research hypothesis change during the research process?

Yes, research hypotheses can change during the research process. As researchers collect and analyze data, new insights and information may emerge that require modification or refinement of the initial hypotheses. This can be due to unexpected findings, limitations in the original hypotheses, or the need to explore additional dimensions of the research topic. Flexibility is crucial in research, allowing for adaptation and adjustment of hypotheses to align with the evolving understanding of the subject matter.

8. How many hypotheses should be included in a research study?

The number of research hypotheses in a research study varies depending on the nature and scope of the research. It is not necessary to have multiple hypotheses in every study. Some studies may have only one primary hypothesis, while others may have several related hypotheses. The number of hypotheses should be determined based on the research objectives, research questions, and the complexity of the research topic. It is important to ensure that the hypotheses are focused, testable, and directly related to the research aims.

9. Can research hypotheses be used in qualitative research?

Yes, research hypotheses can be used in qualitative research, although they are more commonly associated with quantitative research. In qualitative research, hypotheses may be formulated as tentative or exploratory statements that guide the investigation. Instead of testing hypotheses through statistical analysis, qualitative researchers may use the hypotheses to guide data collection and analysis, seeking to uncover patterns, themes, or relationships within the qualitative data. The emphasis in qualitative research is often on generating insights and understanding rather than confirming or rejecting specific research hypotheses through statistical testing.

Researcher.Life is a subscription-based platform that unifies the best AI tools and services designed to speed up, simplify, and streamline every step of a researcher's journey.  

Based on 21+ years of experience in academia, Researcher.Life All Access empowers researchers to put their best research forward and move closer to success. Explore our top AI Tools pack, AI Tools + Publication Services pack, or Build Your Own Plan. Find everything a researcher needs to succeed, all in one place –  Get All Access now starting at just $17 a month !    

Testing Hypotheses

  • What is a hypothesis?
  • Significance testing
  • One-tailed or two-tailed?
  • Degrees of freedom

A hypothesis is a statement that we are trying to prove or disprove. It is used to express the relationship between variables  and whether this relationship is significant. It is specific and offers a prediction on the results of your research question.

Your research question  will lead you to developing a hypothesis, this is why your research question needs to be specific and clear.

The hypothesis will then guide you to the most appropriate techniques you should use to answer the question. They reflect the literature and theories on which you basing them. They need to be testable (i.e. measurable and practical).

Null hypothesis  (H 0 ) is the proposition that there will not be a relationship between the variables you are looking at. i.e. any differences are due to chance). They always refer to the population. (Usually we don't believe this to be true.)

e.g. There is  no difference in instances of illegal drug use by teenagers who are members of a gang and those who are not..

Alternative hypothesis  (H A ) or ( H 1 ):  this is sometimes called the research hypothesis or experimental hypothesis. It is the proposition that there will be a relationship. It is a statement of inequality between the variables you are interested in. They always refer to the sample. It is usually a declaration rather than a question and is clear, to the point and specific.

e.g. The instances of illegal drug use of teenagers who are members of a gang  is different than the instances of illegal drug use of teenagers who are not gang members.

A non-directional research hypothesis - reflects an expected difference between groups but does not specify the direction of this difference (see two-tailed test).

A directional research hypothesis - reflects an expected difference between groups but does specify the direction of this difference. (see one-tailed test)

e.g. The instances of illegal drug use by teenagers who are members of a gang will be higher t han the instances of illegal drug use of teenagers who are not gang members.

Then the process of testing is to ascertain which hypothesis to believe. 

It is usually easier to prove something as untrue rather than true, so looking at the null hypothesis is the usual starting point.

The process of examining the null hypothesis in light of evidence from the sample is called significance testing . It is a way of establishing a range of values in which we can establish whether the null hypothesis is true or false.

The debate over hypothesis testing

There has been discussion over whether the scientific method employed in traditional hypothesis testing is appropriate.  

See below for some articles that discuss this:

  • Gill, J. (1999) 'The insignificance of null hypothesis testing',  Politics Research Quarterly , 52(3), pp. 647-674 .
  • Wainer, H. and Robinson, D.H. (2003) 'Shaping up the practice of null hypothesis significance testing',  Educational Researcher, 32(7), pp.22-30 .
  • Ferguson, C.J. and Heener, M. (2012) ' A vast graveyard of undead theories: publication bias and psychological science's aversion to the null' ,  Perspectives on Psychological Science, 7(6), pp.555-561 .

Taken from: Salkind, N.J. (2017)  Statistics for people who (think they) hate statistics. 6th edn. London: SAGE pp. 144-145.

  • Null hypothesis - a simple introduction (SPSS)

A significance level defines the level when your sample evidence contradicts your null hypothesis so that your can then reject it. It is the probability of rejecting the null hypothesis when it is really true.

e.g. a significance level of 0.05 indicates that there is a 5% (or 1 in 20) risk of deciding that there is an effect when in fact there is none.

The lower the significance level that you set,  then the evidence from the sample has to be stronger to be able to reject the null hypothesis.

N.B.  - it is important that you set the significance level before you carry out your study and analysis.

Using Confidence Intervals

I t is possible to test the significance of your null hypothesis using Confidence Interval (see under samples and populations tab).

- if the range lies outside our predicted null hypothesis value we can reject it and accept the alternative hypothesis  

The test statistic

This is another commonly used statistic

  • Write down your null and alternative hypothesis
  • Find the sample statistic (e.g.the mean of your sample)
  • Calculate the test statistic Z score (see under Measures of spread or dispersion and Statistical tests - parametric). In this case the sample mean is compared to the population mean (assumed from the null hypothesis) and the standard error (see under Samples and population) is used rather than the standard deviation.
  • Compare the test statistic with the critical values (e.g. plus or minus 1.96 for 5% significance)
  • Draw a conclusion about the hypotheses - does the calculated z value lies in this critical range i.e. above 1.96 or below -1.96? If it does we can reject the null hypothesis. This would indicate that the results are significant (or an effect has been detected) - which means that if there were no difference in the population then getting a result that you have observed would be highly unlikely therefore you can reject the null hypothesis.

all quantitative research must be hypothesis driven

Type I error  - this is the chance of wrongly rejecting the null hypothesis even though it is actually true, e.g. by using a 5% p  level you would expect the null hypothesis to be rejected about 5% of the time when the null hypothesis is true. You could set a more stringent p  level such as 1% (or 1 in 100) to be more certain of not seeing a Type I error. This, however, makes more likely another type of error (Type II) occurring.

Type II error  - this is where there is an effect, but the  p  value you obtain is non-significant hence you don’t detect this effect.

  • Statistical significance - what does it really mean?
  • Statistical tables

One-tailed tests - where we know in which direction (e.g. larger or smaller) the difference between sample and population will be. It is a directional hypothesis.

Two-tailed tests - where we are looking at whether there is a difference between sample and population. This difference could be larger or smaller. This is a non-directional hypothesis.

If the difference is in the direction you have predicted (i.e. a one-tailed test) it is easier to get a significant result. Though there are arguments against using a one-tailed test (Wright and London, 2009, p. 98-99)*

*Wright, D. B. & London, K. (2009)  First (and second) steps in statistics . 2nd edn. London: SAGE.

N.B. - think of the ‘tails’ as the regions at the far-end of a normal distribution. For a two-tailed test with significance level of 0.05% then 0.025% of the values would be at one end of the distribution and the other 0.025% would be at the other end of the distribution. It is the values in these ‘critical’ extreme regions where we can think about rejecting the null hypothesis and claim that there has been an effect.

Degrees of freedom ( df)  is a rather difficult mathematical concept, but is needed to calculate the signifcance of certain statistical tests, such as the t-test, ANOVA and Chi-squared test.

It is broadly defined as the number of "observations" (pieces of information) in the data that are free to vary when estimating statistical parameters. (Taken from Minitab Blog ).

The higher the degrees of freedom are the more powerful and precise your estimates of the parameter (population) will be.

Typically, for a 1-sample t-test it is considered as the number of values in your sample minus 1.

For chi-squared tests with a table of rows and columns the rule is:

(Number of rows minus 1) times (number of columns minus 1)

Any accessible example to illustrate the principle of degrees of freedom using chocolates.

  • You have seven chocolates in a box, each being a different type, e.g. truffle, coffee cream, caramel cluster, fudge, strawberry dream, hazelnut whirl, toffee. 
  • You are being good and intend to eat only one chocolate each day of the week.
  • On the first day, you can choose to eat any one of the 7 chocolate types  - you have a choice from all 7.
  • On the second day, you can choose from the 6 remaining chocolates, on day 3 you can choose from 5 chocolates, and so on.
  • On the sixth day you have a choice of the remaining 2 chocolates you haven't ate that week.
  • However on the seventh day - you haven't really got any choice of chocolate - it has got to be the one you have left in your box.
  • You had 7-1 = 6 days of “chocolate” freedom—in which the chocolate you ate could vary!
  • Search Site

Eric C. Wait , Michael A. Reiche , Teng-Leong Chew; Hypothesis-driven quantitative fluorescence microscopy – the importance of reverse-thinking in experimental design. J Cell Sci 1 November 2020; 133 (21): jcs250027. doi: https://doi.org/10.1242/jcs.250027

Download citation file:

  • Ris (Zotero)
  • Reference Manager

One of the challenges in modern fluorescence microscopy is to reconcile the conventional utilization of microscopes as exploratory instruments with their emerging and rapidly expanding role as a quantitative tools. The contribution of microscopy to observational biology will remain enormous owing to the improvements in acquisition speed, imaging depth, resolution and biocompatibility of modern imaging instruments. However, the use of fluorescence microscopy to facilitate the quantitative measurements necessary to challenge hypotheses is a relatively recent concept, made possible by advanced optics, functional imaging probes and rapidly increasing computational power. We argue here that to fully leverage the rapidly evolving application of microscopes in hypothesis-driven biology, we not only need to ensure that images are acquired quantitatively but must also re-evaluate how microscopy-based experiments are designed. In this Opinion, we present a reverse logic that guides the design of quantitative fluorescence microscopy experiments. This unique approach starts from identifying the results that would quantitatively inform the hypothesis and map the process backward to microscope selection. This ensures that the quantitative aspects of testing the hypothesis remain the central focus of the entire experimental design.

Advancements in optical engineering, labeling technologies, and computational capacity have turned fluorescence microscopy into an indispensable tool in the life sciences. Its unique capacity to probe biological questions across a large range of biological length scales has made it a popular tool in cell biology, neurobiology and developmental biology, as well as many other fields of research. Modern microscopy can reveal valuable information on molecular ultrastructure, dynamic biological processes and biological functions. Yet, the appeal of seemingly limitless promises, the myriad of technical details and the rapid development of computational capabilities has also created confusion for many seeking the right combination of imaging tools. As has been previously pointed out by Jonkman and colleagues ( Jonkman et al., 2020 ), biologists can spend considerable time and resources acquiring huge amounts of data without proper planning, only to realize later that the data cannot appropriately address a particular biological question. This usually occurs when the design of a microscopy experiment is not guided by a suitable hypothesis, the experimenter gets side-tracked by new observations or the experiment starts without a design at all. The method proposed here aims to assist the gathering of appropriate data that directly addresses a quantitative hypothesis. The intent is to give the reader a better understanding of the process and potential issues that arise in quantitative experiments.

The importance of fluorescence microscopy lies in its ability to serve both as an exploratory and a quantitative tool. In other words, microscopy has a combined capacity that enables a biologist to both formulate hypotheses based on observation and to perform quantitative measurements to test those hypotheses. For example, one might easily observe the localization of a target protein within a mitochondrial compartment. However, it takes a shift in mindset to design an appropriate experiment capable of quantifying this localization change in response to an oxidative stress. Quantitative measurements, however, can only produce results that directly address a proposed hypothesis when the experiment is designed appropriately. In fact, even an accurate, quantitative set of data that has been generated with the best practices will not necessarily yield biologically meaningful results. An image acquired with a digital detector is inherently a data map – an array of values. While any digital image can be quantified, these measurements are only biologically meaningful when they are pertinent to the hypothesis. Take for example a study that investigates the rates of filopodia extension during cell migration. Data revealing the super-resolved, 3D actin filaments are not sufficient for determining the rate of filopodia extension. However, an experiment that captures the change in location of the filopodial tip will provide the necessary data. In other words, when testing a quantitative hypothesis, informative data are quantitative, but not all quantitative data are informative.

Reliable and informative results require high-quality image data and relevant analyses. Fortunately, there is no shortage of excellent reviews in the literature that offer step-by-step guidance to perform microscopy experiments, from image acquisition to quantitative image analysis ( Berg et al., 2019 ; Jonkman et al., 2014 ; McQuin et al., 2018 ; North, 2006 ; Rueden et al., 2017 ; Swedlow, 2013 ; Van Den Berge et al., 2019 ; Waters, 2009 ; Weigert et al., 2018 ). The task now lies in ensuring that data acquisition and analyses can be translated into biologically meaningful information, capable of challenging a hypothesis. We argue that this must be achieved through rational experimental design.

Designing a hypothesis-driven experiment is a vital step in the overall experimental scheme, but it is often over-simplified and represented by a single step. The conventional workflow of an imaging experiment, as astutely observed by Lee and Kitaoka (2018) , is adapted in Fig. 1 A. In this generalized diagram, the execution of the experiment begins with sample preparation after experimental design. The images are acquired, and the data will then be processed and analyzed – usually followed by several iterations of optimization – before the final results are presented. What is important to note is that experimental design is appropriately singled out as the key first step ( Fig. 1 A). Yet, in stark contrast to the wealth of technical guides, there is a paucity of discussion in the literature on the logic of rational experimental design and how it can be harnessed to successfully perform a hypothesis-driven, quantitative experiment. This is an unfortunate omission, partly due to the difficulty in summarizing a logical scheme that is sufficiently general to be applicable to most biological questions. In this Opinion article, we aim to fill this important gap and focus on rational, hypothesis-driven experimental design. This guide is aimed toward biologists interested in learning how to design quantitative experiments that are geared toward testing their hypotheses. It embodies our experience in steering imaging projects from hypotheses to quantitative, informative results at the Advanced Imaging Center at HHMI Janelia Research Campus ( Chew et al., 2017 ). We include in Box 1 a case study of how we have successfully steered the development of such a quantitative microscopy project.

Conducting and designing quantitative fluorescence microscopy experiments. (A) Typical workflow in microscopy experiment. This workflow is forward-facing, progressing from the formulation of a hypothesis to the eventual presentation of the data as results. Adapted with permission of American Society for Cell Biology from Lee and Kitaoka (2018); permission conveyed through Copyright Clearance Center, Inc. (B) A focused view of the experimental planning phase. We propose that experimental design would be more efficient and effective by adopting a reverse-facing workflow. Here, the hypothesis should determine what the necessary results should be. From there, the experimenter can plan backward from the required data to the point where the experiment can be executed. The processes outlined in A and B are iterative, and the experimenter should re-evaluate whether the best decision has been made at each step. (C) A flow diagram to determine whether the experimental output generated from the microscope will lead to informative results. Answering the questions outlined here will identify the corresponding step in the design that needs re-evaluation. Reaching the ‘Informative results’ box would indicate that the data acquired were most likely collected in a manner that would directly test the hypothesis. Alternatively, the bulleted lists provide insight into which step in the design process requires re-evaluation to be improved in subsequent design iterations.

Conducting and designing quantitative fluorescence microscopy experiments . (A) Typical workflow in microscopy experiment. This workflow is forward-facing, progressing from the formulation of a hypothesis to the eventual presentation of the data as results. Adapted with permission of American Society for Cell Biology from Lee and Kitaoka (2018) ; permission conveyed through Copyright Clearance Center, Inc. (B) A focused view of the experimental planning phase. We propose that experimental design would be more efficient and effective by adopting a reverse-facing workflow. Here, the hypothesis should determine what the necessary results should be. From there, the experimenter can plan backward from the required data to the point where the experiment can be executed. The processes outlined in A and B are iterative, and the experimenter should re-evaluate whether the best decision has been made at each step. (C) A flow diagram to determine whether the experimental output generated from the microscope will lead to informative results. Answering the questions outlined here will identify the corresponding step in the design that needs re-evaluation. Reaching the ‘Informative results’ box would indicate that the data acquired were most likely collected in a manner that would directly test the hypothesis. Alternatively, the bulleted lists provide insight into which step in the design process requires re-evaluation to be improved in subsequent design iterations.

all quantitative research must be hypothesis driven

This case study partially summarizes one of the quantitative experiments performed by McArthur and colleagues ( McArthur et al., 2018 ). Preliminary observations indicated that the mitochondrial network of cells deficient in induced myeloid leukemia cell differentiation protein (MCL-1), a Bcl-2 family member, broke down during apoptosis (A in the box figure), followed by the presence of mitochondrial DNA (mtDNA) in the cytoplasm (B in the box figure). This observation led to the conceptualization of the working model – ‘during apoptosis, mitochondrial morphology changes prior to the release of mtDNA into the cytoplasm’.

To properly plan a quantitative experiment to test this model, we used our reverse-logic to steer the following experimental design:

1. A more-defined hypothesis was formulated – ‘during apoptosis, the mitochondrial sphericity increases prior to an increase in the number of externalized mtDNA’. Note how the initial descriptive semantics have been translated into quantitative semantics that will guide subsequent measurements.

2. Two sets of informative results were essential to test this hypothesis: (i) mitochondrial sphericity, and (ii) mtDNA externalization, both measured as a function of time.

3. To achieve these informative results, the required data must include time-lapsed, volumetric images of labeled mitochondria and mtDNA.

4. To produce these data, the following experimental imaging parameters had to be met:

• high-speed volumetric imaging to accurately track 3D mitochondrial network reorganization

• high signal-to-noise ratio and resolution in order to accurately measure the 3D structures of the mitochondria

• near-isotropic resolution to precisely characterize the sphericity of mitochondria and mtDNA extrusion

• two-color acquisition to provide information on both the mitochondria and the mtDNA.

5. While both lattice lightsheet microscopy (LLSM) and 3D structured illumination microscopy (SIM) met these benchmarks, it was also important to meet the biological requirements. Pilot studies established that two-channel volumes of 50 slices each, acquired approximately every 10 s for a total of 50 min would be necessary to capture and follow this rare process in its entirety. Photoxicity could affect the mitochondrial biology, introducing artifacts. To mitigate phototoxicity, the gentle illumination of LLSM established it as the clear choice. To further reduce light exposure, brighter fluorescent labels, such as mNeonGreen ( Shaner et al., 2013 ) and HaloTag™ (Promega, USA) with Janelia Fluor ® 646 ( Grimm et al., 2017 ) (instead of EGFP and mCherry), were used. Note that the experimental design process was iterative and benefited from pilot studies used to identify the necessary imaging parameters, suitable fluorophores, and the optimal microscope.

C to E in the box figure illustrate the successful completion of this quantitative experiment. The LLSM micrograph (C) shows mtDNA extrusion from mitochondria. These images were used to create 3D segmentations (D) and were quantified. The mitochondrial sphericity and mtDNA externalization were measured over time, and plotted in E. This graph shows that an increase in mitochondrial sphericity (thin red line) preceded the onset of mtDNA extrusion (thin green line) – providing the informative result that ultimately supported the hypothesis.

The box figure shows morphological changes of mitochondria and mitochondrial DNA release during apoptosis; images were previously published in McArthur et al. (2018) and are reused here with permission. Scale bars: 5 µm.

The success of a microscopy-based quantitative experiment hinges on the appreciation and understanding of (i) how the underlying biological query and defined hypothesis directs the experimental design, and (ii) how experimental design and instrument choice are related to the way in which image data will eventually be analyzed. For this reason, we outline a logic that exemplifies these themes ( Fig. 1 B). We propose, in this Opinion article, that the very first step of experimental design, following the formulation of a hypothesis, is to determine the informative results that can quantitatively test that hypothesis. In other words, informative results are the ultimate goal of the designed experiment. Therefore, an experiment that has been developed to specifically generate data pertinent to the biological query will produce informative results. As such, the production of the required data will necessitate that a certain set of experimental parameters be met, which would in turn prescribe the features of the instrument needed to make such measurements. Overall, such a systematic workflow ensures that the hypothesis remains central to the experiment and that the experiment yields information capable of challenging the hypothesis. This will help chart the roadmap of how microscopy-based experiments should be designed for quantitative analyses. We will not replicate the many superbly written reviews and guides in the literature here, but rather aim to help readers better utilize these guides, as we embark on our journey of experimental design.

The capacity of modern optical microscopy to support both visual exploration and content-rich measurement has made it a versatile biological research technique. Unfortunately, it is also one that is commonly misunderstood. Biologists are keen observers, exceptional in recognizing patterns, finding anomalies and identifying new phenotypes. In fact, when it comes to studying structures and processes, visualization by itself is often sufficient to prompt biologists to formulate working models of the observed systems, and these working models provide abstract representations of the observation. The descriptive semantics used in these working models have served as powerful tools in life sciences and enable biologists to organize and communicate information about the complexity of the living systems ( Courtot et al., 2011 ). Indeed, specific follow-up questions can often already be framed by experienced biologists as soon as the initial images appear on their monitor; and this is the inception point of many biological queries. This is the essence of observation-driven, empirical inferences – ‘I know it when I see it’, and this is where the power of microscopy has historically been leveraged. Observational biology will continue to play an important role, and it is certainly true that not all biological hypotheses must be quantitatively tested. However, there is no denying that with the advent of modern experimental methods, hypotheses in general have become, and are increasingly expected to be, formulated in more quantitative terms. Addressing these increasingly focused hypotheses is where the quantitative capacity of microscopy has the most impact and is the core of this Opinion article.

If one were to accept the idea that ‘seeing is believing’ with microscopy as an exploratory instrument, then surely one must also accept the notion that ‘measuring is knowing’ when using microscopy as an analytical technique. The challenge here is to reconcile observation and quantification using the same instrument. Quantitative measurement is intrinsically analysis-rich and semantics-agnostic ( Shasha, 2003 ). However, this is where the disparity between observation and quantification often arises. It is common to see proposed microscopy studies with phrases such as ‘to analyze the spatial-temporal dynamics of an organelle’. There is unfortunately no specific analytical metric for the ‘dynamics’ of an organelle or any other biological structure. Dynamics is an ambiguous term that is often used to encapsulate several different metrics that together describe a particular observation. To transform vague biological queries such as this into quantifiable goals for microscopic analysis, we need to consider how intuitive biological semantics can be reformulated. With this in mind, we will begin by exploring how hypotheses shape the rationale of microscopy-based experiments.

Testable hypothesis

The cornerstone of the classical scientific method is to determine whether evidence supports or negates a postulated idea. Hypotheses, at the experimental level, must therefore be negatable by observation or measurements ( Popper, 2005 ). A clearly stated, verifiable hypothesis will guide every step of an experiment and will provide invaluable checkpoints. More importantly, the negatable hypothesis will impart the necessary restraint to mitigate being side-tracked from the initial question. This disciplined approach does not preclude future exploration of other observations, but it serves to balance both the exploratory and the analytical priorities of an experiment ( Fig. 1 C). This is why a hypothesis such as ‘condition X will increase the rate of mitochondrial fission’ has stronger semantic specificity than ‘condition X will affect the spatial-temporal dynamics of mitochondria’. The latter hypothesis cannot be tested because the experimental variables (i.e. fission events) that either support or negate it are not defined.

Interestingly, such cautionary advice is rarely needed for biochemical and molecular biology assays. These are assays that are uniquely quantitative and do not usually serve as observational tools, and biologists learn these techniques extensively during their training. As a result, biologists formulate testable hypotheses and perform quantitative analyses with ease using assays such as immunoblots, PCRs, ELISAs or enzyme kinetic assays. What differentiates these assays from microscopy is that they are explicitly linked to well-defined sets of output. For example, an immunoblot yields specific information on molecular mass and abundance. In contrast, a vast plethora of information can be derived from microscopy data, including molecular abundance, spatial location, movement behavior, morphological changes, structural features, molecular association, enzymatic activity, and the list goes on. Microscopy is therefore not a single assay; instead, it is a collection of assays that vary depending on how the experiment is designed. Without a defined boundary, the scope of an experiment can quickly become too ambitious and unnecessarily complex. This underscores the importance of identifying the appropriate experimental output that addresses the hypothesis early in the design process.

Compared to biochemical and molecular biology assays, the complexity of microscopy is further compounded by the variability in the nature of the sample. In comparison to molecular biology assays that use defined samples for input, such as nucleic acids or proteins, microscopy can accommodate a wide variety of complex samples (from purified molecules to a multitude of model organisms at various stages of development, for example) that in turn change the requirements and implementation of the experiment. Thus, it does not come as a surprise that the experimental scheme and sample choice often have to be considered in parallel due to their interdependencies ( Galas et al., 2018 ). Sample compatibility is a complex issue that comprises both the specimen and fluorescent labels. Likewise, the labeling strategy and sample viability are critically important factors to the success of an experiment, and these topics have been extensively discussed in the literature ( Albrecht and Oliver, 2018 ; Dean and Palmer, 2014 ; Frigault et al., 2009 ; Heppert et al., 2016 ; Icha et al., 2017 ; Kiepas et al., 2020 ; Lambert, 2019 ; Schneider and Hackenberger, 2017 ; Specht et al., 2017 ; Thorn, 2017 ). Overall, the compatibility of a sample will be determined by all aspects of the experiment and demands careful consideration. As a result, the hypothesis and the associated experiment will be heavily influenced by what can be realistically achieved given the nature of the sample. Once the hypothesis has been appropriately defined, rather than proceeding directly to performing microscopy experiments, the most critical step at this point is to evaluate what it means to challenge the hypothesis.

Informative results

Not all results can adequately test a hypothesis. It is important to differentiate between a ‘desired outcome’ and an ‘informative result’. The desired outcome would naturally be for the evidence to support the hypothesis. Continuing with the example of mitochondrial fission stated above, the informative result in this case would be the number of mitochondrial fission events as a function of time, both in the presence and absence of condition X. This is in contrast to the ‘desired outcome’ of finding an increased rate of mitochondrial fission given condition X. In addition, to be informative, the required data should encompass appropriate controls and sufficient replicates to support statistical analyses. The informative result is not designed to affirm one's intuition; it is required to support or negate the hypothesis.

Required data

As depicted in Fig. 1 B, experimental design involves a reverse-thinking workflow that begins with informative results and concludes with microscope choice. This reverse-flow provides the necessary logic for designing a quantitative experiment. The essence of efficient experimental design is to home in on the appropriate assay from the multitude of possibilities offered by fluorescence microscopy. It is therefore imperative that the experimenter identifies what the necessary data are, as this will ultimately define the appropriate assay. This underscores the importance of thinking in reverse, as the necessary data can only be defined by informative results. While results and data are sometimes used interchangeably elsewhere, they are distinctly different in this context. Results refer to the final analytical metrics compiled from a set of related experiments. In contrast, a set of data generated from the microscope, by itself, is insufficient to speak to the validity of a hypothesis.

The transition from data to results requires certain translational steps. A good example of such translation is the process of connecting coordinates of a moving object, be it a cell or particle, between time points into a defined track. Without further analyses, the tracked data of a moving object is only minimally informative; it merely indicates that the object has moved. If one were to hypothesize that the object would change its migratory behavior under certain conditions, then one would need to consider which measurements could describe that behavior. These informative measurements, when performed on the data, are referred to as the analytical metrics. In this example of characterizing migration patterns, the analytical metrics may include directionality, velocity and motion persistence ( Aaron et al., 2019 ). Informative results are produced when these analytical metrics are applied to the appropriate data .

Adhering to our reverse-design approach, the types of analytical metrics that will lead to the informative results are the next factor an experimenter must consider. Table 1 shows how common biological objectives dictate the relevant analytical metrics, which in turn prescribe the necessary experimental tools. Analytical metrics is a form of semantics. What sets it apart from the semantics used in working models is that, in analytical metrics, the semantics are quantitative and specific rather than descriptive. What should be clear from Table 1 is that careful consideration is required to choose the appropriate analytical metrics. In fact, as reflected in the mitochondrial fission example, analytical metrics (mitochondrial fission rate) should be central to the hypothesis, so that it can be tested. An additional example where the choice of the correct analytical metric would affect the results is in colocalization studies. One must first determine whether measuring the degree of overlap (co-occurrence) of the two signals is more appropriate than measuring the extent of their correlation. This decision will dictate the analytical metric that should be used ( Aaron et al., 2018 ). Likewise, if a certain treatment is postulated to increase the dissemination of cancer cells from a cell cluster, it is important, from a mechanistic standpoint, to properly frame the testable hypothesis. This can be accomplished by avoiding vague descriptions such as ‘dissemination’ and instead frame the descriptor in quantitative terms, such as velocity, directionality and persistence of the cellular movement ( Aaron et al., 2019 ). This is how descriptive semantics should be translated into quantitative semantics, thereby enabling the underlying biology to be measured.

Selecting analytical metrics based on biological questions

Selecting analytical metrics based on biological questions

Interestingly, and perhaps ironically, many of the analytical metrics listed in Table 1 , such as velocity, directionality, or curvature, collectively describe ‘spatial-temporal dynamics’. Yet, owing to various limitations of individual microscope design, it is impossible to capture them all in one experiment (see the section on microscope selection below). Similarly, it is often counter-productive to acquire more data than one needs, as this complicates data analysis and also compounds the problem of data storage ( Andreev and Koo, 2020 ). Added complexity can lead to the experimenter being side-tracked from the original goal and makes data interpretation more difficult. Fig. 1 C shows how the iterative evaluation of the experimental output will ensure that these readouts stay pertinent to the hypothesis and allow room for observational biology to take place. Parsimonious selection of analytical metrics will focus the scope of the experiment, generating data that can test the hypothesis. However, the well-considered selection of analytical metrics only fulfills half of the data requirement. One also needs to consider the validity of the data. In other words, how to ensure that the data set is accurate and reproducible.

Accuracy and reproducibility together describe the rigor of the experiment. While highly related, it is possible that accurate data are not reproducible, and reproducibility does not ensure accuracy ( Payne-Tobin Jost and Waters, 2019 ). Too often, the accuracy and reproducibility of microscopy data is only an afterthought, which can potentially jeopardize an entire experiment. There are two places in which rigor can be compromised: during data generation and in the experimental design. Great care should be taken to ensure unbiased sampling, appropriate use of standards and controls, uniform instrument performance and consistent data processing pipelines. In this light, preserving accuracy and reproducibility during image acquisition has been extensively covered ( Jonkman, 2020 ; McQuin et al., 2018 ; Payne-Tobin Jost and Waters, 2019 ), and is beyond the scope of our discussion. Nevertheless, this is extremely important advice and should be followed closely.

However, identifying the appropriate constraints for a rigorous experimental design can be equally challenging. How experimental controls and baselines are chosen can alter the data and the results, and therefore cannot be taken lightly as it can skew data interpretation. In stark contrast to physics, in which absolute numbers of various universal constants can be mathematically derived, biology is a comparative science. In biology, it is the change of experimental readouts in response to a modification of the experimental variables that is the important factor. As previously mentioned, modern microscopes will always generate quantifiable data because a digital image is intrinsically a data map. However, not all quantifiable digital images are meaningful. An absolute number derived from a colocalization experiment (for example, a calculated Pearson's correlation coefficient of 0.75) between two proteins is quantitative, but utterly meaningless as a stand-alone piece of data. It has to be compared to controls to become biologically informative – has the Pearson's coefficient changed in response to a variation in the experimental condition? The importance of establishing an experimental baseline for comparison cannot be overstated. Owing to our inherent tendency to look for the desired outcome, experimental bias occurs in the absence of a rigorous baseline. Validation of an experimental pipeline will ensure the measurements accurately represent the biological truth. This can be achieved by the effective use of controls and standards ( Payne-Tobin Jost and Waters, 2019 ). While this sounds cliché, we found that comparative baselines are often forgotten. By articulating the necessary controls for a given hypothesis, the underlying nature of the experiment can become more apparent. This in turn can be used to refine the hypothesis and home in on what the biologist seeks to test. Stringent controls will indeed make for better experiments.

When an experiment is driven by a hypothesis, the hypothesis itself will define the requirements of the experiment. These, in turn, will define the parameters that subsequently circumscribe the rest of the microscopy assay. The key parameters in any microscopy experiment will include one or more of the following: (i) lateral and axial spatial resolution, (ii) temporal resolution, (iii) tolerance to phototoxicity and photobleaching, (iv) field of view, (v) imaging depth, (vi) multiplexing capacity to acquire a combination of colors, and (vii) spectroscopic imaging capabilities. In a perfect world, a microscope will encompass all these parameters. Unfortunately, in reality, such a microscope does not exist as every microscope requires trade-offs ( Combs, 2010 ; Lemon and McDole, 2020 ; Scherf and Huisken, 2015 ; Schermelleh et al., 2010 ). Occasionally, the trade-off can come at an exorbitant price, and this is especially the case with super-resolution microscopy. To gain the extra resolution, these modalities either completely sacrifice the capacity to image live phenomena or incur unacceptable doses of illumination light that rapidly induces phototoxicity ( Schermelleh et al., 2019 ). Thus, the trade-off of an otherwise suitable microscope may render it incapable of producing the required data.

In order to avoid such situations, it is best to understand what needs to be captured by the microscope before selecting an instrument. This can be achieved by changing the ambiguous, descriptive semantics (e.g. ‘membrane 3D dynamics’) to those that are framed in the semantics of analytical metrics (e.g. ‘filopodial angular deflection’, ‘membrane surface curvature’) (see Table 1 ). By identifying the necessary metrics, the required imaging parameters can be prioritized. For example, the analytical metrics required to sufficiently measure the 3D membrane ruffles of a cell ( Fritz-Laylin et al., 2017 ) include angular deflection, surface curvature, volumetric changes and the turnover rate of these membranous structures. These metrics will mandate the following imaging parameters: (i) high volumetric imaging speed (multiple volumes per min); (ii) improved axial resolution producing near or true isotropic resolution in all three axes, so that the ruffling structures can be resolved and segmented accurately; (iii) gentle illumination to minimize phototoxicity; (iv) live-cell-compatible imaging; and (iv) labeling of the cell membrane that is capable of facilitating the high number of image acquisitions. Box 1 also provides a case study of how analytical metrics influence microscope choice. Specific analytical metrics do not preclude the experimenter from observing (and even exploring) the biology; instead, they help winnow the imaging parameters down to the bare essentials. Together, quantitative metrics and experimental parameters will guide the user to the optimal microscope(s).

Microscope selection

The task of microscope selection can be bewildering to novices, and at times is confusing to even experienced microscopists. Biologists often face multiple hurdles in identifying suitable microscopes for an experiment through no fault of their own. These include (i) the lack of access to the desired instrument, (ii) ill-informed demand from reviewers to use the latest technology in the name of innovation, (iii) over-promise of instrument capabilities from the manufacturers, (iv) under-reporting of the instrument limitations, and (v) insufficient or erroneous reporting of published results that render experimental conditions irreproducible. Table 2 summarizes the features of various commonly used microscope modalities, as well as their relative advantages and shortcomings in our experience. Biologists have access to a wide range of modalities beyond standard widefield epifluorescence microscopes: total internal reflection fluorescence microscopy ( Mattheyses et al., 2010 ), lightsheet microscopy ( Chatterjee et al., 2018 ; Chen et al., 2014 ; Power and Huisken, 2017 ), confocal microscopy ( Claxton et al., 2011 ; Conchello and Lichtman, 2005 ; Jonkman et al., 2020 ; Oreopoulos et al., 2014 ), two-photon excitation fluorescence microscopy ( Benninger and Piston, 2013 ; So et al., 2000 ) and image scanning microscopy ( Gregor and Enderlein, 2019 ), as well as super-resolution techniques ( Demmerle et al., 2017 ; Sahl et al., 2017 ; Schermelleh et al., 2019 ; Sydor et al., 2015 ; Vicidomini et al., 2018 ). What should be immediately obvious from their comparison is that there is no ‘winner’ or ‘loser’ ( Table 2 ). No microscope scores equally well or poorly across the various parameters, reinforcing the notion that every microscope compromises a combination of parameters in order to excel at others. As a result, the process of microscope selection is rarely linear. Many instruments have overlapping capabilities that obscure the selection process and will require that more than one instrument be considered at a time. By defining the required parameters beforehand, they can be used to filter the selection down to the most appropriate instrument(s), as exemplified in the case study presented in Box 1 . The process of microscope selection is aided by a good understanding of the necessary imaging parameters. Ultimately, the justification for an instrument lies solely on the ability of that microscope to provide the necessary analytical metrics and the data informative of the biology.

Performance comparison of various microscope modalities

Performance comparison of various microscope modalities

It is impractical to expect biologists to understand the myriad of technical nuances of these rapidly evolving technologies. Likewise, most advanced imaging systems are usually concentrated in shared microscopy facilities, managed by experienced microscopists. This makes it all the more important for biologists to communicate, precisely and concisely, the desired analytical metrics and the corresponding parameters required for a successful experiment. It is sometimes difficult to appreciate that the latest imaging technology is not always the most appropriate. A super-resolution microscope or advanced lightsheet microscope may not necessarily be more suitable than a widefield epifluorescence microscope for a particular experiment. A microscope can only enhance certain parameters, and it is only beneficial if the enhanced parameters are utilized wisely. Even though structured illumination microscopy (SIM) offers improved resolution (see Table 2 ), it does not enhance the data of cell tracking studies over what can be achieved with a standard epifluorescence widefield microscope. It is also important to note that sometimes no single existing imaging technology may be able to produce the required data, necessitating the use of multiple instruments, or even the modification of the testable hypothesis. However, the availability of a new technology can open up the possibility of previously unfeasible analytical metrics that make it possible to address different biological queries.

The microcopy literature has no shortage of excellent reviews on the technical aspects of various imaging modalities, as well as tutorials on how to generate quantitative and reproducible data. However, topical discussion of best practices and optics does not necessarily engender a coherent framework of how these sets of information can be integrated to facilitate a hypothesis-driven, quantitative experimental design. Here, we present not only a roadmap of how to use these guides in the literature, but we break with the convention and argue that microscopy-based quantitative experiments should be designed in reverse, starting with determining the informative results needed to challenge a hypothesis.

Despite the promises of the latest technologies, no microscope is perfect. Usually, a feature gained in a technique comes at the cost of other key parameters. The essence of experimental design is never about the inclusion of every parameter the experimenter wants; rather it is about the careful exclusion of unnecessary parameters. This will allow accurate measurements to be performed and will ensure that the parameters relevant to the information the experimenter needs are maintained. This is the core concept of our approach. The essential parameters must be determined by what is required to test a hypothesis. These parameters will, in turn, naturally shape the rest of the elements of an experimental pipeline ( Fig. 1 A). A hypothesis-driven experimental design must be just that – driven by the hypothesis. It should be based on the biological question at hand, and not by the lure of the latest technologies. Fortunately, this process is an iterative feedback loop. The key questions left unanswered due to lack of technology inspire the development of novel microscopes. New technologies then reciprocally inform biology so that new hypotheses can be formulated. This cyclical process, however, does not negate the fact that experiments should be framed within the confines of existing technologies.

This Opinion article does not, by any means, diminish the exploratory power of microscopes and the well-honed acumen of biologists to observe and deduce. On the contrary, most hypotheses are synthesized following keen observation. The scope of this discussion is to focus on the process of quantitatively verifying a hypothesis. We have not addressed here how the power of modern microscopy has been harnessed in big-data scientific exploration. Such experiments are usually hypothesis-free; instead machine-learning algorithms are employed to search for patterns beyond what human perception can efficiently discern ( Chessel and Carazo Salas, 2019 ; Piccinini et al., 2017 ).

Quantitative microscopy experiments are not easy to design, as they require knowledge at the confluence of optics, imaging probes, data analysis and how the biological samples interact with the microscope. It is therefore of paramount importance for biologists to seek and heed the advice of expert microscopists and data scientists, especially those in core facilities, who are experienced in the application of microscopy. The conventional practice of generating a lot of data first, followed by data analysis as a secondary consideration should be avoided. Microscopy-related experiments demand careful planning and continued, iterative evaluation before the optimal approaches can be implemented. The fact that this message is echoed in every review and guide cited here is because it is important, and unfortunately, because it is also commonly overlooked. The perils of ignoring it cannot be overstated.

We thank Dr Christopher Obara as well as the members of the Advanced Imaging Center for their thoughtful discussion and insightful contributions.

The Advanced Imaging Center at Janelia Research Campus is generously supported by the Howard Hughes Medical Institute and the Gordon and Betty Moore Foundation.

Other journals from The Company of Biologists

This Feature Is Available To Subscribers Only

Sign In or Create an Account

3.4 Sampling Techniques in Quantitative Research

Target population.

The target population includes the people the researcher is interested in conducting the research and generalizing the findings on. 40 For example, if certain researchers are interested in vaccine-preventable diseases in children five years and younger in Australia. The target population will be all children aged 0–5 years residing in Australia. The actual population is a subset of the target population from which the sample is drawn, e.g. children aged 0–5 years living in the capital cities in Australia. The sample is the people chosen for the study from the actual population (Figure 3.9). The sampling process involves choosing people, and it is distinct from the sample. 40 In quantitative research, the sample must accurately reflect the target population, be free from bias in terms of selection, and be large enough to validate or reject the study hypothesis with statistical confidence and minimise random error. 2

all quantitative research must be hypothesis driven

Sampling techniques

Sampling in quantitative research is a critical component that involves selecting a representative subset of individuals or cases from a larger population and often employs sampling techniques based on probability theory. 41 The goal of sampling is to obtain a sample that is large enough and representative of the target population. Examples of probability sampling techniques include simple random sampling, stratified random sampling, systematic random sampling and cluster sampling ( shown below ). 2 The key feature of probability techniques is that they involve randomization. There are two main characteristics of probability sampling. All individuals of a population are accessible to the researcher (theoretically), and there is an equal chance that each person in the population will be chosen to be part of the study sample. 41 While quantitative research often uses sampling techniques based on probability theory, some non-probability techniques may occasionally be utilised in healthcare research. 42 Non-probability sampling methods are commonly used in qualitative research. These include purposive, convenience, theoretical and snowballing and have been discussed in detail in chapter 4.

Sample size calculation

In order to enable comparisons with some level of established statistical confidence, quantitative research needs an acceptable sample size. 2 The sample size is the most crucial factor for reliability (reproducibility) in quantitative research. It is important for a study to be powered – the likelihood of identifying a difference if it exists in reality. 2 Small sample-sized studies are more likely to be underpowered, and results from small samples are more likely to be prone to random error. 2 The formula for sample size calculation varies with the study design and the research hypothesis. 2 There are numerous formulae for sample size calculations, but such details are beyond the scope of this book. For further readings, please consult the biostatistics textbook by Hirsch RP, 2021. 43 However, we will introduce a simple formula for calculating sample size for cross-sectional studies with prevalence as the outcome. 2

all quantitative research must be hypothesis driven

z   is the statistical confidence; therefore,  z = 1.96 translates to 95% confidence; z = 1.68 translates to 90% confidence

p = Expected prevalence (of health condition of interest)

d = Describes intended precision; d = 0.1 means that the estimate falls +/-10 percentage points of true prevalence with the considered confidence. (e.g. for a prevalence of 40% (0.4), if d=.1, then the estimate will fall between 30% and 50% (0.3 to 0.5).

Example: A district medical officer seeks to estimate the proportion of children in the district receiving appropriate childhood vaccinations. Assuming a simple random sample of a community is to be selected, how many children must be studied if the resulting estimate is to fall within 10% of the true proportion with 95% confidence? It is expected that approximately 50% of the children receive vaccinations

all quantitative research must be hypothesis driven

z = 1.96 (95% confidence)

d = 10% = 10/ 100 = 0.1 (estimate to fall within 10%)

p = 50% = 50/ 100 = 0.5

Now we can enter the values into the formula

all quantitative research must be hypothesis driven

Given that people cannot be reported in decimal points, it is important to round up to the nearest whole number.

An Introduction to Research Methods for Undergraduate Health Profession Students Copyright © 2023 by Faith Alele and Bunmi Malau-Aduli is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Scientists have been debating for centuries the nature of proper scientific methods. Currently, criticisms being thrown at data-intensive science are reinvigorating these debates. However, many of these criticisms represent long-standing conflicts over the role of hypothesis testing in science and not just a dispute about the amount of data used. Here, we show that an iterative account of scientific methods developed by historians and philosophers of science can help make sense of data-intensive scientific practices and suggest more effective ways to evaluate this research. We use case studies of Darwin's research on evolution by natural selection and modern-day research on macrosystems ecology to illustrate this account of scientific methods and the innovative approaches to scientific evaluation that it encourages. We point out recent changes in the spheres of science funding, publishing, and education that reflect this richer account of scientific practice, and we propose additional reforms.

Scientists have been debating for centuries the nature of proper scientific methods, especially the role of hypothesis testing in scientific practice (Laudan 1981 ). These debates are being reinvigorated as many fields of science, including high-energy physics, astronomy, public health, climate science, environmental science, and genomics, are increasingly using data-intensive approaches (Bell et al. 2009 , Baraniuk 2011 , Winsberg 2010 , King 2011 , Porter et al. 2012, Mattman 2013 , Khoury and Ioannidis 2014 , Katzav and Parker 2015 ). Data-intensive science has been described as research in which the capture, curation, and analysis of (usually) large volumes of data are central to the scientific question; it has also been defined as research that uses data sets so large or complex that they are hard to process and analyze using traditional approaches and methods (Hey et al. 2009 , Critchlow and van Dam 2013).

Although the term data intensive is relatively new, historians of science point out that scientists have been capturing, curating, and analyzing large volumes of data for centuries in ways that have challenged existing techniques (Muller-Wille and Charmantier 2012 ). For example, the disciplines of natural history and taxonomy provide important historical examples of data-intensive research; as Strasser (2012) put it, “Renaissance naturalists were no less inundated with new information than our contemporaries” (p. 85). However, contemporary data-intensive science is also characterized by new computational methods and technologies for creating, storing, processing, and analyzing data and also by the use of interdisciplinary teams for designing and implementing research to address complex societal challenges (Strasser 2012, Leonelli 2014 ). Consequently, in some areas of science (e.g., astronomy), there can be particularly sharp distinctions between historical and current data-intensive approaches, whereas in other areas of science (e.g., natural history), there are fewer differences (Evans and Rzhetsky 2010 , Haufe et al. 2010 , Pietsch 2016 ).

Contemporary examples of data-intensive science include collecting evidence for the existence of the Higgs boson, sequencing the human genome, developing computer models of climate change and carbon sequestration, and identifying relationships between social networks and human behaviors. Despite these high-profile examples and the increasing availability of large data sets for many science disciplines, there are concerns that contemporary data-intensive research is bad for science or that it will lead to poor methodology and unsubstantiated inferences. For example, data-intensive research has been criticized for being atheoretical, being nothing more than a “fishing expedition,” having a high probability of leading to nonsense results or spurious correlations, being reliant on scientists who do not have adequate expertise in data analysis, and yielding data biased by the mode of collection (Boyd and Crawford 2012 , Fan et al. 2014, Lazer et al. 2014 ).

Such concerns actually reflect deeper and more widespread debates about the centrality of hypothesis-driven research that have challenged the scientific community for centuries. Most contemporary scientific disciplines share a commitment to a hypothesis-driven methodology (see Peters R 1991, Weinberg 2010 , Keating and Cambrosio 2012 , Fudge 2014 ). Definitions for hypotheses vary across disciplines (ranging from specific to general and quantitative to qualitative; Donovan et al. 2015 ), but we define hypothesis-driven methodology in terms of the linear process canonized in many textbooks and represented in figure 1.

An external file that holds a picture, illustration, etc.
Object name is biw115fig1.jpg

Linear account employed in many descriptions of the scientific method.

Although this linear scientific process continues to be held up as an exemplar in many textbooks and grant proposal guidelines (Harwood 2004 , O'Malley et al. 2009 , Haufe 2013 ), recent commentaries from scientists and historians and philosophers of science have argued that historical and contemporary scientific practices incorporate a much more complex, iterative mixture of different methods (e.g., Kell and Oliver 2004 , Glass and Hall 2008 , Gannon 2009 , O'Malley et al. 2010 , Forber 2011 , Elliott 2012 , Glass 2014 , Peters DPC et al. 2014, Pietsch 2016 ). These scholars argue that focusing primarily on a linear, hypothesis-driven account of science impoverishes the scientific enterprise by encouraging scientists to focus on narrowly defined questions that can be posed as testable hypotheses. For example, hypothesis-driven approaches are particularly helpful for choosing between alternative mechanisms that could explain an observed phenomenon (e.g., through a controlled experiment), but they are much less helpful for mapping out new areas of inquiry (e.g., the sequence of the human genome), identifying important relationships among many different variables, or studying complex systems. According to those who accept an iterative account of scientific methods, attempting to draw a sharp distinction between hypothesis-driven and data-intensive science is misleading; these modes of research are not in fact orthogonal and often intertwine in actual scientific practice (e.g., O'Malley et al. 2009 , Elliott 2012 , Peters DPC et al. 2014).

Unfortunately, the historical and philosophical literature on iterative scientific methods has not been well integrated into recent accounts of data-intensive research, nor have the implications for evaluating research quality been fully explored. We address both of these gaps by showing how data-intensive research can be conceptualized more effectively using iterative accounts of scientific methods and by showing how these accounts encourage innovative approaches for evaluation. We argue that the key to assessing the appropriateness of data-intensive research—and, indeed, any scientific practice—is to evaluate how it is situated within broader research practices. Scientific practices should be evaluated on the basis of the significance of the knowledge gap that they address and the alignment between the nature of the gap and the approach or combination of approaches used to address it. In order to better reflect scientific practices and to accommodate all scientific approaches, including data-intensive ones, we point out recent changes and propose additional reforms in the spheres of funding, publishing, and education.

Debates over scientific methods

Contemporary debates over data-intensive methods are merely the latest episode in a long-standing conflict over the proper roles of hypotheses in scientific research. In the seventeenth century, figures such as Robert Boyle and Robert Hooke espoused the use of hypotheses, whereas Francis Bacon and Isaac Newton argued that investigators could easily be led astray if they proposed bold conjectures rather than working inductively from the available evidence (Laudan 1981 , Glass 2014 ). These examples illustrate the long history during which hypothesis-driven science has waxed and waned in popularity (figure ​ (figure2; 2 ; Laudan 1981 ). Most scientists did not favor the use of hypotheses during the eighteenth century, but this perspective changed dramatically over the next 100 years (Laudan 1981 ). By the late nineteenth century, largely descriptive disciplines such as natural history were beginning to be dismissed as a form of “stamp collecting” (Johnson 2007 ). Popper's (1963) emphasis on the hypothetico-deductive (H-D) method proved hugely influential during the twentieth century, and most textbooks continue to focus on hypothesis testing as the core of the scientific method (see figure ​ figure1; 1 ; Harwood 2004 ). Although some scientists, publishers, and funders have remained loyal to a Popper-informed account of the scientific method that privileges hypothesis-driven research, many today are questioning this focus and mirroring the methodological debates embodied in previous time periods (Hilborn and Mangel 1997, Kell and Oliver 2004 , Glass and Hall 2008 , Peters DPC et al. 2014).

An external file that holds a picture, illustration, etc.
Object name is biw115fig2.jpg

A depiction of the waxing and waning of hypothesis-driven approaches.

In particular, despite the huge potential for new data-intensive methodologies to generate knowledge (King 2011 ), the advent of these techniques has raised questions about the appropriate relationships between hypothesis-driven and observationally driven modes of investigation (Kell and Oliver 2004 , Beard and Kushmerick 2009 ). Again, historians of science have shown that this debate is not a new one and that scientists have struggled for centuries with storing, analyzing, and standardizing large quantities of data (Muller-Wille and Charmantier 2012 ). Nevertheless, contemporary data-intensive science raises additional issues because of its extensive use of statistical and computer science methodologies and interdisciplinary teams (Strasser 2012), thereby adding further dimensions to debates about appropriate scientific methods.

A richer account of scientific practice

Many concerns about data-intensive research can be addressed by defining scientific practice more broadly (figure ​ (figure3), 3 ), as has been argued in recent historical and philosophical studies of scientific methods. Taking this view, the fundamental goal of science is to address gaps or challenges facing our current state of knowledge. Hypothesis testing is one approach for filling these knowledge gaps, but science proceeds in other ways as well (Chang 2004 , Franklin 2005, O'Malley et al. 2009 , Elliott 2012 , O'Malley and Soyer 2012 ). Scientists attempt to answer research questions with observations, field studies, or integrated databases (Leonelli 2014 ); they engage in exploratory inquiry or modeling exercises to detect patterns in available data (Steinle 1997 , Burian 2007 , Elliott 2007 , Winsberg 2010 , Katzav and Parker 2015 ); or they create new tools, techniques, and methods (Baird 2004 , O'Malley et al. 2010 )—all of which in turn enable them to test hypotheses, answer questions, or gather additional data more effectively.

An external file that holds a picture, illustration, etc.
Object name is biw115fig3.jpg

A representation of scientific practice as an iterative process, with many approaches and links (as depicted by two-way arrows). The evaluation or assessment of scientific practices is based on the importance of the knowledge generated, the importance of the gap or challenge addressed, and the alignment of the approaches and methods used to conduct the science.

This multiplicity of different research approaches is not new, but it has become even more prominent in contemporary data-intensive research. Historically, it was often most efficient for scientists to work from hypotheses that guided their inquiry in the most promising directions. But with the advent of high-throughput technologies and data-mining techniques that make data less expensive to generate and analyze, other approaches that are more inductive also play a fruitful role in scientific research (Franklin 2005, Servick 2015 ). Broad hypotheses or background assumptions may still provide guidance about what sorts of questions or exploratory inquiries are likely to be most fruitful, but these are not the sorts of specific hypotheses envisioned by most hypothesis-driven accounts of scientific method (Franklin 2005, Leonelli 2012 , Ratti 2015 ). Because it is difficult (often impossible) for an individual scientist to become an expert in all of these contemporary approaches and methods, good science also incorporates the most appropriate disciplines and collaborators, thus making the development of effective—and often interdisciplinary—scientific teams more essential than in the past, and the resulting research reflects a combination of methods originating from multiple disciplines (Cheruvelil et al. 2014, NRC 2015).

An important feature of the scientific methods illustrated in figure ​ figure3 3 is that they are often employed in an iterative fashion in order to address complex research challenges (Chang 2004 , O'Malley et al. 2010 , Elliott 2012 , Leonelli 2012 ). Although some contemporary data-intensive research focuses primarily on the repeated use of inductive methods and machine-learning algorithms (Evans and Rzhetsky 2010 , Lazer et al. 2014 , Pietsch 2016 ), much of it involves a combination of different approaches. O'Malley and colleagues (2010) argued that not only data-intensive research but also scientific practice as a whole should be characterized as an iterative interplay between at least four different modes of research: hypothesis-driven, question-driven, exploratory, and tool- and method-oriented. As inquiry proceeds, initial questions are specified, whereas others are revised or give rise to new lines of research. In an effort to address these questions, new equipment and techniques are often developed and tested, frequently generating new questions and altering old ones. In the course of investigating questions and developing new techniques, exploratory approaches are often central (O'Malley et al. 2010 ). These exploratory efforts, which can include experimentation, data mining, and simulation modeling, often involve the systematic variation of experimental parameters or analysis of datasets in search of important regularities and patterns (Elliott 2007 , Winsberg 2010 ). In many cases, this web of activities generates the sorts of tightly constrained contexts in which specific hypotheses can be fruitfully tested, but this may be just one component of a much broader scientific context. In fact, the methodological iteration between different approaches results in a process of epistemic iteration by which our knowledge is gradually altered and improved (Elliott 2012 ), as is depicted by the two-way arrows in figure ​ figure3 3 that highlight the links among knowledge, motivation, and the multiple approaches employed by scientists.

One of the primary lessons to be learned from the iterative model of scientific methods is that contemporary research, and especially data-intensive research, incorporates a wide variety of different approaches, which gain their significance primarily from their roles in broader research programs and lines of inquiry. Therefore, evaluating the quality of this work requires much more than looking to confirm that it incorporates a well-formulated hypothesis (Kell and Oliver 2004 , Beard and Kushmerick 2009 ). Instead, it should be evaluated on the basis of the alignment between the nature of the knowledge gap or challenge addressed and the combination of approaches or methods used to address the gap. Research should be evaluated favorably if it incorporates approaches and methods that are well-suited for addressing an important gap in current knowledge, even if they do not focus solely or primarily on hypothesis testing (figure ​ (figure3 3 ).

An iterative model of scientific practice alleviates many common concerns about data-intensive research. The potential for generating spurious correlations becomes less serious when data-generated patterns are identified and evaluated as part of larger research projects that incorporate broader research questions, hypotheses, or objectives and when appropriate techniques and inferences are used to deal with spurious correlations (Hand 1998 ). These projects are also frequently embedded within conceptual frameworks or theories that facilitate the investigation of underlying causal mechanisms. Some proponents of data-intensive science argue that it can largely replace hypothesis testing, focusing on generating correlations rather than seeking causal understanding (Prensky 2009 , Steadman 2013 ). In contrast, we contend that data-intensive science will typically be most fruitful when it is part of broader inquiries that guide the collection and interpretation of data and that provide additional investigations of the correlations that are generated (Leonelli 2012 , Kitchin 2014 ). Finally, the worry that individual researchers do not have the skill sets to perform data-intensive work can be alleviated by the development of interdisciplinary research teams that can accomplish the iterative tasks required for many contemporary scientific research projects. Admittedly, data-intensive methods can still be used inappropriately, such as when data are collected without standard approaches or quality metadata or when data are simply mined for correlative relationships without attention to spurious correlations (Hand 1998 ). However, we argue that this is a matter of improper technique or a poorly designed research program, which can occur in any form of scientific practice; it is not a problem inherent in data-intensive methods themselves.

Examples of iterative data-intensive research practices

The interplay between multiple research approaches can be observed across many scientific subdisciplines and time periods. To illustrate, we present two examples drawn from the natural sciences. The first example highlights the historical nature of these debates concerning scientific methods (the study of evolution by natural selection; figure ​ figure4a). 4a ). It shows that even though contemporary data-intensive approaches have unique characteristics, historical research also incorporated iterative and data-intensive components. The second example highlights how methods from contemporary data-intensive ecology are being used to better understand broad-scale ecological research questions and environmental problems (the study of macrosystems ecology; figure ​ figure4b). 4b ). It also illustrates how contemporary data-intensive research incorporates greater use of computational approaches and interdisciplinary teams than did historical data-intensive research.

An external file that holds a picture, illustration, etc.
Object name is biw115fig4.jpg

Two examples of iterative scientific efforts using multiple approaches.

The historical study of evolution by natural selection

Darwin's development of the theory of natural selection provides a classic example of research that incorporates multiple approaches. Despite the efforts of some commentators to reconstruct Darwin's research as primarily hypothesis-driven (Ayala 2009), he spent more than two decades performing exploratory work in an effort to identify the patterns that he later explained in The Origin of Species . Driven by curiosity and a naturalist's love for nature, as well as a structured observational agenda that he learned from scholars like Humboldt, Cuvier, and Lyell, Darwin's observations during his famous voyage aboard the Beagle generated questions that guided his inductive data collection over subsequent decades. During that time, he drew upon a wide range of methods and sources (Hodge 1983), including data produced by fellow members of the traditional scientific elite and countless women and other so-called amateurs practicing science outside of the scientific societies and journals of the nineteenth century. In the Origin , for instance, Darwin cites animal breeders as an important source of data, and in Expression of Emotions , mothers provided observations of their own children to supplement those made by Darwin of his own family (Harvey 2009, Montgomery 2012 ).

Darwin's use of natural history methods led Frank Gannon to write a tongue-in-cheek editorial pointing out that in today's funding structure Darwin's work would be dismissed as “an open-ended ‘fishing expedition’” (Gannon 2009 ). However, Darwin also engaged in experiments that showed how his theory of evolution could explain the details of sexual form in plant species (Bellon 2013 ). His combination of methods and compilation of data from a variety of sources proved to be extremely fruitful, and works such as Origin (1859), The Variation of Animals and Plants under Domestication (1868), The Descent of Man (1871), and Expression of Emotions (1872) all embody a blend of what are now often held up as distinct approaches: inductive and deductive methods, observation and experiment.

Even in Darwin's own time, he was forced to consciously navigate scientific norms when considering how to present his multi-modal research. For example, following nineteenth-century philosophers of science such as William Whewell and John F. W. Herschel, Darwin organized the Origin to conform to the scientific values of the day—namely, demonstrating the strength of a theory by the breadth of facts it explained (Ruse 1975 ). Arguing from analogy, as Whewell recommended, Darwin began by recognizing an uncontested phenomenon—that artificial selection quickly resulted in drastic structural changes in domestic breeding of animals such as pigeons—and used this accepted truth to compel the reader to accept his inference that natural selection accounted for species changes.

Darwin's use of both inductive and deductive methods also followed Whewell's methodological recommendations. In contrast with more recent accounts of hypothesis-driven science, Whewell insisted that scientists should move through a very gradual inductive process to arrive at successively more general causal laws (Snyder 1999 ). Only after performing this inductive process did he think that scientists could legitimately move on to test these hypotheses. Thus, Whewell himself encouraged the use of a combination of research modes, and this is reflected in Darwin's works. Philosophers of science have since debated the extent to which Darwin was influenced by different methodologists (including Francis Bacon and John Stuart Mill, as well as Whewell and Herschel) and precisely when Darwin switched from an inductive to a deductive approach during the 20-plus years of gestation of the Origin (Ruse 1975 , Hodge 1991 ). Regardless of the exact year when this switch occurred, it is clear that scientists today—like Darwin—often move back and forth between the best aspects of both inductive and deductive logic when formulating and testing a theory. Similarly—and again like Darwin—scientists also often blend laboratory- and field-work, observation and experiment, and data from multiple sources rather than conforming to artificially distinct modes of scientific practice that are sometimes held up as “traditional” to a particular field of science, despite the long history of a multimodal reality.

The contemporary study of macrosystems ecology

A contemporary example of data-intensive research that involves multiple and iterative approaches comes from the emerging subdiscipline of macrosystems ecology (Heffernan et al. 2014). Most traditional ecological research is conducted by studying organisms and their environments at relatively small scales—such as individual species, communities, or ecosystems—using methods such as lab or field experiments, modeling, field surveys, or long-term studies (Carmel et al. 2013). However, environmental changes such as the spread of invasive species, climate change, and land-use intensification are occurring globally, are the result of relationships and interactions between human and natural systems, and may result in widespread but complicated effects. For example, across regions and continents (at the scales of hundreds of kilometers), there are differences in the direction and magnitude of environmental changes, the underlying geophysical and ecological contexts, and social structures. These differences mean that results from fine-scaled studies in some regions are not likely to apply to other regions and that the study of ecological systems at larger scales—such as regions to continents—is required. Macrosystems ecology fills this gap by explicitly studying fine-scaled ecological patterns and processes nested within regions and continents and employing a variety of methods to do so.

Such multiscaled understanding of ecological systems cannot be achieved through an individual hypothesis test or a field experiment, nor can it be achieved by using only one approach (Heffernan et al. 2014, Levy et al. 2014 ). For example, to understand the complex relationships among tree growth, human disturbance, and regional and global climate, scientists need to study forests as a whole using multiple methods within a region rather than at the scale of individual trees or stands (Chapin et al. 2008). One approach that ecologists have used to study ecological systems at regional scales is by quantitatively delineating ecological regions that represent a measured combination of geophysical features thought to influence fine-scaled ecological processes (Cheruvelil et al. 2013). However, existing ecological regions have limitations in that they were created for a variety of purposes, using different underlying geophysical and human data and using a variety of methods.

For example, lake water quality is related to both climate and land use. Therefore, scientists have speculated that lake water quality is likely to strongly respond to changes in both climate and land uses. However, the response of lake water quality to such environmental changes is likely to vary among regions and continents. In fact, Cheruvelil and ­colleagues (2008, 2013) had observed that lake water chemistry varied regionally but that the variation depended on how the boundaries of “regions” were defined. Therefore, they had the overarching goals of developing new ways to define regional boundaries that were based on the geophysical features that are likely important for predicting regional water quality and its response to climate and land-use change (figure ​ (figure4b). 4b ). Meeting these goals required the iterative use of multiple research methodologies, data collected by various individuals and groups, and contributions from multiple disciplines.

An interdisciplinary team was created ( sensu Cheruvelil et al. 2014) that included ecologists, computer scientists, and experts in geospatial analysis and ecoinformatics to create a large, multiscaled database by integrating multiple lake data sources (including field surveys of water quality conducted by state agency scientists, citizen scientists, and university researchers) with geospatial data quantified at the national scale (Soranno et al. 2015). The team used three data-intensive approaches to meet their goal of developing new ecological regions for water quality (figure ​ (figure4b): 4b ): First, they developed and tested a clustering algorithm to define regional boundaries (Yuan et al. 2015); second, they used an exploratory data-mining analysis to determine which geophysical features were correlated with the regional boundaries and might lend insight into the underlying mechanisms driving regional variation in lake water quality (Cheruvelil, Lyman Briggs College and Department of Fisheries and Wildlife, Michigan State University, East Lansing, personal communication, 9 November 2015); and third, they used statistical models to quantify how well the regional boundaries captured variation in lake water quality for thousands of lakes in approximately 100 regions (Cheruvelil, Lyman Briggs College and Department of Fisheries and Wildlife, Michigan State University, East Lansing, personal communication, 9 November 2015). Ecological regions were created with a variety of geophysical features that are related to lake water quality, many of which are expected to be strongly affected by changes in climate and land use. Employing multiple scientific practices, rather than solely a hypothesis-driven approach, improved their ability to use the regional scale for understanding, explaining, and predicting ecological phenomena across spatial scales.

Lessons learned from examples of iterative data-intensive research

Together, these two examples illustrate the major points that we have made in this article. First, they show that although scientists have been working with challenging quantities of data for centuries, contemporary data-intensive science incorporates additional features. For example, whereas Darwin received data from numerous sources, he worked primarily on his own (with input from colleagues) to analyze the data. In contrast, the environmental scientists in the second example worked with computer scientists and experts in ecoinformatics in order to make optimal use of contemporary computational tools for integrating, creating, and analyzing data.

Second, these examples illustrate the power of moving iteratively among multiple research methods. What made both of these research efforts successful is not the fact that they used a particular approach but rather that the approaches they chose were well designed for addressing important knowledge gaps. In Darwin's case, his research was important because he was addressing one of the most fundamental issues in biology—namely, the processes by which species have changed over time. Similarly, the scientists in our second example were addressing the important societal issue of the response of water quality to environmental changes at macroscales. Encouraging scientists to emulate the iterative approaches embodied in these two examples requires the development of richer conceptions of scientific practice.

Recommendations for promoting good science in our data-rich world

A number of reforms should be made to promote not only iterative data-intensive science but also the scientific enterprise more broadly (table ​ (table1). 1 ). First, funding agencies (and reviewers) should evaluate the quality of proposed research not based on a uniform requirement that it states a specific hypothesis but based on the importance of the knowledge gaps that it identifies and the appropriateness of the methods proposed for addressing those gaps (O'Malley et al. 2009 ). For example, some recent funding initiatives are placing emphasis on grand challenges (e.g., the human genome project, brain research, personalized medicine, smart cities) that do not lend themselves to solely hypothesis-based approaches. Therefore, rather than expecting researchers to shoehorn proposals into a misleading, linear research format, reviewers should be open to proposals that describe a more realistic, iterative research trajectory. This reform will require developing appropriate grant guidelines and review mechanisms that encourage mixed modes of scientific practice, such as those recently being used by the US National Institutes of Health to fund investigators rather than individual projects (table ​ (table1 1 ).

Recommendations for promoting iterative data-intensive science.

Components of scienceCurrent normsProposed reformsRecent exemplar of reform
FundingProposals are expected to have an organizing hypothesis.Proposals should be expected to have alignment between knowledge gaps and approaches.Several institutes of the NIH have introduced long-term funding opportunities that allow investigators to pursue more creative, innovative research projects (e.g., and )
Proposals are expected to describe a linear, non-iterative approach.Proposals should be expected to describe appropriate iterative use of multiple approaches.The Biotechnology and Biological Sciences Research Council of the UK describes multiple methods that are integrated into the systems-biology research it funds ( ).
PublishingArticles are expected to be structured to embody a hypothesis-testing approach.Articles should be structured to convey the alignment between the identified knowledge gaps and the approaches used.A new journal, Limnology and Oceanography Letters, requires an explicit statement by the authors of the knowledge gaps filled by the study ( ).
The components of iterative research are difficult to publish on their own (e.g., exploratory analysis, data, methods, code).Articles focused on any aspect of iterative research should be publishable based on contribution to knowledge, data, or methods developmentRecent advent of outlets for a broad range of research products, such as data journals (e.g., Earth System Science Data, Scientific Data, GigaScience, Biodiversity Data Journal), online code repositories (e.g., GitHub, BitBucket), and online data repositories (e.g., FigShare, Dryad, TreeBASE)
Education (K–12, undergraduate, and graduate)Students are taught mainly about hypothesis testing.Students should be taught multiple scientific methods and to choose approaches that best align with knowledge gaps.Reformed teaching approaches, such as authentic science labs (e.g., Luckie et al. , Harwood ) and teaching with case studies (e.g., , , White et al. ).
Students are taught linear, non-iterative scientific methods.Students should be taught an iterative account of scientific methods.Dissemination of nonlinear accounts of scientific methods (e.g., )

Second, rather than expecting articles to be structured to embody a linear hypothesis-testing approach, journal editors and reviewers should be open to publications that are organized around the full range of methods used to address knowledge gaps. Allowing journal articles and other research products to take a greater variety of forms will help alleviate the discrepancies that a number of authors have identified between the structure of scientific articles and the actual practice of research (e.g., Medawar 1996 , Schickore 2008 ). Some journals and online repositories are providing guidelines and mechanisms for scientists to disseminate data and computer code, and the science community as a whole is discussing ways to give scientists credit for a variety of research products that will help advance a broader view of scientific practices (e.g., Goring et al. 2014 ; see also table ​ table1 1 ).

Third, whereas K–12 through graduate science education currently emphasizes a linear, hypothesis-driven approach to science, it should be reformed to incorporate more complex models of the scientific method. For example, students should be taught that hypothesis testing is just one important component of a much broader landscape of scientific activities that need to be combined in creative and interdisciplinary ways to move science forward (Harwood 2004 ). Including the history, philosophy, and sociology of science in science curricula; teaching science in interdisciplinary ways; and using reformed teaching methods in science courses (e.g., inquiry-based labs, case studies) can introduce students to the multiple methods scientists have historically used—and continue to use—to address significant knowledge gaps (table ​ (table1 1 ).


The recognition that data-intensive research methods—and indeed, research practices in all areas of science—need to be evaluated as part of broader research programs does much to alleviate common concerns about these and other non-hypothesis-driven methods. Although data-intensive and exploratory efforts to identify patterns in large datasets have the potential to generate spurious results, all methods have their potential problems when used poorly; when used properly, such data-intensive approaches can play a very fruitful role in broader research programs that also test hypothesized processes and mechanisms. The iterative research methods that we have described in this article allow researchers to address more complex questions than they could with hypothesis testing alone. To make these efforts successful, changes are needed in the norms for research funding, publication, and education. In all these areas, more emphasis should be placed on aligning research methods with the knowledge gaps that need to be addressed rather than focusing primarily on hypothesis testing. In addition, scientific practice should be more explicitly recognized as an iterative path through multiple approaches rather than as a linear process of moving through pre-defined steps. Of course, this does not mean that “anything goes”; rather, it facilitates more careful thought about how to fund, publish, and teach the right combinations of methods that will enable the scientific community to tackle the big issues confronting society today.


Funding for this work was provided by the Science + Society @ State program at Michigan State University to all authors; the US National Science Foundation's Macrosystems Biology Program (no. EF-1065786) to PAS and KSC; and the USDA National Institute of Food and Agriculture, Hatch Project no. 176820 to PAS.

