Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What is Secondary Research? | Definition, Types, & Examples

What is Secondary Research? | Definition, Types, & Examples

Published on January 20, 2023 by Tegan George . Revised on January 12, 2024.

Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research .

Secondary research can be qualitative or quantitative in nature. It often uses data gathered from published peer-reviewed papers, meta-analyses, or government or private sector databases and datasets.

Table of contents

When to use secondary research, types of secondary research, examples of secondary research, advantages and disadvantages of secondary research, other interesting articles, frequently asked questions.

Secondary research is a very common research method, used in lieu of collecting your own primary data. It is often used in research designs or as a way to start your research process if you plan to conduct primary research later on.

Since it is often inexpensive or free to access, secondary research is a low-stakes way to determine if further primary research is needed, as gaps in secondary research are a strong indication that primary research is necessary. For this reason, while secondary research can theoretically be exploratory or explanatory in nature, it is usually explanatory: aiming to explain the causes and consequences of a well-defined problem.

Prevent plagiarism. Run a free check.

Secondary research can take many forms, but the most common types are:

Statistical analysis

Literature reviews, case studies, content analysis.

There is ample data available online from a variety of sources, often in the form of datasets. These datasets are often open-source or downloadable at a low cost, and are ideal for conducting statistical analyses such as hypothesis testing or regression analysis .

Credible sources for existing data include:

  • The government
  • Government agencies
  • Non-governmental organizations
  • Educational institutions
  • Businesses or consultancies
  • Libraries or archives
  • Newspapers, academic journals, or magazines

A literature review is a survey of preexisting scholarly sources on your topic. It provides an overview of current knowledge, allowing you to identify relevant themes, debates, and gaps in the research you analyze. You can later apply these to your own work, or use them as a jumping-off point to conduct primary research of your own.

Structured much like a regular academic paper (with a clear introduction, body, and conclusion), a literature review is a great way to evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

A case study is a detailed study of a specific subject. It is usually qualitative in nature and can focus on  a person, group, place, event, organization, or phenomenon. A case study is a great way to utilize existing research to gain concrete, contextual, and in-depth knowledge about your real-world subject.

You can choose to focus on just one complex case, exploring a single subject in great detail, or examine multiple cases if you’d prefer to compare different aspects of your topic. Preexisting interviews , observational studies , or other sources of primary data make for great case studies.

Content analysis is a research method that studies patterns in recorded communication by utilizing existing texts. It can be either quantitative or qualitative in nature, depending on whether you choose to analyze countable or measurable patterns, or more interpretive ones. Content analysis is popular in communication studies, but it is also widely used in historical analysis, anthropology, and psychology to make more semantic qualitative inferences.

Primary Research and Secondary Research

Secondary research is a broad research approach that can be pursued any way you’d like. Here are a few examples of different ways you can use secondary research to explore your research topic .

Secondary research is a very common research approach, but has distinct advantages and disadvantages.

Advantages of secondary research

Advantages include:

  • Secondary data is very easy to source and readily available .
  • It is also often free or accessible through your educational institution’s library or network, making it much cheaper to conduct than primary research .
  • As you are relying on research that already exists, conducting secondary research is much less time consuming than primary research. Since your timeline is so much shorter, your research can be ready to publish sooner.
  • Using data from others allows you to show reproducibility and replicability , bolstering prior research and situating your own work within your field.

Disadvantages of secondary research

Disadvantages include:

  • Ease of access does not signify credibility . It’s important to be aware that secondary research is not always reliable , and can often be out of date. It’s critical to analyze any data you’re thinking of using prior to getting started, using a method like the CRAAP test .
  • Secondary research often relies on primary research already conducted. If this original research is biased in any way, those research biases could creep into the secondary results.

Many researchers using the same secondary research to form similar conclusions can also take away from the uniqueness and reliability of your research. Many datasets become “kitchen-sink” models, where too many variables are added in an attempt to draw increasingly niche conclusions from overused data . Data cleansing may be necessary to test the quality of the research.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Sources in this article

We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.

George, T. (2024, January 12). What is Secondary Research? | Definition, Types, & Examples. Scribbr. Retrieved June 11, 2024, from https://www.scribbr.com/methodology/secondary-research/
Largan, C., & Morris, T. M. (2019). Qualitative Secondary Research: A Step-By-Step Guide (1st ed.). SAGE Publications Ltd.
Peloquin, D., DiMaio, M., Bierer, B., & Barnes, M. (2020). Disruptive and avoidable: GDPR challenges to secondary research uses of data. European Journal of Human Genetics , 28 (6), 697–705. https://doi.org/10.1038/s41431-020-0596-x

Is this article helpful?

Tegan George

Tegan George

Other students also liked, primary research | definition, types, & examples, how to write a literature review | guide, examples, & templates, what is a case study | definition, examples & methods, what is your plagiarism score.

  • Research article
  • Open access
  • Published: 27 September 2018

Primary versus secondary source of data in observational studies and heterogeneity in meta-analyses of drug effects: a survey of major medical journals

  • Guillermo Prada-Ramallal 1 , 2 ,
  • Fatima Roque 3 , 4 ,
  • Maria Teresa Herdeiro 5 , 6 ,
  • Bahi Takkouche 1 , 2 , 7 &
  • Adolfo Figueiras   ORCID: orcid.org/0000-0002-5766-8672 1 , 2 , 7  

BMC Medical Research Methodology volume  18 , Article number:  97 ( 2018 ) Cite this article

20k Accesses

8 Citations

6 Altmetric

Metrics details

The data from individual observational studies included in meta-analyses of drug effects are collected either from ad hoc methods (i.e. “primary data”) or databases that were established for non-research purposes (i.e. “secondary data”). The use of secondary sources may be prone to measurement bias and confounding due to over-the-counter and out-of-pocket drug consumption, or non-adherence to treatment. In fact, it has been noted that failing to consider the origin of the data as a potential cause of heterogeneity may change the conclusions of a meta-analysis. We aimed to assess to what extent the origin of data is explored as a source of heterogeneity in meta-analyses of observational studies.

We searched for meta-analyses of drugs effects published between 2012 and 2018 in general and internal medicine journals with an impact factor > 15. We evaluated, when reported, the type of data source (primary vs secondary) used in the individual observational studies included in each meta-analysis, and the exposure- and outcome-related variables included in sensitivity, subgroup or meta-regression analyses.

We found 217 articles, 23 of which fulfilled our eligibility criteria. Eight meta-analyses (8/23, 34.8%) reported the source of data. Three meta-analyses (3/23, 13.0%) included the method of outcome assessment as a variable in the analysis of heterogeneity, and only one compared and discussed the results considering the different sources of data (primary vs secondary).

Conclusions

In meta-analyses of drug effects published in seven high impact general medicine journals, the origin of the data, either primary or secondary, is underexplored as a source of heterogeneity.

Peer Review reports

Specific research questions are ideally answered through tailor-made studies. Although these ad hoc studies provide more accurate and updated data, designing a completely new project may not represent a feasible strategy [ 1 , 2 ]. On the other hand, clinical and administrative databases used for billing and other fiscal purposes (i.e. “secondary data”) are a valuable resource as an alternative to ad hoc methods (i.e. “primary data”) since it is easier and less costly to reuse the information than collecting it anew [ 3 ]. The potential of secondary automated databases for observational epidemiological studies is widely acknowledged; however, their use is not without challenges, and many quality requirements and methodological pitfalls must be considered [ 4 ].

Meta-analysis represents one of the most valuable tools for assessing drug effects as it may lead to the best evidence possible in epidemiology [ 5 ]. Consequently, its use for making relevant clinical and regulatory decisions on the safety and efficacy of drugs is dramatically increasing [ 6 ]. Existence of heterogeneity in a given meta-analysis is a feature that needs to be carefully described by analyzing the possible factors responsible for generating it [ 7 ]. In this regard, the results of a recent study [ 8 ] show that whether the origin of the data (primary vs secondary) is explored as a potential cause of heterogeneity may change the conclusions of a meta-analysis due to an effect modification [ 9 ]. Thus, considering the source of data as a variable in sensitivity and subgroup analyses, or meta-regression analyses, seems crucial to avoid misleading conclusions in meta-analyses of drug effects.

Given the evidence noted [ 8 , 9 ], we surveyed published meta-analyses in a selection of high-impact journals over a 6-year period, to assess to what extent the origin of the data, either primary or secondary, is explored as a source of heterogeneity in meta-analyses of observational studies.

Meta-analysis selection and data collection process

General and internal medicine journals with an impact factor > 15 according to the Web of Science were included in the survey [ 10 ]. This method has been widely used to assess quality as well as publication trends in medical journals [ 11 , 12 , 13 ]. The rationale is that meta-analyses published in high impact journals: (1) are likely to be rigorously performed and reported due to the exhaustive editorial process [ 12 , 14 ]; and, (2) in general, exert a higher influence on medical practice due to the major role played by these journals in the dissemination of the new medical evidence [ 14 , 15 ]. We searched MEDLINE on May 2018 using the search terms “meta-analysis” as publication type and “drug” in any field between January 1, 2012 and May 7, 2018 in the New England Journal of Medicine ( NEJM ), Lancet, Journal of the American Medical Association ( JAMA) , British Medical Journal ( BMJ ), JAMA Internal Medicine (JAMA Intern Med) , Annals of Internal Medicine ( Ann Intern Med ), and Nature Reviews Disease Primers (Nat Rev Dis Primers) .

Two investigators (GP-R, FR) independently assessed publications for eligibility. Abstracts were screened and if deemed potentially relevant, full text articles were retrieved. Articles were excluded if they met any of the following conditions: (1) were not a meta-analysis of published studies, (2) no drug effects were evaluated, (3) only randomized clinical trials were included in the meta-analysis (in order to consider observational studies), (4) less than two observational studies were included in the meta-analysis (since with a single study it would not have been possible to calculate a pooled measure). When a meta-analysis included both observational studies and clinical trials, only observational studies were considered.

A data extraction form was developed previously to extract information from articles. Two investigators (GP-R, FR) independently extracted and recorded the information and resolved discrepancies by referring to the original report. If necessary, a third author (AF) was asked to resolve disagreements between the investigators.

When available we extracted the following data from each eligible meta-analysis: first author, publication year, journal, drug(s) exposure and outcome(s); number of individual studies included in the meta-analysis based on each type of data source used (primary vs secondary), for both exposure and outcome assessment; and exposure- and outcome-related variables included in sensitivity, subgroup or meta-regression analyses. We extracted data directly from the tables, figures, text, and supplementary material of the meta-analyses, not from the individual studies.

Assessment of exposure and outcome

We considered “primary data” the information on drug exposure collected directly by the researchers using interviews –personal or by telephone– or self-administered questionnaires. The origin of the data was also considered primary when objective diagnostic methods were used for the determination of drug exposure (e.g. blood test). “Secondary data” are data that were formerly collected for other purposes than that of the study at hand and that were included in databases on drug prescription (e.g. prescription registers, medical records/charts) and dispensing (e.g. computerized pharmacy records, insurance claims databases). Regarding the outcome assessment, we considered primary data when an objective confirmation is available that endorses them (e.g. confirmed by individual medical ad hoc diagnosis, lab test or imaging results). These criteria are based on those commonly used in the risk assessment of bias for observational studies [ 16 , 17 , 18 , 19 ].

MEDLINE search results yielded 217 articles from the major general medical journals (3 from NEJM , 46 from Lancet , 26 from JAMA , 85 from BMJ , 19 from JAMA Intern Med, 38 from Ann Intern Med, and 0 from Nat Rev Dis Primers ) (see Fig. 1 ). A total of 194 articles were excluded (see list of excluded articles with reasons for exclusion in Additional file 1 ) leaving 23 articles to be examined [ 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 ]. General characteristics of the 23 included meta-analyses are outlined in Table 1 .

figure 1

Flow diagram of literature search results

Source of exposure and outcome data

Table 2 summarizes the evidence regarding the type of data source included in each meta-analysis, according to the information presented in the data extraction tables of the article. The information was evaluated taking the study design into account. Only eight meta-analyses [ 21 , 24 , 26 , 31 , 32 , 34 , 38 , 41 ] reported the source of data, three of them [ 31 , 34 , 38 ] reporting mixed sources for both the exposure and outcome assessment. Five meta-analyses [ 21 , 24 , 26 , 32 , 41 ] reported only secondary sources for the exposure assessment, three of them [ 21 , 24 , 41 ] reporting as well only secondary sources for the outcome assessment, while in the other two [ 26 , 32 ] only primary and mixed sources for the outcome assessment were reported respectively.

Source of data in the analysis of heterogeneity

All but two [ 20 , 42 ] of the meta-analyses performed subgroup and/or sensitivity analyses. Although three of them [ 23 , 34 , 36 ] considered the methods of outcome assessment – type of diagnostic assay used for Clostridium difficile infection, method of venous thrombosis diagnosis confirmation, and type of scale for psychosis symptoms assessment respectively– as stratification variables, only the second referred to the origin of the data. Only five meta-analyses [ 22 , 28 , 33 , 35 , 39 ] included meta-regression analyses to describe heterogeneity, none of which considered the source of data as an explanatory variable. Other findings for the inclusion of the data source as a variable in the analysis of heterogeneity are presented in Table 3 .

We finally assessed if the influence of the data origin on the conclusions of the meta-analyses was discussed by their respective authors. We found that only four meta-analyses [ 21 , 31 , 32 , 34 ] noted limitations derived from the type of data source used.

The findings of this research suggest that the origin of the data, either primary or secondary, is underexplored as a source of heterogeneity and an effect modifier in meta-analyses of drug effects published in general medicine journals with high impact. Few meta-analyses reported the source of data and only one [ 34 ] of the articles included in our survey compared and discussed the meta-analysis results considering the different sources of data.

Although it is usual to consider the design of the individual studies (i.e. case-control, cohort or experimental studies) in the analysis of the heterogeneity of a meta-analysis [ 43 , 44 ], the type of data source (primary vs secondary) is still rarely used for this purpose [ 9 , 45 ]. In fact, the current reporting guidelines for meta-analyses, such as MOOSE (Meta-analysis Of Observational Studies in Epidemiology) [ 18 ] or PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) [ 46 , 47 ], do not recommend that authors specifically report the origin of the data. This is probably due to the close relationship that exists between the study design and the type of data source used, despite the fact that each criterion has its own basis. Performing this additional analysis is a simple task that involves no additional cost. Failure to do so may lead to diverging conclusions [ 8 ].

Conclusions about the effects of a drug that are derived from studies based exclusively on data from secondary sources may be dicey, among other reasons, because no information is collected on consumption of over-the-counter drugs (i.e. drugs that individuals can buy without a prescription) [ 48 ] and/or out-of-pocket expenses for prescription drugs (i.e. costs that individuals pay out of their own cash reserves) [ 49 ]. In the health care and insurance context, out-of-pocket expenses usually refer to deductibles, co-payments or co-insurance. Figure 2 shows the model that we propose to describe the relationship between the different data records according to their origin, including the possible loss of information (susceptible to be registered only through primary research).

figure 2

Conceptual model of individual data recording. * Never dispensed. † Absence of dispensing of successive prescriptions (or self-medication) among patients with primary adherence, or inadequate secondary adherence

Failure to take these situations into account may lead to exposure measurement bias [ 48 , 49 ]. Consumption of a drug may be underestimated when only prescription data is used as secondary source without additionally considering unregistered consumption, such as over-the-counter consumption (e.g. oral contraceptives [ 34 , 50 ]), that may only be available from a primary database. Alternatively, this may occur when dispensing data for billing purposes (reimbursement) are used for clinical research, if out-of-pocket expenses are not considered (see Fig. 2 ). The portion of the medical bill that the insurance company does not cover, and that the individual must pay on his own, is unlikely to be recorded. Data on the sale of over-the-counter drugs will also not be available in this scenario.

The reverse situation may also occur and consumption may be overestimated when only prescription data is used, if the prescribed drug is not dispensed by the pharmacist; or when dispensing data is used, if the drug is not really consumed by the patient. While primary non-adherence occurs when the patient does not pick up the medication after the first prescription, secondary non-adherence refers to the absence of dispensing of successive prescriptions among patients with primary adherence, or to inadequate secondary adherence (i.e. ≥20% of time without adequate medication) [ 51 ] (see Fig. 2 ). In some diseases the medication adherence is very low [ 52 , 53 , 54 , 55 ], with percentages of primary non-adherence (never dispensed) that exceed 30% [ 56 ]. It should be noted that the impact of non-adherence varies from medication to medication. Therefore, it must be defined and measured in the context of a particular therapy [ 57 ].

Moreover, failing to take into consideration the portion of consumption due to over-the-counter and/or out-of-pocket expenses may lead to confounding , as that variable may be related to the socio-economic level and/or to the potential of access to the health system [ 58 ], which are independent risk factors of adverse outcomes of some medications (e.g. myocardial infarction [ 21 , 28 , 30 , 41 ]). Given the presence of high-deductible health plans and the high co-insurance rate for some drugs, cost-sharing may deter clinically vulnerable patients from initiating essential medications, thus negatively affecting patient adherence [ 59 , 60 ].

Outcome misclassification may also give rise to measurement bias and heterogeneity [ 61 ]. This occurs, for example, in the meta-analysis that evaluates the relationship between combined oral contraceptives and the risk of venous thrombosis [ 34 ]. In the studies without objective confirmation of the outcome, the women were classified erroneously regardless of the use of contraceptives. This led to a non-differential misclassification that may have underestimated the drug–outcome relationship, especially when the third generation of progestogen is analysed: Risk ratio (RR) primary data = 6.2 (95% confidence interval (CI) 5.2–7.4), RR secondary data = 3.0 (95% CI 1.7–5.4) [ 34 ].

On the one hand, medical records are often considered as being the best information source for outcome variables. However, they present important limitations in the recording of medications taken by patients [ 62 ]. On the other hand, dispensing records show more detailed data on the measurement of drug exposure. However, they do not record the over-the-counter or out-of-pocket drug consumption at an individual level [ 48 , 49 ], apart from offering unreliable data on outcome variables [ 62 , 63 ].

Limitations

The first limitation of this research is that its findings may not be applicable to journals not included in our survey such as journals with low impact factor. Despite the widespread use of the impact factor metric [ 64 ], this method has inherent weaknesses [ 65 , 66 ]. However, meta-analyses published in high impact general medicine journals are likely to be most rigorously performed and reported due to their greater availability of resources and procedures [ 12 , 14 ]. It is then expected that the overall reporting quality of articles published in other lesser-known journals will be similar. Another limitation would be related to the limited search period . In this sense, and given that the general tendency is the improvement of the methodology of published meta-analyses [ 67 , 68 ], we find no reason to suspect that the adverse conclusions could be different before the period from 2012 to 2018. Although it exceeds the objective of this research, one last limitation may be the inability to reanalyse the included meta-analyses stratifying by the type of data source since our study design restricts the conclusions to the published data of the meta-analyses, which were insufficiently reported , or the number of individual studies in each stratum was insufficient to calculate a pooled measure (see Table 2 ).

Owing to automated capture of data on drug prescription and dispensing that are used for billing and other administration purposes, as well as to the implementation of electronic medical records, secondary databases have generated enormous possibilities. However, neither their limitations, nor the risk of bias that they pose should be overlooked [ 69 ]. Thus, researchers should consider the link between administrative databases and medical records, as well as the advisability of combining secondary and primary data in order to minimize the occurrence of biases due to the use of any of these databases.

No source of heterogeneity in a meta-analysis should ever be considered alone but always as part of an interconnected set of potential questions to be addressed. In particular, the origin of the data, either primary or secondary, is insufficiently explored as a source of heterogeneity in meta-analyses of drug effects, even in those published in high impact general medicine journals. Thus, we believe that authors should systematically include the source of data as an additional variable in subgroup and sensitivity analyses, or meta-regression analyses, and discuss its influence on the meta-analysis results. Likewise, reviewers, editors and future guidelines should also consider the origin of the data as a potential cause of heterogeneity in meta-analyses of observational studies that include both primary and secondary data. Failure to do this may lead to misleading conclusions, with negative effects on clinical and regulatory decisions.

Abbreviations

Annals of Internal Medicine

British Medical Journal

Confidence Interval

JAMA Internal Medicine

Journal of the American Medical Association

Meta-analysis Of Observational Studies in Epidemiology

Nature Reviews Disease Primers

New England Journal of Medicine

Preferred Reporting Items for Systematic reviews and Meta-Analyses

Terris DD, Litaker DG, Koroukian SM. Health state information derived from secondary databases is affected by multiple sources of bias. J Clin Epidemiol. 2007;60:734–41.

Article   PubMed Central   PubMed   Google Scholar  

Schneeweiss S. Understanding secondary databases: a commentary on “sources of bias for health state characteristics in secondary databases”. J Clin Epidemiol. 2007;60:648–50.

Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58:323–37.

Article   PubMed   Google Scholar  

Knottnerus JA, Tugwell P. Requirements for utilizing health care-based data sources for research. J Clin Epidemiol. 2011;64:1051–3.

Berlin JA, Golub RM. Meta-analysis as evidence: building a better pyramid. JAMA. 2014;312:603–6.

Article   CAS   PubMed   Google Scholar  

Blettner M, Schlattmann P. Meta-analysis in epidemiology. In: Ahrens W, Pigeot I, editors. Handbook of epidemiology. Berlin: Springer; 2005. p. 829–59.

Chapter   Google Scholar  

Higgins JPT. Heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol. 2008;37:1158–60.

Prada-Ramallal G, Takkouche B, Figueiras A. Diverging conclusions from the same meta-analysis in drug safety: source of data (primary versus secondary) takes a toll. Drug Saf. 2017;40:351–8.

Madigan D, Ryan PB, Schuemie M, Stang PE, Overhage JM, Hartzema AG, et al. Evaluating the impact of database heterogeneity on observational study results. Am J Epidemiol. 2013;178:645–51.

InCites Journal Citation Reports. Science citation index expanded - medicine. General & Internal Thomson Reuters. https://jcr.incites.thomsonreuters.com . Accessed 10 Sept 2018.

Faggion CM Jr, Bakas NP, Wasiak J. A survey of prevalence of narrative and systematic reviews in five major medical journals. BMC Med Res Methodol. 2017;17:176.

Hopewell S, Ravaud P, Baron G, Boutron I. Effect of editors' implementation of CONSORT guidelines on the reporting of abstracts in high impact medical journals: interrupted time series analysis. BMJ. 2012;344:e4178.

Blanc X, Collet TH, Auer R, Fischer R, Locatelli I, Iriarte P, et al. Publication trends of shared decision making in 15 high impact medical journals: a full-text review with bibliometric analysis. BMC Med Inform Decis Mak. 2014;14:71.

Rehal S, Morris TP, Fielding K, Carpenter JR, Phillips PP. Non-inferiority trials: are they inferior? A systematic review of reporting in major medical journals. BMJ Open. 2016;6:e012594.

Callaham M, Wears RL, Weber E. Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. JAMA. 2002;287:2847–50.

Kim SY, Park JE, Lee YJ, Seo HJ, Sheen SS, Hahn S, et al. Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity. J Clin Epidemiol. 2013;66:408–14.

Wells GA, Shea B, O’Connell D, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Ottawa: Univ of Ottawa; 2009. www.ohri.ca/programs/clinical_epidemiology/oxford.asp . Accessed 10 Sept. 2018

Google Scholar  

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis of observational studies in epidemiology (MOOSE) group. JAMA. 2000;283:2008–12.

von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Epidemiology. 2007;18:800–4.

Weiss J, Freeman M, Low A, Fu R, Kerfoot A, Paynter R, et al. Benefits and harms of intensive blood pressure treatment in adults aged 60 years or older: a systematic review and meta-analysis. Ann Intern Med. 2017;166:419–29.

Bally M, Dendukuri N, Rich B, Nadeau L, Helin-Salmivaara A, Garbe E, et al. Risk of acute myocardial infarction with NSAIDs in real world use: bayesian meta-analysis of individual patient data. BMJ. 2017;357:j1909.

Sordo L, Barrio G, Bravo MJ, Indave BI, Degenhardt L, Wiessing L, et al. Mortality risk during and after opioid substitution treatment: systematic review and meta-analysis of cohort studies. BMJ. 2017;357:j1550.

Tariq R, Singh S, Gupta A, Pardi DS, Khanna S. Association of Gastric Acid Suppression with Recurrent Clostridium difficile infection: a systematic review and meta-analysis. JAMA Intern Med. 2017;177:784–91.

Maruthur NM, Tseng E, Hutfless S, Wilson LM, Suarez-Cuervo C, Berger Z, et al. Diabetes medications as monotherapy or metformin-based combination therapy for type 2 diabetes: a systematic review and meta-analysis. Ann Intern Med. 2016;164:740–51.

Paul S, Saxena A, Terrin N, Viveiros K, Balk EM, Wong JB. Hepatitis B virus reactivation and prophylaxis during solid tumor chemotherapy: a systematic review and meta-analysis. Ann Intern Med. 2016;164:30–40.

Li L, Li S, Deng K, Liu J, Vandvik PO, Zhao P, et al. Dipeptidyl peptidase-4 inhibitors and risk of heart failure in type 2 diabetes: systematic review and meta-analysis of randomised and observational studies. BMJ. 2016;352:i610.

Molnar AO, Fergusson D, Tsampalieros AK, Bennett A, Fergusson N, Ramsay T, et al. Generic immunosuppression in solid organ transplantation: systematic review and meta-analysis. BMJ. 2015;350:h3163.

Ziff OJ, Lane DA, Samra M, Griffith M, Kirchhof P, Lip GY, et al. Safety and efficacy of digoxin: systematic review and meta-analysis of observational and controlled trial data. BMJ. 2015;351 h4451. Erratum in: BMJ 2015;351:h4937.

Collaborative Group On Epidemiological Studies Of Ovarian Cancer, Beral V, Gaitskell K, Hermon C, Moser K, Reeves G, et al. Menopausal hormone use and ovarian cancer risk: individual participant meta-analysis of 52 epidemiological studies. Lancet. 2015;385:1835–42.

Article   PubMed Central   Google Scholar  

Bellemain-Appaix A, Kerneis M, O'Connor SA, Silvain J, Cucherat M, Beygui F, et al. Reappraisal of thienopyridine pretreatment in patients with non-ST elevation acute coronary syndrome: a systematic review and meta-analysis. BMJ. 2014;349:g6269.

Grigoriadis S, Vonderporten EH, Mamisashvili L, Tomlinson G, Dennis CL, Koren G, et al. Prenatal exposure to antidepressants and persistent pulmonary hypertension of the newborn: systematic review and meta-analysis. BMJ. 2014;348:f6932.

Li L, Shen J, Bala MM, Busse JW, Ebrahim S, Vandvik PO, et al. Incretin treatment and risk of pancreatitis in patients with type 2 diabetes mellitus: systematic review and meta-analysis of randomised and non-randomised studies. BMJ. 2014;348:g2366.

Kalil AC, Van Schooneveld TC, Fey PD, Rupp ME. Association between vancomycin minimum inhibitory concentration and mortality among patients with Staphylococcus aureus bloodstream infections: a systematic review and meta-analysis. JAMA. 2014;312:1552–64.

Stegeman BH, de Bastos M, Rosendaal FR, van Hylckama Vlieg A, Helmerhorst FM, Stijnen T, et al. Different combined oral contraceptives and the risk of venous thrombosis: systematic review and network meta-analysis. BMJ. 2013;347:f5298.

Maneiro JR, Salgado E, Gomez-Reino JJ. Immunogenicity of monoclonal antibodies against tumor necrosis factor used in chronic immune-mediated inflammatory conditions: systematic review and meta-analysis. JAMA Intern Med. 2013;173:1416–28.

Hartling L, Abou-Setta AM, Dursun S, Mousavi SS, Pasichnyk D, Newton AS. Antipsychotics in adults with schizophrenia: comparative effectiveness of first-generation versus second-generation medications: a systematic review and meta-analysis. Ann Intern Med. 2012;157:498–511.

Hsu J, Santesso N, Mustafa R, Brozek J, Chen YL, Hopkins JP, et al. Antivirals for treatment of influenza: a systematic review and meta-analysis of observational studies. Ann Intern Med. 2012;156:512–24.

Article   PubMed   PubMed Central   Google Scholar  

Caldeira D, Alarcão J, Vaz-Carneiro A, Costa J. Risk of pneumonia associated with use of angiotensin converting enzyme inhibitors and angiotensin receptor blockers: systematic review and meta-analysis. BMJ. 2012;345:e4260.

MacArthur GJ, Minozzi S, Martin N, Vickerman P, Deren S, Bruneau J, et al. Opiate substitution treatment and HIV transmission in people who inject drugs: systematic review and meta-analysis. BMJ. 2012;345:e5945.

Mantha S, Karp R, Raghavan V, Terrin N, Bauer KA, Zwicker JI. Assessing the risk of venous thromboembolic events in women taking progestin-only contraception: a meta-analysis. BMJ. 2012;345:e4944.

Article   CAS   PubMed Central   PubMed   Google Scholar  

Silvain J, Beygui F, Barthélémy O, Pollack C Jr, Cohen M, Zeymer U, et al. Efficacy and safety of enoxaparin versus unfractionated heparin during percutaneous coronary intervention: systematic review and meta-analysis. BMJ. 2012;344:e553.

McKnight RF, Adida M, Budge K, Stockton S, Goodwin GM, Geddes JR. Lithium toxicity profile: a systematic review and meta-analysis. Lancet. 2012;379:721–8.

Article   CAS   Google Scholar  

Egger M, Davey Smith G, Schneider M. Systematic reviews of observational studies. In: Egger M, Davey Smith G, Altman DG, editors. Systematic reviews in health care: meta-analysis in context. 2nd ed. London: BMJ Publishing Group; 2001. p. 211–27.

Glasziou PP, Sanders SL. Investigating causes of heterogeneity in systematic reviews. Stat Med. 2002;21:1503–11.

Seeger J, Daniel GW. Commercial Insurance Databases. In: Strom BL, Kimmel SE, Hennessy S, editors. Pharmacoepidemiology. 5th ed. Chichester, John Wiley & Sons; 2012. p. 189–208.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700.

Zorzela L, Loke YK, Ioannidis JP, Golder S, Santaguida P, Altman DG, et al. PRISMA harms checklist: improving harms reporting in systematic reviews. BMJ. 2016;352:i157.

Cohen JM, Wood ME, Hernandez-Diaz S, Nordeng H. Agreement between paternal self-reported medication use and records from a national prescription database. Pharmacoepidemiol Drug Saf. 2018;27:413–21.

Gamble JM, McAlister FA, Johnson JA, Eurich DT. Quantifying the impact of drug exposure misclassification due to restrictive drug coverage in administrative databases: a simulation cohort study. Value Health. 2012;15:191–7.

Upadhya KK, Santelli JS, Raine-Bennett TR, Kottke MJ, Grossman D. Over-the-counter access to oral contraceptives for adolescents. J Adolesc Health. 2017;60:634–40.

Raebel MA, Schmittdiel J, Karter AJ, Konieczny JL, Steiner JF. Standardizing terminology and definitions of medication adherence and persistence in research employing electronic databases. Med Care. 2013;51(Suppl 3):S11–21.

Wu AC, Butler MG, Li L, Fung V, Kharbanda EO, Larkin EK, et al. Primary adherence to controller medications for asthma is poor. Ann Am Thorac Soc. 2015;12:161–6.

Fallis BA, Dhalla IA, Klemensberg J, Bell CM. Primary medication non-adherence after discharge from a general internal medicine service. PLoS One. 2013;8:e61735.

Anderson KL, Dothard EH, Huang KE, Feldman SR. Frequency of primary nonadherence to acne treatment. JAMA Dermatol. 2015;151:623–6.

Fischer MA, Stedman MR, Lii J, Vogeli C, Shrank WH, Brookhart MA, et al. Primary medication non-adherence: analysis of 195,930 electronic prescriptions. J Gen Intern Med. 2010;25:284–90.

Tamblyn R, Eguale T, Huang A, Winslade N, Doran P. The incidence and determinants of primary nonadherence with prescribed medication in primary care: a cohort study. Ann Intern Med. 2014;160:441–50.

Kolandaivelu K, Leiden BB, O'Gara PT, Bhatt DL. Non-adherence to cardiovascular medications. Eur Heart J. 2014;35:3267–76.

Kirkeby MJ, Hansen CD, Andersen JH. Socio-economic differences in use of prescribed and over-the-counter medicine for pain and psychological problems among Danish adolescents--a longitudinal study. Eur J Pediatr. 2014;173:1147–55.

Mukherjee K, Kamal KM. Sociodemographic determinants of out-of-pocket expenditures for patients using prescription drugs for rheumatoid arthritis. Am Health Drug Benefits. 2017;10:7–15.

PubMed   PubMed Central   Google Scholar  

Karter AJ, Parker MM, Solomon MD, Lyles CR, Adams AS, Moffet HH, et al. Effect of out-of-pocket cost on medication initiation, adherence, and persistence among patients with type 2 diabetes: The Diabetes Study of Northern California (DISTANCE). Health Serv Res. 2018;53:1227–47.

Leong A, Dasgupta K, Bernatsky S, Lacaille D, Avina-Zubieta A, Rahme E. Systematic review and meta-analysis of validation studies on a diabetes case definition from health administrative records. PLoS One. 2013;8:e75256.

Takahashi Y, Nishida Y, Asai S. Utilization of health care databases for pharmacoepidemiology. Eur J Clin Pharmacol. 2012;68:123–9.

Prada-Ramallal G, Takkouche B, Figueiras A. Summarising the evidence for drug safety: a methodological discussion of different meta-analysis approaches. Drug Saf. 2017;40:547–58.

Garfield E. The history and meaning of the journal impact factor. JAMA. 2006;295:90–3.

Seglen PO. Why the impact factor of journals should not be used for evaluating research. BMJ. 1997;314:498–502.

Brown H. How impact factors changed medical publishing–and science. BMJ. 2007;334:561–4.

Gerber S, Tallon D, Trelle S, Schneider M, Jüni P, Egger M. Bibliographic study showed improving methodology of meta-analyses published in leading journals 1993-2002. J Clin Epidemiol. 2007;60:773–80.

Petropoulou M, Nikolakopoulou A, Veroniki AA, Rios P, Vafaei A, Zarin W, et al. Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015. J Clin Epidemiol. 2017;82:20–8.

Ray WA. Improving automated database studies. Epidemiology. 2011;22:302–4.

Download references

This study received no funding from the public, commercial or not-for-profit sectors.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Author information

Authors and affiliations.

Department of Preventive Medicine and Public Health, University of Santiago de Compostela, c/ San Francisco s/n, 15786, Santiago de Compostela, A Coruña, Spain

Guillermo Prada-Ramallal, Bahi Takkouche & Adolfo Figueiras

Health Research Institute of Santiago de Compostela (Instituto de Investigación Sanitaria de Santiago de Compostela - IDIS), Clinical University Hospital of Santiago de Compostela, 15706, Santiago de Compostela, Spain

Research Unit for Inland Development, Polytechnic of Guarda (Unidade de Investigação para o Desenvolvimento do Interior - UDI/IPG), 6300-559, Guarda, Portugal

Fatima Roque

Health Sciences Research Centre, University of Beira Interior (Centro de Investigação em Ciências da Saúde - CICS/UBI), 6200-506, Covilhã, Portugal

Department of Medical Sciences & Institute for Biomedicine – iBiMED, University of Aveiro, 3810-193, Aveiro, Portugal

Maria Teresa Herdeiro

Higher Polytechnic & University Education Co-operative (Cooperativa de Ensino Superior Politécnico e Universitário - CESPU), Institute for Advanced Research & Training in Health Sciences & Technologies, 4585-116, Gandra, Portugal

Consortium for Biomedical Research in Epidemiology & Public Health (CIBER en Epidemiología y Salud Pública – CIBERESP), Santiago de Compostela, Spain

Bahi Takkouche & Adolfo Figueiras

You can also search for this author in PubMed   Google Scholar

Contributions

AF and GP-R contributed to study conception and design. GP-R, FR and AF contributed to searching, screening, data collection and analyses. GP-R was responsible for drafting the manuscript. FR, MTH, BT and AF provided comments and made several revisions of the manuscript. All authors read and approved the final version.

Corresponding author

Correspondence to Adolfo Figueiras .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:.

Excluded articles. List of articles excluded with reasons for exclusion. (PDF 247 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Prada-Ramallal, G., Roque, F., Herdeiro, M.T. et al. Primary versus secondary source of data in observational studies and heterogeneity in meta-analyses of drug effects: a survey of major medical journals. BMC Med Res Methodol 18 , 97 (2018). https://doi.org/10.1186/s12874-018-0561-3

Download citation

Received : 01 March 2018

Accepted : 18 September 2018

Published : 27 September 2018

DOI : https://doi.org/10.1186/s12874-018-0561-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Observational studies
  • Meta-analysis
  • Source of data
  • Heterogeneity
  • Over-the-counter
  • Out-of-pocket

BMC Medical Research Methodology

ISSN: 1471-2288

secondary research scholarly articles

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Secondary Research

Try Qualtrics for free

Secondary research: definition, methods, & examples.

19 min read This ultimate guide to secondary research helps you understand changes in market trends, customers buying patterns and your competition using existing data sources.

In situations where you’re not involved in the data gathering process ( primary research ), you have to rely on existing information and data to arrive at specific research conclusions or outcomes. This approach is known as secondary research.

In this article, we’re going to explain what secondary research is, how it works, and share some examples of it in practice.

Free eBook: The ultimate guide to conducting market research

What is secondary research?

Secondary research, also known as desk research, is a research method that involves compiling existing data sourced from a variety of channels . This includes internal sources (e.g.in-house research) or, more commonly, external sources (such as government statistics, organizational bodies, and the internet).

Secondary research comes in several formats, such as published datasets, reports, and survey responses , and can also be sourced from websites, libraries, and museums.

The information is usually free — or available at a limited access cost — and gathered using surveys , telephone interviews, observation, face-to-face interviews, and more.

When using secondary research, researchers collect, verify, analyze and incorporate it to help them confirm research goals for the research period.

As well as the above, it can be used to review previous research into an area of interest. Researchers can look for patterns across data spanning several years and identify trends — or use it to verify early hypothesis statements and establish whether it’s worth continuing research into a prospective area.

How to conduct secondary research

There are five key steps to conducting secondary research effectively and efficiently:

1.    Identify and define the research topic

First, understand what you will be researching and define the topic by thinking about the research questions you want to be answered.

Ask yourself: What is the point of conducting this research? Then, ask: What do we want to achieve?

This may indicate an exploratory reason (why something happened) or confirm a hypothesis. The answers may indicate ideas that need primary or secondary research (or a combination) to investigate them.

2.    Find research and existing data sources

If secondary research is needed, think about where you might find the information. This helps you narrow down your secondary sources to those that help you answer your questions. What keywords do you need to use?

Which organizations are closely working on this topic already? Are there any competitors that you need to be aware of?

Create a list of the data sources, information, and people that could help you with your work.

3.    Begin searching and collecting the existing data

Now that you have the list of data sources, start accessing the data and collect the information into an organized system. This may mean you start setting up research journal accounts or making telephone calls to book meetings with third-party research teams to verify the details around data results.

As you search and access information, remember to check the data’s date, the credibility of the source, the relevance of the material to your research topic, and the methodology used by the third-party researchers. Start small and as you gain results, investigate further in the areas that help your research’s aims.

4.    Combine the data and compare the results

When you have your data in one place, you need to understand, filter, order, and combine it intelligently. Data may come in different formats where some data could be unusable, while other information may need to be deleted.

After this, you can start to look at different data sets to see what they tell you. You may find that you need to compare the same datasets over different periods for changes over time or compare different datasets to notice overlaps or trends. Ask yourself: What does this data mean to my research? Does it help or hinder my research?

5.    Analyze your data and explore further

In this last stage of the process, look at the information you have and ask yourself if this answers your original questions for your research. Are there any gaps? Do you understand the information you’ve found? If you feel there is more to cover, repeat the steps and delve deeper into the topic so that you can get all the information you need.

If secondary research can’t provide these answers, consider supplementing your results with data gained from primary research. As you explore further, add to your knowledge and update your findings. This will help you present clear, credible information.

Primary vs secondary research

Unlike secondary research, primary research involves creating data first-hand by directly working with interviewees, target users, or a target market. Primary research focuses on the method for carrying out research, asking questions, and collecting data using approaches such as:

  • Interviews (panel, face-to-face or over the phone)
  • Questionnaires or surveys
  • Focus groups

Using these methods, researchers can get in-depth, targeted responses to questions, making results more accurate and specific to their research goals. However, it does take time to do and administer.

Unlike primary research, secondary research uses existing data, which also includes published results from primary research. Researchers summarize the existing research and use the results to support their research goals.

Both primary and secondary research have their places. Primary research can support the findings found through secondary research (and fill knowledge gaps), while secondary research can be a starting point for further primary research. Because of this, these research methods are often combined for optimal research results that are accurate at both the micro and macro level.

First-hand research to collect data. May require a lot of time The research collects existing, published data. May require a little time
Creates raw data that the researcher owns The researcher has no control over data method or ownership
Relevant to the goals of the research May not be relevant to the goals of the research
The researcher conducts research. May be subject to researcher bias The researcher collects results. No information on what researcher bias existsSources of secondary research
Can be expensive to carry out More affordable due to access to free data

Sources of Secondary Research

There are two types of secondary research sources: internal and external. Internal data refers to in-house data that can be gathered from the researcher’s organization. External data refers to data published outside of and not owned by the researcher’s organization.

Internal data

Internal data is a good first port of call for insights and knowledge, as you may already have relevant information stored in your systems. Because you own this information — and it won’t be available to other researchers — it can give you a competitive edge . Examples of internal data include:

  • Database information on sales history and business goal conversions
  • Information from website applications and mobile site data
  • Customer-generated data on product and service efficiency and use
  • Previous research results or supplemental research areas
  • Previous campaign results

External data

External data is useful when you: 1) need information on a new topic, 2) want to fill in gaps in your knowledge, or 3) want data that breaks down a population or market for trend and pattern analysis. Examples of external data include:

  • Government, non-government agencies, and trade body statistics
  • Company reports and research
  • Competitor research
  • Public library collections
  • Textbooks and research journals
  • Media stories in newspapers
  • Online journals and research sites

Three examples of secondary research methods in action

How and why might you conduct secondary research? Let’s look at a few examples:

1.    Collecting factual information from the internet on a specific topic or market

There are plenty of sites that hold data for people to view and use in their research. For example, Google Scholar, ResearchGate, or Wiley Online Library all provide previous research on a particular topic. Researchers can create free accounts and use the search facilities to look into a topic by keyword, before following the instructions to download or export results for further analysis.

This can be useful for exploring a new market that your organization wants to consider entering. For instance, by viewing the U.S Census Bureau demographic data for that area, you can see what the demographics of your target audience are , and create compelling marketing campaigns accordingly.

2.    Finding out the views of your target audience on a particular topic

If you’re interested in seeing the historical views on a particular topic, for example, attitudes to women’s rights in the US, you can turn to secondary sources.

Textbooks, news articles, reviews, and journal entries can all provide qualitative reports and interviews covering how people discussed women’s rights. There may be multimedia elements like video or documented posters of propaganda showing biased language usage.

By gathering this information, synthesizing it, and evaluating the language, who created it and when it was shared, you can create a timeline of how a topic was discussed over time.

3.    When you want to know the latest thinking on a topic

Educational institutions, such as schools and colleges, create a lot of research-based reports on younger audiences or their academic specialisms. Dissertations from students also can be submitted to research journals, making these places useful places to see the latest insights from a new generation of academics.

Information can be requested — and sometimes academic institutions may want to collaborate and conduct research on your behalf. This can provide key primary data in areas that you want to research, as well as secondary data sources for your research.

Advantages of secondary research

There are several benefits of using secondary research, which we’ve outlined below:

  • Easily and readily available data – There is an abundance of readily accessible data sources that have been pre-collected for use, in person at local libraries and online using the internet. This data is usually sorted by filters or can be exported into spreadsheet format, meaning that little technical expertise is needed to access and use the data.
  • Faster research speeds – Since the data is already published and in the public arena, you don’t need to collect this information through primary research. This can make the research easier to do and faster, as you can get started with the data quickly.
  • Low financial and time costs – Most secondary data sources can be accessed for free or at a small cost to the researcher, so the overall research costs are kept low. In addition, by saving on preliminary research, the time costs for the researcher are kept down as well.
  • Secondary data can drive additional research actions – The insights gained can support future research activities (like conducting a follow-up survey or specifying future detailed research topics) or help add value to these activities.
  • Secondary data can be useful pre-research insights – Secondary source data can provide pre-research insights and information on effects that can help resolve whether research should be conducted. It can also help highlight knowledge gaps, so subsequent research can consider this.
  • Ability to scale up results – Secondary sources can include large datasets (like Census data results across several states) so research results can be scaled up quickly using large secondary data sources.

Disadvantages of secondary research

The disadvantages of secondary research are worth considering in advance of conducting research :

  • Secondary research data can be out of date – Secondary sources can be updated regularly, but if you’re exploring the data between two updates, the data can be out of date. Researchers will need to consider whether the data available provides the right research coverage dates, so that insights are accurate and timely, or if the data needs to be updated. Also, fast-moving markets may find secondary data expires very quickly.
  • Secondary research needs to be verified and interpreted – Where there’s a lot of data from one source, a researcher needs to review and analyze it. The data may need to be verified against other data sets or your hypotheses for accuracy and to ensure you’re using the right data for your research.
  • The researcher has had no control over the secondary research – As the researcher has not been involved in the secondary research, invalid data can affect the results. It’s therefore vital that the methodology and controls are closely reviewed so that the data is collected in a systematic and error-free way.
  • Secondary research data is not exclusive – As data sets are commonly available, there is no exclusivity and many researchers can use the same data. This can be problematic where researchers want to have exclusive rights over the research results and risk duplication of research in the future.

When do we conduct secondary research?

Now that you know the basics of secondary research, when do researchers normally conduct secondary research?

It’s often used at the beginning of research, when the researcher is trying to understand the current landscape . In addition, if the research area is new to the researcher, it can form crucial background context to help them understand what information exists already. This can plug knowledge gaps, supplement the researcher’s own learning or add to the research.

Secondary research can also be used in conjunction with primary research. Secondary research can become the formative research that helps pinpoint where further primary research is needed to find out specific information. It can also support or verify the findings from primary research.

You can use secondary research where high levels of control aren’t needed by the researcher, but a lot of knowledge on a topic is required from different angles.

Secondary research should not be used in place of primary research as both are very different and are used for various circumstances.

Questions to ask before conducting secondary research

Before you start your secondary research, ask yourself these questions:

  • Is there similar internal data that we have created for a similar area in the past?

If your organization has past research, it’s best to review this work before starting a new project. The older work may provide you with the answers, and give you a starting dataset and context of how your organization approached the research before. However, be mindful that the work is probably out of date and view it with that note in mind. Read through and look for where this helps your research goals or where more work is needed.

  • What am I trying to achieve with this research?

When you have clear goals, and understand what you need to achieve, you can look for the perfect type of secondary or primary research to support the aims. Different secondary research data will provide you with different information – for example, looking at news stories to tell you a breakdown of your market’s buying patterns won’t be as useful as internal or external data e-commerce and sales data sources.

  • How credible will my research be?

If you are looking for credibility, you want to consider how accurate the research results will need to be, and if you can sacrifice credibility for speed by using secondary sources to get you started. Bear in mind which sources you choose — low-credibility data sites, like political party websites that are highly biased to favor their own party, would skew your results.

  • What is the date of the secondary research?

When you’re looking to conduct research, you want the results to be as useful as possible , so using data that is 10 years old won’t be as accurate as using data that was created a year ago. Since a lot can change in a few years, note the date of your research and look for earlier data sets that can tell you a more recent picture of results. One caveat to this is using data collected over a long-term period for comparisons with earlier periods, which can tell you about the rate and direction of change.

  • Can the data sources be verified? Does the information you have check out?

If you can’t verify the data by looking at the research methodology, speaking to the original team or cross-checking the facts with other research, it could be hard to be sure that the data is accurate. Think about whether you can use another source, or if it’s worth doing some supplementary primary research to replicate and verify results to help with this issue.

We created a front-to-back guide on conducting market research, The ultimate guide to conducting market research , so you can understand the research journey with confidence.

In it, you’ll learn more about:

  • What effective market research looks like
  • The use cases for market research
  • The most important steps to conducting market research
  • And how to take action on your research findings

Download the free guide for a clearer view on secondary research and other key research types for your business.

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

  • Login to Survey Tool Review Center

Secondary Research Advantages, Limitations, and Sources

Summary: secondary research should be a prerequisite to the collection of primary data, but it rarely provides all the answers you need. a thorough evaluation of the secondary data is needed to assess its relevance and accuracy..

5 minutes to read. By author Michaela Mora on January 25, 2022 Topics: Relevant Methods & Tips , Business Strategy , Market Research

Secondary Research

Secondary research is based on data already collected for purposes other than the specific problem you have. Secondary research is usually part of exploratory market research designs.

The connection between the specific purpose that originates the research is what differentiates secondary research from primary research. Primary research is designed to address specific problems. However, analysis of available secondary data should be a prerequisite to the collection of primary data.

Advantages of Secondary Research

Secondary data can be faster and cheaper to obtain, depending on the sources you use.

Secondary research can help to:

  • Answer certain research questions and test some hypotheses.
  • Formulate an appropriate research design (e.g., identify key variables).
  • Interpret data from primary research as it can provide some insights into general trends in an industry or product category.
  • Understand the competitive landscape.

Limitations of Secondary Research

The usefulness of secondary research tends to be limited often for two main reasons:

Lack of relevance

Secondary research rarely provides all the answers you need. The objectives and methodology used to collect the secondary data may not be appropriate for the problem at hand.

Given that it was designed to find answers to a different problem than yours, you will likely find gaps in answers to your problem. Furthermore, the data collection methods used may not provide the data type needed to support the business decisions you have to make (e.g., qualitative research methods are not appropriate for go/no-go decisions).

Lack of Accuracy

Secondary data may be incomplete and lack accuracy depending on;

  • The research design (exploratory, descriptive, causal, primary vs. repackaged secondary data, the analytical plan, etc.)
  • Sampling design and sources (target audiences, recruitment methods)
  • Data collection method (qualitative and quantitative techniques)
  • Analysis point of view (focus and omissions)
  • Reporting stages (preliminary, final, peer-reviewed)
  • Rate of change in the studied topic (slowly vs. rapidly evolving phenomenon, e.g., adoption of specific technologies).
  • Lack of agreement between data sources.

Criteria for Evaluating Secondary Research Data

Before taking the information at face value, you should conduct a thorough evaluation of the secondary data you find using the following criteria:

  • Purpose : Understanding why the data was collected and what questions it was trying to answer will tell us how relevant and useful it is since it may or may not be appropriate for your objectives.
  • Methodology used to collect the data : Important to understand sources of bias.
  • Accuracy of data: Sources of errors may include research design, sampling, data collection, analysis, and reporting.
  • When the data was collected : Secondary data may not be current or updated frequently enough for the purpose that you need.
  • Content of the data : Understanding the key variables, units of measurement, categories used and analyzed relationships may reveal how useful and relevant it is for your purposes.
  • Source reputation : In the era of purposeful misinformation on the Internet, it is important to check the expertise, credibility, reputation, and trustworthiness of the data source.

Secondary Research Data Sources

Compared to primary research, the collection of secondary data can be faster and cheaper to obtain, depending on the sources you use.

Secondary data can come from internal or external sources.

Internal sources of secondary data include ready-to-use data or data that requires further processing available in internal management support systems your company may be using (e.g., invoices, sales transactions, Google Analytics for your website, etc.).

Prior primary qualitative and quantitative research conducted by the company are also common sources of secondary data. They often generate more questions and help formulate new primary research needed.

However, if there are no internal data collection systems yet or prior research, you probably won’t have much usable secondary data at your disposal.

External sources of secondary data include:

  • Published materials
  • External databases
  • Syndicated services.

Published Materials

Published materials can be classified as:

  • General business sources: Guides, directories, indexes, and statistical data.
  • Government sources: Census data and other government publications.

External Databases

In many industries across a variety of topics, there are private and public databases that can bed accessed online or by downloading data for free, a fixed fee, or a subscription.

These databases can include bibliographic, numeric, full-text, directory, and special-purpose databases. Some public institutions make data collected through various methods, including surveys, available for others to analyze.

Syndicated Services

These services are offered by companies that collect and sell pools of data that have a commercial value and meet shared needs by a number of clients, even if the data is not collected for specific purposes those clients may have.

Syndicated services can be classified based on specific units of measurements (e.g., consumers, households, organizations, etc.).

The data collection methods for these data may include:

  • Surveys (Psychographic and Lifestyle, advertising evaluations, general topics)
  • Household panels (Purchase and media use)
  • Electronic scanner services (volume tracking data, scanner panels, scanner panels with Cable TV)
  • Audits (retailers, wholesalers)
  • Direct inquiries to institutions
  • Clipping services tracking PR for institutions
  • Corporate reports

You can spend hours doing research on Google in search of external sources, but this is likely to yield limited insights. Books, articles journals, reports, blogs posts, and videos you may find online are usually analyses and summaries of data from a particular perspective. They may be useful and give you an indication of the type of data used, but they are not the actual data. Whenever possible, you should look at the actual raw data used to draw your own conclusion on its value for your research objectives. You should check professionally gathered secondary research.

Here are some external secondary data sources often used in market research that you may find useful as starting points in your research. Some are free, while others require payment.

  • Pew Research Center : Reports about the issues, attitudes, and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis, and other empirical social science research.
  • Data.Census.gov : Data dissemination platform to access demographic and economic data from the U.S. Census Bureau.
  • Data.gov : The US. government’s open data source with almost 200,00 datasets ranges in topics from health, agriculture, climate, ecosystems, public safety, finance, energy, manufacturing, education, and business.
  • Google Scholar : A web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.
  • Google Public Data Explorer : Makes large, public-interest datasets easy to explore, visualize and communicate.
  • Google News Archive : Allows users to search historical newspapers and retrieve scanned images of their pages.
  • Mckinsey & Company : Articles based on analyses of various industries.
  • Statista : Business data platform with data across 170+ industries and 150+ countries.
  • Claritas : Syndicated reports on various market segments.
  • Mintel : Consumer reports combining exclusive consumer research with other market data and expert analysis.
  • MarketResearch.com : Data aggregator with over 350 publishers covering every sector of the economy as well as emerging industries.
  • Packaged Facts : Reports based on market research on consumer goods and services industries.
  • Dun & Bradstreet : Company directory with business information.

Related Articles

  • What Is Market Research?
  • Step by Step Guide to the Market Research Process
  • How to Leverage UX and Market Research To Understand Your Customers
  • Why Your Business Needs Discovery Research
  • Your Market Research Plan to Succeed As a Startup
  • Top Reason Why Businesses Fail & What To Do About It
  • What To Value In A Market Research Vendor
  • Don’t Let The Budget Dictate Your Market Research Approach
  • How To Use Research To Find High-Order Brand Benefits
  • How To Prioritize What To Research
  • Don’t Just Trust Your Gut — Do Research
  • Understanding the Pros and Cons of Mixed-Mode Research

Subscribe to our newsletter to get notified about future articles

Subscribe and don’t miss anything!

Recent Articles

  • How AI Can Further Remove Researchers in Search of Productivity and Lower Costs
  • Re: Design/Growth Podcast – Researching User Experiences for Business Growth
  • Why You Need Positioning Concept Testing in New Product Development
  • Why Conjoint Analysis Is Best for Price Research
  • The Rise of UX
  • Making the Case Against the Van Westendorp Price Sensitivity Meter
  • How to Future-Proof Experience Management and Your Business
  • When Using Focus Groups Makes Sense
  • How to Make Segmentation Research Actionable
  • How To Integrate Market Research and UX Research for Desired Business Outcomes

Popular Articles

  • Which Rating Scales Should I Use?
  • What To Consider in Survey Design
  • 6 Decisions To Make When Designing Product Concept Tests
  • Write Winning Product Concepts To Get Accurate Results In Concept Tests
  • How to Use Qualitative and Quantitative Research in Product Development
  • The Opportunity of UX Research Webinar
  • Myths & Misunderstandings About UX – MR Realities Podcast
  • 12 Research Techniques to Solve Choice Overload
  • Concept Testing for UX Researchers
  • UX Research Geeks Podcast – Using Market Research for Better Context in UX
  • A Researcher’s Path – Data Stories Leaders At Work Podcast
  • How To Improve Racial and Gender Inclusion in Survey Design

GDPR

  • Privacy Overview
  • Strictly Necessary Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Tutorial: Evaluating Information: Primary vs. Secondary Articles

  • Evaluating Information
  • Scholarly Literature Types
  • Primary vs. Secondary Articles
  • Peer Review
  • Systematic Reviews & Meta-Analysis
  • Gray Literature
  • Evaluating Like a Boss
  • Evaluating AV

Primary vs. Secondary Research Articles

In the sciences,  primary (or empirical) research articles :

  • are original scientific reports of new research findings (Please note that an original scientific article does not include review articles, which summarize the research literature on a particular subject, or articles using meta-analyses, which analyze pre-published data.)
  • usually include the following sections: Introduction , Methods , Results , Discussion, References
  • are usually  peer-reviewed (examined by expert(s) in the field before publication). Please note that a peer-reviewed article is not the same as a review article, which summarizes the research literature on a particular subject

You may also choose to use some secondary sources (summaries or interpretations of original research) such as books (find these through the library catalog) or review articles (articles which organize and critically analyze the research of others on a topic). These secondary sources, particularly review articles, are often useful and easier-to-read summaries of research in an area. Additionally, you can use the listed references to find useful primary research articles.

Anatomy of a Scholarly Article

scholarly article anatomy

from NCSU Libraries' Anatomy of a Scholarly Article

Types of health studies

In the sciences, particularly the health sciences, there are a number of types of primary articles (the gold standard being randomized controlled trials ) and secondary articles (the gold standard being systematic reviews and meta-analysis ). The chart below summarizes their differences and the linked article gives more information.

health study types

Searching for Primary vs. Secondary Articles

primary or secondary article search

Some scholarly databases will allow you to specific what kind of scholarly literature you're looking for.  However, be careful! Sometimes, depending on the database, the Review article type may mean book review instead of or as well as review article. You may also have to look under more or custom options to find these choices.

  • << Previous: Scholarly Literature Types
  • Next: Peer Review >>
  • Last Updated: Oct 20, 2021 11:11 AM
  • URL: https://guides.library.cornell.edu/evaluate

Banner Image

Peer-Reviewed Literature: Peer-Reviewed Research: Primary vs. Secondary

  • Peer-Reviewed Research: Primary vs. Secondary
  • Types of Peer Review
  • Identifying Peer-Reviewed Research

Peer Reviewed Research

Published literature can be either peer-reviewed or non-peer-reviewed. Official research reports are almost always peer reviewed while a journal's other content is usually not. In the health sciences, official research can be primary, secondary, or even tertiary. It can be an original experiment or investigation (primary), an analysis or evaluation of primary research (secondary), or findings that compile secondary research (tertiary). If you are doing research yourself, then primary or secondary sources can reveal more in-depth information.

Primary Research

Primary research is information presented in its original form without interpretation by other researchers. While it may acknowledge previous studies or sources, it always presents original thinking, reports on discoveries, or new information about a topic.

Health sciences research that is primary includes both experimental trials and observational studies where subjects may be tested for outcomes or investigated to gain relevant insight.  Randomized Controlled Trials are the most prominent experimental design because randomized subjects offer the most compelling evidence for the effectiveness of an intervention. See the below graphic and below powerpoint for further information on primary research studies.

secondary research scholarly articles

  • Research Design

Secondary Research

Secondary research is an account of original events or facts. It is secondary to and retrospective of the actual findings from an experiment or trial. These studies may be appraised summaries, reviews, or interpretations of primary sources and often exclude the original researcher(s). In the health sciences, meta-analysis and systematic reviews are the most frequent types of secondary research. 

  • A meta-analysis is a quantitative method of combining the results of primary research. In analyzing the relevant data and statistical findings from experimental trials or observational studies, it can more accurately calculate effective resolutions regarding certain health topics.
  • A systematic review is a summary of research that addresses a focused clinical question in a systematic, reproducible manner. In order to provide the single best estimate of effect in clinical decision making, primary research studies are pooled together and then filtered through an inclusion/exclusion process. The relevant data and findings are then compiled and synthesized to arrive at a more accurate conclusion about a specific health topic. Only peer-reviewed publications are used and analyzed in a methodology which may or may not include a meta-analysis.

secondary research scholarly articles

  • << Previous: Home
  • Next: Types of Peer Review >>
  • Last Updated: Sep 29, 2023 10:05 AM
  • URL: https://ttuhsc.libguides.com/PeerReview

Texas Tech University Health Sciences Center logo

  • Search Menu
  • Sign in through your institution
  • Chemical Biology and Nucleic Acid Chemistry
  • Computational Biology
  • Critical Reviews and Perspectives
  • Data Resources and Analyses
  • Gene Regulation, Chromatin and Epigenetics
  • Genome Integrity, Repair and Replication
  • Methods Online
  • Molecular Biology
  • Nucleic Acid Enzymes
  • RNA and RNA-protein complexes
  • Structural Biology
  • Synthetic Biology and Bioengineering
  • Advance Articles
  • Breakthrough Articles
  • Special Collections
  • Scope and Criteria for Consideration
  • Author Guidelines
  • Data Deposition Policy
  • Database Issue Guidelines
  • Web Server Issue Guidelines
  • Submission Site
  • About Nucleic Acids Research
  • Editors & Editorial Board
  • Information of Referees
  • Self-Archiving Policy
  • Dispatch Dates
  • Advertising and Corporate Services
  • Journals Career Network
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Introduction, materials and methods, data availability, supplementary data, acknowledgements.

  • < Previous

Conserved intronic secondary structures with concealed branch sites regulate alternative splicing of poison exons

ORCID logo

The first three authors should be regarded as Joint First Authors.

  • Article contents
  • Figures & tables
  • Supplementary Data

Hao Li, Zhan Ding, Zhuo-Ya Fang, Ni Long, Hao-Yang Ang, Yu Zhang, Yu-Jie Fan, Yong-Zhen Xu, Conserved intronic secondary structures with concealed branch sites regulate alternative splicing of poison exons, Nucleic Acids Research , Volume 52, Issue 10, 10 June 2024, Pages 6002–6016, https://doi.org/10.1093/nar/gkae185

  • Permissions Icon Permissions

Alternative splicing (AS) generates multiple RNA isoforms and increases the complexities of transcriptomes and proteomes. However, it remains unclear how RNA structures contribute to AS regulation. Here, we systematically search transcriptomes for secondary structures with concealed branch sites (BSs) in the alternatively spliced introns and predict thousands of them from six organisms, of which many are evolutionarily conserved. Intriguingly, a highly conserved stem–loop structure with concealed BSs is found in animal SF3B3 genes and colocalizes with a downstream poison exon (PE). Destabilization of this structure allows increased usage of the BSs and results in enhanced PE inclusion in human and Drosophila cells, leading to decreased expression of SF3B3. This structure is experimentally validated using an in-cell SHAPE-MaP assay. Through RNA interference screens of 28 RNA-binding proteins, we find that this stem–loop structure is sensitive to U2 factors. Furthermore, we find that SF3B3 also facilitates DNA repair and protects genome stability by enhancing interaction between ERCC6/CSB and arrested RNA polymerase II. Importantly, both Drosophila and human cells with the secondary structure mutated by genome editing exhibit altered DNA repair in vivo . This study provides a novel and common mechanism for AS regulation of PEs and reveals a physiological function of SF3B3 in DNA repair.

Graphical Abstract

Alternative splicing (AS) occurs extensively in higher eukaryotes, generating multiple RNA isoforms from one gene through the selection of different splice sites, which dramatically increases the complexities of transcriptomes and proteomes ( 1 ). More than 95% of human genes are alternatively spliced ( 2 ), and on average each gene has >10 RNA isoforms ( 3 ).

RNA splicing is catalyzed by the spliceosome, a large and dynamic RNA–protein complex, consisting of five small nuclear RNAs (snRNAs; U1, U2, U4, U5 and U6) and >100 proteins ( 4 ). Four conserved intronic regions are required for spliceosomal recognition, assembly and catalysis: the 5′ and 3′ splice sites (SSs), the branch site (BS) and a polypyrimidine tract (PPyT) ( 5 ). During spliceosome assembly, the 5′SS is recognized by U1 small nuclear ribonucleoprotein (snRNP) through base pairing with the U1 snRNA, the PPyT and 3′SS are bound by U2AFs and the BS is stably bound by the 17S U2 snRNP through base pairing with the U2 snRNA ( 6 , 7 ). After the U4/U6·U5 tri-snRNP joins in and conformational changes, the BS-adenosine initiates a two-step transesterification reaction that connects two exons and releases a lariat intron ( 8 ).

The two SSs of an intron can be identified through sequence comparison between complementary DNA (cDNA) and DNA, but identification of a BS is difficult in metazoans. BSs are highly conserved with a UACUA A C sequence in the yeast Saccharomyces cerevisiae , while more divergent in metazoans, showing a less conserved yUn A y sequence in human ( 9 ). Since the structure of the lariat intron juxtaposes the 5′SS and BS, a split-and-inverted alignment was developed for mapping lariat reads, allowing transcriptome-wide identification of BSs ( 10 ). Later on, techniques for enriching lariats were developed to find BSs on large scales ( 11–13 ).

The BS is recognized by the 17S U2 snRNP, in which there are two protein subcomplexes, SF3a and SF3b ( 14 ). SF3b has seven subunits, SF3B1/SF3b155, SF3B2/SF3b145, SF3B3/SF3b130, SF3B4/SF3b49, SF3B5/SF3b10, SF3B6/SF3b14a and PHF5A/SF3b14b ( 15–17 ). Previous cross-linking assays and recent cryo-electron microscopy structures reveal that SF3B1 interacts with pre-messenger RNA (pre-mRNA) flanking the BS, and SF3B6 interacts with the BS-adenosine ( 18–20 ). Mutations in SF3B1 have been frequently identified in patients with blood cancer ( 21 ) and mechanistically alter spliceosomal recognition of the BS and 3′SS ( 22 ). Although functions of the other five SF3b subunits have not been well studied, it has been reported that SF3B3 interacts with the STAGA (SPT3–TAFII31–GCN5L acetylase) complex ( 23 , 24 ) and the TFTC (TATA-binding-protein-free TAFII-containing) complex ( 25 ), suggesting that SF3B3 might be involved in the regulation of transcription. Interestingly, SF3B3 is unique and seems weird in structural domains. Unlike other SFs that typically have an RNA binding or recognition motif, SF3B3 has both an MMS1_N domain and a CSPF_A domain that usually appear in factors involved in the DNA damage/repair and 3′-end cleavage/polyadenylation, respectively ( 26 , 27 ). Including SF3B3, several U2 components and related factors are linked with the maintenance of protein levels of DNA repair factors, such as BRCA1 and RAD51, which contribute to the ionizing radiation- and hydroxyurea-triggered DNA repair through homologous recombination ( 28 ). However, it remains unclear whether SF3B3 is involved in these two non-splicing processes.

There are five basic AS types, retained intron, skipped exon, alternative 5′SS and 3′SS, and mutually exclusive exons, in which the most frequent is the skipping of cassette exons ( 29 ). The inclusion or skipping of a cassette exon in mRNAs would encode different proteins or introduce a premature termination codon (PTC) that usually triggers nonsense-mediated mRNA decay (NMD) ( 30 ). A cassette exon is named a poison exon (PE) when its inclusion introduces a PTC ( 31 , 32 ), while it is named an essential exon (EE) when its skipping introduces a PTC ( 33 ). AS events have been found to be conserved, and many homologous exons (homo-exons, exons from homologous genes with conserved borders and coding sequences) are spliced by the same types of AS ( 34 , 35 ). Disordered AS has been frequently connected to differential and developmental defects and diseases ( 36 ).

Regulation of AS is determined by exonic and intronic c is -elements that are recognized by trans -factors during the assembly of pre-spliceosome ( 2 ). RNA secondary structures also contribute to AS regulation; the structure either blocks nearby splice sites and inhibits splicing or brings long-distance exons closer and promotes splicing ( 2 , 37 ). The most well-known example is the mutually exclusive splicing of the Drosophila Dscam gene ( 38 ). G-quadruplex structures can enhance the skipping of nearby exons in mouse and human neurons ( 39 ). However, BS-related RNA secondary structures and their regulatory roles in AS have not been systematically investigated.

To find transcriptome-wide BS-containing secondary structures, we performed a search in six model organisms. Thousands of secondary structures with concealed BSs were identified. The most conserved structure is the one in animal SF3B3 introns, which is colocalized with a downstream PE. We found that the structure’s stability modulates AS of the PE in human and fly cells and that it is sensitive to U2 and U2-associated proteins. Furthermore, SF3B3 facilitates DNA repair and genome stability in response to UV irradiation. Importantly, when the SF3B3 structure was mutated in fruit fly and human cells by genome editing, both AS of the PE and DNA repair activity were significantly changed, providing in vivo evidence that hiding of BS in a stem–loop structure is a novel and common mechanism that regulates AS of the PE and contributes to the gene’s physiological function.

Searching the concealed BSs in AS-related introns

Sequences and annotations of retained introns and skipped-exon neighboring introns were extracted from the UCSC Genome Browser for six species ( Homo sapiens /hg38, Mus musculus /mm10, Gallus gallus /galGal6, Xenopus tropicalis /xenTro10, Drosophila melanogaster /dm6 and Caenorhabditis elegans /ce11). BSs in the human introns were determined according to the BS data ( 40 ) and were predicted by LaBranchoR ( 41 ) in the other introns. Secondary structures containing BSs were constructed by RNAfold ( 42 ) for the last 120-nt intronic sequences. The energy contribution of different base pairs to the secondary structure at different positions was calculated, and finally, the secondary structure with the lowest free energy was chosen. For convenience, introns with the same 3′SS but different 5′SSs were merged. Concealed BS must fit three criteria: (i) the BS or at least one of its two flanking nucleotides should be paired; (ii) the stem structure should have at least seven continuous pairings with ≤1 mismatch, and the maximum mismatches can be increased by 1 for every additional five pairings; and (iii) the BS region is not overlapped with exons from other genes or other transcripts from the same gene. A summary of this pipeline is presented in Supplementary Figure S1A .

Culture of cells and Drosophila strains

Human HEK293T and HaCaT cells are from CCTCC. These cell lines were cultured in Dulbecco’s modified Eagle medium (Gibco) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin and streptomycin at 37°C. Drosophila S2 cell is obtained from Gibco and cultured in Schneider’s Drosophila medium with 10% FBS at 28°C. All the D. melanogaster strains used in this study were maintained and cultured on a standard cornmeal agar medium.

RNAi and overexpression in human and Drosophila cells

In human cells, small hairpin RNAs (shRNAs) were designed by online software from Sigma and loaded into a lentiviral vector pLKO.1 (Addgene) to knock down targets. SF3B3 mini-genes containing a 4.7-kb fragment (exons 10–12) of wild-type (WT) or mutated sequences were cloned into a pcDNA3.0 vector (Addgene) and transfected into 293T; coding sequences of the full-length and truncated proteins were cloned into a pCDH vector (Addgene) and transfected into 293T and HaCaT cells to express SF3B3 and other proteins. An antisense RNA oligo with 2′- O -methylation is synthesized and transfected into 293T cells with RNATransMate (Sangon Biotech) for competition assay of the endogenous human SF3B3 structure. All the used primers are listed in Supplementary Table S1 .

In Drosophila S2 cells, double-stranded RNAs (dsRNAs) were transcribed using the T7 RiboMAX Express RNAi System (Promega) and transfected to knock down targets that were precultured in an FBS-free medium for 40 min ( 43 ); sf3b3 mini-genes containing a 2.7-kb fragment (exons 3–7) of WT or mutated sequences were cloned into a pIZT-V5 vector (Invitrogen) and transfected.

SHAPE-MaP assay

The selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) assay was performed as described ( 44 ). Briefly, 1 × 10 6 293T cells were collected and treated in 1 ml phosphate-buffered saline (PBS) containing 10% dimethyl sulfoxide (DMSO) or 10% DMSO with 100 mM NAI-N3 (MCE) at 37°C for 10 min. The total RNAs were then isolated, and an SF3B3 primer to the downstream intron of the PE (11a) was used for reverse transcription by SuperScript II (Thermo). Using specific primers and adapter primers, amplification of the SF3B3 cDNAs was performed by two rounds of polymerase chain reactions (PCRs). Followed by an Illumina NovaSeq Xplus, the modification reactivities caused by NAI-N3 and structure modeling were analyzed as described ( 45 ).

Construction of Drosophila and human cell mutants by CRISPR/Cas9 systems

The WT is a w 1118 isogenic strain (BDSC 5905). Two mutant strains, the sf3b3-mutL and sf3b3-5c , were constructed using a CRISPR/Cas9-mediated knock-in technique ( 22 ). In brief, target sequences of single-guide RNAs (sgRNAs) were selected ( Supplementary Table S1 ), and donor plasmid (pMD18-T) with the deletion/mutation regions and the adjacent 2-kb sequences as homologous arms were constructed. The guide RNA (gRNA) and donor plasmids were co-injected into embryos of the transgenic line nanos-Cas9 (BDSC 78781) by UniHuaii Technology Company. Specific primers were used for genomic PCRs to screen for the desired alleles, which were further validated by sequencing. The flies obtained were then crossed for at least five generations with the WT strain to eliminate potential off-target events.

The ΔL cell was constructed using a CRISPR/Cas9-mediated knock-in technique ( 46 ). Briefly, target sequences of two sgRNAs were designed and inserted into the px458 plasmid, and single-stranded oligodeoxyribonucleotides (ssODNs) with mutations were synthesized by Tsingke. The gRNA plasmids and ssODNs were co-transfected into 293T cells, and the monoclonal cells were sorted through flow cytometry. The ΔL mutant cell line was then confirmed by genomic PCRs and Sanger sequencing after a 2-week culture.

UV irradiation

Using the cross-linker CL-1000 (UVP), UV irradiation to 293T and HaCaT cells was carried out under 254 nm light for 10 J/m 2 for detection of γ-H2AX signals ( 47 ) and 30 J/m 2 for a comet assay. For Drosophila samples, 100 larvae of the third instar were placed in a 6-cm dish and UV irradiated for 200 J/m 2 under 254 nm light, and the test was performed in triplets. The irradiated larvae were then carefully put back into vials and the eclosed flies from each group were counted 6 days later. All statistics were analyzed by GraphPad Prism 7 (San Diego, CA), and the significant differences were determined according to a t -test.

Western blotting and immunofluorescence

For western blotting, the 293T and HaCaT cells with or without 10 J/m 2 UV radiation and the third instar larvae with or without 200 J/m 2 UV radiation were collected and lysed by RIPA buffer [50 mM Tris–HCl (pH 7.4), 150 mM NaCl, 1% Triton X-100, 1% sodium deoxycholate, 0.1% sodium dodecyl phosphate] for preparation of proteins, whose concentrations were determined by Enhanced BCA Protein Assay Kit (Promega). Signals of western blotting were detected using rabbit polyclonal antibodies anti-SF3B3 from Sigma (HPA041134), anti-DDB1 (A3827), anti-Phoshpo-H2AX-S139 (AP0687), anti-ERCC8/CSA (A6884) and anti-tubulin (AC030) from Abclonal, and anti-ERCC6/CSB (24291-1-AP) from Proteintech. Drosophila γ-h2AvD was detected using rabbit polyclonal antibody anti-Histone h2AvD phosphoS137 (Rockland, 600401914).

For immunofluorescence, the 293T and HaCaT cells were fixed after 10 J/m 2 UV radiation, incubated with the primary antibody anti-Phoshpo-H2AX-S139 or anti-SF3B3 (1:100), followed by incubation with the secondary antibody goat anti-rabbit Alexa Fluor 594 or anti-mouse Alexa Fluor 488 (1:500, Sigma). DAPI (1:2000, Sigma) was used for staining the nuclei. Images were acquired using a Carl Zeiss LSM880 confocal microscope.

About 4 × 10 3 cells under different treatments were digested and spread out into a 96-well plate. After 24 h, cells were exposed to 254 nm UV light, and the cell activities were then measured at 12-h intervals using the Cell Counting Kit-8 (CCK8; Beyotime). Data were processed and analyzed by GraphPad Prism 7 (San Diego, CA).

Comet assay

The comet assay was performed as described ( 48 ). Briefly, the 293T and HaCaT cells were cultured in a 12-well plate until 60% density and then exposed under 254 nm UV using a CL-1000 Ultraviolet Crosslinker (UVP). Cells were then collected, resuspended at single-cell status by 1× PBS with an additional 1:10 ratio of 1% low-melting-point agarose and applied to a 1% agarose precoated slide for electrophoresis using the alkaline electrophoresis solution (200 mM NaOH, 1 mM Na 2 EDTA) under 1 V/cm at 4°C. Finally, fluorescent nucleic acid staining solution (Yeasen) was dropped onto a slide for visualization of the DNA breaks under epifluorescence microscopy (Olympus), and the quantifications were processed by the software Opencomet.

Immunoprecipitation and co-immunoprecipitation

Cell lysates were prepared from a six-well plate culture using 1.8 ml of immunoprecipitation (IP) buffer [20 mM Tris–HCl (pH 8.0), 150 mM NaCl, 0.5% NP-40, 1% Triton X-100 and 1 mM phenylmethylsulfonyl fluoride with 1× proteinase inhibitor cocktail (Yeasen)]. After centrifugation to remove the cell debris, lysates were then precleared by 25 μl of Protein A magnetic beads (Thermo), followed by overnight incubation with 1.5 μg of primary antibodies. IPed and co-IPed proteins were then purified by Protein A or FLAG magnetic beads (Thermo), separated on sodium dodecyl–sulfate polyacrylamide gel electrophoresis and visualized by western blotting.

RT-PCR, RNA-seq and bioinformatics

Total RNAs from the human and Drosophila cells and tissues were isolated by TRIzol (Sigma) and treated with RNase-free DNase I (Invitrogen). For reverse transcription PCR (RT-PCR), reverse transcription was performed using PrimeScript RT Reagent Kit with gDNA Eraser (TaKaRa), and the obtained cDNA was amplified by 2× Hieff PCR Master Mix (Yeasen). RNAs from the Drosophila adults (5 days post-eclosion) and 293T cells were isolated for mRNA sequencing (mRNA-seq), while the construction of cDNA libraries and sequencing were performed using Illumina HiseqXten-PE150 by Novogene.

Raw reads from RNA-seq were quality filtered and trimmed, mapped to the human genome (hg38) and D. melanogaster genome (dm6) by STAR, and counted by featureCounts ( 49 ). Analyses of differentially expressed genes (DEGs) were performed by DEseq2_v1.22.2 ( 50 ), and genes with fold changes >2 and false discovery rate (FDR) <0.05 were screened as significant. Differentially spliced (DS) events were analyzed by rMATS_v4.1.2 ( 51 ), in which the significant DS events were screened by conditions of |ΔPSI| > 0.05 and FDR < 0.05.

Drosophila developmental assays

The number of eggs laid per female fly was measured for fecundity. Briefly, 10 individual mated female adults (24–28 h post-eclosion) from each strain were passed to new plates, and their laid eggs per plate were counted every day for 10 days. For hatching, 100 eggs from each mated strain were collected and counted under standard conditions. One hundred wandering-stage larvae and third-day pupae from each strain were collected into new vials with standard food for detection of the pupation and eclosion rates, respectively. All tests were performed in triplets and counted at regular intervals. The statistical analyses were performed using GraphPad Prism 7 (San Diego, CA), and the statistical differences were determined by t -tests.

Identification of transcriptome-wide concealed BSs in RNA secondary structures

A BS concealed in a secondary structure would result in intron retention or skipping of its neighboring exons (Figure 1A ). To find transcriptome-wide concealed BSs from six model species, including H. sapiens , M. musculus , G. gallus , X. tropicalis , D. melanogaster and C. elegans , we searched their AS events from the UCSC Genome Browser and retrieved sequences of the retained introns and flanking introns of the skipped exons, in which locations of BSs in human were determined according to the BS data ( 40 ) and in other five species were predicted by LaBranchoR ( 41 ). The secondary structures of the last 120-nt intronic sequences including BSs were predicted by RNAfold ( 42 ) and the structures with concealed BSs were finally selected in consideration of the stabilities and the BS positions in the duplexes. The pipeline is summarized in Figure 1B and Supplementary Figure S1A , and for more details please see the ‘Materials and methods’ section. In total, we obtained 2521 human genes that have 4237 concealed BSs in 3074 AS-related introns. The numbers gradually decreased in genomes from mouse to chicken, frog, fruit fly and worm (Figure 1C , Supplementary Figure S1B and Supplementary Table S2 ).

 alt=

Identification of transcriptome-wide RNA secondary structures with concealed BSs in the alternatively spliced introns in six species. ( A ) A BS within a stable secondary structure would result in intron retention or skipping of its flanking exons. ( B ) Strategy for searching concealed BSs in secondary structures in six species. ( C ) The numbers of genes with identified concealed BSs in six species. Genome sizes are indicated. ( D ) Counting of homologous genes with concealed BSs across six species. A cluster is defined as a group of homologous genes (including the orthologs and paralogs). ( E ) Sankey diagram of homologous genes with concealed BSs in homologous introns (homo-introns). ( F ) Two examples of the conserved secondary structures between human and mouse genes. More examples are shown in Supplementary Figure S1B . ( G ) The conserved secondary structures with concealed BSs of BCOR genes in human, mouse and chicken. ( H ) The conserved secondary structures with concealed BSs of SF3B3 genes in five species. Asterisks represent experimentally determined BSs in human and predicted BSs in other species.

To identify the conservation of these structures, we analyzed the homology of the genes and introns. Evolutionarily, 37 orthologous clusters/genes in each of the four vertebrates have concealed BSs, which gradually decreased to 11 in all six species (Figure 1D and Supplementary Table S3 ). Furthermore, 41 homologous genes in human and mouse have concealed BSs in their homo-introns (whose flanking exons have conserved borders and coding sequences near the 5′SSs and 3′SSs), showing strictly conserved locations of the secondary structures (Figure 1E and  F , Supplementary Figure S1C and Supplementary Table S4 ). However, this location conservation is dramatically decreased, down to 2 when including the chicken, and to 1 when including the frog (Figure 1E ). The two conserved structures between human, mouse and chicken are located in homo-introns of BCOR (Figure 1G ) and SF3B3 . Interestingly, the SF3B3 structure is also conserved down to the fruit fly (Figure 1H ).

The secondary structure in SF3B3 is evolutionarily colocalized with a downstream PE

We further found that the SF3B3 secondary structure exists in nearly all vertebrates and insects (Figure 2A and  B , and Supplementary Figure S2A and B ). To fully investigate this, we analyzed the overall gene structures of all available eukaryotic SF3B3 s. First, there are two homo-introns in nearly all eukaryotes, except in plants and the yeast S. cerevisiae (Figure 2C and  D ). The flanking exons of the homo-intron IIs are homo-exons ( Supplementary Figure S3A ). Second, there is a PE in most animal SF3B3 s, except in nematodes, fishes and spiders (Figure 2C – E ). Third, the secondary structure with concealed BS is always colocalized upstream to the PE (Figure 2C ).

 alt=

An evolutionarily conserved intronic stem–loop structure is colocalized with a downstream PE in animal SF3B3 genes. Representative secondary structures in SF3B3 genes from vertebrates ( A ) and insects ( B ). More structures from other animals are shown in Supplementary Figure S2 . ( C ) The conserved secondary structure is colocalized with a downstream PE in the homo-intron II of animal SF3B3 genes. Exons in the Drosophila and human Sf3b3 genes are numbered. Arrows, positions of the conserved structures; boxes downstream of the arrow, PEs. ( D ) Polyphyletic analyses of the SF3B3 homo-intron IIs, PEs and the stem–loop structures. ( E ) Sequences of the Sf3b3 -PEs from representative species; each has a stop codon when it is spliced into mRNAs.

The SF3B3 secondary structures consist of a stem with 14–17 base pairs that are sparsely interrupted by 1–2 nt and a loop usually 5–8 nt in length (Figures 1H and  2A and  B , and Supplementary Figure S2B ). As to primary sequences, both sides of the stem are highly conserved in 80 vertebrates showing sequences of GGCUGGUACUUGGUG and UACCAAUACCAAGCC. The primary sequences are less conserved in 43 available insects, showing CUGGUGnnUUG and CnRACAYCAG, respectively (Figure 2B , Supplementary Figures S2A and B and S3B , and Supplementary Table S5 ). However, the stabilities of their stem structures are similar, indicating an evolutionary conservation of the secondary structures.

The secondary structures with concealed BSs regulate AS of SF3B3- PEs

Online data indicate that all the Sf3b3- PEs are alternatively spliced, producing either full-length or PTC-included mRNAs ( Supplementary Figure S3C and D ). We identified BS-adenosines for splicing of the Drosophila and human Sf3b3 -PEs according to Drosophila circular RNA-seq ( 43 ) and human BS data ( 40 ) ( Supplementary Figure S3E and F ), which are base-paired and located on the right sides of the stems. To address whether the BS-adenosines were limited in splicing, both Drosophila and human Sf3b3 - WT and mutant mini-genes were constructed and transfected into Drosophila S2 and human 293T cells, respectively. Two kinds of mutants disrupt the stem structure, including mutL with a 16-nt mutation on the left side and ΔL with a 23-nt deletion on the left side. By RT-PCR, only 29% of the Drosophila PE 5a (E5a) was spliced and included from the WT mini-gene, whereas 57% and 48% of E5a were spliced-in from the mutL and ΔL mini-genes, respectively (Figure 3A ). Similarly, only 60% of the human PE 11a (E11a) was spliced-in from the human WT mini-gene, whereas 87% and 94% of E11a were spliced-in from the human mutL and ΔL mini-genes, respectively (Figure 3B ). Splicing of E5a from two more mutants ( ΔupL and ΔlowL ) was also significantly increased, in which the left side was deleted in either the upper half or lower half to partially decrease the structural stability (Figure 3C , lanes 1–3). The half left-side deletion mutants showed more increased splicing of E5a than the whole left-side deletion mutant, implying that the whole deletion allows exposure of all the alternative BSs and results in BS competition, while half deletion only allows partial exposure and less BS competition and higher splicing activity. Splicing of the endogenous PE (11a) was also enhanced in 293T cells by a 2′- O -methylated RNA antisense oligo targeting the left side of the stem structure ( Supplementary Figure S3G ). Taken together, the conserved SF3B3 intronic structure inhibits splicing of its downstream PE in both Drosophila and human cells, and the structural stability is critical for AS of the PE. To verify that the right side of the stem has BS-adenosines, we further constructed several A-to-C mutations in the ΔlowL background. Splicing of E5a from two single-point ( A1c , A2c ) and one double-point ( 2c ) mutant mini-genes was not significantly changed. However, splicing of E5a was nearly totally abolished when all five As were mutated to Cs ( 5c ) (Figure 3C and Supplementary Figure S3H ), indicating that the right side of the stem contains multiple BS-adenosines for splicing of the Sf3b3 -PE. To further confirm this, compensatory mutations were made on the right side of the stem. Due to the existence of multiple BSs, new left-side mutations ( mutL9 ) were constructed for both the fly and human mini-genes. We found that compensatory changes on the right side of the stem significantly restore the splicing inhibition of the PEs 5a and 11a (Figure 3D and  E ).

Destabilization of the secondary structure with concealed BSs enhances the AS of the Sf3b3-PEs in both Drosophila and human cells. Disrupted structure by mutation or deletion at the left side of the stem in a Drosophila sf3b3 mini-gene (A) and a human SF3B3 mini-gene (B) results in enhanced splicing of the PE 5a in S2 cells and PE 11a in 293T cells, respectively. (C) Mutations of the potential BS-adenosines at the right side of the stem in the Drosophila sf3b3 mini-gene result in the skipping of the PE (5a) in S2 cells. Compensatory mutation assays for both the human and Drosophila mini-gens. Splicing of the PEs (E5a and E11a) was nearly abolished in a Drosophila sf3b3 mini-gene (D) and a human SF3B3 mini-gene (E) when compensatory mutations were made to the mutated 9-nt in mutL9. (F, G) In-cell validation of the SF3B3 intronic secondary structure by a SHAPE-MaP assay. Analyses of the modification reactivities caused by NAI-N3 reveal that the previously predicted intron region has low reactivity (F) and a modeling SF3B3 structure is constructed using the SHAPE-MaP reactivities as a constraint (G). The predicted structure is listed.

Destabilization of the secondary structure with concealed BSs enhances the AS of the Sf3b3 -PEs in both Drosophila and human cells. Disrupted structure by mutation or deletion at the left side of the stem in a Drosophila sf3b3 mini-gene ( A ) and a human SF3B3 mini-gene ( B ) results in enhanced splicing of the PE 5a in S2 cells and PE 11a in 293T cells, respectively. ( C ) Mutations of the potential BS-adenosines at the right side of the stem in the Drosophila sf3b3 mini-gene result in the skipping of the PE (5a) in S2 cells. Compensatory mutation assays for both the human and Drosophila mini-gens. Splicing of the PEs (E5a and E11a) was nearly abolished in a Drosophila sf3b3 mini-gene ( D ) and a human SF3B3 mini-gene ( E ) when compensatory mutations were made to the mutated 9-nt in mutL9 . ( F , G ) In-cell validation of the SF3B3 intronic secondary structure by a SHAPE-MaP assay. Analyses of the modification reactivities caused by NAI-N3 reveal that the previously predicted intron region has low reactivity (F) and a modeling SF3B3 structure is constructed using the SHAPE-MaP reactivities as a constraint (G). The predicted structure is listed.

We also performed a SHAPE-MaP assay to validate this SF3B3 secondary structure in cells. Analyses of the modification reactivities caused by NAI-N3 ( 45 ) showed that the previously predicted intron region has low reactivity, especially on the right side of the stem (Figure 3F ), allowing the construction of a well structure modeling in the SF3B3 intron 11 (Figure 3G ). These data experimentally validate the in-cell existence of the intronic SF3B3 secondary structure.

The stability of the Sf3b3  structure is sensitive to U2 and U2-associated proteins

To investigate which RNA-binding proteins (RBPs) are involved in the recognition of the SF3B3 structure and AS regulation of the PE, we searched two online resources: (i) human eCLIP data on ENCODE ( 52 ), from which two U2 snRNP components, SF3A3 and SF3B4, have binding sites exactly on the structure, and three splicing factors (SFs), AQR, U2AF2 and BUD13, on the PE and its 3′SS ( Supplementary Figure S4A ); and (ii) NCBI-deposited RNA-seq data of 56 RBP-RNAi in Drosophila S2 cells ( 53 ), from which RNA interference (RNAi) of 17 RBPs has effects on splicing of the Drosophila PE ( Supplementary Figure S4B ). Combining this information, we performed a wide screen of dsRNA-induced RNAi in S2 cells for 28 RBPs, which covers nearly all U2 proteins, representative U1 proteins and other RNA processing factors ( Supplementary Figure S5A and B ). We also tested the effects of the Sf3b3-KD on other U2 factors; many of them were either up- or downregulated ( Supplementary Figure S5C ), suggesting a cross-regulation between those RBPs. However, we did not observe splicing change of the Sf3b3 -PE with overexpression of other RBPs ( Supplementary Figure S5D ). RNAi of the early-stage SFs, including U1-70K (U1 component), U2A′, subunits of Sf3a and Sf3b (U2 components), U2af38, Caper and Tat-SF1 (U2-associated proteins), and an SR protein SREK1, significantly inhibited splicing of the endogenous E5a, whereas RNAi of the later-stage SFs or other RNA processing factors did not inhibit or only slightly improved it (Figure 4A and  B ). These results suggest that the PE splicing is sensitive to factors that are involved in early intron recognition and spliceosome assembly.

RNAi screening of RBPs that have effects on AS of the Sf3b3-PE. (A) Knockdown of 28 RBPs using dsRNA in S2 cells to screen RBPs that have effects on AS of the endogenous Sf3b3-PE (5a). (B) Quantitation of the endogenous Sf3b3 isoforms with spliced-in PE in RBP knocked down cells. (C) AS of the Sf3b3-PE from WT and mutant mini-genes in RBP-KD S2 cells. (D) Quantitation of the Sf3b3 isoforms from mini-genes with the spliced-in PE. Statistical data are shown as mean ± standard error of the mean (SEM): *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, ns: no significance.

RNAi screening of RBPs that have effects on AS of the Sf3b3 -PE. ( A ) Knockdown of 28 RBPs using dsRNA in S2 cells to screen RBPs that have effects on AS of the endogenous Sf3b3 -PE (5a). ( B ) Quantitation of the endogenous Sf3b3 isoforms with spliced-in PE in RBP knocked down cells. ( C ) AS of the Sf3b3 -PE from WT and mutant mini-genes in RBP-KD S2 cells. ( D ) Quantitation of the Sf3b3 isoforms from mini-genes with the spliced-in PE. Statistical data are shown as mean ± standard error of the mean (SEM): * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, ns: no significance.

To understand the contribution of stability of this structure, we tested the splicing of previous mini-genes under RNAi conditions. Similar to the endogenous Sf3b3 -PE, RNAi of U1-70K , U2af38 , Caper, Sf3a3 and Sf3b4 significantly inhibited splicing of E5a from the WT mini-gene, whereas RNAi of other SFs ( U4/U6-60K and BUD13 ) did not. However, the splicing was not inhibited by RNAi of U1-70K for pre-mRNAs from the mutL and ΔL mini-genes (Figure 4C and  D , and Supplementary Figure S5E ). Importantly, the inhibition was still present by RNAi of U2 or U2-related proteins, especially Sf3a3 and Sf3b4 (Figure 4C and  D ), indicating that the stability of the secondary structure is more sensitive to the levels of U2 and U2-associated proteins.

SF3B3 facilitates DNA repair and protects genome stability in response to UV irradiation

SF3B3 protein is unique in structural domains. Unlike other SFs that typically have RNA binding motif(s), SF3B3 proteins have an MMS1_N domain and a CSPF_A domain (Figure 5A ) that usually appear in factors involved in DNA repair and 3′-end cleavage/polyadenylation, respectively ( 26 , 27 ). To investigate whether SF3B3 is involved in DNA repair, we altered its levels through overexpression (OE) or shRNA-induced knockdown (KD) in both human 293T and HaCaT cells ( Supplementary Figure S6C ). After UV irradiation, we performed a time course of cell growth and detected signals of the DNA damage indicator γ-H2AX. By using a CCK8 assay, we found that SF3B3-OE was helpful and SF3B3-KD was harmful to the survival of both cells ( Supplementary Figure S6A ). The total cellular γ-H2AX signal (shown by western blotting) was significantly decreased in the two SF3B3-OE cells, whereas significantly increased in the SF3B3-KD cells (Figure 5B and  C , and Supplementary Figure S6B ). Further immunostaining assays revealed that the γ-H2AX foci, which are quantitatively the same as that of double-strand breaks ( 54 ), were significantly decreased in the SF3B3-OE cells, but significantly increased in the SF3B3-KD cells (Figure 5D and  E , and Supplementary Figure S6D and E ). Immunostaining signals of the endogenous SF3B3 and γ-H2AX in the 293T cells indicated that they are partly colocalized ( Supplementary Figure S6F ).

SF3B3 facilitates DNA repair and protects genome stability in human 293T cells by enhancing the ERCC6/CSB–RNAPII (RNA polymerase II) interaction. (A) Schematics of the conserved domains in human and Drosophila SF3B3 proteins. Time courses of the cellular γ-H2AX shown by western blotting in 293T cells post-UV irradiation, in which SF3B3 was either overexpressed (B) or knocked down by shRNA (C). Immunostaining of γ-H2AX in the SF3B3-OE (D) or SF3B3-KD (E) 293T cells. Photos were taken at 3 h post-UV irradiation. Comet assays of genome stabilities in the SF3B3-OE (F) or shRNA knockdown (G) 293T cells with and without UV irradiation. (H) Comet assays in 293T cells with expression of truncated SF3B3s. Truncations of SF3B3 are indicated, and overexpressions of SF3B4 and DDB1 were used as controls. (I) Co-IP assays using the antibody against ERCC6/CSB in SF3B3-OE or SF3B3-KD 293T cells with or without UV irradiation. UV irradiation stimulated ERCC6/CSB co-IPed more SF3B3 and RNAPII, and the ERCC6/CSB–RNAPII interaction is enhanced by SF3B3-OE and inhibited by SF3B3-KD. (J) Overlapping of DEGs and differentially alternatively spliced events (DASs) between the SF3B3-OE, SF3B3-MMS1-OE and SF3B4-OE cells. Statistical data are shown as mean ± SEM: *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, ns: no significance.

SF3B3 facilitates DNA repair and protects genome stability in human 293T cells by enhancing the ERCC6/CSB–RNAPII (RNA polymerase II) interaction. ( A ) Schematics of the conserved domains in human and Drosophila SF3B3 proteins. Time courses of the cellular γ-H2AX shown by western blotting in 293T cells post-UV irradiation, in which SF3B3 was either overexpressed ( B ) or knocked down by shRNA ( C ). Immunostaining of γ-H2AX in the SF3B3-OE ( D ) or SF3B3-KD ( E ) 293T cells. Photos were taken at 3 h post-UV irradiation. Comet assays of genome stabilities in the SF3B3-OE ( F ) or shRNA knockdown ( G ) 293T cells with and without UV irradiation. ( H ) Comet assays in 293T cells with expression of truncated SF3B3s. Truncations of SF3B3 are indicated, and overexpressions of SF3B4 and DDB1 were used as controls. ( I ) Co-IP assays using the antibody against ERCC6/CSB in SF3B3-OE or SF3B3-KD 293T cells with or without UV irradiation. UV irradiation stimulated ERCC6/CSB co-IPed more SF3B3 and RNAPII, and the ERCC6/CSB–RNAPII interaction is enhanced by SF3B3-OE and inhibited by SF3B3-KD. ( J ) Overlapping of DEGs and differentially alternatively spliced events (DASs) between the SF3B3-OE, SF3B3-MMS1-OE and SF3B4-OE cells. Statistical data are shown as mean ± SEM: * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, ns: no significance.

We then investigated the genome stability by a comet assay, which quantifies both single- and double-strand DNA breaks through in vitro single-cell gel electrophoresis ( 48 ). Before UV irradiation, all cells showed good genome stability, indicated by an intact nucleoid body concentrated in the comet head without detectable tail moments that represent the migrated fragments of damaged DNAs. Post-UV irradiation, the SF3B3-OE cells maintained better genome stabilities than the vector-only controls, showing fewer tail moments, while the SF3B3-KD cells maintained worse genome stabilities, showing significantly more tail moments (Figure 5F and  G , and Supplementary Figure S6G and H ). Together with the detection of the γ-H2AX signals, these data indicate that cells with different levels of SF3B3 have altered abilities in response to UV irradiation, with more SF3B3 resulting in stronger/faster DNA repair and genome stability. We further found that expression of the MMS1_N domain alone was sufficient to protect genome stability, similar to the full-length SF3B3-OE, showing fewer comet tail moments in the UV-irradiated cells, whereas expression of the CPSF_A domain did not (Figure 5H and Supplementary Figure S6I ). As controls, the overexpression of DDB1, which also has an MMS1_N and a CPSF_A domain, exhibited strong protection, whereas the overexpression of SF3B4, another subunit of the SF3b complex, exhibited no detectable protection.

SF3B3 facilitates ERCC6/CSB–RNAPII interaction in transcription-coupled DNA repair

To address how SF3B3 facilitates DNA repair, we performed co-IPs in human 293T cells using an antibody against ERCC6/CSB, which is a DNA-dependent ATPase and plays a critical role in transcription-coupled DNA repair (TC-NER). The results showed that ERCC6 co-IPed both endogenous and overexpressed SF3B3, and the interaction was enhanced by UV irradiation (Figure 5I , cf. odd and even lanes). Without UV, ERCC6/CSB co-IPed more RNAPII from the SF3B3-OE cells than its control cells (Figure 5I , cf. lanes 3 to 1) and less RNAPII from the SF3B3-KD cells than its control cells (cf. lanes 7 to 5). Post-UV irradiation, the ERCC6/CSB–RNAPII interaction was significantly enhanced, resulting in much more co-IPed RNAPII by the ERCC6/CSB antibody (cf. even to odd lanes), consistent with previous reports that ERCC6/CSB interacts loosely with the elongating RNAPII and becomes more tightly bound when transcription is arrested on DNA lesions ( 55 ). The enhanced ERCC6/CSB–RNAPII interaction was further increased by SF3B3-OE (cf. lanes 4 to 2) but decreased by SF3B3-KD (cf. lanes 8 to 6). We conclude that SF3B3 facilitates the interaction between ERCC6/CSB and RNPAII, which would enhance the recruitment of other TC-NER factors to the DNA lesions and thereby increase DNA repair activity.

To address whether the SF3B3’s function on DNA repair is direct or indirect, we performed RNA-seq of the 293T cells with SF3B3-OE as well as controls of SF3B3-MMS1-OE and SF3B4-OE, and analyzed DEGs and DASs in those cells ( Supplementary Figure S7A and B ). First, none of the DEGs in the SF3B3-OE cells was from the 219 annotated genes that are involved in DNA repair pathways, and 2 and 7 DEGs were in the MMS1-OE and SF3B4-OE cells, respectively (Figure 5J , upper). Second, 19, 29 and 26 genes with DASs are DNA repair factors in the three overexpressed cells, respectively (Figure 5J , lower). Since our previous assays indicated that the SF3B4-OE has no effects on DNA repair and the MMS1-OE is sufficient to enhance DNA repair as the SF3B3-OE did, we focused on the overlapping genes only between the SF3B3-OE and MMS1-OE, which includes NEIL1, POLL, RMI1 and XRCC3; none of them is in the NE-TCR subpathway. In addition, AS changes of these four genes are totally different between the SF3B3-OE and MMS1-OE cells ( Supplementary Figure S7C ). Taken together, we concluded that the impacts on DNA repair by SF3B3-OE are not due to expression or splicing changes of DNA repair factors.

DNA repair is altered in the SF3B3  structure-mutated Drosophila and human cells

To investigate the in vivo function of the conserved SF3B3 secondary structure, we constructed two structure-mutated Drosophila strains using a CRISPR/Cas9-mediated system ( 22 ). As described for the previously studied mini-genes, the sf3b3-mutL Drosophila has mutations to disrupt the structure stability, and the sf3b3-5c has five A-to-C mutations to eliminate BS-adenosines (Figure 6A and Supplementary Figure S8A ). As expected, RT-PCR analyses of adults, heads, gonads and the third instar larva from the Drosophila females and males revealed that the PE inclusion was significantly increased in sf3b3 - mutL , but abolished in sf3b3 - 5c (Figure 6B and  C , and Supplementary Figure S8B ). We observed that Sf3b3 levels were significantly higher in female adults and gonads than in those of males from all WT and mutant strains, but levels were similar between the female and male larvae and heads, consistent with analyses of online RNA-seq data ( Supplementary Figure S8C ). Using a commercial SF3B3 antibody, western blotting confirmed that levels of Sf3b3 protein in the adults were significantly decreased in sf3b3 - mutL but increased in sf3b3 - 5c (Figure 6B ), indicating that in vivo mutations in the Sf3b3 structure alter AS of its downstream PE and thus protein levels. Unfortunately, this antibody and several polyclonal antibodies we made could not give signals from samples of the third instar larva and adult head.

In vivo mutations in the SF3B3 secondary structures result in altered DNA repair activities. (A) Schematics of two structure-mutated Drosophila strains, which were generated by a CRISPR/Cas9-mediated knock-in system. mutL strain, the left side of the stem was mutated; 5c strain, all the five potential BSs were mutated. Mutated sequences are indicated. AS analysis of Sf3b3-PE in the Drosophila mutant adults (B) and third instar larvae (C) by RT-PCR and western blotting. (D) Survival of the Drosophila mutant larvae post-UV irradiation. No UV irradiations were tested as controls. (E) The time course of the γ-h2AvD signals in the WT and mutant larvae post-UV irradiation by western blotting. (F) Schematic of the structure-mutated human cell line, which was generated by a CRISPR/Cas9-mediated knock-in system. ΔL cell line, the left side of the stem was deleted. (G) The time course of γ-H2AX signals in the WT and mutant human cell lines post-UV irradiation. Quantitation of the γ-h2AvD and γ-H2AX signals was performed based on data from triplets. (H) Immunostaining of γ-H2AX in the WT and ΔL human cells. Photos were taken at 3 h post-UV irradiation. (I) Comet assays for detection of genome stabilities in the WT and ΔL cells with and without UV irradiation. Statistical data are shown as mean ± SEM: *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, ns: no significance.

In vivo mutations in the SF3B3 secondary structures result in altered DNA repair activities. ( A ) Schematics of two structure-mutated Drosophila strains, which were generated by a CRISPR/Cas9-mediated knock-in system. mutL strain, the left side of the stem was mutated; 5c strain, all the five potential BSs were mutated. Mutated sequences are indicated. AS analysis of Sf3b3 -PE in the Drosophila mutant adults ( B ) and third instar larvae ( C ) by RT-PCR and western blotting. ( D ) Survival of the Drosophila mutant larvae post-UV irradiation. No UV irradiations were tested as controls. ( E ) The time course of the γ-h2AvD signals in the WT and mutant larvae post-UV irradiation by western blotting. ( F ) Schematic of the structure-mutated human cell line, which was generated by a CRISPR/Cas9-mediated knock-in system. ΔL cell line, the left side of the stem was deleted. ( G ) The time course of γ-H2AX signals in the WT and mutant human cell lines post-UV irradiation. Quantitation of the γ-h2AvD and γ-H2AX signals was performed based on data from triplets. ( H ) Immunostaining of γ-H2AX in the WT and ΔL human cells. Photos were taken at 3 h post-UV irradiation. ( I ) Comet assays for detection of genome stabilities in the WT and ΔL cells with and without UV irradiation. Statistical data are shown as mean ± SEM: * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, ns: no significance.

Under regular conditions, the fly mutants did not exhibit obvious developmental phenotypes, such as in hatching, pupation, eclosion or fecundity ( Supplementary Figure S8D–G ). To address whether these mutants had changed in vivo DNA repair activities, we established a UV-resistant assay using the third instar larvae. Compared to the WT , the mutL animals showed ∼28% decreased survival post-UV irradiation; in contrast, the 5c animals showed ∼32% increased survival (Figure 6D ). The oppositely changed UV resistance is consistent with their Sf3b3 levels in vivo . Further detection revealed that sf3b3 - mutL exhibited stronger and longer signals of γ-h2AvD, the Drosophila homolog of γ-H2AX, than those in the WT post-UV irradiation, but sf3b3 - 5c exhibited weaker and shorter signals (Figure 6E ).

We also constructed a ΔL cell line from human 293T cells (Figure 6F and Supplementary Figure S8H ) to disrupt the human SF3B3 structure using a CRISPR/Cas9-mediated system ( 46 ). Similar to the mutL fly, the ΔL cells exhibited more PE inclusion and less SF3B3 protein (Figure 6G and Supplementary Figure S8H ), and stronger and longer γ-H2AX signals post-UV irradiation (Figure 6G and  H ). Comet assays showed that ΔL had more tail moments post-UV irradiation, indicating worse genome stability (Figure 6I ). These data indicate that in vivo DNA repair activity is regulated by alteration of the SF3B3 level through mutations in the conserved secondary structures with concealed BSs.

AS plays a critical role in differentiation, development and diseases. It is of great importance to understand their regulatory mechanisms. AS is usually determined by the existence and accessibility of cis -elements on the transcript and the existence of active trans -factors. Besides the primary sequences, RNA secondary structures could regulate AS by either bringing long-distance SSs together to promote splicing or blocking the accessibility of cis -elements to suppress interaction between the cis -elements and trans -factors. Regulatory RNA structures near the BS have been sporadically found, including the structures between the BS and 3′SS in S. cerevisiae genes ( 56–58 ), and a stem–loop structure that encompasses the BS and 3′SS in human growth hormone transcript ( 59 ). In this study, we perform the first large-scale survey in transcriptomes and find >4000 concealed BSs in human, and hundreds of them in other five species.

The most exciting example of these RNA structures is in animal SF3B3 genes, which are colocalized with a downstream PE. Conservation of PEs in homologous genes in vertebrates has been shown to be important for the regulation of gene functions ( 60 ). Here, we experimentally validate the structure in cells and demonstrate that the SF3B3 structure contains BS-adenosines on the right side of the stem. The stable secondary structure blocks the usage of BS-adenosines for splicing of the PE; however, the destabilized structure activates the BS-adenosines and allows for more efficient splicing (Figure 7 left). Inclusion of the PE introduces a PTC into the transcript, triggering the NMD pathway, whereas skipping of the PE results in a full-length mRNA isoform for translation. Thus, AS regulation of PE is critical for balancing the two RNA isoforms and eventual levels of the functional protein.

Conserved intronic RNA secondary structure with concealed BS regulates AS of the SF3B3-PE and modulates the transcription-coupled DNA repair. Left: Conserved intronic RNA secondary structure with concealed BS regulates AS of its downstream PE. The stable secondary structure blocks the usage of the BS-adenosines, while the destabilized structure allows for efficient splicing of the downstream PE. The PE-skipped isoform will be translated into a functional protein, while the PE-included isoform will be degraded through the NMD pathway. The AS balance of the two RNA isoforms determines the level of protein. Right: Regulated by the most conserved secondary structure with concealed BSs in animal introns, SF3B3 facilitates DNA repair and protects genome stability by enhancing the interaction between ERCC6/CSB and the arrested RNAPII on the UV irradiation-damaged DNA, as well as functioning in the pre-mRNA splicing. TC-NER, transcription-coupled nucleotide excision repair; CSA and CSB are key factors in the TC-NER subpathway. The MMS1 domain in SF3B3 is sufficient for facilitating DNA repair.

Conserved intronic RNA secondary structure with concealed BS regulates AS of the SF3B3 -PE and modulates the transcription-coupled DNA repair. Left: Conserved intronic RNA secondary structure with concealed BS regulates AS of its downstream PE. The stable secondary structure blocks the usage of the BS-adenosines, while the destabilized structure allows for efficient splicing of the downstream PE. The PE-skipped isoform will be translated into a functional protein, while the PE-included isoform will be degraded through the NMD pathway. The AS balance of the two RNA isoforms determines the level of protein. Right: Regulated by the most conserved secondary structure with concealed BSs in animal introns, SF3B3 facilitates DNA repair and protects genome stability by enhancing the interaction between ERCC6/CSB and the arrested RNAPII on the UV irradiation-damaged DNA, as well as functioning in the pre-mRNA splicing. TC-NER, transcription-coupled nucleotide excision repair; CSA and CSB are key factors in the TC-NER subpathway. The MMS1 domain in SF3B3 is sufficient for facilitating DNA repair.

Interestingly, in the 41 structures with concealed BSs that are conserved in human and mouse genes (Figure 1E ), 7 are colocalized with PEs, 8 with EEs and 26 with in-frame exons, untranslated regions (UTRs) or retained introns ( Supplementary Figure S9A–D ). Including the SF3B3 -PE, those PEs and EEs are either downstream or upstream adjacent to the concealed BSs, implying a common regulatory mechanism for splicing of cassette exons whose inclusion or skipping will disrupt the coding sequences. We believe that future BS-related sequencing, especially sequencing of enriched lariats in various tissues from multiple organisms, will identify more BSs on large scale with accurate locations and enable us to further investigate how cassette exons are alternatively spliced.

As one of the highly conserved subunits in the 17S U2 snRNP, SF3B3 functions during the early stages of intron recognition and spliceosome assembly (Figure 7 upper right). It has been reported that SF3B3 is involved in the regulation of transcription, interacting with the STAGA and TFTC complexes ( 24 , 25 ), and the knockdown of SF3B3 is linked with the ionizing radiation- and hydroxyurea-triggered DNA repair through homologous recombination ( 28 ). Here, we find that, in addition to splicing, SF3B3 facilitates DNA repair and protects genome stability in response to UV irradiation, and its MMS1 domain is sufficient to fulfill this function (Figure 7 lower right). Importantly, the enhanced PE-skipping fly mutant ( 5c ) exhibits stronger UV resistance and faster γ-h2AvD disappearance, whereas the enhanced PE-inclusion fly mutant ( mutL ) exhibits the opposite phenotypes. The enhanced PE-inclusion human mutant cell shows a decreased genome stability and slower γ-H2AX disappearance. Therefore, AS of SF3B3 -PE is regulated by the intronic structure with concealed BSs, modulating DNA repair in vivo .

Processing of RNA splicing is coupled with transcription, functioning together as a delicate network to tune gene expression ( 61 , 62 ). Two recognition subpathways have been identified for nucleotide excision repair, global genomic repair and transcription-coupled repair. Here, we reveal that SF3B3 facilitates the TC-NER by enhancing the interaction between ERCC6/CSB and the arrested RNAPII on DNA lesions caused by UV irradiation (Figure 7 lower right). RNA-seq data from the SF3B3-OE, SF3B3-MMS1-OE and SF3B4-OE human cells (Figure 5J , Supplementary Figure S7 and Supplementary Table S6 ) indicate that the SF3B3-facilitated DNA repair occurs through the SF3B3 protein directly.

The next interesting question is how the PE-associated secondary structures are recognized and regulated. RNA structures themselves may act as a target for regulatory proteins or other small molecules to mediate AS ( 37 ). Our data suggest that the stability of the SF3B3 structure is sensitive to U2 and U2-associated proteins, implying that fluctuation of U2-related proteins during differentiation and development would affect the recognition or stability of the SF3B3 structure and consequentially influence the AS. The levels of U2 proteins do differ in cell types (The Human Protein Atlas, https://proteinatlas.org/ ) as do levels of U2 snRNA variant isoforms ( 63 ).

In response to environmental stimuli or stress, several flexible secondary structures presented in the UTRs have been found in the regulation of mRNA translation ( 64 ). Our preliminary data suggest that AS of the SF3B3 -PE is enhanced when the fruit fly is cultured at high temperatures, especially in males ( Supplementary Figure S9E ). In future studies, it will be interesting to know which environmental stimuli or stresses could destabilize the SF3B3 secondary structure.

Next-generation sequencing data have been submitted to the Gene Expression Omnibus (accession number GSE223117).

Supplementary Data are available at NAR Online.

The authors are grateful to Charles Query for his insightful comments, to Wei Shao for reagents, to Sophie Xu for her English editing and to other members of the Xu lab for discussions and critical reading of the manuscript.

Author contributions : H.L., Z.D., Y.-J.F. and Y.-Z.X. conceived the project and designed the experiments. H.L., Z.D., Z.-Y.F., N.L. and H.-Y.A. performed the experiments and data analyses. Z.D. performed the bioinformatic analyses and Y.Z. performed the SHAPE-MaP analysis. H.L., Z.D., Y.-J.F. and Y.-Z.X. wrote the manuscript.

National Key Research and Development Program of China [2021YFA1100500 and 2021YFC2700700]; National Natural Science Foundation of China [31971225, 32261133522, 31570821 and 91440109]. Funding for open access charge: National Key Research and Development Program of China; National Natural Science Foundation of China.

Conflict of interest statement . All authors declare that they have no conflicts of interest.

Black   D.L.   Mechanisms of alternative pre-messenger RNA splicing . Annu. Rev. Biochem.   2003 ; 72 : 291 – 336 .

Google Scholar

Ule   J. , Blencowe   B.J.   Alternative splicing regulatory networks: functions, mechanisms, and evolution . Mol. Cell . 2019 ; 76 : 329 – 345 .

Djebali   S. , Davis   C.A. , Merkel   A. , Dobin   A. , Lassmann   T. , Mortazavi   A. , Tanzer   A. , Lagarde   J. , Lin   W. , Schlesinger   F.  et al. .   Landscape of transcription in human cells . Nature . 2012 ; 489 : 101 – 108 .

Kastner   B. , Will   C.L. , Stark   H. , Luhrmann   R.   Structural insights into nuclear pre-mRNA splicing in higher eukaryotes . Cold Spring Harb. Perspect. Biol.   2019 ; 11 : a032417 .

Wan   R. , Bai   R. , Zhan   X. , Shi   Y.   How is precursor messenger RNA spliced by the spliceosome? . Annu. Rev. Biochem.   2020 ; 89 : 333 – 358 .

Yang   F. , Wang   X.Y. , Zhang   Z.M. , Pu   J. , Fan   Y.J. , Zhou   J. , Query   C.C. , Xu   Y.Z.   Splicing proofreading at 5′ splice sites by ATPase Prp28p . Nucleic Acids Res.   2013 ; 41 : 4660 – 4670 .

Zhang   Z. , Will   C.L. , Bertram   K. , Dybkov   O. , Hartmuth   K. , Agafonov   D.E. , Hofele   R. , Urlaub   H. , Kastner   B. , Luhrmann   R.  et al. .   Molecular architecture of the human 17S U2 snRNP . Nature . 2020 ; 583 : 310 – 313 .

Smith   D.J. , Query   C.C. , Konarska   M.M.   “Nought may endure but mutability”: spliceosome dynamics and the regulation of splicing . Mol. Cell . 2008 ; 30 : 657 – 666 .

Gao   K. , Masuda   A. , Matsuura   T. , Ohno   K.   Human branch point consensus sequence is yUnAy . Nucleic Acids Res.   2008 ; 36 : 2257 – 2267 .

Taggart   A.J. , DeSimone   A.M. , Shih   J.S. , Filloux   M.E. , Fairbrother   W.G.   Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo . Nat. Struct. Mol. Biol.   2012 ; 19 : 719 – 721 .

Awan   A.R. , Manfredo   A. , Pleiss   J.A.   Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans . Proc. Natl Acad. Sci. U.S.A.   2013 ; 110 : 12762 – 12767 .

Mercer   T.R. , Clark   M.B. , Andersen   S.B. , Brunck   M.E. , Haerty   W. , Crawford   J. , Taft   R.J. , Nielsen   L.K. , Dinger   M.E. , Mattick   J.S.   Genome-wide discovery of human splicing branchpoints . Genome Res.   2015 ; 25 : 290 – 303 .

Zeng   Y. , Fair   B.J. , Zeng   H. , Krishnamohan   A. , Hou   Y. , Hall   J.M. , Ruthenburg   A.J. , Li   Y.I. , Staley   J.P.   Profiling lariat intermediates reveals genetic determinants of early and late co-transcriptional splicing . Mol. Cell . 2022 ; 82 : 4681 – 4699 .

Bertram   K. , Agafonov   D.E. , Dybkov   O. , Haselbach   D. , Leelaram   M.N. , Will   C.L. , Urlaub   H. , Kastner   B. , Luhrmann   R. , Stark   H.   Cryo-EM structure of a pre-catalytic human spliceosome primed for activation . Cell . 2017 ; 170 : 701 – 713 .

Will   C.L. , Urlaub   H. , Achsel   T. , Gentzel   M. , Wilm   M. , Luhrmann   R.   Characterization of novel SF3b and 17S U2 snRNP proteins, including a human Prp5p homologue and an SF3b DEAD-box protein . EMBO J.   2002 ; 21 : 4978 – 4988 .

Zhan   X. , Yan   C. , Zhang   X. , Lei   J. , Shi   Y.   Structures of the human pre-catalytic spliceosome and its precursor spliceosome . Cell Res.   2018 ; 28 : 1129 – 1140 .

Plaschka   C. , Lin   P.C. , Nagai   K.   Structure of a pre-catalytic spliceosome . Nature . 2017 ; 546 : 617 – 621 .

Gozani   O. , Potashkin   J. , Reed   R.   A potential role for U2AF–SAP 155 interactions in recruiting U2 snRNP to the branch site . Mol. Cell. Biol.   1998 ; 18 : 4752 – 4760 .

Will   C.L. , Schneider   C. , MacMillan   A.M. , Katopodis   N.F. , Neubauer   G. , Wilm   M. , Luhrmann   R. , Query   C.C.   A novel U2 and U11/U12 snRNP protein that associates with the pre-mRNA branch site . EMBO J.   2001 ; 20 : 4536 – 4546 .

Tholen   J. , Razew   M. , Weis   F. , Galej   W.P.   Structural basis of branch site recognition by the human spliceosome . Science . 2022 ; 375 : 50 – 57 .

Papaemmanuil   E. , Cazzola   M. , Boultwood   J. , Malcovati   L. , Vyas   P. , Bowen   D. , Pellagatti   A. , Wainscoat   J.S. , Hellstrom-Lindberg   E. , Gambacorti-Passerini   C.  et al. .   Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts . N. Engl. J. Med.   2011 ; 365 : 1384 – 1395 .

Zhang   B. , Ding   Z. , Li   L. , Xie   L.K. , Fan   Y.J. , Xu   Y.Z.   Two oppositely-charged sf3b1 mutations cause defective development, impaired immune response, and aberrant selection of intronic branch sites in Drosophila . PLoS Genet.   2021 ; 17 : e1009861 .

Martinez   E. , Palhan   V.B. , Tjernberg   A. , Lymar   E.S. , Gamper   A.M. , Kundu   T.K. , Chait   B.T. , Roeder   R.G.   Human STAGA complex is a chromatin-acetylating transcription coactivator that interacts with pre-mRNA splicing and DNA damage-binding factors in vivo . Mol. Cell. Biol.   2001 ; 21 : 6782 – 6795 .

Herbst   D.A. , Esbin   M.N. , Louder   R.K. , Dugast-Darzacq   C. , Dailey   G.M. , Fang   Q. , Darzacq   X. , Tjian   R. , Nogales   E.   Structure of the human SAGA coactivator complex . Nat. Struct. Mol. Biol.   2021 ; 28 : 989 – 996 .

Brand   M. , Moggs   J.G. , Oulad-Abdelghani   M. , Lejeune   F. , Dilworth   F.J. , Stevenin   J. , Almouzni   G. , Tora   L.   UV-damaged DNA-binding protein in the TFTC complex links DNA damage recognition to nucleosome acetylation . EMBO J.   2001 ; 20 : 3187 – 3196 .

Clerici   M. , Faini   M. , Muckenfuss   L.M. , Aebersold   R. , Jinek   M.   Structural basis of AAUAAA polyadenylation signal recognition by the human CPSF complex . Nat. Struct. Mol. Biol.   2018 ; 25 : 135 – 138 .

Scrima   A. , Konickova   R. , Czyzewski   B.K. , Kawasaki   Y. , Jeffrey   P.D. , Groisman   R. , Nakatani   Y. , Iwai   S. , Pavletich   N.P. , Thoma   N.H.   Structural basis of UV DNA-damage recognition by the DDB1–DDB2 complex . Cell . 2008 ; 135 : 1213 – 1223 .

Tanikawa   M. , Sanjiv   K. , Helleday   T. , Herr   P. , Mortusewicz   O.   The spliceosome U2 snRNP factors promote genome stability through distinct mechanisms; transcription of repair factors and R-loop processing . Oncogenesis . 2016 ; 5 : e280 .

Kim   P. , Yang   M. , Yiya   K. , Zhao   W. , Zhou   X.   ExonSkipDB: functional annotation of exon skipping event in human . Nucleic Acids Res.   2020 ; 48 : D896 – D907 .

Pervouchine   D. , Popov   Y. , Berry   A. , Borsari   B. , Frankish   A. , Guigo   R.   Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay . Nucleic Acids Res.   2019 ; 47 : 5293 – 5306 .

Voskobiynyk   Y. , Battu   G. , Felker   S.A. , Cochran   J.N. , Newton   M.P. , Lambert   L.J. , Kesterson   R.A. , Myers   R.M. , Cooper   G.M. , Roberson   E.D.  et al. .   Aberrant regulation of a poison exon caused by a non-coding variant in a mouse model of Scn1a-associated epileptic encephalopathy . PLoS Genet.   2021 ; 17 : e1009195 .

Leclair   N.K. , Brugiolo   M. , Urbanski   L. , Lawson   S.C. , Thakar   K. , Yurieva   M. , George   J. , Hinson   J.T. , Cheng   A. , Graveley   B.R.  et al. .   Poison exon splicing regulates a coordinated network of SR protein expression during differentiation and tumorigenesis . Mol. Cell . 2020 ; 80 : 648 – 665 .

Lorson   C.L. , Androphy   E.J.   An exonic enhancer is required for inclusion of an essential exon in the SMA-determining gene SMN . Hum. Mol. Genet.   2000 ; 9 : 259 – 265 .

Marquez   Y. , Mantica   F. , Cozzuto   L. , Burguera   D. , Hermoso-Pulido   A. , Ponomarenko   J. , Roy   S.W. , Irimia   M.   ExOrthist: a tool to infer exon orthologies at any evolutionary distance . Genome Biol.   2021 ; 22 : 239 .

Pillmann   H. , Hatje   K. , Odronitz   F. , Hammesfahr   B. , Kollmar   M.   Predicting mutually exclusive spliced exons based on exon length, splice site and reading frame conservation, and exon sequence homology . BMC Bioinformatics . 2011 ; 12 : 270 .

Urbanski   L.M. , Leclair   N. , Anczukow   O.   Alternative-splicing defects in cancer: splicing regulators and their downstream targets, guiding the way to novel cancer therapeutics . Wiley Interdiscip. Rev. RNA . 2018 ; 9 : e1476 .

Xu   B. , Meng   Y. , Jin   Y.   RNA structures in alternative splicing and back-splicing . Wiley Interdiscip. Rev. RNA . 2021 ; 12 : e1626 .

Yang   Y. , Zhan   L. , Zhang   W. , Sun   F. , Wang   W. , Tian   N. , Bi   J. , Wang   H. , Shi   D. , Jiang   Y.  et al. .   RNA secondary structure in mutually exclusive splicing . Nat. Struct. Mol. Biol.   2011 ; 18 : 159 – 168 .

Georgakopoulos-Soares   I. , Parada   G.E. , Wong   H.Y. , Medhi   R. , Furlan   G. , Munita   R. , Miska   E.A. , Kwok   C.K. , Hemberg   M.   Alternative splicing modulation by G-quadruplexes . Nat. Commun.   2022 ; 13 : 2404 .

Pineda   J.M.B. , Bradley   R.K.   Most human introns are recognized via multiple and tissue-specific branchpoints . Genes Dev.   2018 ; 32 : 577 – 591 .

Paggi   J.M. , Bejerano   G.   A sequence-based, deep learning model accurately predicts RNA splicing branchpoints . RNA . 2018 ; 24 : 1647 – 1658 .

Lorenz   R. , Bernhart   S.H. , Honer Zu Siederdissen   C. , Tafer   H. , Flamm   C. , Stadler   P.F. , Hofacker   I.L.   ViennaRNA package 2.0 . Algorithms Mol. Biol.   2011 ; 6 : 26 .

Fan   Y.J. , Ding   Z. , Zhang   Y. , Su   R. , Yue   J.L. , Liang   A.M. , Huang   Q.W. , Meng   Y.R. , Li   M. , Xue   Y.  et al. .   Sex-lethal regulates back-splicing and generation of the sex-differentially expressed circular RNAs . Nucleic Acids Res.   2023 ; 51 : 5228 – 5241 .

Smola   M.J. , Weeks   K.M.   In-cell RNA structure probing with SHAPE-MaP . Nat. Protoc.   2018 ; 13 : 1181 – 1195 .

Wang   J.X. , Zhang   Y. , Zhang   T. , Tan   W.T. , Lambert   F. , Darmawan   J. , Huber   R. , Wan   Y.   RNA structure profiling at single-cell resolution reveals new determinants of cell identity . Nat. Methods . 2024 ; 21 : 411 – 422 .

Paquet   D. , Kwart   D. , Chen   A. , Sproul   A. , Jacob   S. , Teo   S. , Olsen   K.M. , Gregg   A. , Noggle   S. , Tessier-Lavigne   M.   Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9 . Nature . 2016 ; 533 : 125 – 129 .

Oh   K.S. , Bustin   M. , Mazur   S.J. , Appella   E. , Kraemer   K.H.   UV-induced histone H2AX phosphorylation and DNA damage related proteins accumulate and persist in nucleotide excision repair-deficient XP-B cells . DNA Repair (Amst.) . 2011 ; 10 : 5 – 15 .

Lu   Y. , Liu   Y. , Yang   C.   Evaluating in vitro DNA damage using comet assay . J. Vis. Exp.   2017 ; 2017 : 56450 .

Liao   Y. , Smyth   G.K. , Shi   W.   featureCounts: an efficient general purpose program for assigning sequence reads to genomic features . Bioinformatics . 2014 ; 30 : 923 – 930 .

Love   M.I. , Huber   W. , Anders   S.   Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 . Genome Biol.   2014 ; 15 : 550 .

Shen   S. , Park   J.W. , Lu   Z.X. , Lin   L. , Henry   M.D. , Wu   Y.N. , Zhou   Q. , Xing   Y.   rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data . Proc. Natl Acad. Sci. U.S.A.   2014 ; 111 : E5593 – E5601 .

Van Nostrand   E.L. , Freese   P. , Pratt   G.A. , Wang   X. , Wei   X. , Xiao   R. , Blue   S.M. , Chen   J.Y. , Cody   N.A.L. , Dominguez   D.  et al. .   A large-scale binding and functional map of human RNA-binding proteins . Nature . 2020 ; 583 : 711 – 719 .

Brooks   A.N. , Duff   M.O. , May   G. , Yang   L. , Bolisetty   M. , Landolin   J. , Wan   K. , Sandler   J. , Booth   B.W. , Celniker   S.E.  et al. .   Regulation of alternative splicing in Drosophila by 56 RNA binding proteins . Genome Res.   2015 ; 25 : 1771 – 1780 .

Rothkamm   K. , Lobrich   M.   Evidence for a lack of DNA double-strand break repair in human cells exposed to very low X-ray doses . Proc. Natl Acad. Sci. U.S.A.   2003 ; 100 : 5057 – 5062 .

van Gool   A.J. , Citterio   E. , Rademakers   S. , van Os   R. , Vermeulen   W. , Constantinou   A. , Egly   J.M. , Bootsma   D. , Hoeijmakers   J.H.   The Cockayne syndrome B protein, involved in transcription-coupled DNA repair, resides in an RNA polymerase II-containing complex . EMBO J.   1997 ; 16 : 5955 – 5965 .

Gahura   O. , Hammann   C. , Valentova   A. , Puta   F. , Folk   P.   Secondary structure is required for 3′ splice site recognition in yeast . Nucleic Acids Res.   2011 ; 39 : 9759 – 9767 .

Halfter   H. , Gallwitz   D.   Impairment of yeast pre-mRNA splicing by potential secondary structure-forming sequences near the conserved branchpoint sequence . Nucleic Acids Res.   1988 ; 16 : 10413 – 10423 .

Meyer   M. , Plass   M. , Perez-Valle   J. , Eyras   E. , Vilardell   J.   Deciphering 3′ss selection in the yeast genome reveals an RNA thermosensor that mediates alternative splicing . Mol. Cell . 2011 ; 43 : 1033 – 1039 .

Estes   P.A. , Cooke   N.E. , Liebhaber   S.A.   A native RNA secondary structure controls alternative splice-site selection and generates two human growth hormone isoforms . J. Biol. Chem.   1992 ; 267 : 14902 – 14908 .

Thomas   J.D. , Polaski   J.T. , Feng   Q. , De Neef   E.J. , Hoppe   E.R. , McSharry   M.V. , Pangallo   J. , Gabel   A.M. , Belleville   A.E. , Watson   J.  et al. .   RNA isoform screens uncover the essentiality and tumor-suppressor activity of ultraconserved poison exons . Nat. Genet.   2020 ; 52 : 84 – 94 .

Alpert   T. , Herzel   L. , Neugebauer   K.M.   Perfect timing: splicing and transcription rates in living cells . Wiley Interdiscip. Rev. RNA . 2017 ; 8 : e1401 .

Shao   W. , Ding   Z. , Zheng   Z.Z. , Shen   J.J. , Shen   Y.X. , Pu   J. , Fan   Y.J. , Query   C.C. , Xu   Y.Z.   Prp5–Spt8/Spt3 interaction mediates a reciprocal coupling between splicing and transcription . Nucleic Acids Res.   2020 ; 48 : 5799 – 5813 .

Kosmyna   B. , Gupta   V. , Query   C.   Transcriptional analysis supports the expression of human snRNA variants and reveals U2 snRNA homeostasis by an abundant U2 variant . 2020 ; bioRxiv doi: 25 January 2020, preprint: not peer reviewed https://doi.org/10.1101/2020.01.24.917260 .

Reis   R.S. , Deforges   J. , Schmidt   R.R. , Schippers   J.H.M. , Poirier   Y.   An antisense noncoding RNA enhances translation via localized structural rearrangements of its cognate mRNA . Plant Cell . 2021 ; 33 : 1381 – 1397 .

Author notes

Month: Total Views:
March 2024 1,104
April 2024 433
May 2024 230
June 2024 206

Email alerts

Citing articles via.

  • Editorial Board

Affiliations

  • Online ISSN 1362-4962
  • Print ISSN 0305-1048
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Protecting against researcher bias in secondary data analysis: challenges and potential solutions

  • Open access
  • Published: 13 January 2022
  • Volume 37 , pages 1–10, ( 2022 )

Cite this article

You have full access to this open access article

secondary research scholarly articles

  • Jessie R. Baldwin   ORCID: orcid.org/0000-0002-5703-5058 1 , 2 ,
  • Jean-Baptiste Pingault 1 , 2 ,
  • Tabea Schoeler 1 ,
  • Hannah M. Sallis 3 , 4 , 5 &
  • Marcus R. Munafò 3 , 4 , 6  

49k Accesses

33 Citations

179 Altmetric

Explore all metrics

Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society’s most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. In this article, we describe these challenges and propose novel solutions and alternative approaches. Proposed solutions include approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) help ensure that pre-registered analyses will be appropriate for the data, and (4) address difficulties arising from reduced analytic flexibility in pre-registration. For each solution, we provide guidance on implementation for researchers and data guardians. The adoption of these practices can help to protect against researcher bias in secondary data analysis, to improve the robustness of research based on existing data.

Similar content being viewed by others

secondary research scholarly articles

What is Qualitative in Qualitative Research

secondary research scholarly articles

Qualitative Research: Ethical Considerations

Systematic review or scoping review guidance for authors when choosing between a systematic or scoping review approach.

Avoid common mistakes on your manuscript.

Introduction

Secondary data analysis has the potential to provide answers to science and society’s most pressing questions. An abundance of secondary data exists—cohort studies, surveys, administrative data (e.g., health records, crime records, census data), financial data, and environmental data—that can be analysed by researchers in academia, industry, third-sector organisations, and the government. However, secondary data analysis is vulnerable to questionable research practices (QRPs) which can distort the evidence base. These QRPs include p-hacking (i.e., exploiting analytic flexibility to obtain statistically significant results), selective reporting of statistically significant, novel, or “clean” results, and hypothesising after the results are known (HARK-ing [i.e., presenting unexpected results as if they were predicted]; [ 1 ]. Indeed, findings obtained from secondary data analysis are not always replicable [ 2 , 3 ], reproducible [ 4 ], or robust to analytic choices [ 5 , 6 ]. Preventing QRPs in research based on secondary data is therefore critical for scientific and societal progress.

A primary cause of QRPs is common cognitive biases that affect the analysis, reporting, and interpretation of data [ 7 – 10 ]. For example, apophenia (the tendency to see patterns in random data) and confirmation bias (the tendency to focus on evidence that is consistent with one’s beliefs) can lead to particular analytical choices and selective reporting of “publishable” results [ 11 – 13 ]. In addition, hindsight bias (the tendency to view past events as predictable) can lead to HARK-ing, so that observed results appear more compelling.

The scope for these biases to distort research outputs from secondary data analysis is perhaps particularly acute, for two reasons. First, researchers now have increasing access to high-dimensional datasets that offer a multitude of ways to analyse the same data [ 6 ]. Such analytic flexibility can lead to different conclusions depending on the analytical choices made [ 5 , 14 , 15 ]. Second, current incentive structures in science reward researchers for publishing statistically significant, novel, and/or surprising findings [ 16 ]. This combination of opportunity and incentive may lead researchers—consciously or unconsciously—to run multiple analyses and only report the most “publishable” findings.

One way to help protect against the effects of researcher bias is to pre-register research plans [ 17 , 18 ]. This can be achieved by pre-specifying the rationale, hypotheses, methods, and analysis plans, and submitting these to either a third-party registry (e.g., the Open Science Framework [OSF]; https://osf.io/ ), or a journal in the form of a Registered Report [ 19 ]. Because research plans and hypotheses are specified before the results are known, pre-registration reduces the potential for cognitive biases to lead to p-hacking, selective reporting, and HARK-ing [ 20 ]. While pre-registration is not necessarily a panacea for preventing QRPs (Table 1 ), meta-scientific evidence has found that pre-registered studies and Registered Reports are more likely to report null results [ 21 – 23 ], smaller effect sizes [ 24 ], and be replicated [ 25 ]. Pre-registration is increasingly being adopted in epidemiological research [ 26 , 27 ], and is even required for access to data from certain cohorts (e.g., the Twins Early Development Study [ 28 ]). However, pre-registration (and other open science practices; Table 2 ) can pose particular challenges to researchers conducting secondary data analysis [ 29 ], motivating the need for alternative approaches and solutions. Here we describe such challenges, before proposing potential solutions to protect against researcher bias in secondary data analysis (summarised in Fig.  1 ).

figure 1

Challenges in pre-registering secondary data analysis and potential solutions (according to researcher motivations). Note : In the “Potential solution” column, blue boxes indicate solutions that are researcher-led; green boxes indicate solutions that should be facilitated by data guardians

Challenges of pre-registration for secondary data analysis

Prior knowledge of the data.

Researchers conducting secondary data analysis commonly analyse data from the same dataset multiple times throughout their careers. However, prior knowledge of the data increases risk of bias, as prior expectations about findings could motivate researchers to pursue certain analyses or questions. In the worst-case scenario, a researcher might perform multiple preliminary analyses, and only pursue those which lead to notable results (perhaps posting a pre-registration for these analyses, even though it is effectively post hoc). However, even if the researcher has not conducted specific analyses previously, they may be biased (either consciously or subconsciously) to pursue certain analyses after testing related questions with the same variables, or even by reading past studies on the dataset. As such, pre-registration cannot fully protect against researcher bias when researchers have previously accessed the data.

Research may not be hypothesis-driven

Pre-registration and Registered Reports are tailored towards hypothesis-driven, confirmatory research. For example, the OSF pre-registration template requires researchers to state “specific, concise, and testable hypotheses”, while Registered Reports do not permit purely exploratory research [ 30 ], although a new Exploratory Reports format now exists [ 31 ]. However, much research involving secondary data is not focused on hypothesis testing, but is exploratory, descriptive, or focused on estimation—in other words, examining the magnitude and robustness of an association as precisely as possible, rather than simply testing a point null. Furthermore, without a strong theoretical background, hypotheses will be arbitrary and could lead to unhelpful inferences [ 32 , 33 ], and so should be avoided in novel areas of research.

Pre-registered analyses are not appropriate for the data

With pre-registration, there is always a risk that the data will violate the assumptions of the pre-registered analyses [ 17 ]. For example, a researcher might pre-register a parametric test, only for the data to be non-normally distributed. However, in secondary data analysis, the extent to which the data shape the appropriate analysis can be considerable. First, longitudinal cohort studies are often subject to missing data and attrition. Approaches to deal with missing data (e.g., listwise deletion; multiple imputation) depend on the characteristics of missing data (e.g., the extent and patterns of missingness [ 34 ]), and so pre-specifying approaches to dealing with missingness may be difficult, or extremely complex. Second, certain analytical decisions depend on the nature of the observed data (e.g., the choice of covariates to include in a multiple regression might depend on the collinearity between the measures, or the degree of missingness of different measures that capture the same construct). Third, much secondary data (e.g., electronic health records and other administrative data) were never collected for research purposes, so can present several challenges that are impossible to predict in advance [ 35 ]. These issues can limit a researcher’s ability to pre-register a precise analytic plan prior to accessing secondary data.

Lack of flexibility in data analysis

Concerns have been raised that pre-registration limits flexibility in data analysis, including justifiable exploration [ 36 – 38 ]. For example, by requiring researchers to commit to a pre-registered analysis plan, pre-registration could prevent researchers from exploring novel questions (with a hypothesis-free approach), conducting follow-up analyses to investigate notable findings [ 39 ], or employing newly published methods with advantages over those pre-registered. While this concern is also likely to apply to primary data analysis, it is particularly relevant to certain fields involving secondary data analysis, such as genetic epidemiology, where new methods are rapidly being developed [ 40 ], and follow-up analyses are often required (e.g., in a genome-wide association study to further investigate the role of a genetic variant associated with a phenotype). However, this concern is perhaps over-stated – pre-registration does not preclude unplanned analyses; it simply makes it more transparent that these analyses are post hoc. Nevertheless, another understandable concern is that reduced analytic flexibility could lead to difficulties in publishing papers and accruing citations. For example, pre-registered studies are more likely to report null results [ 22 , 23 ], likely due to reduced analytic flexibility and selective reporting. While this is a positive outcome for research integrity, null results are less likely to be published [ 13 , 41 , 42 ] and cited [ 11 ], which could disadvantage researchers’ careers.

In this section, we describe potential solutions to address the challenges involved in pre-registering secondary data analysis, including approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) ensure that pre-planned analyses will be appropriate for the data, and (4) address potential difficulties arising from reduced analytic flexibility.

Challenge: Prior knowledge of the data

Declare prior access to data.

To increase transparency about potential biases arising from knowledge of the data, researchers could routinely report all prior data access in a pre-registration [ 29 ]. This would ideally include evidence from an independent gatekeeper (e.g., a data guardian of the study) stating whether data and relevant variables were accessed by each co-author. To facilitate this process, data guardians could set up a central “electronic checkout” system that records which researchers have accessed data, what data were accessed, and when [ 43 ]. The researcher or data guardian could then provide links to the checkout histories for all co-authors in the pre-registration, to verify their prior data access. If it is not feasible to provide such objective evidence, authors could self-certify their prior access to the dataset and where possible, relevant variables—preferably listing any publications and in-preparation studies based on the dataset [ 29 ]. Of course, self-certification relies on trust that researchers will accurately report prior data access, which could be challenging if the study involves a large number of authors, or authors who have been involved on many studies on the dataset. However, it is likely to be the most feasible option at present as many datasets do not have available electronic records of data access. For further guidance on self-certifying prior data access when pre-registering secondary data analysis studies on a third-party registry (e.g., the OSF), we recommend referring to the template by Van den Akker, Weston [ 29 ].

The extent to which prior access to data renders pre-registration invalid is debatable. On the one hand, even if data have been accessed previously, pre-registration is likely to reduce QRPs by encouraging researchers to commit to a pre-specified analytic strategy. On the other hand, pre-registration does not fully protect against researcher bias where data have already been accessed, and can lend added credibility to study claims, which may be unfounded. Reporting prior data access in a pre-registration is therefore important to make these potential biases transparent, so that readers and reviewers can judge the credibility of the findings accordingly. However, for a more rigorous solution which protects against researcher bias in the context of prior data access, researchers should consider adopting a multiverse approach.

Conduct a multiverse analysis

A multiverse analysis involves identifying all potential analytic choices that could justifiably be made to address a given research question (e.g., different ways to code a variable, combinations of covariates, and types of analytic model), implementing them all, and reporting the results [ 44 ]. Notably, this method differs from the traditional approach in which findings from only one analytic method are reported. It is conceptually similar to a sensitivity analysis, but it is far more comprehensive, as often hundreds or thousands of analytic choices are reported, rather than a handful. By showing the results from all defensible analytic approaches, multiverse analysis reduces scope for selective reporting and provides insight into the robustness of findings against analytical choices (for example, if there is a clear convergence of estimates, irrespective of most analytical choices). For causal questions in observational research, Directed Acyclic Graphs (DAGs) could be used to inform selection of covariates in multiverse approaches [ 45 ] (i.e., to ensure that confounders, rather than mediators or colliders, are controlled for).

Specification curve analysis [ 46 ] is a form of multiverse analysis that has been applied to examine the robustness of epidemiological findings to analytic choices [ 6 , 47 ]. Specification curve analysis involves three steps: (1) identifying all analytic choices – termed “specifications”, (2) displaying the results graphically with magnitude of effect size plotted against analytic choice, and (3) conducting joint inference across all results. When applied to the association between digital technology use and adolescent well-being [ 6 ], specification curve analysis showed that the (small, negative) association diminished after accounting for adequate control variables and recall bias – demonstrating the sensitivity of results to analytic choices.

Despite the benefits of the multiverse approach in addressing analytic flexibility, it is not without limitations. First, because each analytic choice is treated as equally valid, including less justifiable models could bias the results away from the truth. Second, the choice of specifications can be biased by prior knowledge (e.g., a researcher may choose to omit a covariate to obtain a particular result). Third, multiverse analysis may not entirely prevent selective reporting (e.g., if the full range of results are not reported), although pre-registering multiverse approaches (and specifying analytic choices) could mitigate this. Last, and perhaps most importantly, multiverse analysis is technically challenging (e.g., when there are hundreds or thousands of analytic choices) and can be impractical for complex analyses, very large datasets, or when computational resources are limited. However, this burden can be somewhat reduced by tutorials and packages which are being developed to standardise the procedure and reduce computational time [see 48 , 49 ].

Challenge: Research may not be hypothesis-driven

Pre-register research questions and conditions for interpreting findings.

Observational research arguably does not need to have a hypothesis to benefit from pre-registration. For studies that are descriptive or focused on estimation, we recommend pre-registering research questions, analysis plans, and criteria for interpretation. Analytic flexibility will be limited by pre-registering specific research questions and detailed analysis plans, while post hoc interpretation will be limited by pre-specifying criteria for interpretation [ 50 ]. The potential for HARK-ing will also be minimised because readers can compare the published study to the original pre-registration, where a-priori hypotheses were not specified.

Detailed guidance on how to pre-register research questions and analysis plans for secondary data is provided in Van den Akker’s [ 29 ] tutorial. To pre-specify conditions for interpretation, it is important to anticipate – as much as possible – all potential findings, and state how each would be interpreted. For example, suppose that a researcher aims to test a causal relationship between X and Y using a multivariate regression model with longitudinal data. Assuming that all potential confounders have been fully measured and controlled for (albeit a strong assumption) and statistical power is high, three broad sets of results and interpretations could be pre-specified. First, an association between X and Y that is similar in magnitude to the unadjusted association would be consistent with a causal relationship. Second, an association between X and Y that is attenuated after controlling for confounders would suggest that the relationship is partly causal and partly confounded. Third, a minimal, non-statistically significant adjusted association would suggest a lack of evidence for a causal effect of X on Y. Depending on the context of the study, criteria could also be provided on the threshold (or range of thresholds) at which the effect size would justify different interpretations [ 51 ], be considered practically meaningful, or the smallest effect size of interest for equivalence tests [ 52 ]. While researcher biases might still affect the pre-registered criteria for interpreting findings (e.g., toward over-interpreting a small effect size as meaningful), this bias will at least be transparent in the pre-registration.

Use a holdout sample to delineate exploratory and confirmatory research

Where researchers wish to integrate exploratory research into a pre-registered, confirmatory study, a holdout sample approach can be used [ 18 ]. Creating a holdout sample refers to the process of randomly splitting the dataset into two parts, often referred to as ‘training’ and ‘holdout’ datasets. To delineate exploratory and confirmatory research, researchers can first conduct exploratory data analysis on the training dataset (which should comprise a moderate fraction of the data, e.g., 35% [ 53 ]. Based on the results of the discovery process, researchers can pre-register hypotheses and analysis plans to formally test on the holdout dataset. This process has parallels with cross-validation in machine learning, in which the dataset is split and the model is developed on the training dataset, before being tested on the test dataset. The approach enables a flexible discovery process, before formally testing discoveries in a non-biased way.

When considering whether to use the holdout sample approach, three points should be noted. First, because the training dataset is not reusable, there will be a reduced sample size and loss of power relative to analysing the whole dataset. As such, the holdout sample approach will only be appropriate when the original dataset is large enough to provide sufficient power in the holdout dataset. Second, when the training dataset is used for exploration, subsequent confirmatory analyses on the holdout dataset may be overfitted (due to both datasets being drawn from the same sample), so replication in independent samples is recommended. Third, the holdout dataset should be created by an independent data manager or guardian, to ensure that the researcher does not have knowledge of the full dataset. However, it is straightforward to randomly split a dataset into a holdout and training sample and we provide example R code at: https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Holdout_script.md .

Challenge: Pre-registered analyses are not appropriate for the data

Use blinding to test proposed analyses.

One method to help ensure that pre-registered analyses will be appropriate for the data is to trial the analyses on a blinded dataset [ 54 ], before pre-registering. Data blinding involves obscuring the data values or labels prior to data analysis, so that the proposed analyses can be trialled on the data without observing the actual findings. Various types of blinding strategies exist [ 54 ], but one method that is appropriate for epidemiological data is “data scrambling” [ 55 ]. This involves randomly shuffling the data points so that any associations between variables are obscured, whilst the variable distributions (and amounts of missing data) remain the same. We provide a tutorial for how to implement this in R (see https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Data_scrambling_tutorial.md ). Ideally the data scrambling would be done by a data guardian who is independent of the research, to ensure that the main researcher does not access the data prior to pre-registering the analyses. Once the researcher is confident with the analyses, the study can be pre-registered, and the analyses conducted on the unscrambled dataset.

Blinded analysis offers several advantages for ensuring that pre-registered analyses are appropriate, with some limitations. First, blinded analysis allows researchers to directly check the distribution of variables and amounts of missingness, without having to make assumptions about the data that may not be met, or spend time planning contingencies for every possible scenario. Second, blinded analysis prevents researchers from gaining insight into the potential findings prior to pre-registration, because associations between variables are masked. However, because of this, blinded analysis does not enable researchers to check for collinearity, predictors of missing data, or other covariances that may be necessary for model specification. As such, blinded analysis will be most appropriate for researchers who wish to check the data distribution and amounts of missingness before pre-registering.

Trial analyses on a dataset excluding the outcome

Another method to help ensure that pre-registered analyses will be appropriate for the data is to trial analyses on a dataset excluding outcome data. For example, data managers could provide researchers with part of the dataset containing the exposure variable(s) plus any covariates and/or auxiliary variables. The researcher can then trial and refine the analyses ahead of pre-registering, without gaining insight into the main findings (which require the outcome data). This approach is used to mitigate bias in propensity score matching studies [ 26 , 56 ], as researchers use data on the exposure and covariates to create matched groups, prior to accessing any outcome data. Once the exposed and non-exposed groups have been matched effectively, researchers pre-register the protocol ahead of viewing the outcome data. Notably though, this approach could help researchers to identify and address other analytical challenges involving secondary data. For example, it could be used to check multivariable distributional characteristics, test for collinearity between multiple predictor variables, or identify predictors of missing data for multiple imputation.

This approach offers certain benefits for researchers keen to ensure that pre-registered analyses are appropriate for the observed data, with some limitations. Regarding benefits, researchers will be able to examine associations between variables (excluding the outcome), unlike the data scrambling approach described above. This would be helpful for checking certain assumptions (e.g., collinearity or characteristics of missing data such as whether it is missing at random). In addition, the approach is easy to implement, as the dataset can be initially created without the outcome variable, which can then be added after pre-registration, minimising burden on data guardians. Regarding limitations, it is possible that accessing variables in advance could provide some insight into the findings. For example, if a covariate is known to be highly correlated with the outcome, testing the association between the covariate and the exposure could give some indication of the relationship between the exposure and the outcome. To make this potential bias transparent, researchers should report the variables that they already accessed in the pre-registration. Another limitation is that researchers will not be able to identify analytical issues relating to the outcome data in advance of pre-registration. Therefore, this approach will be most appropriate where researchers wish to check various characteristics of the exposure variable(s) and covariates, rather than the outcome. However, a “mixed” approach could be applied in which outcome data is provided in scrambled format, to enable researchers to also assess distributional characteristics of the outcome. This would substantially reduce the number of potential challenges to be considered in pre-registered analytical pipelines.

Pre-register a decision tree

If it is not possible to access any of the data prior to pre-registering (e.g., to enable analyses to be trialled on a dataset that is blinded or missing outcome data), researchers could pre-register a decision tree. This defines the sequence of analyses and rules based on characteristics of the observed data [ 17 ]. For example, the decision tree could specify testing a normality assumption, and based on the results, whether to use a parametric or non-parametric test. Ideally, the decision tree should provide a contingency plan for each of the planned analyses, if assumptions are not fulfilled. Of course, it can be challenging and time consuming to anticipate every potential issue with the data and plan contingencies. However, investing time into pre-specifying a decision tree (or a set of contingency plans) could save time should issues arise during data analysis, and can reduce the likelihood of deviating from the pre-registration.

Challenge: Lack of flexibility in data analysis

Transparently report unplanned analyses.

Unplanned analyses (such as applying new methods or conducting follow-up tests to investigate an interesting or unexpected finding) are a natural and often important part of the scientific process. Despite common misconceptions, pre-registration does not permit such unplanned analyses from being included, as long as they are transparently reported as post-hoc. If there are methodological deviations, we recommend that researchers should (1) clearly state the reasons for using the new method, and (2) if possible, report results from both methods, to ideally show that the change in methods was not due to the results [ 57 ]. This information can either be provided in the manuscript or in an update to the original pre-registration (e.g., on the third-party registry such as the OSF), which can be useful when journal word limits are tight. Similarly, if researchers wish to include additional follow-up analyses to investigate an interesting or unexpected finding, this should be reported but labelled as “exploratory” or “post-hoc” in the manuscript.

Ensure a paper’s value does not depend on statistically significant results

Researchers may be concerned that reduced analytic flexibility from pre-registration could increase the likelihood of reporting null results [ 22 , 23 ], which are harder to publish [ 13 , 42 ]. To address this, we recommend taking steps to ensure that the value and success of a study does not depend on a significant p-value. First, methodologically strong research (e.g., with high statistical power, valid and reliable measures, robustness checks, and replication samples) will advance the field, whatever the findings. Second, methods can be applied to allow for the interpretation of statistically non-significant findings (e.g., Bayesian methods [ 58 ] or equivalence tests, which determine whether an observed effect is surprisingly small [ 52 , 59 , 60 ]. This means that the results will be informative whatever they show, in contrast to approaches relying solely on null hypothesis significance testing, where statistically non-significant findings cannot be interpreted as meaningful. Third, researchers can submit the proposed study as a Registered Report, where it will be evaluated before the results are available. This is arguably the strongest way to protect against publication bias, as in-principle study acceptance is granted without any knowledge of the results. In addition, Registered Reports can improve the methodology, as suggestions from expert reviewers can be incorporated into the pre-registered protocol.

Under a system that rewards novel and statistically significant findings, it is easy for subconscious human biases to lead to QRPs. However, researchers, along with data guardians, journals, funders, and institutions, have a responsibility to ensure that findings are reproducible and robust. While pre-registration can help to limit analytic flexibility and selective reporting, it involves several challenges for epidemiologists conducting secondary data analysis. The approaches described here aim to address these challenges (Fig.  1 ), to either improve the efficacy of pre-registration or provide an alternative approach to address analytic flexibility (e.g., a multiverse analysis). The responsibility in adopting these approaches should not only fall on researchers’ shoulders; data guardians also have an important role to play in recording and reporting access to data, providing blinded datasets and hold-out samples, and encouraging researchers to pre-register and adopt these solutions as part of their data request. Furthermore, wider stakeholders could incentivise these practices; for example, journals could provide a designated space for researchers to report deviations from the pre-registration, and funders could provide grants to establish best practice at the cohort level (e.g., data checkout systems, blinded datasets). Ease of adoption is key to ensure wide uptake, and we therefore encourage efforts to evaluate, simplify and improve these practices. Steps that could be taken to evaluate these practices are presented in Box 1.

More broadly, it is important to emphasise that researcher biases do not operate in isolation, but rather in the context of wider publication bias and a “publish or perish” culture. These incentive structures not only promote QRPs [ 61 ], but also discourage researchers from pre-registering and adopting other time-consuming reproducible methods. Therefore, in addition to targeting bias at the individual researcher level, wider initiatives from journals, funders, and institutions are required to address these institutional biases [ 7 ]. Systemic changes that reward rigorous and reproducible research will help researchers to provide unbiased answers to science and society’s most important questions.

Box 1. Evaluation of approaches

To evaluate, simplify and improve approaches to protect against researcher bias in secondary data analysis, the following steps could be taken.

Co-creation workshops to refine approaches

To obtain feedback on the approaches (including on any practical concerns or feasibility issues) co-creation workshops could be held with researchers, data managers, and wider stakeholders (e.g., journals, funders, and institutions).

Empirical research to evaluate efficacy of approaches

To evaluate the effectiveness of the approaches in preventing researcher bias and/or improving pre-registration, empirical research is needed. For example, to test the extent to which the multiverse analysis can reduce selective reporting, comparisons could be made between effect sizes from multiverse analyses versus effect sizes from meta-analyses (of non-pre-registered studies) addressing the same research question. If smaller effect sizes were found in multiverse analyses, it would suggest that the multiverse approach can reduce selective reporting. In addition, to test whether providing a blinded dataset or dataset missing outcome variables could help researchers develop an appropriate analytical protocol, researchers could be randomly assigned to receive such a dataset (or no dataset), prior to pre-registration. If researchers who received such a dataset had fewer eventual deviations from the pre-registered protocol (in the final study), it would suggest that this approach can help ensure that proposed analyses are appropriate for the data.

Pilot implementation of the measures

To assess the practical feasibility of the approaches, data managers could pilot measures for users of the dataset (e.g., required pre-registration for access to data, provision of datasets that are blinded or missing outcome variables). Feedback could then be collected from researchers and data managers via about the experience and ease of use.

Kerr NL. HARKing: Hypothesizing after the results are known. Pers Soc Psychol Rev. 1998;2(3):196–217.

CAS   PubMed   Google Scholar  

Border R, Johnson EC, Evans LM, et al. No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. Am J Psychiatry. 2019;176(5):376–87.

PubMed   PubMed Central   Google Scholar  

Duncan LE, Keller MC. A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am J Psychiatry. 2011;168(10):1041–9.

Seibold H, Czerny S, Decke S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. PLoS ONE. 2021;16(6):e0251194. https://doi.org/10.1371/journal.pone.0251194 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Botvinik-Nezer R, Holzmeister F, Camerer CF, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020;582:84–8.

CAS   PubMed   PubMed Central   Google Scholar  

Orben A, Przybylski AK. The association between adolescent well-being and digital technology use. Nat Hum Behav. 2019;3(2):173.

PubMed   Google Scholar  

Munafò MR, Nosek BA, Bishop DV, et al. A manifesto for reproducible science. Nat Hum Behav. 2017;1(1):0021.

Nuzzo R. How scientists fool themselves–and how they can stop. Nature News. 2015;526(7572):182.

CAS   Google Scholar  

Bishop DV. The psychology of experimental psychologists: Overcoming cognitive constraints to improve research: The 47th Sir Frederic Bartlett lecture. Q J Exp Psychol. 2020;73(1):1–19.

Google Scholar  

Greenland S. Invited commentary: The need for cognitive science in methodology. Am J Epidemiol. 2017;186(6):639–45.

De Vries Y, Roest A, de Jonge P, Cuijpers P, Munafò M, Bastiaansen J. The cumulative effect of reporting and citation biases on the apparent efficacy of treatments: The case of depression. Psychol Med. 2018;48(15):2453–5.

Nickerson RS. Confirmation bias: A ubiquitous phenomenon in many guises. Rev Gen Psychol. 1998;2(2):175–220.

Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Unlocking the file drawer. Science. 2014;345(6203):1502–5.

Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, one data set: Making transparent how variations in analytic choices affect results. Adv Methods Pract Psychol Sci. 2018;1(3):337–56.

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22(11):1359–66.

Metcalfe J, Wheat, K., Munafo, M., Parry, J. Research integrity: A landscape study: UK Research and innovation 2020.

Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci. 2018;115(11):2600–6.

Wagenmakers E-J, Wetzels R, Borsboom D, van der Maas HL, Kievit RA. An agenda for purely confirmatory research. Perspect Psychol Sci. 2012;7(6):632–8.

Chambers CD. Registered reports: A new publishing initiative at Cortex. Cortex. 2013;49(3):609–10.

Nosek BA, Beck ED, Campbell L, et al. Preregistration is hard, and worthwhile. Trends Cogn Sci. 2019;23(10):815–8.

Kaplan RM, Irvin VL. Likelihood of null effects of large NHLBI clinical trials has increased over time. PLoS One. 2015;10(8):e0132382.

Allen C, Mehler DM. Open science challenges, benefits and tips in early career and beyond. PLoS Biol. 2019;17(5):e3000246.

Scheel AM, Schijen MR, Lakens D. An excess of positive results: Comparing the standard psychology literature with registered reports. Adv Methods Pract Psychol Sci. 2021;4(2):25152459211007468.

Schäfer T, Schwarz MA. The meaningfulness of effect sizes in psychological research: differences between sub-disciplines and the impact of potential biases. Front Psychol. 2019;10:813.

Protzko J, Krosnick J, Nelson LD, et al. High replicability of newly-discovered social-behavioral findings is achievable. PsyArXiv. 2020. doi: https://doi.org/10.31234/osf.io/n2a9x

Small DS, Firth D, Keele L, et al. Protocol for a study of the effect of surface mining in central appalachia on adverse birth outcomes. arXiv.org. 2020

Deshpande SK, Hasegawa RB, Weiss J, Small DS. Protocol for an observational study on the effects of playing football in adolescence on mental health in early adulthood. arXiv preprint 2018

Twins Early Development Study. TEDS Data Access Policy: 6. Pre-registration of analysis. https://www.teds.ac.uk/researchers/teds-data-access-policy#preregistration . Accessed 18 March 2021

Van den Akker O, Weston SJ, Campbell L, et al. Preregistration of secondary data analysis: a template and tutorial. PsyArXiv. 2019. doi: https://doi.org/10.31234/osf.io/hvfmr

Chambers C, Tzavella L. Registered reports: past, present and future. MetaArXiv. 2020. doi: https://doi.org/10.31222/osf.io/43298

McIntosh RD. Exploratory reports: A new article type for cortex. Cortex. 2017;96:A1–4.

Scheel AM, Tiokhin L, Isager PM, Lakens D. Why hypothesis testers should spend less time testing hypotheses. Perspect Psychol Sci. 2020;16(4):744–55.

Colhoun HM, McKeigue PM, Smith GD. Problems of reporting genetic associations with complex outcomes. Lancet. 2003;361(9360):865–72.

Hughes RA, Heron J, Sterne JAC, Tilling K. Accounting for missing data in statistical analyses: Multiple imputation is not always the answer. Int J Epidemiol. 2019;48(4):1294–304. https://doi.org/10.1093/ije/dyz032 .

Article   PubMed   PubMed Central   Google Scholar  

Goldstein BA. Five analytic challenges in working with electronic health records data to support clinical trials with some solutions. Clin Trials. 2020;17(4):370–6.

Goldin-Meadow S. Why preregistration makes me nervous. APS Observer. 2016;29(7).

Lash TL. Preregistration of study protocols is unlikely to improve the yield from our science, but other strategies might. Epidemiology. 2010;21(5):612–3. https://doi.org/10.1097/EDE.0b013e3181e9bba6 .

Article   PubMed   Google Scholar  

Lawlor DA. Quality in epidemiological research: should we be submitting papers before we have the results and submitting more hypothesis-generating research? Int J Epidemiol. 2007;36(5):940–3.

Vandenbroucke JP. Preregistration of epidemiologic studies: An ill-founded mix of ideas. Epidemiology. 2010;21(5):619–20.

Pingault J-B, O’reilly PF, Schoeler T, Ploubidis GB, Rijsdijk F, Dudbridge F. Using genetic data to strengthen causal inference in observational research. Nat Rev Genet. 2018;19(9):566.

Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2012;90(3):891–904.

Greenwald AG. Consequences of prejudice against the null hypothesis. Psychol Bull. 1975;82(1):1.

Scott KM, Kline M. Enabling confirmatory secondary data analysis by logging data checkout. Adv Methods Pract Psychol Sci. 2019;2(1):45–54. https://doi.org/10.1177/2515245918815849 .

Article   Google Scholar  

Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W. Increasing transparency through a multiverse analysis. Perspect Psychol Sci. 2016;11(5):702–12.

Del Giudice M, Gangestad SW. A traveler’s guide to the multiverse: Promises, pitfalls, and a framework for the evaluation of analytic decisions. Adv Methods Pract Psychol Sci. 2021;4(1):2515245920954925.

Simonsohn U, Simmons JP, Nelson LD. Specification curve: descriptive and inferential statistics on all reasonable specifications. SSRN. 2015. https://doi.org/10.2139/ssrn.2694998 .

Rohrer JM, Egloff B, Schmukle SC. Probing birth-order effects on narrow traits using specification-curve analysis. Psychol Sci. 2017;28(12):1821–32.

Masur P. How to do specification curve analyses in R: Introducing ‘specr’. 2020. https://philippmasur.de/2020/01/02/how-to-do-specification-curve-analyses-in-r-introducing-specr/ . Accessed 23rd July 2020.

Masur PK, Scharkow M. specr: Conducting and visualizing specification curve analyses: R package. (2020).

Kiyonaga A, Scimeca JM. Practical considerations for navigating registered reports. Trends Neurosci. 2019;42(9):568–72.

McPhetres J. What should a preregistration contain? PsyArXiv. (2020).

Lakens D. Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Soc Psychol Personal Sci. 2017;8(4):355–62.

Anderson ML, Magruder J. Split-sample strategies for avoiding false discoveries: National Bureau of Economic Research2017. Report No.: 0898-2937.

MacCoun R, Perlmutter S. Blind analysis: Hide results to seek the truth. Nature. 2015;526(7572):187–9.

MacCoun R, Perlmutter S. Blind analysis as a correction for confirmatory bias in physics and in psychology. Psychological science under scrutiny 2017. p. 295-322.

Rubin DB. The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Stat Med. 2007;26(1):20–36.

Claesen A, Gomes SLBT, Tuerlinckx F, Vanpaemel W. Preregistration: Comparing dream to reality. 2019.

Schönbrodt FD, Wagenmakers E-J. Bayes factor design analysis: Planning for compelling evidence. Psychon Bull Rev. 2018;25(1):128–42.

Lakens D, Scheel AM, Isager PM. Equivalence testing for psychological research: A tutorial. Adv Methods Pract Psychol Sci. 2018;1(2):259–69.

Lakens D, McLatchie N, Isager PM, Scheel AM, Dienes Z. Improving inferences about null effects with Bayes factors and equivalence tests. J Gerontol Ser B. 2020;75(1):45–57.

Gopalakrishna G, ter Riet G, Vink G, Stoop I, Wicherts J, Bouter L. Prevalence of questionable research practices, research misconduct and their potential explanatory factors: a survey among academic researchers in The Netherlands. 2021.

Goldacre B, Drysdale, H., Powell-Smith, A., Dale, A., Milosevic, I., Slade, E., Hartley, H., Marston, C., Mahtani, K., Heneghan, C. The compare trials project. 2021. https://compare-trials.org . Accessed 23rd July 2020.

Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA. 2009;302(9):977–84.

Rubin M. Does preregistration improve the credibility of research findings? arXiv preprint 2020.

Szollosi A, Kellen D, Navarro D, et al. Is preregistration worthwhile? Cell. 2019.

Quintana DS. A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation. Elife. 2020;9:e53275.

Weston SJ, Ritchie SJ, Rohrer JM, Przybylski AK. Recommendations for increasing the transparency of analysis of preexisting data sets. Adv Methods Pract Psychol Sci. 2019;2(3):214–27.

Thompson WH, Wright J, Bissett PG, Poldrack RA. Meta-research: dataset decay and the problem of sequential analyses on open datasets. Elife. 2020;9:e53498.

Download references

Acknowledgements

The authors are grateful to Professor George Davey for his helpful comments on this article.

J.R.B is funded by a Wellcome Trust Sir Henry Wellcome fellowship (grant 215917/Z/19/Z). J.B.P is a supported by the Medical Research Foundation 2018 Emerging Leaders 1 st Prize in Adolescent Mental Health (MRF-160–0002-ELP-PINGA). M.R.M and H.M.S work in a unit that receives funding from the University of Bristol and the UK Medical Research Council (MC_UU_00011/5, MC_UU_00011/7), and M.R.M is also supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at the University Hospitals Bristol National Health Service Foundation Trust and the University of Bristol.

Author information

Authors and affiliations.

Department of Clinical, Educational and Health Psychology, Division of Psychology and Language Sciences, University College London, London, WC1H 0AP, UK

Jessie R. Baldwin, Jean-Baptiste Pingault & Tabea Schoeler

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK

Jessie R. Baldwin & Jean-Baptiste Pingault

MRC Integrative Epidemiology Unit at the University of Bristol, Bristol Medical School, University of Bristol, Bristol, UK

Hannah M. Sallis & Marcus R. Munafò

School of Psychological Science, University of Bristol, Bristol, UK

Centre for Academic Mental Health, Population Health Sciences, University of Bristol, Bristol, UK

Hannah M. Sallis

NIHR Biomedical Research Centre, University Hospitals Bristol NHS Foundation Trust and University of Bristol, Bristol, UK

Marcus R. Munafò

You can also search for this author in PubMed   Google Scholar

Contributions

JRB and MRM developed the idea for the article. The first draft of the manuscript was written by JRB, with support from MRM and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jessie R. Baldwin .

Ethics declarations

Conflict of interest.

Author declares that they have no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Baldwin, J.R., Pingault, JB., Schoeler, T. et al. Protecting against researcher bias in secondary data analysis: challenges and potential solutions. Eur J Epidemiol 37 , 1–10 (2022). https://doi.org/10.1007/s10654-021-00839-0

Download citation

Received : 19 October 2021

Accepted : 28 December 2021

Published : 13 January 2022

Issue Date : January 2022

DOI : https://doi.org/10.1007/s10654-021-00839-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Secondary data analysis
  • Pre-registration
  • Open science
  • Researcher bias

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

July/August 2024 - Volume 30 - Issue : Journal of Public Health Management and Practice

secondary research scholarly articles

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts

Secondary Logo

Journal logo.

Skip Navigation Links

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Funding State and Local Health Departments and Tribal Organizations to Implement and Evaluate Cardiovascular Disease Public Health Strategies: A Collaborative Approach

Journal of Public Health Management and Practice. 30:S1-S5, July/August 2024.

  • Permissions
  • Practice Full Report

Increasing the Writing Capacity and Dissemination of Evaluation Findings Among US Public Health Practitioners Funded to Improve Cardiovascular Health

Journal of Public Health Management and Practice. 30:S6-S14, July/August 2024.

  • Abstract Abstract

Go to Full Text of this Article

Sharing Results From the Field – Tracking and Monitoring Clinical Measures: Examples of Practice-Based Evidence from Recipients of CDC’s Heart Disease and Stroke Prevention Funding

Journal of Public Health Management and Practice. 30:S15-S17, July/August 2024.

  • Research and Practice Reports

Evaluation of a Cardiovascular Disease/Diabetes Mellitus Expansion Program for Community Health Workers Employed by Rhode Island Community Health Teams

Journal of Public Health Management and Practice. 30:S18-S26, July/August 2024.

The Power of Tailoring a Government-Funded Heart Health Program for Marginalized Women: Lessons from WISEWOMAN and Mujer Poderosa/Powerful Woman

Journal of Public Health Management and Practice. 30:S27-S31, July/August 2024.

Stroke Prevention and Management in Rural Georgia: Evaluating the Effectiveness of a Community Paramedicine Program

Journal of Public Health Management and Practice. 30:S32-S38, July/August 2024.

Preparing to Address Social Determinants of Health (SDOH): Approaches to Clinic Transformation

Journal of Public Health Management and Practice. 30:S39-S45, July/August 2024.

Testing an Innovative Approach to Improve Hypertension Management at a Federally Qualified Health Center

Journal of Public Health Management and Practice. 30:S46-S51, July/August 2024.

Using a Cohort-Based Quality Improvement Coaching Model to Optimize Chronic Disease Management for Federally Qualified Health Center Patients

Journal of Public Health Management and Practice. 30:S52-S61, July/August 2024.

Outcome Evaluation of a Public Health Intervention on Cardiovascular Disease (CVD) Prevention Among Women Aged 40-64 with Low Incomes in Nebraska

Journal of Public Health Management and Practice. 30:S62-S70, July/August 2024.

A Case Series Study Assessing an Equity-Focused Implementation of Self-Monitoring Blood Pressure Programs Using Telehealth

Journal of Public Health Management and Practice. 30:S71-S79, July/August 2024.

Strengthening Linkages Between Health Care and Public Health: A Novel Framework for Public Health Action in Health Care Settings to Improve Cardiovascular Disease Risk

Journal of Public Health Management and Practice. 30:S80-S88, July/August 2024.

Virtual Expansion of the YMCA's Blood Pressure Self-Monitoring (BPSM) Program: Using Telehealth to Adapt an Evidence-Based Program to Reach Rural Communities in South Carolina

Journal of Public Health Management and Practice. 30:S89-S95, July/August 2024.

Practice Facilitation and Clinical Performance Feedback Using the Electronic Health Record Improved Blood Pressure and Cholesterol Management in Small Primary Care Practices in New York City

Journal of Public Health Management and Practice. 30:S96-S99, July/August 2024.

  • Top Courses
  • Online Degrees
  • Find your New Career
  • Join for Free

What Is Effective Communication? Skills for Work, School, and Life

Discover how improving your communication skills can benefit your career, education, and personal life.

[Featured image] A group of professionals in business suits sit in front of microphones at an international press conference.

Communication is a part of everyday life, whether we communicate in person or on the countless digital platforms available to us. But how much of our communication actually reaches the intended audience or person the way we hoped? Effective communication requires us to be clear and complete in what we are trying to express.

Being an effective communicator in our professional and personal lives involves learning the skills to exchange information with clarity, empathy, and understanding. In this article, we’ll define what effective communication looks like, discuss its benefits, and offer ways to improve your communication skills.

What is effective communication?

Effective communication is the process of exchanging ideas, thoughts, opinions, knowledge, and data so that the message is received and understood with clarity and purpose. When we communicate effectively, both the sender and receiver feel satisfied.

Communication occurs in many forms, including verbal and non-verbal, written, visual, and listening. It can occur in person, on the internet (on forums, social media, and websites), over the phone (through apps, calls, and video), or by mail.

For communication to be effective, it must be clear , correct , complete , concise , and compassionate . We consider these to be the 5 Cs of communication, though they may vary depending on who you’re asking. 

While the effectiveness of communication can be difficult to measure, its impact is hard to deny. According to one study, surveyed companies in the United States and United Kingdom with at least 100,000 employees lost $62.4 million per year on average due to poor communication. On the flip side, companies led by effective communicators had nearly 50 percent higher total returns to shareholders over companies with less effective communicators at the helm [ 1 ].

Coursera Plus

Build job-ready skills with a Coursera Plus subscription

  • Get access to 7,000+ learning programs from world-class universities and companies, including Google, Yale, Salesforce, and more
  • Try different courses and find your best fit at no additional cost
  • Earn certificates for learning programs you complete
  • A subscription price of $59/month, cancel anytime

Benefits of effective communication

The benefits of communication effectiveness can be witnessed in the workplace, in an educational setting, and in your personal life. Learning how to communicate well can be a boon in each of these areas.

In the workplace, effective communication can help you: 

Manage employees and build teams

Grow your organization more rapidly and retain employees

Benefit from enhanced creativity and innovation

Become a better public speaker

Build strong relationships and attract more opportunities for you or your organization

Read more: Why Is Workplace Communication Important? + How to Improve It

In your personal life, effective communication can lead to:

Improved social, emotional, and mental health

Deeper connections with people you care about

New bonds based on trust and transparency

Better problem–solving and conflict resolution skills

Say it with your body

In face-to-face conversation, body language plays an important role. Communication is 55 percent non-verbal, 38 percent vocal (tone and inflection), and 7 percent words, according to Albert Mehrabian, a researcher who pioneered studies on body language [ 2 ]. Up to 93 percent of communication, then, does not involve what you are actually saying. 

Positive body language is open—your posture is upright and receptive, your palms are open, you lean in when speaking or listening, and nod encouragingly. Negative body language can include biting your lip nervously, looking bored, crossing your arms, putting your hands on your hips, or tapping your foot impatiently.

How to improve your communication skills

Communication, like any other skill, is one you can improve upon with practice. Here are a few ways to start improving your communication skills, whether at home or on the job.

1. Consider your audience.

Who are you communicating with? Make sure you are aware of your audience—those you intend to communicate with may differ from those who actually receive your messages. Knowing your audience can be key to delivering the right messages effectively. Their age, race, ethnicity, gender, marital status, income, education level, subject knowledge, and professional experience can all affect how they’ll receive your message. 

If you’re advertising a fast food restaurant, for example, you might want to deliver your message to an audience that’s likely to be hungry. This could be a billboard on the side of a busy highway that shows a giant cheeseburger and informs drivers that the closest location is just two miles away. 

Or suppose you’re announcing your engagement to your family. You might host a gathering afterwards to celebrate, send them photos of the engagement in a group chat, surprise them in conversation over dinner, or tag your family members in your announcement on social media. Your chosen form of communication will depend on your family dynamics.

2. Practice active listening.

Active listening is the practice of giving your full attention in a communication exchange. 

Some techniques include paying attention to body language, giving encouraging verbal cues, asking questions, and practicing non-judgment. Before executing your communication, be sure to consider your audience and practice active listening to get to the heart of their needs and desires. This way, you can improve your communication as a counselor, social worker, marketer, professor, colleague, or friend. 

Here are some examples of active listening in practice:

If you work in marketing, you might engage in social listening to gather consumer data on social media platforms like Instagram and TikTok. 

If you are a professor, you might take advantage of end-of-semester feedback forms and act on your students' needs by hosting one-on-one meetings during office hours. Likewise, your students might choose to participate in discussions after your lecture or at least sit attentively and ask questions.

If you are a team leader, you might read Slack messages from your teammates, gauge that they are frustrated with the workload, and respond by resetting priorities for the next few weeks. This communicates to the team that their voices are heard.

If you are a parent, you might have a disagreement with your child about finishing their homework, but if you probe deeper with open communication, they may confess that their teacher made a discouraging comment that left them unmotivated.

Read more: What Is Active Listening and How Can You Improve This Key Skill?

3. Make your message as clear as possible.

Once you have successfully identified your audience and listened to their intentions, needs, and desires, you may have something to communicate. To do this effectively, turn to the 5 Cs of communication to ensure your message is:

Compassionate

Prepare to communicate in a way that achieves most of these characteristics.

4. Use the right medium or platform.

Using the right medium or platform to communicate matters. Effective communication requires you to consider whether you need to meet in person or if Zoom would suffice. Is your message casual enough to use WhatsApp, or would a formal email be more efficient and thorough? If you are catching up with a friend, do you two prefer to talk on the phone or via old-fashioned letters? Whatever you choose should be intuitive and appropriate for you and your current situation.

You might assess the priority level and the type of communication needed. In a marketing campaign, is there a visual component on Instagram or is it a spoken podcast ad? Will the platform be a Facebook post, product placement in a film, or a printed poster hung in cafes? For a university lecture, do students prefer to be online or meet in person? Will there be a discussion afterward, and would it be fruitful to conduct it in a pub, cafe, or in a field outdoors? 

By considering your audience, practicing active listening, clarifying your communication, and choosing the right medium or environment, you are well on your way to exercising communication effectiveness.

Effective communication starts here

Start building better communication with Improving your Communication Skills from the University of Pennsylvania, Successful Negotiation: Essential Strategies and Skills from the University of Michigan, or Effective Communication: Writing, Design, and Presentation from the University of Colorado Boulder. 

Article sources

PRovoke Media. " The Cost Of Poor Communications , https://www.provokemedia.com/latest/article/the-cost-of-poor-communications." Accessed January 17, 2024.

The University of Texas Permian Basin. " How Much of Communication Is Nonverbal? , https://online.utpb.edu/about-us/articles/communication/how-much-of-communication-is-nonverbal/." Accessed January 17, 2024.

Keep reading

Coursera staff.

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

  • Student Success
  • Academic Life

Program Innovations: Promoting Success in Student Research

To enhance the student experience and increase access to experiential learning, colleges and universities have gotten creative with undergraduate research experiences.

By  Ashley Mowreader

You have / 5 articles left. Sign up for a free account or log in.

A group of students and a professor meet in a research laboratory. All are wearing lab coats.

Undergraduate research can provide students with hands-on experience in a lab environment, as well as help them determine career opportunities they might not otherwise consider.

sanjeri/E+/Getty Images Plus

Undergraduate research opportunities are one way to provide experiential learning in many disciplines, introducing learners to research methods under the supervision of a faculty member and providing experience for a résumé.

A 2021 study from the University of Central Florida found student researchers are more likely to have higher grade point averages and graduation rates, and higher matriculation into graduate school, as well as life skills such as analyzing literature critically, observing and collecting data and communication.

However, not every student has equal access to undergraduate research opportunities. The study found non-STEM students, transfer students and part-time students are less likely to participate in research compared to their peers.

To increase student participation in undergraduate research and boost skill development among student researchers, institutions have created innovative models of work. Here are seven examples.

Survey Says

Around three in 10 students say they are required to participate in undergraduate research, according to a winter 2023 Student Voice survey from Inside Higher Ed , and a slightly smaller number (27 percent) believe that undergraduate research should be required in their program.

Four-year students are more likely to say undergraduate research is required in their program (32 percent) compared to their two-year peers (15 percent).

First-year research opportunities—University of Missouri

Career exposure in the first year can help students feel confident about their paths throughout college and provide a head start in building their résumés.

Mizzou is known for its Interdisciplinary Plant Group that hosts research scholars exploring innovations in plant biology and a first-year program that provides young researchers a leg up in their work, giving them research time with more experienced researchers and mentorship.

FRIPS, short for Freshman Research in Plants , supports 10 to 15 students annually, who work alongside a faculty member and their research group on plant biology. Students also meet regularly with their FRIPS scholars cohort and gain professional development training.

Each student’s work is funded by grant dollars from the National Science Foundation (NSF).

Graduates of FRIPS often go on to become Goldwater Scholars and NSF graduate research fellows. The program also creates a place of belonging and community for new students to the university.

Underrepresented minority students—Davidson College

Some students face systemic disadvantages in participating in co-curricular experiences because they may lack the social capital or be unfamiliar with the norms of higher ed to identify and participate in a faculty-led research experience.

At Davidson College in North Carolina, rising sophomores can participate in a four- to six-week summer intensive research fellowship program called RISE (Research in Science Experience) . This program is designed for students from historically marginalized groups including low-income and first-generation students.

The goal of RISE is to equip students to take on larger, more intensive academic-year and summer experiences for later in their college career. Each student receives $2,500 in scholarships and funds to cover on-campus housing, which the college arranges for all participants.

Jacquelline Nyakunu , a rising junior at Davidson, spent the summer with chemistry professor Cindy Hauser researching hookah, studying the smoking of shisha and the chemical composition of the tobacco. Nyakunu wrote in a blog post that the experience taught her about her passion for chemistry, built her research skills and solidified her career path as a pre-medicine student.

Editors’ Picks

  • What Prompted Pittsburgh Technical College to Close?

Affirmative Action Fallout Sours Donor Relations

  • No, AI Should Not Be a Student’s Co-Pilot

Transfer students—University of California, Los Angeles

Transferring into a new institution can be a challenge for many students, and finding ways to get plugged in and connected to one’s field of interest can be just as hard.

UCLA offers an initiative exclusively for transfer students to both promote their academic success at the university and expose them to undergraduate research opportunities, the Transfer Research Entry Program (TREP).

To participate, each student must be an incoming transfer student with at least two years remaining at UCLA and be considering a career in research. Participants attend a one-week virtual bootcamp about research, which covers careers in research and how to write a cover letter and curriculum vitae. The program also provides networking opportunities for transfer students and academic survival skills for the transition to UCLA.

There’s no obligation to take a research role after the bootcamp, but students are encouraged to do so and given guidance on how to find their areas of interest, the application and interview process as well.

Financial support—University of Texas at El Paso

Financial need can be a barrier to participation for some students. The University of Texas at El Paso is a Hispanic Serving Institution (84 percent Hispanic), commuter school with a large population of Pell-eligible students (60 percent) and first-generation learners. Many students are working to support themselves and their families, explains Lourdes E. Echegoyen, director of the Campus Office of Undergraduate Research Initiatives.

As a result, UTEP staff realized a need to provide financial assistance through employment to give students high-impact activities.

University staff have identified grant funding from federal agencies, including the National Science Foundation, the National Institutes of Health, the Department of Education, and private foundations.

Students can receive financial support through stipends or tuition scholarships. The university’s student employment program also provides employment positions for undergraduate researchers across disciplines.

“Generally, full time students are supported to conduct research during the academic year from 10 to 19 hours per week—depending on the program—thus allowing students to remain on campus and be mentored as research trainees,” Echegoyen says.

UTEP leaders have seen the benefits of undergraduate research on retention and persistence among students, with one program focused on biomedical research having a 98 percent retention rate among students across four years, compared to a 37 percent retention rate among their peers who did not participate.&

Community partnerships—Roosevelt University

During the COVID-19 pandemic, Roosevelt University in Chicago partnered with The Field Museum to digitize and analyze data collected at the museum. Visitors had measured specimens of liverworts, but the data needed to be sorted and inaccurate measurements eliminated from the set to be most useful to scientists.

Students wrote code to screen and clean the data, helping set the researchers up for success and teaching students firsthand about research processes in a remote setting.

Career development—Elon University

At Elon in North Carolina, returning students can participate in undergraduate research over the summer in between academic terms, funded by the university. While career readiness is a natural component of research experiences, leaders at the university wanted to bolster student skills beyond the laboratory, says Eric Hall, professor of exercise and director of undergraduate research at Elon.

Now, student researchers attend regular professional-development workshops that inform and establish career competencies. The workshops are co-led by other campus partners, including the writing center, career services, the fellowships office and librarians, Hall says.

For the 2024 session, workshops include a session on LinkedIn on how the fellowships office can support student goals, professional writing for graduate school and industry, and navigating academic publishing.

The new initiative is still being evaluated, with formal data collection underway, but anecdotal evidence from post-assessments shows students enjoy and learn from the experiences.

Research in the classroom—California State Polytechnic University, Pomona

Cal Poly Pomona leaders wanted to expose more learners to undergraduate research, understanding that first-generation, Pell-eligible or historically underserved students have lower access to research opportunities, says Winny Dong, faculty director for the office of undergraduate research.

Rather than asking students to squeeze an additional responsibility into their schedules, faculty members brought research to the classroom, embedding experiences into required general education courses.

The initiative makes it so all students are exposed to research and required to participate in some capacity, helping build their skills and pique interest for those who may consider a career in research.

A student walks down a path on a college campus in front of a row of columns

The University of Missouri system is removing racial criteria from endowed scholarships, saying they run afoul of the

Share This Article

More from academic life.

The University of Arizona’s student athlete support center on a sunny day

Positive Partnership: Librarians Support Student Athlete Academics

University of Arizona librarians partnered with an athletics support program to help students hone their research and

Students work on a writing assignment in a classroom

Program Innovation: Painting Student Supports in a New Light

Colorado College staff created a new intervention to reframe student supports, encouraging all students to utilize se

An exterior view of the new Red Lake Nation College building in Minneapolis.

Tribal College Expands to City Center, Promoting Access

Starting this fall, Native students in Minneapolis can take classes at Red Lake Nation College’s new site downtown, a

  • Become a Member
  • Sign up for Newsletters
  • Learning & Assessment
  • Diversity & Equity
  • Career Development
  • Labor & Unionization
  • Shared Governance
  • Academic Freedom
  • Books & Publishing
  • Financial Aid
  • Residential Life
  • Free Speech
  • Physical & Mental Health
  • Race & Ethnicity
  • Sex & Gender
  • Socioeconomics
  • Traditional-Age
  • Adult & Post-Traditional
  • Teaching & Learning
  • Artificial Intelligence
  • Digital Publishing
  • Data Analytics
  • Administrative Tech
  • Alternative Credentials
  • Financial Health
  • Cost-Cutting
  • Revenue Strategies
  • Academic Programs
  • Physical Campuses
  • Mergers & Collaboration
  • Fundraising
  • Research Universities
  • Regional Public Universities
  • Community Colleges
  • Private Nonprofit Colleges
  • Minority-Serving Institutions
  • Religious Colleges
  • Women's Colleges
  • Specialized Colleges
  • For-Profit Colleges
  • Executive Leadership
  • Trustees & Regents
  • State Oversight
  • Accreditation
  • Politics & Elections
  • Supreme Court
  • Student Aid Policy
  • Science & Research Policy
  • State Policy
  • Colleges & Localities
  • Employee Satisfaction
  • Remote & Flexible Work
  • Staff Issues
  • Study Abroad
  • International Students in U.S.
  • U.S. Colleges in the World
  • Intellectual Affairs
  • Seeking a Faculty Job
  • Advancing in the Faculty
  • Seeking an Administrative Job
  • Advancing as an Administrator
  • Beyond Transfer
  • Call to Action
  • Confessions of a Community College Dean
  • Higher Ed Gamma
  • Higher Ed Policy
  • Just Explain It to Me!
  • Just Visiting
  • Law, Policy—and IT?
  • Leadership & StratEDgy
  • Leadership in Higher Education
  • Learning Innovation
  • Online: Trending Now
  • Resident Scholar
  • University of Venus
  • Student Voice
  • Health & Wellness
  • The College Experience
  • Life After College
  • Academic Minute
  • Weekly Wisdom
  • Reports & Data
  • Quick Takes
  • Advertising & Marketing
  • Consulting Services
  • Data & Insights
  • Hiring & Jobs
  • Event Partnerships

4 /5 Articles remaining this month.

Sign up for a free account or log in.

  • Create Free Account

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMC Med Res Methodol

Logo of bmcmrm

Primary versus secondary source of data in observational studies and heterogeneity in meta-analyses of drug effects: a survey of major medical journals

Guillermo prada-ramallal.

1 Department of Preventive Medicine and Public Health, University of Santiago de Compostela, c/ San Francisco s/n, 15786 Santiago de Compostela, A Coruña, Spain

2 Health Research Institute of Santiago de Compostela (Instituto de Investigación Sanitaria de Santiago de Compostela - IDIS), Clinical University Hospital of Santiago de Compostela, 15706 Santiago de Compostela, Spain

Fatima Roque

3 Research Unit for Inland Development, Polytechnic of Guarda (Unidade de Investigação para o Desenvolvimento do Interior - UDI/IPG), 6300-559 Guarda, Portugal

4 Health Sciences Research Centre, University of Beira Interior (Centro de Investigação em Ciências da Saúde - CICS/UBI), 6200-506 Covilhã, Portugal

Maria Teresa Herdeiro

5 Department of Medical Sciences & Institute for Biomedicine – iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal

6 Higher Polytechnic & University Education Co-operative (Cooperativa de Ensino Superior Politécnico e Universitário - CESPU), Institute for Advanced Research & Training in Health Sciences & Technologies, 4585-116 Gandra, Portugal

Bahi Takkouche

7 Consortium for Biomedical Research in Epidemiology & Public Health (CIBER en Epidemiología y Salud Pública – CIBERESP), Santiago de Compostela, Spain

Adolfo Figueiras

Associated data.

All data generated or analysed during this study are included in this published article.

The data from individual observational studies included in meta-analyses of drug effects are collected either from ad hoc methods (i.e. “primary data”) or databases that were established for non-research purposes (i.e. “secondary data”). The use of secondary sources may be prone to measurement bias and confounding due to over-the-counter and out-of-pocket drug consumption, or non-adherence to treatment. In fact, it has been noted that failing to consider the origin of the data as a potential cause of heterogeneity may change the conclusions of a meta-analysis. We aimed to assess to what extent the origin of data is explored as a source of heterogeneity in meta-analyses of observational studies.

We searched for meta-analyses of drugs effects published between 2012 and 2018 in general and internal medicine journals with an impact factor > 15. We evaluated, when reported, the type of data source (primary vs secondary) used in the individual observational studies included in each meta-analysis, and the exposure- and outcome-related variables included in sensitivity, subgroup or meta-regression analyses.

We found 217 articles, 23 of which fulfilled our eligibility criteria. Eight meta-analyses (8/23, 34.8%) reported the source of data. Three meta-analyses (3/23, 13.0%) included the method of outcome assessment as a variable in the analysis of heterogeneity, and only one compared and discussed the results considering the different sources of data (primary vs secondary).

Conclusions

In meta-analyses of drug effects published in seven high impact general medicine journals, the origin of the data, either primary or secondary, is underexplored as a source of heterogeneity.

Electronic supplementary material

The online version of this article (10.1186/s12874-018-0561-3) contains supplementary material, which is available to authorized users.

Specific research questions are ideally answered through tailor-made studies. Although these ad hoc studies provide more accurate and updated data, designing a completely new project may not represent a feasible strategy [ 1 , 2 ]. On the other hand, clinical and administrative databases used for billing and other fiscal purposes (i.e. “secondary data”) are a valuable resource as an alternative to ad hoc methods (i.e. “primary data”) since it is easier and less costly to reuse the information than collecting it anew [ 3 ]. The potential of secondary automated databases for observational epidemiological studies is widely acknowledged; however, their use is not without challenges, and many quality requirements and methodological pitfalls must be considered [ 4 ].

Meta-analysis represents one of the most valuable tools for assessing drug effects as it may lead to the best evidence possible in epidemiology [ 5 ]. Consequently, its use for making relevant clinical and regulatory decisions on the safety and efficacy of drugs is dramatically increasing [ 6 ]. Existence of heterogeneity in a given meta-analysis is a feature that needs to be carefully described by analyzing the possible factors responsible for generating it [ 7 ]. In this regard, the results of a recent study [ 8 ] show that whether the origin of the data (primary vs secondary) is explored as a potential cause of heterogeneity may change the conclusions of a meta-analysis due to an effect modification [ 9 ]. Thus, considering the source of data as a variable in sensitivity and subgroup analyses, or meta-regression analyses, seems crucial to avoid misleading conclusions in meta-analyses of drug effects.

Given the evidence noted [ 8 , 9 ], we surveyed published meta-analyses in a selection of high-impact journals over a 6-year period, to assess to what extent the origin of the data, either primary or secondary, is explored as a source of heterogeneity in meta-analyses of observational studies.

Meta-analysis selection and data collection process

General and internal medicine journals with an impact factor > 15 according to the Web of Science were included in the survey [ 10 ]. This method has been widely used to assess quality as well as publication trends in medical journals [ 11 – 13 ]. The rationale is that meta-analyses published in high impact journals: (1) are likely to be rigorously performed and reported due to the exhaustive editorial process [ 12 , 14 ]; and, (2) in general, exert a higher influence on medical practice due to the major role played by these journals in the dissemination of the new medical evidence [ 14 , 15 ]. We searched MEDLINE on May 2018 using the search terms “meta-analysis” as publication type and “drug” in any field between January 1, 2012 and May 7, 2018 in the New England Journal of Medicine ( NEJM ), Lancet, Journal of the American Medical Association ( JAMA) , British Medical Journal ( BMJ ), JAMA Internal Medicine (JAMA Intern Med) , Annals of Internal Medicine ( Ann Intern Med ), and Nature Reviews Disease Primers (Nat Rev Dis Primers) .

Two investigators (GP-R, FR) independently assessed publications for eligibility. Abstracts were screened and if deemed potentially relevant, full text articles were retrieved. Articles were excluded if they met any of the following conditions: (1) were not a meta-analysis of published studies, (2) no drug effects were evaluated, (3) only randomized clinical trials were included in the meta-analysis (in order to consider observational studies), (4) less than two observational studies were included in the meta-analysis (since with a single study it would not have been possible to calculate a pooled measure). When a meta-analysis included both observational studies and clinical trials, only observational studies were considered.

A data extraction form was developed previously to extract information from articles. Two investigators (GP-R, FR) independently extracted and recorded the information and resolved discrepancies by referring to the original report. If necessary, a third author (AF) was asked to resolve disagreements between the investigators.

When available we extracted the following data from each eligible meta-analysis: first author, publication year, journal, drug(s) exposure and outcome(s); number of individual studies included in the meta-analysis based on each type of data source used (primary vs secondary), for both exposure and outcome assessment; and exposure- and outcome-related variables included in sensitivity, subgroup or meta-regression analyses. We extracted data directly from the tables, figures, text, and supplementary material of the meta-analyses, not from the individual studies.

Assessment of exposure and outcome

We considered “primary data” the information on drug exposure collected directly by the researchers using interviews –personal or by telephone– or self-administered questionnaires. The origin of the data was also considered primary when objective diagnostic methods were used for the determination of drug exposure (e.g. blood test). “Secondary data” are data that were formerly collected for other purposes than that of the study at hand and that were included in databases on drug prescription (e.g. prescription registers, medical records/charts) and dispensing (e.g. computerized pharmacy records, insurance claims databases). Regarding the outcome assessment, we considered primary data when an objective confirmation is available that endorses them (e.g. confirmed by individual medical ad hoc diagnosis, lab test or imaging results). These criteria are based on those commonly used in the risk assessment of bias for observational studies [ 16 – 19 ].

MEDLINE search results yielded 217 articles from the major general medical journals (3 from NEJM , 46 from Lancet , 26 from JAMA , 85 from BMJ , 19 from JAMA Intern Med, 38 from Ann Intern Med, and 0 from Nat Rev Dis Primers ) (see Fig. ​ Fig.1). 1 ). A total of 194 articles were excluded (see list of excluded articles with reasons for exclusion in Additional file 1 ) leaving 23 articles to be examined [ 20 – 42 ]. General characteristics of the 23 included meta-analyses are outlined in Table ​ Table1 1 .

An external file that holds a picture, illustration, etc.
Object name is 12874_2018_561_Fig1_HTML.jpg

Flow diagram of literature search results

Characteristics of the 23 included meta-analyses

Meta-analysisVariables
First authorYearJournalDrug exposureOutcome
Weiss J [ ]2017Ann Intern MedAntihypertensive drugsHarms outcomes: Cognitive impairment, quality of life, falls, fractures, syncope, functional status, hypotension, acute kidney injury, medication burden, withdrawal due to adverse events
Bally M [ ]2017BMJNSAIDsMyocardial infarction
Sordo L [ ]2017BMJOpioid substitution treatment (methadone, buprenorphine)All cause and overdose mortality
Tariq R [ ]2017JAMA Intern MedGastric acid suppressantsRecurrent infection
Maruthur NM [ ]2016Ann Intern MedDiabetes monotherapy (thiazolidinediones, metformin, sulfonylureas, DPP-4 inhibitors, SGLT-2 inhibitors, GLP-1 receptor agonists) or metformin-based combinationsAll-cause mortality, macrovascular and microvascular outcomes, intermediate outcomes (hemoglobin A1c, body weight, systolic blood pressure, heart rate), hypoglycemia, gastrointestinal side effects, genital mycotic infections, congestive heart failure
Paul S [ ]2016Ann Intern MedAntiviral prophylaxisPrimary outcome: Hepatitis B Virus (HBV) reactivation
Secondary outcomes: HBV-related hepatitis, interrupted chemotherapy, acute liver failure, mortality
Li L [ ]2016BMJDipeptidyl peptidase-4 inhibitorsHeart failure
Hospital admissions for heart failure
Molnar AO [ ]2015BMJGeneric immunosuppressive drugsPatient survival, allograft survival, acute rejection, adverse events, bioequivalence
Ziff OJ [ ]2015BMJDigoxinPrimary outcome: All-cause mortality
Secondary outcomes: Cardiovascular mortality; admission to hospital for any cause, cardiovascular causes and heart failure; incident stroke, incident myocardial infarction
CGESOC [ ]2015LancetHormone therapy (oestrogen, progestagen)Ovarian cancer
Bellemain- Appaix A [ ]2014BMJTienopyridines (clopidogrel)Primary outcome: All-cause mortality, major bleeding
Secondary outcomes: Major cardiovascular events and myocardial infarction, stroke, urgent revascularization, stent thrombosis
Grigoriadis S [ ]2014BMJAntidepressants (SSRIs)Persistent pulmonary hypertension of the newborn
Li L [ ]2014BMJIncretin-based treatmentsPancreatitis
Kalil AC [ ]2014JAMAVancomycin MICAll-cause mortality
Stegeman BH [ ]2013BMJCombined oral contraceptivesVenous thrombosis
Maneiro JR [ ]2013JAMA Intern MedBiologic agents (abatacept, adalimumab, etanercept, golimumab, infliximab, rituximab)Influence of AABs: on efficacy in immune-mediated inflammatory diseases (rheumatoid arthritis, juvenile idiopathic arthritis, inflammatory bowel disease, ankylosing spondylitis, psoriasis, psoriatic arthritis, or other spondyloarthropathies), in hypersensitivity reactions, and on the concentration of biological drugs; effect of concomitant treatment in development of AAB
Hartling L [ ]2012Ann Intern MedAntipsychoticsPrimary outcomes: Improved core symptoms of illness (positive and negative symptoms and general psychopathology), adverse events: diabetes mellitus, death, tardive dyskinesia, major metabolic syndrome
Secondary outcomes: Functional outcomes, health care system use; response, remission and relapse rates; medication adherence, health-related quality of life, other patient-oriented outcomes (e.g. patient satisfaction), other adverse events: extrapyramidal symptoms, weight gain
Hsu J [ ]2012Ann Intern MedAntivirals (oseltamivir, zanamivir, amantadine, rimantadine)Mortality, hospitalization, intensive care unit admission, mechanical ventilation and respiratory failure, duration of hospitalization, duration of signs and symptoms, time to return to normal activity, complications, critical adverse events: major psychotic disorders, encephalitis, stroke, seizure; important adverse events: pain in extremities, clonic twitching, body weakness, dermatologic changes (urticaria or rash); influenza viral shedding, emergence of antiviral resistance
Caldeira D [ ]2012BMJACEIs and ARBsIncidence of pneumonia
Pneumonia related mortality
MacArthur GJ [ ]2012BMJOpiate substitution, methadone detoxificationHIV infection among people who inject drugs
Mantha S [ ]2012BMJProgestin-only contaceptionVenous thromboembolic events
Silvain J [ ]2012BMJEnoxaparin, unfractioned heparinPrimary outcome: Mortality, major bleeding
Secondary outcomes: Composite ischaemic end point (death or myocardial infarction), complications of myocardial infarction, minor bleeding
McKnight RF [ ]2012LancetLithiumRenal function, thyroid function, parathyroid function, hair disorders, skin disorders, bodyweight, teratogenicity

Abbreviations : AABs antibodies against biologic agents, ACEIs , angiotensin converting enzyme inhibitors, Ann Intern Med Annals of Internal Medicine , ARBs angiotensin receptor blockers, BMJ British Medical Journal , DPP-4 Dipeptidyl Peptidase-4, GLP-1 glucagon like peptide-1, JAMA Journal of the American Medical Association , MIC minimum inhibitory concentration, NSAIDs non-steroidal anti-inflammatory drugs, SGLT-2 sodium–glucose cotransporter 2, SSRIs selective serotonin reuptake inhibitors

Source of exposure and outcome data

Table ​ Table2 2 summarizes the evidence regarding the type of data source included in each meta-analysis, according to the information presented in the data extraction tables of the article. The information was evaluated taking the study design into account. Only eight meta-analyses [ 21 , 24 , 26 , 31 , 32 , 34 , 38 , 41 ] reported the source of data, three of them [ 31 , 34 , 38 ] reporting mixed sources for both the exposure and outcome assessment. Five meta-analyses [ 21 , 24 , 26 , 32 , 41 ] reported only secondary sources for the exposure assessment, three of them [ 21 , 24 , 41 ] reporting as well only secondary sources for the outcome assessment, while in the other two [ 26 , 32 ] only primary and mixed sources for the outcome assessment were reported respectively.

Reporting of the data source in the data extraction tables of the included meta-analyses

Meta-analysis (MA)Exposure assessmentOutcome assessment
Data source presented in MACohort studies (n)Case-control studies (n)Data source presented in MACohort studies (n)Case-control studies (n)
1ry2ryNR1ry2ryNR1ry2ryNR1ry2ryNR
Weiss J [ ]
Harms outcomes
No......No......
Bally M [ ]Yes03 0010Yes03 0010
Sordo L [ ]No ......No ......
Tariq R [ ]No ......No ......
Maruthur NM [ ]Yes 030...Yes 030...
Paul S [ ]No ......No ......
Li L [ ]
Heart failure
Yes012001Yes102001
Li L [ ]
Hospital admissions for heart failure
Yes006002Yes303002
Molnar AO [ ]No ......No ......
Ziff OJ [ ]No ......No ......
CGESOC [ ]No......No......
Bellemain-Appaix A [ ]No ......No ......
Grigoriadis S [ ]Yes230110Yes410200
Li L [ ]Yes012011Yes120002
Kalil AC [ ]No......No......
Stegeman BH [ ]Yes090881Yes4505120
Maneiro JR [ ]No ......No ......
Hartling L [ ]No......No......
Hsu J [ ]No ......No ......
Caldeira D [ ]Yes227071Yes0110314
MacArthur GJ [ ]No ......No ......
Mantha S [ ]No......No......
Silvain J [ ]Yes070...Yes070...
McKnight RF [ ]No......No......

Abbreviations : 1ry number of individual studies in each MA based on primary data sources, 2ry number of individual studies in each MA based on secondary data sources, NR number of individual studies in each MA with not reported data source

a Although the meta-analysis shows the results of methodological quality assessment based on a standardized scale, it does not indicate the type of data source used for each individual observational study included in the meta-analysis

b Cohort with nested case-control analysis

c The meta-analysis reports that most of the included observational studies assessed medication exposure through a review of medical records

d The meta-analysis reports only data from high-quality observational studies

Source of data in the analysis of heterogeneity

All but two [ 20 , 42 ] of the meta-analyses performed subgroup and/or sensitivity analyses. Although three of them [ 23 , 34 , 36 ] considered the methods of outcome assessment – type of diagnostic assay used for Clostridium difficile infection, method of venous thrombosis diagnosis confirmation, and type of scale for psychosis symptoms assessment respectively– as stratification variables, only the second referred to the origin of the data. Only five meta-analyses [ 22 , 28 , 33 , 35 , 39 ] included meta-regression analyses to describe heterogeneity, none of which considered the source of data as an explanatory variable. Other findings for the inclusion of the data source as a variable in the analysis of heterogeneity are presented in Table ​ Table3 3 .

Inclusion of the data source as a variable in the analysis of heterogeneity of the included meta-analyses

Meta-analysisSubgroup/ sensitivity analysisMeta-regression analysis
Exposure-related variablesOutcome-related variablesOther variablesType of data source includedExposure-related variablesOutcome-related variablesOther variablesType of data source included
Weiss J [ ]
Harms outcomes
...No...No
Bally M [ ]Timing of exposure to NSAIDs, dosage and duration of treatment, concomitant drug treatmentComorbiditiesAlternative statistical model, reason for exclusionNo...No
Sordo L [ ]Time interval in and out of opioid substitution treatment.Alternative statistical modelNoTreatment provider, prevalence of opioid injection, average methadone dose.Mean age, percentage of men, location, percentage of inpatient induction, percentage loss to follow-up, midpoint follow-up periodNo
Tariq R [ ]Type of gastric acid suppressant (PPI and H2B reported together, PPI alone, or H2B alone)Case definition (time interval of recurrence: within 60 days vs within 90 days), type of diagnostic assay used for infectionStudy design, study setting (inpatients vs outpatients), data adjustmentNo...No
Maruthur NM [ ]Mode of therapy..No...No
Paul S [ ]
Primary outcome
.Chronic or resolved hepatitis B virus infectionTumor and chemotherapy subtype, alternative statistical model, quality of designNo...No
Paul S [ ]
Secondary outcomes
..Alternative statistical model, quality of designNo...No
Li L [ ]Type of control, mode of therapy, individual drugs.Length of follow up, type of designNo...No
Molnar AO [ ]..Type of designNo...No
Ziff OJ [ ]
Primary outcome
..Data adjustment, population typeNoDifference between digoxin and control arms at baseline: Diabetes, hypertension, diuretics, anti-arrhythmic drugs.Summary bias score, baseline study level variable: Year of publication, age, sex, previous myocardial infarctionNo
Ziff OJ [ ]
Secondary outcomes
...No...No
CGESOC [ ]Duration of use in current and past users of hormone therapy, types of hormone therapyTumour histology and malignant potential of the tumourStudy design, geographical region, age at first use of hormone therapy, age at menarche, parity, oral contraceptive use, height, bosy mass index, alcohol use, tobacco use, mother or sister with ovarian/breast cancer, histerectomyNo...No
Bellemain-Appaix A [ ]Clopidogrel doseTypes of percutaneous coronary interventionType of designNo...No
Grigoriadis S [ ]Timing of exposure to SSRIs.Study design, congenital malformations, control, meconium aspirationNo...No
Li L [ ]Type of incretin agents, type of control, mode of therapy, individual incretin agents.Length of follow-up, alternative effect measure, alternative statistical modelNo...No
Kalil AC [ ]Different MIC cutoffs, assay typeHospital or 30-d mortalityPublication year, quality of designNoVancomycin MIC cut-off, vancomycin exposure in the previous 6 months, vancomycin trough levels, proportion of patients who received vancomycin treatmentControl mortality, APACHE II score, Charlson score, duration of bacteremia, proportion of patients with endocarditis, proportion of patients located in the intensive care unitAgeNo
Stegeman BH [ ]Generation of progestogen used in combined oral contraceptives, combined oral contraceptive pillMethod of diagnosis confirmationFunding source, study designYes (outcome)...No
Maneiro JR [ ]Type of biologic agent, concomitant treatment (monotherapy vs combined therapy), prior use of TNF inhibitorsType of diseaseLength of follow-up, data quality, study design, level of evidence of studiesNoType of biologic agent, prior use of TNF inhibitors,
method of measurement of antibodies, type of the anti-TNF monoclonal antibody
Type of disease, time of disease duration, time to assess responseAge and sex of patients, number of participants, length of follow-up, data quality, study design, level of evidence of studiesNo
Hartling L [ ]
Primary outcomes
Type of drug-comparisonType of scale for the assessment of symptoms and quality of life.No...No
Hartling L [ ]
Secondary outcomes
...No...No
Hsu J [ ]Individual drugs, dosage of antiviral, timing of treatment.Data adjustment, confirmed influenza, type of influenza A vs B, pandemic versus seasonal influenza, severity of influenza, age, pregnancy, baseline risk (e.g. immune-compromised), setting, funding conflictNo...No
Caldeira D [ ]
Incidence of pneumonia
..Study design, previous stroke, heart failure, chronic kidney disease, non-Asian patientsNo...No
Caldeira D [ ]
Pneumonia related mortality
..Study designNo...No
MacArthur GJ [ ]Duration of exposure to opiate substitution treatment.Data adjustment, geographical region, site of recruitment, monetary incentives, percentage of female participants, percentage of individuals from ethnic minoritiesNoExposure to methadone maintenance treatment at baseline only.Inclusion only of studies at lower risk of bias, inclusion only of studies that measured an incidence rate ratio, exclusion of studies that did not adjust for confoundersNo
Mantha S [ ]Route of administration.Data adjustmentNo...No
Silvain J [ ]Route of administration.Types of percutaneous coronary intervention, study publication, study size, quality of designNo...No
McKnight RF [ ]...No...No

Abbreviations : APACHE acute physiology and chronic health evaluation, MIC minimum inhibitory concentration, SSRIs selective serotonin reuptake inhibitors, TNF tumor necrosis factor

We finally assessed if the influence of the data origin on the conclusions of the meta-analyses was discussed by their respective authors. We found that only four meta-analyses [ 21 , 31 , 32 , 34 ] noted limitations derived from the type of data source used.

The findings of this research suggest that the origin of the data, either primary or secondary, is underexplored as a source of heterogeneity and an effect modifier in meta-analyses of drug effects published in general medicine journals with high impact. Few meta-analyses reported the source of data and only one [ 34 ] of the articles included in our survey compared and discussed the meta-analysis results considering the different sources of data.

Although it is usual to consider the design of the individual studies (i.e. case-control, cohort or experimental studies) in the analysis of the heterogeneity of a meta-analysis [ 43 , 44 ], the type of data source (primary vs secondary) is still rarely used for this purpose [ 9 , 45 ]. In fact, the current reporting guidelines for meta-analyses, such as MOOSE (Meta-analysis Of Observational Studies in Epidemiology) [ 18 ] or PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) [ 46 , 47 ], do not recommend that authors specifically report the origin of the data. This is probably due to the close relationship that exists between the study design and the type of data source used, despite the fact that each criterion has its own basis. Performing this additional analysis is a simple task that involves no additional cost. Failure to do so may lead to diverging conclusions [ 8 ].

Conclusions about the effects of a drug that are derived from studies based exclusively on data from secondary sources may be dicey, among other reasons, because no information is collected on consumption of over-the-counter drugs (i.e. drugs that individuals can buy without a prescription) [ 48 ] and/or out-of-pocket expenses for prescription drugs (i.e. costs that individuals pay out of their own cash reserves) [ 49 ]. In the health care and insurance context, out-of-pocket expenses usually refer to deductibles, co-payments or co-insurance. Figure ​ Figure2 2 shows the model that we propose to describe the relationship between the different data records according to their origin, including the possible loss of information (susceptible to be registered only through primary research).

An external file that holds a picture, illustration, etc.
Object name is 12874_2018_561_Fig2_HTML.jpg

Conceptual model of individual data recording. * Never dispensed. † Absence of dispensing of successive prescriptions (or self-medication) among patients with primary adherence, or inadequate secondary adherence

Failure to take these situations into account may lead to exposure measurement bias [ 48 , 49 ]. Consumption of a drug may be underestimated when only prescription data is used as secondary source without additionally considering unregistered consumption, such as over-the-counter consumption (e.g. oral contraceptives [ 34 , 50 ]), that may only be available from a primary database. Alternatively, this may occur when dispensing data for billing purposes (reimbursement) are used for clinical research, if out-of-pocket expenses are not considered (see Fig. ​ Fig.2). 2 ). The portion of the medical bill that the insurance company does not cover, and that the individual must pay on his own, is unlikely to be recorded. Data on the sale of over-the-counter drugs will also not be available in this scenario.

The reverse situation may also occur and consumption may be overestimated when only prescription data is used, if the prescribed drug is not dispensed by the pharmacist; or when dispensing data is used, if the drug is not really consumed by the patient. While primary non-adherence occurs when the patient does not pick up the medication after the first prescription, secondary non-adherence refers to the absence of dispensing of successive prescriptions among patients with primary adherence, or to inadequate secondary adherence (i.e. ≥20% of time without adequate medication) [ 51 ] (see Fig. ​ Fig.2). 2 ). In some diseases the medication adherence is very low [ 52 – 55 ], with percentages of primary non-adherence (never dispensed) that exceed 30% [ 56 ]. It should be noted that the impact of non-adherence varies from medication to medication. Therefore, it must be defined and measured in the context of a particular therapy [ 57 ].

Moreover, failing to take into consideration the portion of consumption due to over-the-counter and/or out-of-pocket expenses may lead to confounding , as that variable may be related to the socio-economic level and/or to the potential of access to the health system [ 58 ], which are independent risk factors of adverse outcomes of some medications (e.g. myocardial infarction [ 21 , 28 , 30 , 41 ]). Given the presence of high-deductible health plans and the high co-insurance rate for some drugs, cost-sharing may deter clinically vulnerable patients from initiating essential medications, thus negatively affecting patient adherence [ 59 , 60 ].

Outcome misclassification may also give rise to measurement bias and heterogeneity [ 61 ]. This occurs, for example, in the meta-analysis that evaluates the relationship between combined oral contraceptives and the risk of venous thrombosis [ 34 ]. In the studies without objective confirmation of the outcome, the women were classified erroneously regardless of the use of contraceptives. This led to a non-differential misclassification that may have underestimated the drug–outcome relationship, especially when the third generation of progestogen is analysed: Risk ratio (RR) primary data = 6.2 (95% confidence interval (CI) 5.2–7.4), RR secondary data = 3.0 (95% CI 1.7–5.4) [ 34 ].

On the one hand, medical records are often considered as being the best information source for outcome variables. However, they present important limitations in the recording of medications taken by patients [ 62 ]. On the other hand, dispensing records show more detailed data on the measurement of drug exposure. However, they do not record the over-the-counter or out-of-pocket drug consumption at an individual level [ 48 , 49 ], apart from offering unreliable data on outcome variables [ 62 , 63 ].

Limitations

The first limitation of this research is that its findings may not be applicable to journals not included in our survey such as journals with low impact factor. Despite the widespread use of the impact factor metric [ 64 ], this method has inherent weaknesses [ 65 , 66 ]. However, meta-analyses published in high impact general medicine journals are likely to be most rigorously performed and reported due to their greater availability of resources and procedures [ 12 , 14 ]. It is then expected that the overall reporting quality of articles published in other lesser-known journals will be similar. Another limitation would be related to the limited search period . In this sense, and given that the general tendency is the improvement of the methodology of published meta-analyses [ 67 , 68 ], we find no reason to suspect that the adverse conclusions could be different before the period from 2012 to 2018. Although it exceeds the objective of this research, one last limitation may be the inability to reanalyse the included meta-analyses stratifying by the type of data source since our study design restricts the conclusions to the published data of the meta-analyses, which were insufficiently reported , or the number of individual studies in each stratum was insufficient to calculate a pooled measure (see Table ​ Table2 2 ).

Owing to automated capture of data on drug prescription and dispensing that are used for billing and other administration purposes, as well as to the implementation of electronic medical records, secondary databases have generated enormous possibilities. However, neither their limitations, nor the risk of bias that they pose should be overlooked [ 69 ]. Thus, researchers should consider the link between administrative databases and medical records, as well as the advisability of combining secondary and primary data in order to minimize the occurrence of biases due to the use of any of these databases.

No source of heterogeneity in a meta-analysis should ever be considered alone but always as part of an interconnected set of potential questions to be addressed. In particular, the origin of the data, either primary or secondary, is insufficiently explored as a source of heterogeneity in meta-analyses of drug effects, even in those published in high impact general medicine journals. Thus, we believe that authors should systematically include the source of data as an additional variable in subgroup and sensitivity analyses, or meta-regression analyses, and discuss its influence on the meta-analysis results. Likewise, reviewers, editors and future guidelines should also consider the origin of the data as a potential cause of heterogeneity in meta-analyses of observational studies that include both primary and secondary data. Failure to do this may lead to misleading conclusions, with negative effects on clinical and regulatory decisions.

Additional file

Excluded articles. List of articles excluded with reasons for exclusion. (PDF 247 kb)

This study received no funding from the public, commercial or not-for-profit sectors.

Availability of data and materials

Abbreviations.

Ann Intern MedAnnals of Internal Medicine
BMJBritish Medical Journal
CIConfidence Interval
JAMA Intern MedJAMA Internal Medicine
JAMAJournal of the American Medical Association
MOOSEMeta-analysis Of Observational Studies in Epidemiology
Nat Rev Dis PrimersNature Reviews Disease Primers
NEJMNew England Journal of Medicine
PRISMAPreferred Reporting Items for Systematic reviews and Meta-Analyses
RRRisk ratio
VSVersus

Authors’ contributions

AF and GP-R contributed to study conception and design. GP-R, FR and AF contributed to searching, screening, data collection and analyses. GP-R was responsible for drafting the manuscript. FR, MTH, BT and AF provided comments and made several revisions of the manuscript. All authors read and approved the final version.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Competing interests.

The authors declare that they no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Guillermo Prada-Ramallal, Email: [email protected] .

Fatima Roque, Email: tp.gpi@euqorf .

Maria Teresa Herdeiro, Email: tp.au@oriedrehaseret .

Bahi Takkouche, Email: [email protected] .

Adolfo Figueiras, Phone: (+34) 981 95 11 92, Email: [email protected] .

Skip to Content

CU Boulder scholar wins support for research on political polarization

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via E-mail

Carnegie Corporation of New York commits $18 million over three years to help 28 scholars find solutions to a national problem

Seema Sohi , associate professor of ethnic studies at the University of Colorado Boulder, is one of 28 Andrew Carnegie Fellows who will receive stipends of $200,000 each for research that seeks to understand how and why our society has become so polarized and how we can strengthen the forces of cohesion to fortify our democracy, the Carnegie Foundation announced today.

With this focus, the Andrew Carnegie Fellows Program marks the start of an effort to develop a body of research around today’s growing political polarization. Under the direction of Dame Louise Richardson , the Corporation will commit up to $6 million annually to the program for at least the next three years.

Sohi’s winning project is titled “We Are Each Other’s Magnitude and Bond: A History of Climate Justice from Warren County to the Sunrise Movement.” She will investigate the intersection of the climate crisis, democracy and political polarization.

Sohi will undertake the first comprehensive history of the climate justice movement in the United States, centering the work of Black, Indigenous, Latina and Asian American women who have been unrecognized in environmental history and yet who have played a leading role in the struggle to advance climate justice and, with it, the struggle to realize the promises of a multiracial and sustainable American democracy.

Carnegie Fellows logo with political images

The Andrew Carnegie Fellows Program is supporting scholars who will develop a body of research around today’s growing political polarization.

“In doing so, I tell the story of the climate crisis not as one of impending disaster or resignation, but one of transformative possibility,” Sohi said. “At a time when we so many of us feel hopelessly divided and bitterly polarized, these climate activists and leaders do much more than reproduce grim scientific preconditions and fatalistic narratives. Instead, they show us that we are capable of collective action and of coming together to build a more just, equitable, and sustainable world.”

Sohi said she was “thrilled and honored” to have won a Carnegie Fellowship, adding: “What a gift to be able to spend the next two years working on a research project that means so much to me.”

Sohi is the author of Echoes of Mutiny: Race, Surveillance, and Indian Anticolonialism in North America , which examines the anticolonial politics of South Asian intellectuals and migrant workers in North America during the early 20th century. She has published essays and articles in the Journal of American History, Sikh Formations, Amerasia and the Journal of Modern European History, as well as in the anthologies The Sun Never Sets: South Asian Migrants in an Age of U.S. Power and Asian American Literature in Transition .

“The foundation’s support of these fascinating projects is a considered effort to mine scholarship for insights into the underlying causes of the political polarization that is damaging our democracy,” said Richardson. “We also hope to gain insights into the means by which collectively we can mitigate the negative effects of this polarization on our society.”

The focus on political polarization attracted more than 360 applications, a record high for the program. Selection criteria prioritized the originality and promise of the research, its potential impact on the field and the applicant’s plans for communicating the findings to a broad audience. A panel of jurors composed of current and former leaders from some of the nation’s preeminent institutions made the final selections.

“This year marks the first time the jury was asked to assess proposals addressing a single topic—the pervasive issue of political polarization as characterized by threats to free speech, the decline of civil discourse, disagreement over basic facts, and a lack of mutual understanding and collaboration,” said John J. DeGioia , chair of the jury and president of Georgetown University.

He noted with gratitude the contributions of long-standing juror Jared L. Cohon , president emeritus of Carnegie Mellon University, who died unexpectedly in March. The 2024 selections reflected his highly regarded evaluations. “We were especially gratified,” DeGioia added, “by the rigor of the submissions, the wide range of perspectives, and the potential for lasting impact.”

Of the 28 fellows selected, 12 are junior scholars, 15 are senior scholars, 11 are employed by state universities, 16 are employed by private universities and one is a journalist.

At a time when we so many of us feel hopelessly divided and bitterly polarized, these climate activists and leaders do much more than reproduce grim scientific preconditions and fatalistic narratives. Instead, they show us that we are capable of collective action and of coming together to build a more just, equitable, and sustainable world.”

Among the research topics:

  • Challenging the assumption that politicians are becoming more extreme, while voters are becoming more moderate
  • Investigating the impact of polarization on the public’s trust in government and medicine while finding ways to improve health care overall
  • Understanding how and why diverging conceptions of womanhood have become a factor in the polarization of white women, especially in the South
  • Exploring algorithms that would expose individuals to diverse political opinions and finding low-cost ways to limit the monetization of misinformation
  • Evaluating the effectiveness of redistricting reforms to increase electoral competition and decrease geographic partisanship ahead of the 2031 redistricting cycle
  • Understanding how election denialism is affecting the work of state and local election workers and how to rebuild trust in the voting process
  • Exploring “party misfits,” the 50 percent of Americans who do not sort easily into Republican or Democratic camps, and the growing gap between voters and political elites
  • Examining how attitudes toward the credibility of science shape polarized responses to policies that affect the environment

As part of a competitive nomination process, more than 650 individuals—including the heads of universities, independent research institutes, professional societies, think tanks, major university presses and leading publications—were invited to recommend a junior and a senior scholar for consideration. All applications underwent a preliminary anonymous evaluation by leading authorities in the relevant fields of study. The highest scoring proposals were then forwarded to the jury.

Founded in 2015, the Andrew Carnegie Fellows Program provides one of the most generous stipends of its kind for research in the humanities and social sciences. To date, the Corporation has named more than 270 fellows, representing a philanthropic investment of more than $54 million.

The award is for a period of up to two years and the anticipated result is generally a book or major study. Congressional testimony by past fellows has addressed topics such as social media and privacy protections, transnational crime, governmental responses to pandemics and college affordability. Fellows have received honors including a Nobel Prize and a National Book Award.

The Andrew Carnegie Fellows Program is a continuation of the mission of Carnegie Corporation of New York, as founded by Andrew Carnegie in 1911, to promote the advancement and diffusion of knowledge and understanding. Read more about the  Andrew Carnegie Fellows Program , the work of past honorees , the  criteria  for proposals and a historical  timeline  of scholarly research supported by the corporation.

Did you enjoy this article?  Subcribe to our newsletter.  Passionate about ethnic studies?  Show your support.

Related Articles

Chaka Khan, Aretha Franklin and Donna Summer

Soul sisters, funksters and Afro-disco divas: the heroes of an unsung movement

Hank Aaron hitting 715th home run

Remembering 715, a number that transcended baseball

Antiracism course

Team’s anti-racism course allows thousands to ‘engage with their identity’

  • Division of Social Sciences
  • Ethnic Studies

IMAGES

  1. Are Secondary Submissions Unethical?

    secondary research scholarly articles

  2. Secondary Research

    secondary research scholarly articles

  3. Secondary research

    secondary research scholarly articles

  4. Southern Studies

    secondary research scholarly articles

  5. Using Secondary Research For Better Decisions: An Overview

    secondary research scholarly articles

  6. Primary Vs. Secondary Sources

    secondary research scholarly articles

VIDEO

  1. Part Two of How to Write a Scholarly Introduction

  2. How to Research: Getting Started

  3. Database Searching: Selecting Keywords

  4. Scientific Method for Research​​ #reseach #study

  5. Searching scholarly articles from research databases

  6. Qualitative Research Tools

COMMENTS

  1. Secondary Analysis Research

    Secondary analysis of data collected by another researcher for a different purpose, or SDA, is increasing in the medical and social sciences. This is not surprising, given the immense body of health care-related research performed worldwide and the potential beneficial clinical implications of the timely expansion of primary research (Johnston, 2014; Tripathy, 2013).

  2. Conducting secondary analysis of qualitative data: Should we, can we

    This critical interpretive synthesis examined research articles (n = 71) published between 2006 and 2016 that involved qualitative secondary data analysis and assessed the context, purpose, and methodologies that were reported. ... scholarly journals between the years 1996 and 2016. They also had to meet the following inclusion criteria: (a ...

  3. Protecting against researcher bias in secondary data analysis

    However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. ... a survey among academic researchers in The Netherlands. 2021. [PMC free article] 62.

  4. Conducting secondary analysis of qualitative data ...

    This critical interpretive synthesis examined research articles (n = 71) published between 2006 and 2016 that involved qualitative secondary data analysis and assessed the context, purpose, and methodologies that were reported. ... scholarly journals between the years 1996 and 2016. They also had to meet the following inclusion criteria: (a ...

  5. Secondary Data Analysis: Using existing data to answer new questions

    Secondary data analysis is a valuable research approach that can be used to advance knowledge across many disciplines through the use of quantitative, qualitative, or mixed methods data to answer new research questions ( Polit & Beck, 2021 ). This research method dates to the 1960s and involves the utilization of existing or primary data ...

  6. Secondary Data in Research

    In simple terms, secondary data is every. dataset not obtained by the author, or "the analysis. of data gathered b y someone else" (Boslaugh, 2007:IX) to be more sp ecific. Secondary data may ...

  7. Secondary Qualitative Research Methodology Using Online ...

    In addition to the challenges of secondary research as mentioned in subsection Secondary Data and Analysis, in current research realm of secondary analysis, there is a lack of rigor in the analysis and overall methodology (Ruggiano & Perry, 2019). This has the pitfall of possibly exaggerating the effects of researcher bias (Thorne, 1994, 1998 ...

  8. Secondary Data Analysis in Nursing Research: A Contemporary Discussion

    Abstract. This editorial provides an overview of secondary data analysis in nursing science and its application in a range of contemporary research. The practice of undertaking secondary analysis of qualitative and quantitative data is also discussed, along with the benefits, risks and limitations of this analytical method.

  9. Seminars article Understanding the impact and challenges of secondary

    The use of secondary data in medical research has grown tremendously in recent years. Secondary data analysis is commonly defined as the use of datasets, which were not collected for the purpose of the scientific hypothesis being tested. ... Google Scholar [3] M. Sun, S. Lipsitz. Comparative effectiveness research methodology using secondary ...

  10. What is Secondary Research?

    Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research. Example: Secondary research.

  11. Use of secondary data analyses in research: Pros and Cons

    Given the nature of the academic enquiry, the qualitative secondary data analysis was the most appropriate for this study because of the dynamic and complex nature of the topic (Johnston 2014).

  12. Secondary Data: sources, advantages and disadvantages.

    the online version will vary from the pagination of the print book. 1. 2. Secondary data is usually defined in opposition to primary data. The latter is directly obtained. from first-hand sources ...

  13. Primary versus secondary source of data in observational studies and

    The findings of this research suggest that the origin of the data, either primary or secondary, is underexplored as a source of heterogeneity and an effect modifier in meta-analyses of drug effects published in general medicine journals with high impact. Few meta-analyses reported the source of data and only one [] of the articles included in our survey compared and discussed the meta-analysis ...

  14. Secondary Research: Definition, Methods & Examples

    This includes internal sources (e.g.in-house research) or, more commonly, external sources (such as government statistics, organizational bodies, and the internet). Secondary research comes in several formats, such as published datasets, reports, and survey responses, and can also be sourced from websites, libraries, and museums.

  15. Secondary analysis of existing data: opportunities and implementation

    The secondary analysis of existing data has become an increasingly popular method of enhancing the overall efficiency of the health research enterprise. But this effort depends on governments, funding agencies, and researchers making the data collected in primary research studies and in health-related registry systems available to qualified ...

  16. Secondary Research Advantages, Limitations, and Sources

    Compared to primary research, the collection of secondary data can be faster and cheaper to obtain, depending on the sources you use. Secondary data can come from internal or external sources. Internal sources of secondary data include ready-to-use data or data that requires further processing available in internal management support systems ...

  17. Tutorial: Evaluating Information: Primary vs. Secondary Articles

    Primary vs. Secondary Research Articles. In the sciences, primary (or empirical) research articles: are original scientific reports of new research findings (Please note that an original scientific article does not include review articles, which summarize the research literature on a particular subject, or articles using meta-analyses, which ...

  18. Peer-Reviewed Research: Primary vs. Secondary

    Peer Review within Scholarly Publications. A meta-analysis is a quantitative method of combining the results of primary research. In analyzing the relevant data and statistical findings from experimental trials or observational studies, it can more accurately calculate effective resolutions regarding certain health topics.

  19. Examples of Primary and Secondary Sources in Research

    Scholarly Articles. Peer-reviewed academic journals that summarize, critique, or build upon existing research are secondary sources. Researchers use these to stay up-to-date with current scholarship. A scholarly article, also known as a research or academic article, is a publication written by experts in a particular field.

  20. Secondary Data

    Types of secondary data are as follows: Published data: Published data refers to data that has been published in books, magazines, newspapers, and other print media. Examples include statistical reports, market research reports, and scholarly articles. Government data: Government data refers to data collected by government agencies and departments.

  21. A Review of the Relationship between Parental Involvement and Secondary

    Google Scholar is a web site providing peer-reviewed papers, books, abstracts, and articles from academic publishers, professional societies, universities, and other scholarly organizations. The Brigham Library at Educational Testing Service and the Strozier Library at Florida State University both house comprehensive collections of educational ...

  22. Conserved intronic secondary structures with ...

    Identification of transcriptome-wide RNA secondary structures with concealed BSs in the alternatively spliced introns in six species. (A) A BS within a stable secondary structure would result in intron retention or skipping of its flanking exons.(B) Strategy for searching concealed BSs in secondary structures in six species.(C) The numbers of genes with identified concealed BSs in six species.

  23. Secondary Data Analysis: Ethical Issues and Challenges

    Secondary data analysis. Secondary analysis refers to the use of existing research data to find answer to a question that was different from the original work ( 2 ). Secondary data can be large scale surveys or data collected as part of personal research. Although there is general agreement about sharing the results of large scale surveys, but ...

  24. Protecting against researcher bias in secondary data analysis

    Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society's most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases ...

  25. July/August 2024

    Journal of Public Health Management and Practice publishes articles which focus on evidence based public health practice and research. The journal is a bi-monthly peer-reviewed publication guided by a multidisciplinary editorial board of administrators, practitioners and scientists. Journal of Public Health Management and Practice publishes in a wide range of population health topics including ...

  26. What Is Effective Communication? Skills for Work, School, and Life

    Effective communication is the process of exchanging ideas, thoughts, opinions, knowledge, and data so that the message is received and understood with clarity and purpose. When we communicate effectively, both the sender and receiver feel satisfied. Communication occurs in many forms, including verbal and non-verbal, written, visual, and ...

  27. Qualitative Secondary Analysis: A Case Exemplar

    Qualitative secondary analysis (QSA) is the use of qualitative data collected by someone else or to answer a different research question. Secondary analysis of qualitative data provides an opportunity to maximize data utility particularly with difficult to reach patient populations. However, QSA methods require careful consideration and ...

  28. Seven models of undergraduate research for student success

    This program is designed for students from historically marginalized groups including low-income and first-generation students. The goal of RISE is to equip students to take on larger, more intensive academic-year and summer experiences for later in their college career. Each student receives $2,500 in scholarships and funds to cover on-campus ...

  29. Primary versus secondary source of data in observational studies and

    Background. Specific research questions are ideally answered through tailor-made studies. Although these ad hoc studies provide more accurate and updated data, designing a completely new project may not represent a feasible strategy [1, 2].On the other hand, clinical and administrative databases used for billing and other fiscal purposes (i.e. "secondary data") are a valuable resource as ...

  30. CU Boulder scholar wins support for research on political polarization

    As part of a competitive nomination process, more than 650 individuals—including the heads of universities, independent research institutes, professional societies, think tanks, major university presses and leading publications—were invited to recommend a junior and a senior scholar for consideration.