Abstract
Studies have identified the strong presence of sampling bias in ethnobiological research, which may seriously compromise study results. However, these studies were made in the context of Brazilian studies and global sampling evaluations are still needed. The present study adopted a global scale and was based on ethnobotanical surveys of medicinal plants in open fairs and markets. We aimed to assess sample quality and to identify the factors that interfere with it. Among the factors we investigated were how the (a) year of publication, (b) CiteScore, (c) presence of a clear research question, (d) presentation of hypotheses, and (e) the use of ethnobotanical indices influences the presence of sampling bias. The main source of bias verified in the studies was the absence of information about the sample and the population. None of the variables tested interfered with the level of bias of the studies. Efforts are needed to correct quantitative studies regarding sampling procedures, and the peer-review exercise in scientific journals should be attentive to sampling bias.
Introduction
The community of ethnobiologists has been increasingly concerned with using more rigorous research methods (Vandebroek et al. 2020). The selection of appropriate sampling techniques is among these concerns, as sampling is an essential tool in ethnobiological research. Sampling is not always about approaching many people. In some situations, it may lead to the selection of a restricted group of respondents, such as people possessing knowledge about some specific cultural domain. For example, if a given study aims to record the use of plants during childbirth, researchers may opt for a purposive sampling directed to local birth attendants.
When studies use samples to provide conclusions about a population, a sampling error will always be associated; that is, the difference between a sample result and the actual population result. Insufficient sample size can give rise to Type I error (i.e., rejecting a null hypothesis that is true in the population) or to Type II error (i.e., failing to reject a null hypothesis that is false in the population) (Bartlett et al. 2001). The sampling error cannot be avoided, but it can be reduced by choosing an appropriate sample type and size. Therefore, probability sampling methods, those in which individuals are selected for the study based on the principle of randomization, need to be properly employed in ethnobotanical surveys of medicinal plants, particularly when the data are used to make statistical inferences (Espinosa et al. 2012). However, some authors argue that even studies that do not include statistical analyses should establish an ideal sample size (Bartlett et al. 2001). This is especially true if results are extrapolated for the entire population. For example, if a descriptive study aims to indicate the most used medicinal species in a region, results obtained from a sample of 10 out of 1000 individuals will generate a list that probably does not correspond to the most used species.
Simple random, stratified, cluster, and systematic probability sampling methods are the most used probability sampling methods (Bolfarine and Bussab 2005) and they can all generate representative samples. In a simple random sampling, for example, each elementary unit has the same probability of being selected for the study (Bolfarine and Bussab 2005). Although it is the easiest to design, it can be difficult to execute and it can also be costly. Therefore, the other probability sampling methods can help to avoid this cost.
In the non-probabilistic sample, individuals are selected without following probabilistic procedures. Snowball sampling and sampling by convenience are well used in ethnobiology, saving both time and money, but without the possibility of generating statistical inferences about the whole population (Espinosa et al. 2012). However, not being generalizable for the entire population does not mean that sampling procedure is wrong. Snowball sampling may be adequately used if it was, for example, designed to capture only the most knowledgeable individuals, and study conclusions were only drawn for this group.
Aiming at the issue of sampling in ethnobiology, studies with quantitative ethnobotanical studies involving medicinal plants, such as Medeiros et al. (2014), verified that most studies are not concerned with the representativeness of their samples or do not clarify their criteria for selecting informants. In nearly half of the quantitative studies (48.39%), there was a high risk of the sample causing bias in the results obtained. Another important conclusion was that most of the studies that did not have sampling bias achieved this because they approached the entire or almost the entire sampling universe or because they correctly applied the snowball technique, i.e., not generalizing results for the entire population. This highlights the sampling flaws in Brazilian research, as several studies classified with a low risk of bias did not need to use complex sampling techniques.
In another systematic review related to sampling bias (Lyra-Neves et al. 2015), most Brazilian ethnozoological studies (66.98%) were also classified as having a high risk of bias. Furthermore, the problems concentrate especially on the presentation of methodological procedures, as the sampling strategy and the sample size are not disclosed.
The two studies mentioned above (Lyra-Neves et al. 2015; Medeiros et al. 2014) focused on the Brazilian research experience. However, there are still no studies that evaluate, on a global scale, how ethnobiological studies are dealing with sampling issues.
Therefore, the present study investigated quantitative studies on medicinal plants sold in open fairs and public markets, which featured plant lists, to identify possible sampling bias and the factors that influence its presence. Choosing markets allowed us to investigate sampling bias within a topic that is not too large to require a geographic limitation, allowing a global approach. Additionally, there is an urgent need to evaluate the quality of study designs in the context of markets, considering that urban ethnobotanical studies are becoming more popular and their specificities are not being properly discussed in literature. We present below our questions and hypotheses, accompanied by the theoretical background behind them.
Are recent studies more adequate in terms of sampling? Hypothesis: Recent studies have more adequate samples due to the increased number of manuals and courses on methodological improvement in the field of Ethnobiology (Oliveira et al. 2009; Stepp 2005).
Is journal impact factor related to sampling quality? Hypothesis: Studies published in high-impact journals have more adequate samples. It is expected that high-impact journals should rely on a more rigorous review process, through which possible methodology inconsistencies could be corrected or, if irreparable, lead to the rejection of the manuscript. Furthermore, studies that do not present their methods in detail are rejected by highly ranked journals (McClatchey 2006).
Do the studies that present a clear and objective research question have more adequate samples? Hypothesis: Studies with well-established research questions have more adequate samples. This is expected since the formulation of a clear question is part of the design of a well-structured study. It helps to bring clarity to the objectives of the study and to make more appropriate methodological choices.
Do the studies that test hypotheses have more adequate samples? Hypothesis: Studies that present hypo-” theses have more adequate samples. It is expected that the formulation of hypotheses requires a greater concern by the researchers regarding sample representativeness.
Do the studies that use ethnobotanical indices have more adequate samples? Hypothesis: Studies that present ethnobotanical indices have more adequate samples. It is expected that studies with indices present fewer sampling biases.
Methodology
Search Strategy
A bibliographic search was conducted between January 2019 and January 2020 (Figure 1). We chose three databases to perform the article search: Google Scholar, Scopus, and Web of Science. We used the following combinations of keywords: (a) ethnobotany + market; (b) ‘medicinal plants' + ‘local market'; (c) ‘medicinal plants' + ‘traditional market'; (d) ethnobotany + fair; (e) ‘medicinal plants' + ‘local fair'; (f) ‘medicinal plants' + ‘traditional fair’; (g) ‘medicinal plants’ + ‘open fair’; (h) ‘medicinal plants’ + ‘market survey’ + ethnobotany; and (i) ‘medicinal plants’ + ‘market survey’ + ethnopharmacology. We also carried out a “snowball” search to identify additional studies by searching the reference lists of publications eligible for full-text review. This approach aimed at identifying studies that could not be captured by our search engines and/or keywords.
Refinement of the Search Results
We evaluated the abstracts of the articles found in our bibliographic search, aiming to verify whether the articles met our inclusion criteria: research with new results, text in the English language, and focused on medicinal plant sellers. In some cases, the abstract of the article was not sufficient to perform the exclusion (for example, articles with the abstract in English, but with the full text in another language). Therefore, these articles were excluded in the subsequent stage.
Exclusion by Initial Impediment and by the Nature of the Study
At this stage, the articles were read in full to identify possible initial impediments, namely: (1) not addressing the theme; (2) written in a language other than English; (3) not presenting a list of species; (4) when interviews were not conducted with sellers or when information provided by sellers could not be filtered (e.g., in studies that included sellers and consumers). Subsequently, the studies were classified as quantitative or qualitative, after which only the quantitative articles underwent a bias risk assessment. We chose this procedure because, although qualitative studies also require quality samples, the procedures assessing sampling quality are more complex and context-dependent than for quantitative studies. The studies were considered quantitative when they featured some statistical analysis or ethnobiological index.

Flowchart with the number of articles at each stage of the systematic review.
Risk of Bias
The articles underwent a bias risk assessment through the criteria adapted from Medeiros et al. (2014), shown in Table 1. Each article was assessed regarding the quality of its sample as having a low, moderate, or high risk of bias. Sampling of each study was classified into at least one of the four cases presented in Table 1, and then a risk of bias level was assigned according to the most appropriate option to describe the study sample. Some criteria found in Medeiros et al. (2014) were summarily removed from the chart as they are not applicable to the context of local markets. Universe was considered the total number of sellers in the market. In cases where sellers were distributed in more than one place of sale, the universe corresponded to the total number of sellers in each place. In studies that used rarefaction or species accumulation curves to define the ideal sample size, the presence of information on sample and universe was observed, and whether the curve of the graph got stabilized, which is a criterion used to conclude the inclusion of new elementary units in a sample.
Data Analysis
Considering that the number of studies with a low risk of bias was low, the statistical analyses for the tests of hypotheses were made by comparing (a) studies with a low or moderate risk of bias with (b) studies with a high risk of bias. To test hypothesis (H) 1 (recent studies have more adequate samples than previous studies), we performed a simple logistic regression using the year of publication of the study as the explanatory variable and the risk of bias as the response variable (low or moderate x high).
To test H2 (studies published in high-impact journals have more adequate samples), we accessed the CiteScore (Scopus impact factor) of each journal in which the articles were published according to the values available in December 2021. The journals that were not indexed in the Scopus database received a score of zero. Subsequently, we performed a simple logistic regression using the CiteScore value as the explanatory variable and the risk of bias as the response variable.
Criteria used to establish the risk of sampling bias in studies on medicinal plants conducted in markets, adapted from Medeiros et al. (2014).
We tested H3 (studies with a well-established research question have more adequate samples) through Fisher's exact test in a 2×2 contingency table using the risk of bias (low or moderate x high) and the presence of a research question (yes x no) as categorical variables. The same was done to test H4 (studies that present hypotheses have more adequate samples) and H5 (studies that present ethnobotanical indices have more adequate samples), replacing the presence of questions with the presence of hypotheses and indices, respectively. All analyses were performed in the software R v. 4.0.1 in the interface RStudio v 1.1.456.
Results
General Aspects
We identified 442 studies through database searching, which were reduced to 389 after the duplicates were removed. Only 253 studies were screened for abstract assessment and 161 were excluded by impediments, of which 94 were not conducted with sellers only, 39 are not about the topic, ten were written in a language other than English, nine studies presented no species list, and nine were excluded for other reasons.

Geographic distribution of ethnobiological studies without impediments conducted in local markets and that were considered for this review.
We retrieved 92 studies without initial impediments, distributed in 41 countries. However, when filtering the quantitative studies, only 38 studies remained (Supplementary Material), distributed in 28 countries (Figure 2), highlighting Brazil (four studies) and South Africa (three studies).
The number of ethnobotanical studies conducted in open fairs, markets, and similar spaces has been growing gradually, with the last decade featuring the greatest expression in numerical terms (Figure 3). Of the 38 quantitative studies, 22 (57.9%) had a high risk of bias, ten (26.3%) had a moderate risk, and only six (15.8%) had a low risk of bias. The main reason for the high risk of bias found in the studies conducted in markets is the lack of information about the sample size (n) or the universe (U) when the study sample is taken from the totality of sellers in the region (Table 2).

Number of ethnobotanical studies conducted in local markets throughout the world according to the year of publication.
Most studies that fit the moderate risk category used rarefaction curves that stabilized or approached stabilization. However, these studies did not indicate the sizes of n and/or U. In the case of studies with a low risk of bias, the main reason for this classification was the selection of experts based on clear and well-established criteria.
Recent Studies Still Reproduce Sampling Bias
The year of publication did not interfere with the risk of bias attributed to the study (Table 3). Thus, the hypothesis that recent studies would have more adequate samples was not supported.
Studies from High-Impact Journals Are Not Free from Sampling Bias
We did not find evidence supporting this hypothesis since no relationships were found between the risk of bias of the article and its journal's CiteScore (Table 4). It would be expected that high-impact journals would have a more rigorous review process through which such sampling bias would not reach the published version of the article.
The Presence of a Research Question, Hypotheses, and Ethnobiological Indices Were Not Associated with Sampling Bias
H3, H4, and H5 could not be supported. Studies with a research question and without a research question were not different regarding the presence of a high risk of bias (p = 0.293). The same happened when comparing the studies with and without hypotheses (p = 0.727). The studies with and without ethnobiological indices were also not different regarding the risk of bias (p = 0.21).
Discussion
Why Do Studies regarding Sale of Medicinal Plants Show High Risk of Bias?
The prevalence of studies with a high risk of bias among surveys of medicinal plants conducted in markets raises concerns as the results may be biased. This high-risk classification also prevailed among the systematic reviews conducted in Brazil outside the market context (Lyra-Neves et al. 2015; Medeiros et al. 2014).
Most frequent reasons for risk of bias in studies on medicinal plants sold in local markets.
Regarding the reasons for categorizing studies as having a high risk of bias, the absence of information about the sample or the universe was also highlighted in the Brazilian study on medicinal plants outside the market context (Medeiros et al. 2014), suggesting that merely presenting the data referring to the sample and the universe already provides the quantitative studies with greater reliability. For moderate-risk, the principal reason for the classification is that species accumulation curves have been considered as a viable and practical solution for ethnobiological studies (Williams et al. 2006). However, researchers who use rarefaction curves need to provide the sample and universe size. This was a big limitation in the studies analyzed here.
Logistic regression results between the year of publication and the risk of bias for studies on medicinal plants sold in public markets.
Logistic regression results between the CiteScore (as of 2021) of the journal where the article was published and the risk of bias for studies on medicinal plants sold in public markets.
That the presence of well-established intentional samples was the most common reason for the low-risk classification is an indication that ethnobiologists tend to possess more skills for selecting respondents based on techniques, such as the snowball, than by effectively using probability sampling. This may be related to the fact that techniques like this usually require a smaller “n” than probability samples, facilitating their appropriate use. However, such approaches do not represent the broader population of sellers, so, in the case of intentional samples, it is imperative to elucidate the specific group which that small sample represents, which can never be generalized to sellers who were not eligible to participate in the sample. Special attention must be given to the conclusion section of studies that employ snowball sampling, considering that it is not uncommon to find improper generalizations for the target population when resuming the study.
Why Do Recent Studies Not Have Fewer Sampling Biases?
The fact that more recent studies are not free from sampling bias demonstrates that, despite the growing production of manuals on the technical and methodological aspects of ethnobiology (Albuquerque et al. 2008; Alexiades 1996; Cotton 1996; Cunningham 2001; Martin 1995) and the increase in the number of ethnobotanical programs and courses in universities (Hamilton et al. 2003), the methodological rigor in terms of sampling is still not part of the routine of elaboration of most studies. This result also aligns with the observations of Brazilian studies outside the market context, signaling that recent studies are not free from sampling bias (Lyra-Neves et al. 2015; Medeiros et al. 2014). Furthermore, it is possible that the reproduction of sampling bias found in most of the studies reviewed here still reflects a deficient training of ethnobiologists. Some researchers indicate the need for increasing investment in the continuous training of these professionals for the development of scientific writing skills, the proper application of scientific methods in their experimental projects, and the understanding of the philosophy of science (Albuquerque and Hanazaki 2009). The combination of these elements should increase the quality of science and, consequently, the quality of manuscripts before they go through reviewers' filters during the peer review process.
Why Does Sampling Bias Persist Even in High-Impact Journals?
Factors pointed out by scientists can be added to explain the persistence of sampling bias even in scientific articles published in journals with a high impact factor, including the heavy workload of reviewers, when there is a greater demand for articles to be evaluated than the number of reviewers available in a given area, versus the growing number of existing scientific journals (Albuquerque 2014). In this case, the quality of the reviewer's feedback may be compromised.
Furthermore, the reason for citing an ethnobiological study may often not be due to its theoretical contribution or research design. For example, it is common for articles with long lists of medicinal plants to be widely cited by studies that only mention them to subsidize or exemplify the popular use of a species. Therefore, journals that usually publish studies with such lists may have several citations regardless of research quality or design. In a health journal editorial, Hallberg (2012) argues that the calculation of the impact factor as it is proposed—the number of times that articles are cited during the year divided by the total number of citable articles in the same period—does not reveal the quality of research of published articles. Also, the author highlights that there are reasons that lead less refined papers to receive more citations.
Elaboration of Questions or Hypotheses and Use of Indices Were Not Associated to Reduction of Sampling Bias
The absence of a relationship between the risk of bias and the presence of questions and hypotheses reveals a great generalization of sampling bias regardless of the higher qualification of some research groups in designing studies with well-established questions or hypotheses. These results also come close to what was identified for the Brazilian study with medicinal plants (Medeiros et al. 2014). Therefore, it is clear that studies with medicinal plants, either within or outside the market context, might be answering its questions or testing hypotheses in a biased way based on weak samples. Furthermore, the absence of a relationship between the risk of bias and the presence of ethnobiological indices indicates that the importance of medicinal plants in the markets may be over- or underestimated due to sampling biases.
Final Considerations
The prevalence of studies with a high risk of bias in ethnobiological surveys of medicinal plants in the market context is an indication that ethnobiologists need to observe more effectively the design of their studies. Furthermore, scientific journals—including those with a higher impact factor—need to be prepared to select reviewers with skills that allow identifying sampling bias.
We highlight our study limitations as our findings may not be generalized to other domains in ethnobiology. Although our findings show a prevalence of biased samples in studies in local markets, our sample size is relatively small compared to ethnobiological literature on broader topics.
Overall, several studies only present punctual sources of sampling bias that could be solved with simple text changes, such as those that did not present the universe size or the criteria for selecting respondents. In these cases, the flaw may be more related to the suppression of important information during the writing process than to the delimitation of the research. Nevertheless, authors and journal reviewers need to be more attentive and demand that information regarding the sampling strategy of the research is shown in the manuscript.
Footnotes
Acknowledgments
The present study was performed with the support of the Coordination for the Improvement of Higher Education Personnel—Brazil (CAPES)—Financing Code 001. We also acknowledge the National Council for Scientific and Technological Development (CNPq) for the productivity grant provided to P. Medeiros (302786/20163).
