Scientific consensus: how research works
Research is a slow and long process. It is far from what is pictured, with a new discovery that becomes instantly the new scientific truth. When a new discovery is first made, it takes a while before it is accepted as a scientific consensus. The reason lies in how research works.
Research is different depending on the field. In the first case, we will explore is fundamental science. Here, it is rather easy to apprehend. We can distinguish two cases:
You take a conjecture from the literature, a statement accepted to be true, and you construct a proof from the hypothesis to the conclusion. It is like this in some mathematics fields, for example.
You create a theory from an hypothesis and you construct the proof. It is like this in theoretical physics, for example. However, if it isn’t a new field and is overlapping with existing theories, the results must respond to the same problems as the previous theory and perform better on other problems.
A famous exemple of the previous case is Einstein’s theory of general relativity which outperformed Newton’s theory of gravitation which already outperformed Galileo’s theory of gravitational acceleration.
Here, the proof just need to be reviewed by other scientists, called peers, and if it isn’t flawed, a new theorem or formula is born.
Experimental & social science
In experimental science, things are quite different. As the name indicates, you need to experiment. 
To test your theory, you need to establish a protocole. Then, you conduct your experiment from your protocole to retrieve data. You often perform the experiment multiple times to confirm the results. The more data you get, the more relevant your results are.
Social science is very similar to experimental science in its methodology. You build a model or a protocole for your experiment and you harvest your data to analyse them. 
In both, you need to study the literature to place your study in it and see if it performs better. Then, it can be accepted by the scientific community, thanks to peer reviewing.
The importance of p-value metric
To interpret data, scientists use statistic models, depending of the kind of data they have. Some of them suitable for specific data but are not for others.
The last statistic tool generally used is a metric, called p-value, evaluating the viability of the results, their likelihood, how trustful they are.
The p-value is often misunderstood  so we’ll take a little time to explain it.
The p-value stands for probability value. It defines the probability that the null hypothesis (the hypothesis that the results given by the statistical model is obtained by chance only) is correct. Smaller is the p-value, more improbable is the null hypothesis and then can be rejected.
You might wonder why would we need such a thing? Why is there chance involved in science?
In fact, in any data collection, you have noise, a randomness that mess with the data. It is due to the measure instruments, the experimenter, the subject of the study, the conditions in which the experiment takes place and a myriad of others uncontrollable elements. They can be limited, but can’t be prevented. That is why the p-value is here: to estimate this randomness.
However, be very careful. An experiment with a p-value of 0.05 doesn’t mean that you have a 5% chance that its results are due to chance only (which would be considered as a false positive). It means that you have a 5% chance to have those results (or more extreme results) where the null hypothesis can be true. Therefore, the false positive rate will be higher than 5%.
The p-value researchers are looking for depends on their field and how statistically significant their results need to be. For instance, in biology, medicine or psychology, papers seek a p-value of 0.05, 0.01 or 0.005. In particle physics, the p-value needs to be lower than 0.003 (to report the evidence of a particle) or 0.0000003 (to report the discovery of a particle).
The protocole makes the strength of a study
Besides, to respond to a specific problematic in science, you have more than one way to do it, more than one protocole or experiment possible.
Some are faster, easier to conduct. They depend on the technics and technology used. But they come with a counterpart, they are generally less robust.
For instance, in medical research, double-blind randomised study is considered one of the most robust study there can be.
Also, a protocol with more participants or more experiments will also be more robust, reducing the amount of the noise, explained before. In the other hand, a study with few participants won’t be able to distinguish noise from actual result, even with a very low p-value.
How do we get a scientific consensus?
To summarise, scientific papers have different robustness, and even among the most robust ones, we can have false positives. Then, how do we know what we should take into account in the current scientific knowledge?
Well, a single paper is almost never significant on its own. It makes sense when combined while the whole literature on the subjects.
Some papers go in the same direction in their conclusion, others in another direction. It is when a sufficient amount of papers with robust studies goes in the same direction that we finally a scientific consensus. This is why sometimes, on a new subject, a theory or result is first accepted but as the research goes on, it can be refuted by other results.
So remember it the next time you find a paper enlightening a groundbreaking discovery. Other research projects might need to back it up by corroborating its results.
After all, science is a collective effort and a single scientist or team of scientist doesn’t retain the whole truth.
 Çaparlar CÖ, Dönmez A; “What is Scientific Research and How Can it be Done?”; Turk J Anaesthesiol Reanim. 2016;44(4):212-218; doi:10.5152/TJAR.2016.34711
 Researching in the Social Sciences – Conducting Research – Research Guides at Washington University in St. Louis, https://libguides.wustl.edu/c.php?g=47166&p=2848790
 Ronald L. Wasserstein & Nicole A. Lazar (2016); “The ASA Statement on p-Values: Context, Process, and Purpose”; The American Statistician, 70:2, 129-133; DOI: 10.1080/00031305.2016.1154108