THE NULL HYPOTHESIS IS ALWAYS REJECTED WITH STATISTICAL TRICKS: WHY DO YOU NEED IT?

Ferguson (2015) observed that the proportion of studies supporting the experimental hypothesis and rejecting the null hypothesis is very high. This paper argues that the reason for this scenario is that researchers in the behavioral sciences have learned that the null hypothesis can always be rejected if one knows the statistical tricks to reject it (e.g., the probability of rejecting the null hypothesis increases with p = 0.05 compare to p = 0.01). Examples of the advancement of science without the need to formulate the null hypothesis are also discussed, as well as alternatives to null hypothesis significance testing-NHST (e.g., effect sizes), and the importance to distinguish the statistical significance from the practical significance of results.


BRIEF HISTORY OF THE NULL HYPOTHESIS CONTROVERSY
observed that "the ritualization of null hypothesis significance testing [NHST] to the point of meaninglessness and beyond…has not only failed to support the advance of psychology as a science but also has seriously impeded it" (p. 997). In his article "Thing I Have Learned (So Far), " Cohen 1990) argued, "the null hypothesis is always false in the real world" (p. 1308, italics in original text). Cohen's arguments has been voiced for over 50 years (see Nix & Barnette, 1998, p. 4). For example ,Rozeboom,(960) is among the first scholars discussing the fallacy of the null hypothesis significance test (NHST). Four years later, Wilson and Miller (1964) discussed the inclusiveness of accepting the null hypothesis. In 1997, the Psychological Science journal devoted an entire issue to the controversy surrounding significance tests including a discussion on banning versus not banning the formulation of the null hypothesis and an emphasis on P values (Abelson, 1997a(Abelson, , 1997bHarris, 1997;Hunter, 1997;Shrout, 1997;Scarr, 1997). Harlow, Mulaik, and Steiger (1997) edited a book summarizing the controversy regarding the question "What if there were no significance tests." Among contributors, Schmidt and Hunter (1997) discussed eight false objections to the discontinuation of significance. For example, Smith and Hunter argued that it is not true that "significance tests are essential because without them we would not know whether a finding is real of just due to chance" (p. 3). In the same edited book by Harlow et al. (1997), Mulaik, Raju, and Harshman (1997) disagreed with Schmidt and Hunter (1997) and entitled their paper "There is a time and place for significance testing" (p. 65). Further discussions regarding criticism of statistical tests from 1940 to the present and proposals to band significant testing from 1990 to the present can be found in Chavalarias, Wallach, Li, and Loannidis (2016), Chow (1988), Gliner, Leech, Morgan (2002), Goodman (2008), Goodman & Royall (1988), Kline (2013), Kyriacou (2016), Nix andBarnette (1998), Spiegehalter et al. (2000), Stang et al ( 2010).

The Prevalence of the Null Hypothesis in Scientific Research
Despite the above controversy with emphasis on the role of hypothesis testing in scientific research, researchers still sense the demand to formulate hypothesis during the planning of the study. In this context, Cohen (1990) made the following point: "if the null hypothesis is always false, what's the big deal about rejecting it?"(p. 1308, italics added). The answer is that researchers are aware of the publication bias for significance (see Kline, 2013, p. 11), in the sense that the majority of editors and reviewers in peer-reviewed journals only agree to publish articles rejecting the null hypothesis (e.g., p < .05); negatives results are rarely published in such journals. For example, Ferguson (2015) observed, "the proportion of studies published in psychological science that support the authors' priori hypotheses appear to be unusually high" (p. 529). The reason for Ferguson's observation is that researchers know that null findings (e.g., p > .05) are not published in peer-reviewed journals that enforce the hypothesis testing approach (see Nix & Barnette, 1998, p. 4). In addition, when researchers apply for grant monies they know that they most formulate hypothesis and describe the methodology leading to statistical significance. In this approach to deal with research bias for significance and to ensure that a given grant application has a good chance to be approved (e.g., by the National Institute of Health -NIH) researchers must play what Kline (2013) termed "the significance game, which goes like this: Write application. Promise significance [e.g., p<.05, or better p<.001]. Get money, collect data until significance is found, which is virtually guaranteed because any effect that is not zero needs only a large enough sample in order to be significant" (p. 24). Another factor that explains the prevalence of the null hypothesis in scientific research is that researchers in need of peer-reviewed publications for their promotion across academic rank (e.g., assistant to associate professor) and tenure status feel the pressure "to convert no significant findings into statistically significant ones" (Ferguson, 2005, p. 530) because they know that if they do not appropriately respond to the publication bias for significance the study would not be published. Fulfilling this type of conversion, however, is not a major issue for researchers because they also know about statistical tricks they can use to ensure the null hypothesis is rejected. This is a point missed in Ferguson's (2015) paper. This paper suggests that the best approach against publication and research bias for significance and the demand to convert no significant findings into significant findings leading to the rejection of the null hypothesis is to be sure that statistical tricks are used. These tricks increase the chance the article is published in journals that only review and accept articles in which the assumptions of Type I and Type II errors are met (see Kline, 2013, p. 11;Nix & Barnette, 1998, pp. 5-6;Rodriguez Arias, 2005). In the "fight" between the null hypothesis and the experimental hypothesis, the task of researchers who belief that without hypotheses science cannot advance is to be sure that they do not incorrectly reject a null hypothesis that is true (Type I error) and that they do not fail to reject the null hypothesis when it is actually false (Type II error). Researchers, however, should not be afraid of such errors because, as noted above, Cohen (1990) alerted researches that the null hypothesis could always be falsified.

The Distinction Between Hypotheses in General and the Null Hypothesis
The next section regarding statistical tricks deals with the null hypothesis and not with the formulation of a general hypothesis. A general hypothesis does not need to deal with the debate surrounding significance testing (p values) and, consequently, researchers in this context do not need to be worried about using statistical tricks to ensure that they met the assumptions of Type I and Type II errors. For example, in the applied behavior analysis approach (Paniagua, 2001), the researcher would hypothesize that an emotionally disturbed child would show significant problem behaviors during the baseline (A) condition but would behaviorally improve during the introduction of the treatment (e.g., a token economy program) during the B phase in a reversal experimental design (Hersen & Barlow, 1976;Kazdin, 1980;Kratochwill & Levin, 2014). If during a return to the A phase (second baseline) the child again shows problems behaviors but again improves when phase B is re-introduced (in an A-B-A-B reversal design) the researcher would argue that, his/her prediction (hypothesis) regarding the effectiveness of the B phase was confirmed. In this example, the researcher does not use the significance testing approach to show that findings support the main prediction (hypothesis). The recording of the frequency of problems behaviors over several baseline sessions and the absence of such problems during the B phase are the two conditions in the reversal experimental design to conclude about the effectiveness of the B phase and the confirmation of the general hypothesis (Kazdin, 1980). When the hypothesis is formulated in a "null" condition, the research would test this hypothesis against the experimental hypothesis. In this case, the task of the researcher is to prove that the null hypothesis is wrong (i.e., Type I error = 0). In the above example, the researcher would include a control group (no B Phase) and an experimental group (B phase), and will hypothesized that experimental children would show more improvement in decreasing problem behaviors relative to children in the control group. In this example, the null hypothesis significance testing (NHST) approach would be used with the specific goal to show that the null hypothesis is false with the help of statistical tricks described below.
Examples of Statistical Tricks Selection of Alpha. If t = 1.920 and df = 11, the null hypothesis would not be rejected if alpha = 0.01, one-tailed test, critical value = 2.716. The trick here is to use alpha = 0.05 and df = 11 to assure a critical value of 1.796, one-tailed test. In this example, if alpha = 0.01 one needs at least a t = 3.106 to reject the null hypothesis. If the researcher fails to use this trick, the chance to publish research findings is near 0%, particularly in peer-reviewed journals that only accept positive findings (i.e., Type I error = 0%, see Cohen, 1994Cohen, , p. 1000. So, if the goal is to "convert no significant findings into significant ones" (Ferguson, 2015, p. 530), this goal may be achieved by changing alpha until statistically significant results are found.

One-Tailed versus Two-Tailed Test.
It is also more difficult to reject the null hypothesis in a two-tailed test. Therefore, the trick here is to avoid using a two-tailed test to reject the null hypothesis (see Kline, 2013, p. 71). For example, if the t value is 1.920, alpha = 0.05, and df = 11 and a two-tailed test is used, one would need at least a critical value of 2.201 to reject the null hypothesis. In contrast, with the same t value and similar alpha and df, and a one-tailed test a critical value of at least 1.796 would be needed to reject the null hypothesis. Therefore, if you conducted multiple experiments and feel with the "pressure" to report, only statistical significant results (see Ferguson, 2015, p. 530), you would convert your statistical analyses into a one-tailed test until you find the appropriate alpha to reject the null hypothesis. Researchers would not have problems implementing this trick if they know how "statistics can be potentially manipulated to produce statistically significant but absurd results" (Ferguson, 2015, p. 530; see also Simmons, Nelson, & Simmonsohn, 2011).
Sample Size, Effect Size, and Power. If the original experiment did not reject the null hypothesis, another trick is to repeat the same experiment with a larger sample. The assumption is that increasing the size of the sample increases the probability of rejecting the null hypothesis. As noted earlier, the rejection of the null hypothesis with statistical significant results "is virtually guaranteed because any effect that is not zero needs only a large enough sample in order to be significant" (Kline, 2013, p. 24). This trick would work if reviewers in peer-reviewed journals agree that the difference between group means is substantially large. If reviewers, however, determine that the difference between group means is trivial or very small the study may be rejected because it claimed Type 1 error = 0% with that trivial findings, regardless of the size of the sample. Under this peer-reviewed critique, the next trick is to show that the statistical test used to reject the null hypothesis had "power" (Cohen, 1988(Cohen, , 1990(Cohen, , 1994Lipzey, 1990;Sullivan & Feinn, 2012). For example, in order to use the "power" trick with a t-test conducted on two independent group means the researcher would determine the effect size index (known as Cohen's d, 1988) and then check power tables (e.g., Cohen, 1988) to find out if statistical test results correctly rejected the null hypothesis (i.e., the power of the test). In the case of the "power" of t = 1.920 derived from two independent group means and alpha = 0.05, Cohen (1988) recommends .20, .50, and .80 for small, medium, and large d, respectively. Therefore, reviewers in peer-reviewed journals would be happy that, despite the fact that the study claimed Type 1 error = 0% with trivial differences between groups means, the study also demonstrated the "power" of the statistical test used to reject the null hypothesis. Therefore, studies with large sample size and calculation of power increase the chance to be accepted in peer-reviewed journals enforcing the hypothesis testing approach.

Rejecting the Null Hypothesis is a Temporal Event
It is important to observe that the rejection of the null hypothesis is most likely independent researchers replicate a temporal event until the same study. In this context, Domenech (2018) observes that an uncertainty in the hypothesis testing approach is "the low probability to reproduce a P value after an exact replication of the [original] experiment" (p. 1184). For example, the open science collaboration group includes researchers from many academic settings and countries. In 2011, the Open Science Collaboration (2015) conducted a review of 100 replications of previously published studies. These studies were published in Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition. Among other results, 97% of original studies reported statistical significant results (or rejecting the null hypothesis that is actually false), but only 36% of replications of original studies reported statistical significant results. Because only about 1% of all published studies are replicated and published in peer-reviewed scientific journals (see Kline, 2013, p. 269), this means that 99% of original studies published in a given year report temporal significant statistical findings until such studies are replicated and published to show the stability of such finding over time.

Statistical Significance versus Practical Finding
Another important critique to the emphasis on the null hypothesis significance testing is that statistically significant results (i.e., rejecting a null hypothesis that is actually false) do not necessarily mean that such results have practical values in society (Gliner et al., 2002;Kirk, 1996; see also Kline, 2013, p. 10). For example, in a very-well planned study investigating the effect of Method A to teach English to Latino/a children versus the standard lecturing of this language, researchers find a significant statistical difference (p <.05) between both conditions and then suggest to school districts that Method A should be implemented in all schools to appropriately teach English. The costs to enforce Method A in all schools, however, may prevent such districts from following researchers' recommendation. The study, however, is published in a peer-reviewed journal because it rejected the null hypothesis and not because Method A is a practical strategy in teaching English.
The observation that statistically significant results (e.g., p<0.05, p<0.001) do not necessarily imply practical significance (Goodman, 2008) can also be applied in the case of effect size results. A given treatment for a health problem may result in a large effect size (e.g., .80) in terms of Cohen's (1988) recommendations, but without practical significance. For example, the treatment is too expensive to be implemented, and although it was very effective with a sample selected from the population it does not produce the expected results in the community or its effect cannot be generalized to the population of individuals diagnosed with that health problem. On the other hand, the effect size in a second experiment may be small (e.g., .20, in terms of Cohen's d calculation, 1988), but very-well received by the community because its implementation is in accord with the budget of the family dealing with that health problem or the clinic serving individuals with the same health problem. For example, Gliner et al. (2002) reported a study investigating the effects of aspirin on heart attacks. Subjects who took aspiring were less prompt to have a heart attack, in comparison with subjects who took a placebo. The effect size however was small (0.34). Gliner et al. (2002) concluded that "although this effect size is considered to be small, the practical importance was high, because of both the low cost of taking aspirin and the importance of reducing myocardial infarction" (p. 87).

Alternative to Null Hypothesis Testing
Some researchers suggest that an emphasis on p values should be replaced with an emphasis on effect sizes, confidence intervals , and Bayesian inductive reasoning (Abelson, 1997a(Abelson, , 1997bBerry, Coustere-Yakir, and Grover , 1998;Burton, Gurrin, & Campbell, 1998;Chavalarias, Wallach, Li, & Loannidis, 2016;Erceg-Hurn & Mirosevich, 2008;Kline, 2013;Kyriacou, 2016;Spiegelhalter, Myless,, Jones, & Abrams, 2000;Stang, Poole, Kuss, 2010;Sullivan & Feinn, 2012). The emphasis on effects size is supported by the APA Publication Manual when it states, "for the reader to appreciate the magnitude or importance of a study finding, it is almost always necessary to include some measure of effect size (APA, 2010, p. 34). In 1999, the American Psychological Association Task Force on Statistical Inference considered a ban on the use of null hypothesis significance testing (NHST; Wilkinson & the APA Task Force on Statistical Inference, 1999), but strong opposition from researchers prevented the enforcement of such a ban. This task force, however, recommended that researchers should "always provide some effect size estimate when reporting a p value" (p. 399). Although researchers can calculate effect sizes in most studies, Kline (2013) observed that it might be very difficult to calculate effects sizes in some research activities "such as when the scores are ranks or are presented in complex hierarchically structured designs (p. 14). Gliner et al. (2002) observe that some researchers propose to replace NHST with an emphasis on confidence intervals because "confidence intervals provide more information than a significance test and still include information necessary to determine statistical significance" (p. 84; see also APA, 2010, p. 34). Other researchers suggest, "both significance testing and confidence interval estimation can serve and have served very useful functions for the analysis of public health and biomedical data" (Woolson & Kleinman, 1989, p.423). Abelson (1997a), however, suggested that confidential intervals are a good but not perfect alternative" (p. 119). In another article, Abelson (1997b) suggests that confidence intervals are a good idea, but not a cure-all" (p. 13), and then observed, "despite the benefits of confidence limits [intervals], we will not solve all [NHST] problems by this one stroke. In seeing whether the confidence limits [intervals] include the zero point, some troublemaker will proceed to fatten his list of systematic results by using 93% confidence limits [intervals] instead of 95% limits. This is equivalent to using the 0.7 level instead of .05. Indeed, under the Law of Diffusion of Idiocy, every foolish application of significance testing will beget a corresponding foolish practice for confidence limits" (p. 13). Kyriacou (2016) observes that "Bayesian inductive reasoning is the ability to quantify the amount of certainty in terms of known or estimated conditional probabilities based on information obtained and included in Bayesian calculations" (p. 114). The major problem or limitations with this approach "is that prior information is often unknown or not precisely quantified, making the calculation of posterior probabilities potentially inaccurate" (Kyriacou, 2016, p. 114).

The Advancement of Science Without the Null Hypothesis and
Significance Testing As noted earlier, Schmidt and Hunter (1997) discussed the false argument regarding that if we do not use the null hypothesis significance testing (NHST) approach "we would no longer have a science" (p. 3). Schmidt and Hunter (1997) observe, "most researchers in the physical sciences [e.g., physics, astronomy, chemistry] regard reliance on significance test as unscientific" (p. 7, italics added). In the physical sciences, researchers do formulate general hypothesis but they do not emphasize significance tests with emphasis on P values and are not worried about Type I and Type II errors (see above discussion regarding hypotheses in the general sense versus the null hypothesis). In such sciences, hypotheses are tested via direct observations of the event under study and the variables that are influencing that particular event. Schmidt and Hunter (1997) illustrated this point with Einstein's general theory of relativity which predicted (hypothesized) that if "light passes a massive body [like the sun], it would bend" (p. 7). In 1919, Sir Arthur Eddington photographed a total eclipse of the sun and "measured the amount of bending in light produced by its passing the sun…the measured amount of bending corresponded to the figure predicted by Einstein's general theory, and so the hypothesis was confirmed… [and] no significance tests were used" (Schmidt & Hunter, 1997, p. 7). Because in this example a null hypothesis was not formulated, researchers were not worried about rejecting it with statistical tricks described above.
In the context of behavioral sciences, perhaps the best example of the advancement of science without the need to emphasize the null hypothesis significance testing (NHST) is the special branch of experimental psychology the late Harvard University professor B. F. Skinner termed the Experimental Analysis of Behavior (Catania, 1984). This experimental approach is also termed operant conditioning because Skinner's interest was the study of behavior "defined by its consequences" (Skinner, 1969, p. 127) rather than an emphasis on responses termed "reflexes" in the classical conditioning paradigm (Kuhn, 1962) also known as Pavlovian conditioning (Catania, 1984;Paniagua, 2001). Skinner used pigeons and white rats as experimental subjects, and demonstrated that organisms could learn and maintain over time complex behaviors with the experimental manipulation of antecedents and consequences. Skinner and his students (e.g., Nathan Azrin, Charles Catania, and Charles Fester, among others) developed this experimental approach without both the formulation of the hypothesis null and the NHST approach (Fester & Skinner, 1957;Skinner, 1938, Skinner, 1961. These researchers also created their own peer-reviewed journal known as Journal of the Experimental Analysis of Behavior (JEAB) because they could not find journals at that time interested in publishing articles without the formulation of hypotheses. A summary of Skinner's contributions to experimental psychology can be found in Paniagua (2001, pp. 33-38).
In JEAB, the emphasis was on basic research or experiments leading to the discovered of new principles, techniques, methods to explain the development of new behaviors and how to maintain and generalize them over time. The application of Skinner's basic research findings "in the functional analysis and assessment of adaptive and maladaptive behavior among people resulted in a new field called Applied Behavior Analysis [ABA] or Behavior Modification" (Paniagua, 2001, p. 37, italics added; see also Paniagua, 2018;Cooper, Heron, & Heward, 2007). Similar to the Experimental Analysis of Behavior approach, research in the field of Applied Behavior Analysis also conducted applied research without an emphasis on the null hypothesis significance testing (NHST) approach. Therefore, some early applications of Skinner's basic research findings were published in JEAB (e.g., Ayllon & Michael, 1959), but this journal was exclusively devoted to the publication of basic research and not applied research with emphasis on the ABA approach. During early applications of Skinner's basic research findings, some behavior analysts were lucky enough to publish their applied research findings in non-Skinnerian journals. For example, Fuller (1949) published a paper in the American Journal of Psychology entitled "operant conditioning of a vegetarian human organism." Williams (1959) use the extinction technique (developed in operant basic research) to eliminate tantrum behavior and the study was published in the Journal of Abnormal Social Psychology. Brady and Lind (1961) published an article in the Archives of General Psychology demonstrating the role of operant conditioning techniques in the management of hysterical blindness.
Over time, however, applied behavior analysts encountered significant problems publishing their applied research findings with emphasis on Skinner's methodology because they did not formulate hypothesis, did not consider the NHST approach in the analysis of results, and did not emphasize between-group experimental designs (i.e., control versus experimental subjects; see Paniagua, 20001, p. 37). Like the case with Skinnerian basic research, applied behavior analysts investigate the effectiveness of the particular applied behavior analysis treatment or intervention (e.g., token economy program, extinction technique, differential reinforcement of incompatible behavior, overcorrection technique, etc., see Paniagua, 2018, pp. 83-95) with a single subject and the results are analyzed with the so called singlecase research designs or intrasubject-replication designs including, for example, reversal designs (A = baseline-B=intervention/treatment-A= a return to baseline), multiple-baseline designs across subjects, behaviors, or settings (Hersen & Barlow, 1976;Kazdin, 1980;Kratochwill & Levin, 2014;Paniagua, 2018, pp. 95-100), and multiple-baseline designs across exemplars' (Paniagua, 1990a). Therefore, in 1968 applied behavior analysts also created their own journal to be able to publish articles without null hypothesis and significance testing: Journal of Applied Behavior Analysis (JABA). A review of articles published in JABA shows the significant scientific contributions of such articles in psychology, and without the need to be worried about Type I and Type II errors in the null hypothesis significance testing approach (e.g., Chapman, Fisher , Piazza , & Kurtz, 1993;Derby, Hagopian , Fisher , Richman , Augustine, Fahs , & Thompson , 2000;Ellingson, Miltenberger, Stricker, Garlinghouse , Roberts, Galensky , & Rapp, 2000;Hanley, Iwata, & McCord, 2003;Iwata, Dorsey , Slifer,, Bauman, & Richman, 1994).
Examples of the author's scientific contribution with emphasis on the applied behavior analysis approach and without the formulation of hypotheses but with an emphasis on the Skinnerian paradigm (Kuhn, 1962) and single-case research designs can be found in Paniagua (1987Paniagua ( , 1990bPaniagua ( , 2001. Additional examples of scientific contributions in psychology outside the Skinnerian experimental approach and without an emphasis on the null hypothesis significance testing approach can be found in Dale, Pierre-Louis, Bogart, O'Cleirigh, and Safren (2018), Paniagua, Black, and Gallaway (2016), Vartanian, Keman, and Wansink (2016), Widman, Choukas-Bradley, Noar, et al. (2016),

Conclusion
Despite the fact that researchers know in advance that they do not need the null hypothesis because they know they are going to reject it with statistical tricks, reviewers in most peer-reviewed journals want researchers to reject it anyway if the study is going to be published. Studies that report the "power" trick increase the chance to be accepted in peer-reviewed journals enforcing the hypothesis testing approach. Researchers, however, should not feel "guilty" rejecting the null hypothesis because they are aware of Cohen's (1990) observation in that the "null hypothesis is always false (p. 1308; see also Cohen, 1994Cohen, , p. 1000, but only if one knows the tricks to reject it. The controversy with emphasis the null hypothesis significance testing continues to be a major topic, particularly in the behavioral sciences. This topic, however, is not generally of importance in the physical sciences (Schmidt and Hunter (1997). In the behavioral sciences (e.g., anthropology, economics, political science, Psychology, social work, sociology), students in undergraduate and graduate programs are told about the need for them to consider the hypothesis testing approach, particularly in their thesis and dissertations, but they are not generally informed about that controversy and that they could make significant contribution to the science of psychology without formulating hypothesis (e.g., the applied behavior analysis approach). For example, Gliner et al. (2002) reviewed six general graduatelevel textbooks and six-graduate-level textbooks in statistics. A major finding was "the failure of most of all these [textbooks] to acknowledge that there is a controversy surrounding [null hypothesis significance testing]" (p. 90).
The good news for researchers in the behavioral sciences is that we already have evidence concerning that editors of some peer-reviewed journals are accepting articles in which the null hypothesis is not rejected (Kyriacou, 2016;Spiegelhalter et al., 2000). For example, in a total of 796 abstracts and 99 full-text articles reporting empirical data Chavalarias et al. (2016) found that P values were reported in only 15.7% and 55% , respectively (see Kyriacou, 2016, p. 113). These findings mean that in most of these publications the hypothesis testing approach was not emphasized. In addition, The Journal of Articles in Support of the Null Hypothesis was created in response to journals and reviewers with a bias against articles that do not reject the null hypothesis. This journal is an outlet for researchers to be able to publish their empirical data without reaching traditional significance levels (e.g., p <.05). The website to submit articles to this journal is http://www.jasnh.com.
For students in psychology and other behavioral sciences the present discussion should help them to encourage their professors of statistics and experimental designs to include in their courses the historical and contemporary debates surrounding the formulation of hypotheses and the emphasis on the null hypothesis significance testing approach (see Kline, 2013, pp. 20-25;Nix & Barnette, pp. 4-5). This article should also help students in behavioral sciences courses to ask their professors two important questions: "Can we advance our science without the need to formulate the null hypothesis against the experimental hypothesis? Moreover, "Why do we need to formulate the null hypothesis if it can always be falsified with statistical tricks?"