Reproducibility Criteria

Reviewers will be asked to assess the reproducibility of the work as part of their reviews. The following are the criteria that reviews will take under consideration.

For all reported experimental results:

  • A clear description of the mathematical setting, algorithm, and/or model.
  • Submission of a zip file containing source code, with specification of all dependencies, including external libraries, or a link to such resources (while still anonymized)
  • Description of computing infrastructure used
  • The average runtime for each model or algorithm (e.g., training, inference, etc.), or estimated energy cost
  • Number of parameters in each model
  • Corresponding validation performance for each reported test result
  • Explanation of evaluation metrics used, with links to code

For all experiments with hyperparameter search:

  • The exact number of training and evaluation runs
  • Bounds for each hyperparameter
  • Hyperparameter configurations for best-performing models
  • Number of hyperparameter search trials
  • The method of choosing hyperparameter values (e.g., uniform sampling, manual tuning, etc.) and the criterion used to select among them (e.g., accuracy)
  • Summary statistics of the results (e.g., mean, variance, error bars, etc.)

For all datasets used:

  • Relevant details such as languages, and number of examples and label distributions
  • Details of train/validation/test splits
  • Explanation of any data that were excluded, and all pre-processing steps
  • A zip file containing data or link to a downloadable version of the data
  • For new data collected, a complete description of the data collection process, such as instructions to annotators and methods for quality control.

This list is based on Dodge et al, 2019 and Joelle Pineau’s reproducibility checklist.