A Review Form FAQ

To help you navigate and understand the new review form, Amanda and Heng (the PC chairs) have each reviewed an already published paper. In this post, we provide our reviews. At the bottom there is a FAQ.

Recall that our main goals with this new form are to get better quality reviews and put together a diverse program of papers.

If you read nothing else in this post, think about those two goals, and consider specifically that we want reviewers to think about submissions in terms of:

  • If this submission is publishable in its present form, in what ways do I want to advocate for it to my fellow reviewers, the area chairs and program chairs?
  • What constructive feedback can I give to the authors so that any final published paper (here or in another venue) is the best it can be?

If you have already written your review without looking at the review form, we suggest you do the following:

  • Enter your review text in the text area towards the bottom of the new review form, under “Additional Comments”
  • Go back to the top, and answer the non-free-text questions
  • (Optional, but preferred) For each type of contribution for which you have chosen more than a “1” (no contribution), provide a short bulleted list summary of strengths and weaknesses
  • Provide a short list of contributions under “Contribution Summary” – we will provide indices to the published papers using this response

Our Reviews

Amanda’s paper is one of her favorites, “A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar“, by Claire Gardent and Eric Kow, published in ACL 2007. It’s a more theoretical/formal paper.

Heng’s paper is one she published in ACL 2016, “Liberal Event Extraction and Event Schema Induction“, by Lifu Huang, Taylor Cassidy, Xiaocheng Feng, Heng Ji, Clare R. Voss, Jiawei Han and Avirup Sil. It’s a more empirical paper.

Review Question Amanda’s Review Heng’s Review
Consent to Use Reviews for Research
In coordination with the ACL 2018 program co-chairs, we plan to do some analytics on anonymized reviews and rebuttal statements, with the consent of the reviewers and authors. Our purpose is to improve the quality of the review process. The data will be compiled into a unique corpus for NLP, and will be made available to the research community after appropriate anonymization checks. We hope to provide data on “how to review” to younger researchers, and improve the transparency of the reviewing process in general.
By default, you agree that your anonymised review can be freely used for research purposes and published under an appropriate open-source license.
Check “No” if you would like to opt out of the data collection.
Yes!
 Yes!
Appropriateness (1-3)
Is this submission appropriate to the NAACL HLT 2018 XXX paper research track?NAACL HLT 2018 has a goal of a broad technical program. Relevant and appropriate submissions describe substantial, original, and unpublished research in any area of computational linguistics. Both empirical and theoretical results are welcome; see the Call for Papers.
3
 3
Contribution Type 1: NLP Tasks / Applications
Do the contributions of the submission include NLP tasks or applications?
A new NLP task might be a computational approach to a previously unstudied linguistic phenomenon. A new NLP application is a previously undescribed way to use NLP.
If the contributions of the work include NLP tasks or applications, provide a brief description and analysis of strengths and weaknesses:
no text
 no text
Impact of NLP Tasks / Applications (1-4)
How interesting and impactful might these tasks/applications be to the NAACL HLT community?
  • 4 = Very
  • 3 = Somewhat
  • 2 = Not at all
  • 1 = No contribution in tasks/applications
1
 1
Contribution Type 2: Methods
Do the contributions of the submission include original methods, e.g. new algorithms, for an existing or new task or application?
If the contributions of the work include methods, provide a brief description and analysis of strengths and weaknesses. Assess evaluation under “empirical results”:
Description of the method: A new algorithm for surface realization from feature-based TAG decorated with a mostly lexicalized semantics.
Strengths: (1) the algorithm is outlined in enough detail for replication. (2) a modification to the TAG to make the surface realizer deterministic is described.
Weaknesses: (1) there is no complexity analysis of the algorithm. (2) the part of the method that describes how to make the realizer deterministic is insufficiently motivated – why would I not want to overgenerate and rank (using eg contextual or discourse information)? (3) it would be nice if the grammar were supplied as supplemental materials; (4) the semantic representation is under-described in this paper.
 
This paper proposes a new event extraction framework which can extract events and discover event schema simultaneously, and it can be applied to any domain.
Strengths: (1) It proposes a brand new IE paradigm to perform extraction in a bottom-up discovery way instead of top-down classification which requires a substantial amount of training data; (2) It can also generate a schema customized for input corpus.
Weaknesses: The data set used for new domain (biomedical) is quite small. And only precision is reported.

 

Impact of Methods (1-4)
How useful/impactful might the methods be to the NAACL HLT community?
4 (because I know that this method was impactful in 2007.)
4
Contribution Type 3a: Theoretical or Algorithmic Results
Do the contributions of the submission include theoretical or algorithmic results?
A theoretical result may be fundamentally linguistic (e.g. a description or critique of an approach to syntax), or fundamentally computational (e.g. bounds on the performance of a method). An algorithmic result may be a formalizations of a machine learning or other algorithm pertinent to NLP.
If the work includes theoretical/algorithmic results, provide a brief description of these results and analysis of strengths and weaknesses:
 
no text
 
no text
 Impact of Theoretical / Algorithmic Results (1-4)
How interesting and impactful might these theoretical / algorithmic results be to the NAACL HLT community?
1
 1
Contribution Type 3b: Empirical Results
Do the contributions of the submission include empirical results?
An empirical result may include a corpus study, an evaluation or a controlled experiment done to test a hypothesis. For submissions presenting empirical results, the authors should describe the hypothesis being tested and adequately account for confounds.
If the work includes empirical results (such as evaluations, experiments or corpus analyses; this includes interesting negative results), provide a brief description of the hypothesis/es tested, and their novelty and substance:
 
hypothesis: no stated hypothesis, but I think the authors are testing the coverage and paraphrase spread in the test data
novelty and substance: the focus on paraphrase richness is quite interesting
 
The authors proposed and tested two hypotheses:
Hypothesis 1: Event triggers that occur in similar contexts and share the same sense tend to have similar types.
Hypothesis 2: Beyond the lexical semantics of a particular event trigger, its type is also dependent on its arguments and their roles, as well as other words contextually connected to the trigger
Their experiments well supported these two hypotheses.

 

 If the work includes empirical results, provide a brief description of the method for testing the hypothesis/es, and analysis of strengths and weaknesses (e.g. confounds, lack of error analysis):
 
Method: the authors parse input sentences to semantic representations, choose the best semantic representation from the parses, and then generate from representation. The test sentences were made to cover a range of grammatical variation.
Strengths: I like that the test set is chosen to cover a range of phenomena rather than as a “naturally occurring” corpus, for this task.
Weaknesses: (1) the authors don’t say how many of the input sentences were present (in whole or in part) in the output, or do any analyses of this kind; (2) it would be nice to see some examples of the paraphrases.
 Method: the authors used existing ACE data set which has ground truth event annotations for 33 types to compare their method with state-of-the-art supervised classification approaches. In the meanwhile, to demonstrate the portability of this framework they also created a new data set for biomedical domain.
Strengths: the experiments are thoroughly done on a wide range of diverse event types.
Weaknesses: the new biomedical event extraction data set is small. And the evaluation is done based on manual assessment on precision.
Impact of Empirical Results (1-4)
How interesting and impactful might these empirical results be to the NAACL HLT community?
 3
 3
 Contribution Type 4: Data / Resources
Do the contributions of the submission include data sets or resources?
A data set or resource may include a new corpus, new annotations on an existing corpus, a new knowledge base, a new language resource, etc. The data set or resource need not necessarily be one provided to the research community – for example, if it is proprietary or contains private data – although to the extent possible, researchers are encouraged to share data and resources in the interests of reproducible science.
If the contributions of the work include data/resources, provide a brief description and analysis of strengths and weaknesses:
 no text
 no text
Impact of Data / Resources (1-4)
How useful/impactful might the data/resources be to the NAACL HLT community?
 1
 1
Contribution Type 5: Software / Systems
Do the contributions of this work include software / systems?
The software or system need not necessarily be provided with the submission – for example, if it is proprietary – although to the extent possible, researchers are encouraged to share software and systems in the interests of reproducible science.
If the contributions of the work include software / systems, provide a brief description and analysis of strengths and weaknesses. Assess evaluation under “empirical results”:
 no text
Yes. Although the software was not released at the submission time. 
Impact of Software / System (1-4)
How useful/impactful might the software/systems be to the NAACL HLT community?
 1
 3
The resulting software will be very useful for event extraction for a new domain. Most of the existing tools can only be used for a small set of event types.

 

Contribution Type 6: Evaluation Methods / Metrics
Do the contributions of this work include evaluation methods / metrics?
If this is a contribution of the submission, it should be thoroughly motivated and described in the submission, and if possible a reference implementation should be provided.
If the contributions of the work include evaluation methods/metrics, provide a brief description and analysis of strengths and weaknesses:
 no text
no text
Impact of Evaluation Method / Metric (1-4)
How useful/impactful might the evaluation methods/metrics be to the NAACL HLT community?
 1
 1
Contribution Type 7: Other
If the work includes another type of contribution (an exceptional literature survey, a well argued editorial, etc.) provide a brief description and analysis of strengths and weaknesses:
 no text
no text
Impact of Other Contribution (1-4)
How impactful might this contribution be to the NAACL HLT community?
1
 1
Contributions: Summary
List up to three main contributions for the work in this submission. For example, “a new data set for multilingual part of speech tagging”, “an algorithm for inference in sequence tagging”, “a cross-corpus analysis of discourse cues”.
 – a new algorithm for surface realization from a TAG and a flat semantic representation
– a discussion of over-generation vs one-best realization
– an evaluation in terms of coverage of grammatical phenomena and paraphrasing “power”
– This paper proposes a new event extraction framework which can extract events and discover event schema simultaneously, and it can be applied to any domain.
– A systematic way to combine distributional semantics and symbolic semantics for IE.
Originality (1-5)
Considering your responses to the questions above, rate the originality of the work described in the submission.
  • 5 = Innovative: Highly original and significant new research topic, technique, methodology, or insight.
  • 4 = Creative: An intriguing problem, technique, or approach that is substantially different from previous research.
  • 3 = Respectable: A nice research contribution that represents a notable extension of prior approaches or methodologies.
  • 2 = Uninspiring: Obvious, or a minor improvement on familiar techniques.
  • 1 = Significant portions have actually been done before or done better.
 4
 4
Soundness/Correctness (1-5)
Considering your responses to the questions above, rate the soundness of the work described in the submission.
  • 5 = The approach is sound, and the claims are convincingly supported.
  • 4 = Generally solid, but there are some aspects of the approach or evaluation I am not sure about.
  • 3 = Fairly reasonable, but the main claims cannot be accepted based on the material provided.
  • 2 = Troublesome. Some interesting ideas, but the work needs better justification or evaluation.
  • 1 = Fatally flawed.
 4
 4
Substance (1-5)Considering your responses to the questions above, rate the completeness and substance of the work described in the submission.
  • 5 = Contains more ideas or results than most publications of this length at NAACL HLT.
  • 4 = Represents an appropriate amount of content for a NAACL HLT paper of this length (most submissions).
  • 3 = Leaves open one or two natural questions that could have been pursued within the paper.
  • 2 = Work in progress. There are enough good ideas, but perhaps not enough results yet.
  • 1 = Seems thin. Not enough ideas here.
 4
 4
Replicability (1-5)Considering your responses to the questions above, rate the reproducibility of the work described in the submission.Members of the ACL community…
  • 5 = could easily reproduce the results and verify the correctness of the results described here. Useful supporting dataset and/or software was provided.
  • 4 = could mostly reproduce the results described here, maybe by substituting public data for proprietary data.
  • 3 = could possibly reproduce the results described here with some difficulty. The settings of parameters are underspecified or very subjectively determined.
  • 2 = could not reproduce the results described here no matter how hard they tried.
  • 1 = not applicable (please use this very sparingly, such as for opinion pieces or applications).
 3
 3
Handling of Data / Resources
Does the submission document the appropriate handling of data, software and other resources, including licensing and citation, data protection and research review, as appropriate?
 n/a
 yes
Handling of Human Participants
If the work involves (data from) human participants, are methods for documenting informed consent and protecting participant anonymity described in the submission?
 n/a
 n/a
Provide any comments on handling of data here. If your response to either of the above questions is “No”, be sure to provide a brief justification:
 no text
All of the data sets used for the experiments are publicly released. So the handling of data is appropriate.
 
Meaningful Comparison (1-5)
Does the discussion of related and prior work motivate and support the main claims of the submission in an appropriate scholarly manner? Is it complete?
If you feel references are incomplete, be sure to include the relevant references in your comments.
  • 5 = Comparison to prior work is superbly carried out given the space constraints.
  • 4 = Comparisons are mostly solid, but there are some missing references.
  • 3 = Comparisons are weak, very hard to determine how it compares to previous work.
  • 2 = Only partial awareness or understanding of related work, or a flawed empirical comparison.
  • 1 = Little awareness of related work, or lacks necessary empirical comparison.
 5
 5
 Related Work: ACL Guidelines
Does the discussion of related and prior work adhere to the ACL author guidelines on citation (https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines)?
 Yes
 Yes
Provide any comments regarding the discussion of related and prior work:
 A really good and comprehensive discussion of related work
 Additional suggestion: some unsupervised script learning methods share similar idea on schema discovery, such as [Chambers et al., 2009]. It would be good if the authors could add some comparison and discussion with these methods.
 Readability (1-5)
For a reasonably well-prepared reader, is it clear what was done and why? Is the paper well-written and well-structured?
  • 5 = Very clear.
  • 4 = Understandable by most readers.
  • 3 = Mostly understandable with some effort.
  • 2 = Important questions were hard to resolve even with effort.
  • 1 = Much of the paper is confusing.
 4
  4
 NAACL Guidelines
Does the submission adhere to the NAACL HLT 2018 format and style guidelines?
 No (but it wasn’t actually submitted this year)
 Yes
Provide any comments to the authors on readability, style and format of the submission:
 no text
 no text
 ACL Guidelines
Does the submission adhere to the ACL author guidelines on preserving double blind review (https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines)?
 Yes
 Yes
 Identify Authors? (1-3)
Do you think you could identify authors of this submission? If so, how? How did it affect your reviewing?
  • 3 = I have no idea who the authors are.
  • 2 = I could guess, but the submission itself doesn’t communicate who the authors are.
  • 1 = Sure, I know who the authors are because I’ve seen this work as a preprint or the paper or supplementary material reveals it.
 3
 1
(thanks the area chair Jon May for reminding me I was a co-author:-)
 
If you think you could identify the authors, how were you able to do so? How did it affect your review?
      If you could not identify the authors of this submission, write “N/A” in the text box.
 no text
 no text
 Overall Score (1-6)
Based on your review of this submission, should it be accepted to the NAACL HLT 2018 research track? In deciding on your ultimate recommendation, please think over all your responses above. We want a conference full of creative, original, sound and timely work. Prefer work that is inventive and will stimulate new approaches over work that is solid but incremental. Remember also that the author has about a month to address reviewer comments before the camera-ready deadline.
  • 6 = Transformative: This paper is likely to change our field.
  • 5 = Exciting: The work presented in this submission includes original, creative contributions, the methods are solid, and the paper is well written.
  • 4 = Interesting: The work described in this submission is original and basically sound, but there are a few problems with the method or paper.
  • 3 = Uninspiring: The work in this submission lacks creativity or originality. I’m ambivalent about this one.
  • 2 = Borderline: This submission has some merits but there are significant issues with respect to originality, soundness, replicability or substance, readability, etc.
  • 1 = Poor: I cannot find any reason for this submission to be accepted.
 5
 5
 Reviewer Confidence (1-5)
How confident are you about your review?
  • 5 = Positive that my evaluation is correct. I read the paper very carefully and I am very familiar with related work.
  • 4 = Quite sure. I tried to check the important points carefully. It’s unlikely, though conceivable, that I missed something that should affect my ratings.
  • 3 = Pretty sure, but there’s a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper’s details, e.g., the math, experimental design, or novelty.
  • 2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn’t understand some central points, or can’t be sure about the novelty of the work.
  • 1 = Not my area, or paper was hard for me to understand. My evaluation is just an educated guess.
 4
 5
 Presentation Format
Papers at NAACL HLT can be presented either as poster or as oral presentation, depending on what is most likely to be beneficial to convey its ideas to its audience. If this paper were selected for presentation, which form of presentation would you find more appropriate?

Note that the decisions as to which papers will be presented orally and which as posters will be based on the nature rather than on the quality of the work. There will be no distinction in the proceedings between papers presented orally and those presented as posters.

 Oral
 Oral
 Recommendation for “Best of” Paper Consideration
Choose ‘Yes’ to indicate that this paper is likely to be a “top 5%” paper at NAACL HLT 2018.
 No (but it wasn’t submitted this year!)
 Yes (but it wasn’t submitted this year!)
 Reviewer Guidelines
By checking “Yes”, you certify that you have followed the ACL reviewer guidelines (https://www.aclweb.org/adminwiki/index.php?title=ACL_Reviewer_Guidelines).
 Yes
 Yes
 Additional Comments
      The box below can be used to ask specific questions of the authors, to augment your review comments above.
 Can you provide more information about the semantic representation?
Can you give some indication of computational complexity of the algorithm in the overgenerate case and in the don’t-overgenerate case?
Section 2.2: it would be useful to report the upper-bound recall of this relation selection approach for argument identification.Section 2.6: An alternative event type and argument role naming approach could be grounding the representation of each event mention into an existing event ontology which includes a wide range of event types (e.g., FrameNet). In this way you could also take advantage of the detailed event structures.
Section 3.6: The biomedical domain event extraction experiment is very interesting. But the authors only reported precision. Moreover there are many human constructed biomedical ontologies available. It would be interesting to leverage them to improve both of the event mention representation and schema induction.
 Author Response
I  have taken the author response into consideration in my review (not for your initial review, of course, so the initial setting should be “No”). You may not necessarily change your review after the author response but say “Yes” if you have read the author response and considered whether to change your review.
 No
 No

FAQ

Why do you have all these types of contribution that I have to consider?

Fundamentally, a paper is publishable if contributes to the scientific body of knowledge in some way or another. Computational linguistics is a field that employs a variety of methods, so we have to have some way to organize papers that encourages a diversity of contribution types. COLING 2018 is using different submission forms for different contribution types, as was done in (Amanda thinks) ACL 2011. We notice that many submissions may not fall neatly into one contribution type.

Also, we want reviewers to think about submissions in terms of:

  • If this submission is publishable in its present form, in what ways do I want to advocate for it to my fellow reviewers, the area chairs and program chairs?
  • What constructive feedback can I give to the authors so that any final published paper (here or in another venue) is the best it can be?

Will every paper have many contribution types?

No. A typical paper will have one to three. For the contribution types that do not apply, choose 1 for the Impact question and skip the text field(s).

What will you do with all the different contribution types?

Authors will see the contributions you outlined, during author response (for long paper submissions). They can compare these with what they think the contributions of their paper are, and use your feedback to improve their paper.

Reviewers can advocate for a paper to be accepted based on likely impact of contribution of one or more types, even if it has other weaknesses.

Area chairs and program chairs can sort the submissions by contribution type and impact, and use this information to lobby for a paper to be accepted based on likely impact of contributions, even if it has other weaknesses.

We plan to provide indices to the contributions by contribution type and contribution text (provided by authors and reviewers), so that conference attendees have additional information for finding papers they want to see/read.

We encourage the authors and reviewers of accepted papers to volunteer donating their input dimensions of the paper (e.g., contributions) and reviews. This corpus itself will be a great resource  for CL research on scientific literature.

What is the difference between methods and theoretical/algorithmic results?

Methods: Many CL papers describe novel methods – a new algorithm, a new set of heuristics, etc.

Results: Far fewer CL papers attempt any analysis of the complexity, coverage, etc. of any algorithms presented. And relatively few CL papers at present (more in the past) present results for linguistic phenomena or formalisms.

What if I already wrote my review?

Put it in the additional comments section at the bottom, then go to the top and look at the other questions.

Note that your review cannot be submitted until you have provided an answer to each radio button / drop down list question.

Do I have to delete text from free text fields I don’t want to answer?

No. If you don’t want to answer a free text field, you don’t have to do anything to it.

Why are there extra questions for empirical results?

If we are reporting numbers that involve a comparison against a baseline, or a comparison of two methods, then we are (implicitly or explicitly) testing a hypothesis. Bonnie Webber suggested that we become more rigorous in our assessment of these scientific claims.

Why do you ask about data handling?

Increasingly, reviewers value work that is reproducible – that means using or providing data (and software) that is available to the whole research community. If a researcher provides data or resources to the research community, then the research community needs to know that the contributors have the right to release the data and have considered questions of privacy and consent.

Why do we need a new review form?

NLP is growing very fast. People complain about poor review quality and reviewer overload. We can only fix reviewer overload by having a more diverse reviewer pool – including “not the usual suspects”. That however risks further reducing review quality – unless we give “not the usual suspects” a more structured review form. One could argue about what that structure should look like, and in fact this year there will be three variants: ours, which asks for comment by contribution type, the ACL one, which asks for comments by strengths and weaknesses and questions, and the COLING one, which has different questions for different types of paper.
Also, the new review form is designed to be used by the area chairs to be more editorial in their selection of papers – in particular, to help with the vast middle of papers where it has traditionally been very unclear what to do.
For more on this, see here and here.

4 thoughts on “A Review Form FAQ

    1. Since the template was unclear, we have allowed this. As long as the main body of the paper fits within the limits, and the paper can be understood on its own (without reading the appendix).

      Like

  1. Can I see other reviews of the same paper that I have reviewed?
    In addition, can I have a review/response to my review, either from the author or the committee?

    The goal here is to improve the quality of my review in the future.

    Like

Leave a comment