Analysis of Long Paper Reviews

Long paper review scores across areas:

 

Area Ave before response Ave after response Min Max
Speech 3.9 3.75 2.5 5
Vision, Robotics and Other Grounding 3.83 3.65 2 5
Tagging, Chunking, Syntax and Parsing 3.55 3.56 1.67 5
Machine Translation 3.37 3.34 1.33 4.67
Theory and Formalisms 3.37 3.38 2 4.67
Dialogue and Interactive Systems 3.36 3.41 2 5
NLP Applications 3.36 3.33 1.61 4.96
Information Extraction 3.35 3.54 1.67 5
Sentiment Analysis 3.33 3.27 1.5 5
Social Media and Computational Social Science 3.33 3.32 1.33 4.67
Summarization 3.33 3.40 2 4.67
Machine Learning for NLP 3.31 3.41 1.33 5
Generation 3.3 3.41 1.67 4.67
Discourse and Pragmatics 3.27 3.60 1.33 4.67
Phonology, Morphology and Word Segmentation 3.26 3.24 1.33 5
Semantics 3.25 3.27 1.44 5.58
Cognitive Modeling and Psycholinguistics 3.24 3.86 1 4.67
Question Answering 3.16 3.00 1.67 4.33
Text Mining 3.05 3.08 1.38 4.88

Heng has been mainly working on the IE area and always thinking that IE reviewers are harsh, e.g., they normally don’t nominate awards from IE area. The above table changed her impression positively. 

Long paper review scores comparison across years:

Score NAACL-HLT 2013 (Daumé, 2013) NAACL-HLT 2018
1 1%  4.1%
2 17%  20.4%
3 30%  25.1%
4 44%  35.4%
5 7%  15.0%
6  0%

From the scores it looks like the reviews are more harsh than those from five years ago. However we have a much larger and younger reviewer pool this year. 

Did Author Response Help?

Score Before Response After Response
1  4.3%  4.1%
2  22.9%  20.4%
3  21.6%  25.1%
4  36.5%  35.4%
5  14.7%  15.0%
6  0.11%  0%

From the changes of score distributions we can see more reviews were changed to a medium score 3. 38 reviews increased scores, and 30 reviews decreased scores.

Generally speaking, reviews were harsh

Very few papers got Best Paper nominations from reviewers, while area chairs identified some excellent submissions for nominations.

Some reviews are too generic, e.g., “the method is more complicated than previous methods [without a concrete list of methods referred]”, “i really like the paper [without explaining merits]”. The PC chairs and area chairs urged these reviewers to refine their comments to make them more informative and constructive.

We could all be nicer. Authors really don’t have to criticize all previous papers in order to make their ideas outstanding; reviewers really don’t have to give harsh comments just because the authors did not cite reviewers’ own (sometimes very irrelevant) papers:-).

12 thoughts on “Analysis of Long Paper Reviews

  1. The Semantics track has a max score of 5.58 but 0% gave a 6 after author response. Just wondering if it’s due to rounding error?

    Like

  2. The first table shows scores before author response/review discussion. I tried to make it clearer. Would you rather see a table that includes scores after response instead? Thanks.

    Like

    1. if you are talking about the paper you are reviewing, then that means the authors of the paper did not submit any response. If you are talking about the paper you submitted as an author, you submit (instead of getting) author response by yourself.

      Like

  3. The review form was a pain. I dreaded filling it out, and I had a very difficult time discerning the intent of the reviewers during the rebuttal period. I hope this form is never used again.

    Like

Leave a comment