Long paper review scores across areas:
|Area||Ave before response||Ave after response||Min||Max|
|Vision, Robotics and Other Grounding||3.83||3.65||2||5|
|Tagging, Chunking, Syntax and Parsing||3.55||3.56||1.67||5|
|Theory and Formalisms||3.37||3.38||2||4.67|
|Dialogue and Interactive Systems||3.36||3.41||2||5|
|Social Media and Computational Social Science||3.33||3.32||1.33||4.67|
|Machine Learning for NLP||3.31||3.41||1.33||5|
|Discourse and Pragmatics||3.27||3.60||1.33||4.67|
|Phonology, Morphology and Word Segmentation||3.26||3.24||1.33||5|
|Cognitive Modeling and Psycholinguistics||3.24||3.86||1||4.67|
Heng has been mainly working on the IE area and always thinking that IE reviewers are harsh, e.g., they normally don’t nominate awards from IE area. The above table changed her impression positively.
Long paper review scores comparison across years:
|Score||NAACL-HLT 2013 (Daumé, 2013)||NAACL-HLT 2018|
From the scores it looks like the reviews are more harsh than those from five years ago. However we have a much larger and younger reviewer pool this year.
Did Author Response Help?
|Score||Before Response||After Response|
From the changes of score distributions we can see more reviews were changed to a medium score 3. 38 reviews increased scores, and 30 reviews decreased scores.
Generally speaking, reviews were harsh
Very few papers got Best Paper nominations from reviewers, while area chairs identified some excellent submissions for nominations.
Some reviews are too generic, e.g., “the method is more complicated than previous methods [without a concrete list of methods referred]”, “i really like the paper [without explaining merits]”. The PC chairs and area chairs urged these reviewers to refine their comments to make them more informative and constructive.
We could all be nicer. Authors really don’t have to criticize all previous papers in order to make their ideas outstanding; reviewers really don’t have to give harsh comments just because the authors did not cite reviewers’ own (sometimes very irrelevant) papers:-).
12 thoughts on “Analysis of Long Paper Reviews”
The Semantics track has a max score of 5.58 but 0% gave a 6 after author response. Just wondering if it’s due to rounding error?
The first table shows scores before author response/review discussion. I tried to make it clearer. Would you rather see a table that includes scores after response instead? Thanks.
Could we also have average scores after author’s response?
added. Please advise any other type of analysis you would like to see.:) Thanks.
Would it be possible to see a table that includes scores (max) after response?
I have not got any “author response” for my paper!why?
if you are talking about the paper you are reviewing, then that means the authors of the paper did not submit any response. If you are talking about the paper you submitted as an author, you submit (instead of getting) author response by yourself.
When will we get the notifications?
after a couple of hours today:)
The review form was a pain. I dreaded filling it out, and I had a very difficult time discerning the intent of the reviewers during the rebuttal period. I hope this form is never used again.
The feedback has been summarized in the general chair’s blog: https://naacl2018.wordpress.com/2018/02/03/new-review-form-draws-widely-varying-opinions/ No more discussions are needed here. We will conduct a comprehensive survey after the review process is done for short papers to formally assess the new review form.