Tool Competitions

Introduction

NLP-based approaches and tools have been proposed to improve the efficiency of software engineers, processes, and products, by automatically processing natural language artifacts (issues, emails, commits, etc.).

We believe that the availability of accurate tools is becoming increasingly necessary to improve Software Engineering (SE) processes. Two important processes are (i) code comment classification where developers have to understand, classify, prioritize, assign, etc. incoming issues and code comments reported by end-users and developers and (ii) skill classification where developers build and assess multi‑label classifiers that predict, for each issue, the set of domains and sub‑domains representing the skills required to solve it.

We are pleased to announce the fifth edition of the NLBSE'26 tool competition on code comment classification and skill classification; two important tasks in issue and code comment management and prioritization.

You are invited to participate in one or both tool competitions.

Code Comment Classification Competition

Overview

The code comment classification competition consists of building and testing a set of three multi-label classification models for class comments in each of target programming languages (Java, Python, Pharo).

Dataset

We provide a dataset of 9,361 code comment sentences belonging to 18 categories (7 for Java, 5 for Python, and 6 for Pharo), and three baseline classifiers based on Sentence Transformers (SetFit).

Requirements

You must train, tune, and evaluate your models on the provided data. We are looking forward to the solutions that outperform our baseline models.

Detailed instructions about the competition (data, rules, baseline, results, etc.) can be found in our (GitHub Repository) and a (Google Colab Notebook).

Competition Organizers

The code comment classification competition is organized by: Moritz Mock (momock@unibz.it), Pooja Rani (rani@ifi.uzh.ch)

Participation Requirements

To participate in the competition, you must train, tune and evaluate your models using the provided training and test sets.

Additionally, you must write a paper (2-4 pages) describing:

The architecture and details of the classification models
The procedure used to pre-process the data
The procedure used to tune the classifiers on the training sets
The results of your classifiers on the test sets
A link to the code/tool with proper documentation on how to run it and replicate the results

Submit the paper by the deadline using our submission form. All submissions must conform to the (ICSE'26 formatting and submission instructions) and do not need to be double-blinded.

Submission Acceptance

Submissions will be evaluated and accepted based on correctness and reproducibility, defined by the following criteria:

Clarity and detail of the paper content
Availability of the code/tool, including the training/tuning/evaluation pipeline, released as open-source
Correct training/tuning/evaluation of your code/tool on the provided data
Correct report of the metrics and results
Clarity of the code documentation

We will use a formula to rank the competition submissions and determine a winner, see details in the Google Colab notebook.

The accepted submissions will be published at the workshop proceedings.

Skill Classification Competition

Competition Overview

The 2026 competition consists of building and assessing multi-label classifiers that predict, for each issue, the set of domains and sub-domains representing the skills required to solve it.

Dataset

We release a dataset mined from 7,245 merged pull requests across 11 popular Java repositories (57,206 source files; 59,644 methods; 13,097 classes) annotated with 217 skill labels composed by domain/sub-domains. The full corpus (inputs and gold-labels) is shipped as an SQLite database (skillscope_data.db) for easy reproducibility.

Ready Made Tables:

Inside the database you will find a table named nlbse_tool_competition_data_by_issue that joins each pull request's textual and code-context features with its canonical domain/sub-domain labels per issue. There is also a view vw_nlbse_tool_competition_data_by_file that labels each filename/function associated with each issue.

The nlbse_tool_competition_data_by_issue table contains a column for each domain and subdomain with an integer count of the number of matching APIs found for that issue. A value greater than zero indicates that domain/subdomain is present in the issue.
The vw_nlbse_tool_competition_data_by_file is present for convenience purposes only and will not be used in evaluation of the model.

The dataset can be found on huggingface.

Additional Data Usage

Your model is allowed to use input data that is outside what is given in the database. You may use additional GitHub APIs to fetch more metadata, or download relevant files in an issue for further analysis. However, you must not use any third-party classification engine or the outputs present in skillscope_data.db as direct inputs to the model.

Baselines

Your models will be compared against the SkillScope Random-Forest + TF-IDF baselines reported in the paper by evaluating the overall prediction metrics against the issue classifications as recorded in the nlbse_tool_competition_data_by_issue table. The models you create should return a multi-label classification encoded in an one-hot encoded vector. However, it is the metrics of your models which will be the subject to the evaluation instead of the specific model output.

Goal

Train, tune, and evaluate your models and improve at least one of precision, recall, or micro-F1 while not decreasing the remaining metrics relative to the SkillScope baseline.

Competition Organizers

The skill classification competition is organized by: Fabio Santos (fabio.deabreusantos), Benjamin Carter (benjamincarter@ucsb.edu), and Jacob Penney (jmp458@nau.edu).

Submission Acceptance & Competition

Submissions will first be filtered to ensure they do not lower any of the three core metrics (precision, recall, micro-F1) compared with the baseline. Among qualifying submissions, ranking is determined by the largest positive improvement in micro-F1. Ties will be broken by (1) precision, then (2) runtime.

Submissions will be judged on:

Clarity and completeness of the paper.
Availability and reproducibility of code/tool (open-source required).
Correct use of the provided data split.
Correct reporting of metrics.
Quality of documentation.

Accepted papers will appear in the NLBSE '26 proceedings.

Ranking Details

Participants submit a single multi-label classifier. Rankings proceed in two stages:

Eligibility check: The submission must not reduce any of precision, recall, or micro-F1 compared with the baseline.
Ordering: Qualifying submissions are ordered by the greatest positive ∆micro‑F1.

Citing Relevant Work

General Competition Citation:

@inproceedings{nlbse2026,
  author = {Moritz Mock and Rani, Pooja and Santos, Fabio and Carter, Benjamin and Penney, Jacob},
  title={The NLBSE'26 Tool Competition},
  booktitle={Proceedings of The 5th International Workshop on Natural Language-based Software Engineering (NLBSE'26)},
  year={2026}
}

Code Comment Classification Citations

Please cite if participating in the code comment classification competition:

@article{rani2021,
                title={How to identify class comment types? A multi-language approach for class comment classification},
                author={Rani, Pooja and Panichella, Sebastiano and Leuenberger, Manuel and Di Sorbo, Andrea and Nierstrasz, Oscar},
                journal={Journal of systems and software},
                volume={181},
                pages={111047},
                year={2021},
                publisher={Elsevier}
              }

@INPROCEEDINGS{AlKaswan2023,
                author={Al-Kaswan, Ali and Izadi, Maliheh and Van Deursen, Arie},
                booktitle={2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)}, 
                title={STACC: Code Comment Classification using SentenceTransformers}, 
                year={2023},
                pages={28-31}
              }

@inproceedings{pascarella2017,
                title={Classifying code comments in Java open-source software systems},
                author={Pascarella, Luca and Bacchelli, Alberto},
                booktitle={2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)},
                year={2017},
                organization={IEEE}
              }

Skill Classification Citations

Please cite if participating in the skill competition:

@inproceedings{carter2025skillscope,
    title        = {SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub},
    author       = {Carter, Benjamin C. and Contreras, Jonathan Rivas and Llanes Villegas, Carlos A. and Acharya, Pawan and Utzerath, Jack and Farner, Adonijah O. and Jenkins, Hunter and Johnson, Dylan and Penney, Jacob and Steinmacher, Igor and Gerosa, Marco A. and Santos, Fabio},
    year         = 2025,
    booktitle    = {2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)},
    volume       = {},
    number       = {},
    pages        = {9--12},
    doi          = {10.1109/NLBSE66842.2025.00007},
    keywords     = {Random Forest;Training;Java;Large language models;Semantics;Retrieval augmented generation;Machine learning;Open source software;Software engineering;Software development management;software engineering;skill categorization;open source software (OSS);machine learning;large language models}
}

@article{santos2023tag,
    title        = {Tag that issue: applying API-domain labels in issue tracking systems},
    author       = {Santos, Fabio and Vargovich, Joseph and Trinkenreich, Bianca and Santos, Italo and Penney, Jacob and Britto, Ricardo and Pimentel, Jo{\~a}o Felipe and Wiese, Igor and Steinmacher, Igor and Sarma, Anita and others},
    year         = 2023,
    journal      = {Empirical Software Engineering},
    publisher    = {Springer},
    volume       = 28,
    number       = 5,
    pages        = 116
}

@inproceedings{vargovich2023givemelabeledissues,
    title        = {Givemelabeledissues: An open source issue recommendation system},
    author       = {Vargovich, Joseph and Santos, Fabio and Penney, Jacob and Gerosa, Marco A and Steinmacher, Igor},
    year         = 2023,
    booktitle    = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)},
    pages        = {402--406},
    organization = {IEEE}
}

Important Dates

Paper/tool submission

December 19, 2025

Acceptance notification

January 12, 2026

Camera-ready paper submission

January 26, 2026

All dates are Anywhere on Earth (AoE).

Important Links

Submission Link: Google Form Submission

Code Comment Classification Competition

GitHub Repository: nlbse2026/code-comment-classification

Google Colab Instructions: Challenge Instructions & Setup

Skill Classification Competition

GitHub Repository: nlbse2026/skill-classification

Dataset: Dataset at Hugging Face

NLBSE 2026

The 5th International Workshop on
Natural Language-based Software Engineering

12-13 April 2026, Rio de Janeiro, Brazil