The content of this dataset is licensed to SIGMathLing members for research and tool development purposes.
Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.
This dataset is a ground truth of formula grounding annotation data for 15 scientific papers. More specifically, a total of 12,352 math identifiers were annotated with their referring mathematical concepts, explicitly indicating coreference relations within each article. A total of 938 text spans, called grounding sources, that were used as the basis for human grounding were labeled.
The annotation is performed with our open-source annotation tool MioGatto. The tool is also suitable for viewing the data. Please refer to its documentation for the details.
Download link (SIGMathLing members only)