Dataset for Grounding of Formulae

Basic Information

Accessibility and License

The content of this dataset is licensed to SIGMathLing members for research and tool development purposes.

Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.

Description

This dataset is a ground truth of formula grounding annotation data for 15 scientific papers. More specifically, a total of 12,352 math identifiers were annotated with their referring mathematical concepts, explicitly indicating coreference relations within each article. A total of 938 text spans, called grounding sources, that were used as the basis for human grounding were labeled.

The annotation is performed with our open-source annotation tool MioGatto. The tool is also suitable for viewing the data. Please refer to its documentation for the details.

Download

Download link (SIGMathLing members only)