Dataset for Grounding of Formulae

Basic Information

Accessibility and License

The content of this dataset is licensed to SIGMathLing members for research and tool development purposes.

Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.

Description

This is the project to create a dataset for grounding of formulae.

As a trial work, this dataset consists of an annotated long paper (20 pages in PDF):

The original XHTML file of the paper was taken from the arXMLiv:08.2018 dataset, and we manually annotated all 937 identifiers (i.e., <mi> tags) in the document to the corresponding mathematical objects (meanings).

Download

Download link (SIGMathLing members only)