Dataset for Grounding of Formulae

Basic Information

Accessibility and License

The content of this dataset is licensed to SIGMathLing members for research and tool development purposes.

Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.


This is the project to create a dataset for grounding of formulae.

As a trial work, this dataset consists of an annotated long paper (20 pages in PDF):

The original XHTML file of the paper was taken from the arXMLiv:08.2018 dataset, and we manually annotated all 937 identifiers (i.e., <mi> tags) in the document to the corresponding mathematical objects (meanings).

The annotation is performed with our open-source annotation tool MioGatto. The tool is also suitable for viewing the data. Please refer to its documentation for the details.


Download link (SIGMathLing members only)