The content of this dataset is licensed to SIGMathLing members for research and tool development purposes.
Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.
This is the project to create a dataset for grounding of formulae.
As a trial work, this dataset consists of an annotated long paper (20 pages in PDF):
The original XHTML file of the paper was taken from the arXMLiv:08.2018
dataset, and we manually annotated all
937 identifiers (i.e., <mi>
tags) in the document to the corresponding
mathematical objects (meanings).
The annotation is performed with our open-source annotation tool MioGatto. The tool is also suitable for viewing the data. Please refer to its documentation for the details.
Download link (SIGMathLing members only)