yymm_num
naming scheme.yymm_num
naming scheme.521 MB
packaged as .tar.gz
archives.This is the first public release of the ArGoT dataset generated by the Formal Abstracts research group. ArGoT is a dataset of term-definition pairs automatically extracted from the arXiv mathematical papers.
Two independently extracted versions of the dataset are provided:
Both datasets have the same file structure:
SGD.v3/
├── math00
│ ├── 0001_001.xml.gz
│ ├── 0002_001.xml.gz
│ ├── 0003_001.xml.gz
.
.
.
├── math01
│ ├── 0101_001.xml.gz
│ ├── 0102_001.xml.gz
│ ├── 0103_001.xml.gz
.
.
.
It is comprised of XML files with the following tags and attributes:
@MISC{SML:argot:2021,
author = {Luis Berlioz},
title = {ArGoT:2021 dataset, arXiv Glossary of Terms},
howpublished = {hosted at \url{https://sigmathling.kwarc.info/resources/argot-dataset-2021/}},
note = {SIGMathLing -- Special Interest Group on Math Linguistics},
year = {2021}
@online{SML:argot:2021,
author = {Luis Berlioz},
title = {argot:2021 dataset, an automatically extracted glossary of mathematical terms from the arXiv},
url = {https://sigmathling.kwarc.info/resources/argot-dataset-2021/},
note = {SIGMathLing -- Special Interest Group on Math Linguistics},
year = {2021}
%0 Generic
%T argot:2021 dataset, an automatically extracted glossary of mathematical terms from the arXiv
%A Berlioz, Luis
%D 2021
%I hosted at https://sigmathling.kwarc.info/resources/argot-dataset-2021/
%F SML:argot:2021b
%O SIGMathLing – Special Interest Group on Math Linguistics
The content of this Dataset is licensed to SIGMathLing members for research and tool development purposes.
Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.
Part of the Formal Abstracts research group. Author: Luis Berlioz
Example of an entry in the database:
<article name="1407_005/1407.2218/1407.2218.xml" num="89">
<definition index="51">
<stmnt> Assume _inline_math_. We define the following space-time
norm if _inline_math_ is a time interval _display_math_ </stmnt>
<dfndum>space-time norm</dfndum>
</definition>
</article>