ArGoT 2021 - arXiv Glossary of Terms

Release

Contents

Download

Description

This is the first public release of the ArGoT dataset generated by the Formal Abstracts research group. ArGoT is a dataset of term-definition pairs automatically extracted from the arXiv mathematical papers.

Two independently extracted versions of the dataset are provided:

Both datasets have the same file structure:

SGD.v3/
├── math00
│   ├── 0001_001.xml.gz
│   ├── 0002_001.xml.gz
│   ├── 0003_001.xml.gz
      .
      .
      .
├── math01
│   ├── 0101_001.xml.gz
│   ├── 0102_001.xml.gz
│   ├── 0103_001.xml.gz
      .
      .
      .

It is comprised of XML files with the following tags and attributes:

Citing this Resource

pure bibTeX

@MISC{SML:argot:2021,
  author = {Luis Berlioz},
  title = {ArGoT:2021 dataset, arXiv Glossary of Terms},
  howpublished = {hosted at \url{https://sigmathling.kwarc.info/resources/argot-dataset-2021/}},
  note = {SIGMathLing -- Special Interest Group on Math Linguistics},
  year = {2021}

bibTeX for the bibLaTeX package (preferred)

@online{SML:argot:2021,
  author = {Luis Berlioz},
  title = {argot:2021 dataset, an automatically extracted glossary of mathematical terms from the arXiv},
  url = {https://sigmathling.kwarc.info/resources/argot-dataset-2021/},
  note = {SIGMathLing -- Special Interest Group on Math Linguistics},
  year = {2021}

EndNote

%0 Generic
%T argot:2021 dataset, an automatically extracted glossary of mathematical terms from the arXiv
%A Berlioz, Luis
%D 2021
%I hosted at https://sigmathling.kwarc.info/resources/argot-dataset-2021/
%F SML:argot:2021b
%O SIGMathLing – Special Interest Group on Math Linguistics

Accessibility and License

The content of this Dataset is licensed to SIGMathLing members for research and tool development purposes.

Access is restricted to SIGMathLing members under the SIGMathLing Non-Disclosure-Agreement as for most arXiv articles, the right of distribution was only given (or assumed) to arXiv itself.

Generated via

About

Part of the Formal Abstracts research group. Author: Luis Berlioz

Appendix

Example of an entry in the database:

    <article name="1407_005/1407.2218/1407.2218.xml" num="89">
    <definition index="51">
        <stmnt> Assume _inline_math_. We define the following space-time 
        norm if _inline_math_ is a time interval _display_math_ </stmnt>
        <dfndum>space-time norm</dfndum>
    </definition>
    </article>