arXMLiv 2020 Dataset Released

The 2020 release to the arXMLiv data set has been published, including 1.58 million HTML5+MathML document conversions from arXiv.org.

Read more ...

SIGMathLing has 9 Datasets and 22 Members

The SIGMathLing Data Cooperative is growing nicely, we currently have nine data sets, and 22 members. who have signed the NDA to get access to the data.

Read more ...

arXiv 2019 Data Set and Embeddings Released

The 2019 release to the arXMLiv data set has been published.

Read more ...

Statement Classification Data Set

A new data set with annotations for 10.5 million scientific statements has been uploaded to SIGMathLing. The content of this data set is licensed to SIGMathLing members for research and tool development purposes subject to the SIGMathLing Non-Disclosure-Agreement.

Read more ...

Quantity Expressions Data Set

A new data set with annotations for quantity expressions has been uploaded to SIGMathLing. The content of this data set is licensed to SIGMathLing members for research and tool development purposes subject to the SIGMathLing Non-Disclosure-Agreement.

Read more ...

First Data Sets (1.1 Million scientific HTML5 documents from arXiv and token models)

SIGMathLing has published the first data sets. They also act as templates for future data sets. The content of these data sets are licensed to SIGMathLing members for research and tool development purposes subject to the SIGMathLing Non-Disclosure-Agreement.

Read more ...

SIGMathLing Web Site online

The SIGMathLing web site is now online at http://sigmathling.kwarc.info Even though we do not have any linguistic resoures yet, we can start looking for members that bring in some.

Read more ...

SIGMathLing initialized

We are starting with SIGMathLing.

Read more ...