SigMathLing - Technical Concerns

Recall that SIGMathLing maintains a bouquet of services; here we air some technical concerns and ideas.

Resource Repositories

We have a SIGMathLing group on the GitLab server gl.kwarc.info, where we have hosted a range of data repositories. This allows us to use Git permissions for access control and the GitLab permission UI for management. We estimate that for the first two years (2017-2019) SIGMathLing will have below 25 members (reducing the traffic) and below 5 TB data sets. gl.kwarc.info should be able to serve that given that most data sets will be served via Git LFS. Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution.

Zenodo has officially turned down hosting the SIGMathLing resources due to the large volume of data, but we are open to exploring alternative providers - feel free to reach out!

Standardizing Datasets and Resources

We will need to develop standards for representing, classifying, describing, and citing data sets and reources.

  1. Representation: file formats, repository layout, data models
  2. Classification/description: is the dataset
    • a corpus (raw, processed, …),
    • a set of annotations to a corpus,
    • automatically/automatically created, by which process/system?
    • an evaluation data set (gold standard)?
    • what is the quality? f-measure,
    • what is the license.
  3. Identification: we are looking into obtaining a DOI data identifier for each resource
  4. Citation The idea is to have a “landing page per resourcer that address all the points in 1. and 2. as well as the authors that can be cited. The landing page should also have pre-made bibTeX (and possibly EndNote) entries to make citations easier.

Resource Reference Page

Currently, this is just a manually curated page on the SIGMathLing web site, eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.

Suite of Systems and Libraries

Currently, this is just a manually curated page on the SIGMathLing web site, eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.

Math Analysis Blackboard

MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.