SchemaDB: A Dataset for Structures in Relational Data

Cody Christopher*, Kristen Moore, David Liebowitz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Paperpeer-review

Abstract

In this paper we introduce the SchemaDB dataset; a collection of relational database schemas in both sql and graph formats. Databases are not commonly shared publicly for reasons of privacy and security, and so the corresponding schema for these databases are often not available for study. Consequently, an understanding of database structures in the wild is lacking, and most easily found examples of schema found publicly belong to common development frameworks or are derived from textbooks or engine benchmarks. SchemaDB contains 2,500 samples of relational schema found in public code repositories which have been standardised to MySQL syntax. We provide our gathering and transformation methodology, summary statistics, structural analysis, and discuss potential downstream research tasks in several domains.

Original languageEnglish
Title of host publicationData Mining
Subtitle of host publication20th Australasian Conference, AusDM 2022 Western Sydney, Australia, December 12–15, 2022 Proceedings
EditorsLaurence A.F. Park, Simeon Simoff, Heitor Murilo Gomes, Maryam Doborjeh, Yee Ling Boo, Yun Sing Koh, Yanchang Zhao, Graham Williams
Place of PublicationSingapore
PublisherSpringer Science+Business Media B.V.
Pages233-243
Number of pages11
ISBN (Electronic)978-981-19-8746-5
ISBN (Print)9789811987458
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event20th Australasian Data Mining Conference, AusDM 2022 - Western Sydney, Australia
Duration: 12 Dec 202215 Dec 2022

Publication series

NameCommunications in Computer and Information Science
Volume1741 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference20th Australasian Data Mining Conference, AusDM 2022
Country/TerritoryAustralia
CityWestern Sydney
Period12/12/2215/12/22

Fingerprint

Dive into the research topics of 'SchemaDB: A Dataset for Structures in Relational Data'. Together they form a unique fingerprint.

Cite this