TY - GEN
T1 - Stratified space learning
T2 - 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019
AU - Bokor, Y.
AU - Grixti-Cheng, D.
AU - Hegland, M.
AU - Roberts, S.
AU - Turner, K.
N1 - Publisher Copyright:
Copyright © 2019 The Modelling and Simulation Society of Australia and New Zealand Inc. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Many data-rich industries are interested in the efficient discovery and modelling of structures under-lying large data sets, as it allows for the fast triage and dimension reduction of large volumes of data embedded in high dimensional spaces. The modelling of these underlying structures is also beneficial for the creation of simulated data that better represents real data. In particular, for systems testing in cases where the use of real data streams might prove impractical or otherwise undesirable. We seek to discover and model the structure by combining methods from topological data analysis with numerical modelling. As a first step in combining these two areas, we examine the recovery a linearly embedded graph jGj given only a noisy point cloud sample X of jGj. An abstract graph G consists of two sets: A set of vertices V and a set of edges E. An embedded graph jGj in n dimensions is a geometric realisation of an abstract graph obtained by assigning to each vertex a unique coordinate vector in Rn and then each edge is the line segment between the coordinate vectors of the corresponding vertices. Given an embedded graph jGj-Rn, a point cloud sample of jGj consists of a finite collection of points in Rn sampled from jGj, potentially with noisy perturbations. We will suppose that this sample has bounded noise of-and is sufficiently sampled so every point in jGj is within-of some sample. Such a sample is called an-sample. We can view this as a semiparametric model. Once the abstract graph is fixed we have a parametric model where the parameters are the locations of the vertices. In order to guarantee correctness of our algorithm, we will need to make some reasonable geometric assumptions on the embedded graph. To learn the embedded graph jGj, we first learn the structure of the abstract graph G. We do this by assigning a dimension of either 0 or 1 to each x 2 X (depending on local topological structure) and then cluster points into groups representing embedded vertices or edges. Using local topological structure, we then assign to each abstract edge cluster a pair of abstract vertex clusters, to obtain the incidence relations of the abstract graph. Finally, we use nonlinear least squares regression to model the embedded graph jGj. The approach presented in this paper relies on topological concepts, such as stratification and local homology, which will be used in future research that will generalise this approach to embeddings on more general structures, in particular stratified spaces. A stratified space X is a topological space with a partition into topological manifolds fXigi2I , called strata, such that for all i and j, Xi \ Xj = ;, and if Xi \ Xj 6= ;, then Xi-Xj . If we are interested in a specific topological structure, we can require that the strata satisfy aditional criteria, so that this structure is constant across each strata. In the case presented in this paper, the vertices of a graph (abstract or embedded) are the 0-strata, and the (open) edges are 1-strata. The authors are unaware of any algorithms which recover both the abstract graph, and model its embedding.
AB - Many data-rich industries are interested in the efficient discovery and modelling of structures under-lying large data sets, as it allows for the fast triage and dimension reduction of large volumes of data embedded in high dimensional spaces. The modelling of these underlying structures is also beneficial for the creation of simulated data that better represents real data. In particular, for systems testing in cases where the use of real data streams might prove impractical or otherwise undesirable. We seek to discover and model the structure by combining methods from topological data analysis with numerical modelling. As a first step in combining these two areas, we examine the recovery a linearly embedded graph jGj given only a noisy point cloud sample X of jGj. An abstract graph G consists of two sets: A set of vertices V and a set of edges E. An embedded graph jGj in n dimensions is a geometric realisation of an abstract graph obtained by assigning to each vertex a unique coordinate vector in Rn and then each edge is the line segment between the coordinate vectors of the corresponding vertices. Given an embedded graph jGj-Rn, a point cloud sample of jGj consists of a finite collection of points in Rn sampled from jGj, potentially with noisy perturbations. We will suppose that this sample has bounded noise of-and is sufficiently sampled so every point in jGj is within-of some sample. Such a sample is called an-sample. We can view this as a semiparametric model. Once the abstract graph is fixed we have a parametric model where the parameters are the locations of the vertices. In order to guarantee correctness of our algorithm, we will need to make some reasonable geometric assumptions on the embedded graph. To learn the embedded graph jGj, we first learn the structure of the abstract graph G. We do this by assigning a dimension of either 0 or 1 to each x 2 X (depending on local topological structure) and then cluster points into groups representing embedded vertices or edges. Using local topological structure, we then assign to each abstract edge cluster a pair of abstract vertex clusters, to obtain the incidence relations of the abstract graph. Finally, we use nonlinear least squares regression to model the embedded graph jGj. The approach presented in this paper relies on topological concepts, such as stratification and local homology, which will be used in future research that will generalise this approach to embeddings on more general structures, in particular stratified spaces. A stratified space X is a topological space with a partition into topological manifolds fXigi2I , called strata, such that for all i and j, Xi \ Xj = ;, and if Xi \ Xj 6= ;, then Xi-Xj . If we are interested in a specific topological structure, we can require that the strata satisfy aditional criteria, so that this structure is constant across each strata. In the case presented in this paper, the vertices of a graph (abstract or embedded) are the 0-strata, and the (open) edges are 1-strata. The authors are unaware of any algorithms which recover both the abstract graph, and model its embedding.
KW - Point clouds
KW - Stratified space learning
KW - Structure recovery
UR - http://www.scopus.com/inward/record.url?scp=85086451938&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019
SP - 69
EP - 75
BT - 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making
A2 - Elsawah, S.
PB - Modelling and Simulation Society of Australia and New Zealand Inc (MSSANZ)
Y2 - 1 December 2019 through 6 December 2019
ER -