Thought this was cool: New Papers: Topology of Model Spaces
To continue the train of thought about topological machine learning, I’d like to advertise for a couple of papers by my senior PhD students this summer: Ben Yackley’s paper at ICML (this week!) and Diane Oyen’s paper at AAAI next month.
Ben Yackley is presenting “Smoothness and Structure Learning by Proxy” at ICML in Edinburgh. This paper examines the structure of model spaces for Bayesian networks in the context of model structure search. The core point is a demonstration of a relationship between the topology of the model space and the model scoring function. (Specifically, the BDe score — essentially a Bayesian posterior likelihood.) What Ben showed is that if you pin down a specific metric on Bayes net model space (the hypercube topology) then the score function is smooth (Lipschitz continuous) with respect to that metric. Conveniently, the hypercube topology arises from single-edge model edits, which are common operations in incremental model search methods.
All that sounds rather abstract, but it has a very practical implication: if the score function is smooth, then you can approximate it with a standard function approximator, such as a Gaussian process. That means that you can replace exact score evaluation per model (slow) with point evaluation of your approximator (fast). That turns into a fast search algorithm for Bayes net structures. Ben does a nice job of showing speedup and ability to recover high-scoring networks. If you’re at ICML this week, I recommend stopping by to see his poster! (Thursday, June 28).
Diane Oyen is presenting her work, “Leveraging Domain Knowledge in Multitask Bayesian Network Structure Learning”, at AAAI in Toronto. This paper looks at the problem of learning multiple, related statistical models (such as Bayesian networks) from related data sets. The specific example she examines is neuroimaging data: imaging data on subjects who, say, suffer from schizophrenia are related to, but distinct from, data from healthy controls. The similarities and differences should be reflected in the models derived from the corresponding data sets.
In principle, one can recover exact models from each population independently, and then examine those models for similar or different model parameters (e.g., subnetwork structures). In practice, that can take enormous amounts of data. Typically, for complex model spaces (Bayes nets, factor graphs, and the like) you find large plateaus of very similarly-scored models, with little to really differentiate them (aside from a data-independent prior). If you pick models independently from two such spaces, there is little chance that they’ll be closely related — that they will reflect the similarity that we a priori expect.
What Diane showed is that if you have such a priori similarity expectations, you can encode them in a “meta-graph” — a metric that describes the prior expectations and that can be used to bias the model selection search. This pushes models to be selected from “similar” parts of the high-scoring plateaus in the different spaces, so that you get more closely related models than you would by searching independently. The result is a stabler search that produces models that seem to more accurately reflect necessary similarities and differences than searching independently (or by searching with standard multi-task BN learning algorithms, which, essentially, assume a uniform metric over model space).
Putting these results together, we’re beginning to see a picture of the rich geometric structure of model spaces (as opposed to the geometry of data spaces, which we’re more used to thinking about). In some sense, it’s an extension of the traditional search bias story — use prior knowledge (via kernels or probabilistic priors) to bias the class of results that you get toward the ones that you want. On the other hand, it’s also a knowledge-richer approach that you’ll typically see in the literature. It says that there is geometry to model spaces that goes beyond radial basis functions, and that we can exploit to our benefit if we think carefully about our model spaces.
from Ars Experientia: http://cs.unm.edu/~terran/academic_blog/?p=106