Given an unlabeled data set (n data points in d dimensions), we address the task of performing near-optimal machine learning whilst labeling as few data points as possible, denoted the label complexity of the problem. Our focus is linear models and tight approximation guarantees. Existing theory identifies the data to label from the bottom up, and this approach does not work for tight approximation guarantees. We give algorithms that achieve tight approximation by choosing the data to label from the top down, via rejection. Using these algorithms, one can reduce label complexity in linear regression by O(n) while provably achieving tight approximation.
About the speaker: Dr. Malik Magdon-Ismail is a Professor of Computer Science at Rensselaer Polytechnic Institute, where he has been a faculty member since 2000. His research is in machine learning, complex systems, and decision-making from data. Specific topics include randomized algorithms for community detection in graphs, matrix reconstruction, and large-scale data analytics. He is a co-author of the textbooks "Learning From Data" and "Discrete Mathematics and Computing". Dr. Magdon-Ismail earned his BS from Yale University and his PhD from the California Institute of Technology (Caltech). When he isn't decoding data, he is an avid player of poker, bridge, badminton and squash.