Thursday, 27 December 2018
Deep generative modeling has led to new and state of the art approaches for enforcing structural priors in a variety of inverse problems. In contrast to priors given by sparsity, deep models can provide direct low-dimensional parameterizations of the manifold of images or signals belonging to a particular natural class, allowing for recovery algorithms to be posed in a low-dimensional space. This dimensionality may even be lower than the sparsity level of the same signals when viewed in a fixed basis. In this talk, we will show rigorous recovery guarantees for solving inverse problems under a learned generative prior. First, we will discuss convergence guarantees for compressive sensing under random neural network priors. Then, we will show that generative priors allow for a significant advance to be made in the problem of compressive phase retrieval. To date, no known computationally efficient algorithm exists for solving phase retrieval under a sparsity prior at sample complexity proportional to the signal complexity. With generative priors, we establish a new approach for compressive phase retrieval and establish rigorous guarantees with sample complexity proportional to the signal complexity.
We present algorithms with guarantees for learning and privacy-preservation in the context of the problem of class ratio estimation. More interestingly, we derive learning bounds for the estimation with p rivacy constraints, which lead to important insights for the data-publisher. Such results motivate the need to looking at the privacy vs. learning trade-off in other ML applications.
E-commerce websites such as Amazon, Alibaba, and Walmart typically process billions of orders every year. Semantic representation and understanding of these orders is extremely critical for an eCommerce company. Each order can be represented as a tuple of <customer, product, price, date>. Exploring the space of all plausible orders could help us better understand the relationships between the various entities in an e-commerce ecosystem, namely the customers and the products they purchase. In this work, we propose a Generative Adversarial Network (GAN) for orders made in e-commerce websites. Once trained, the generator in the GAN could generate any number of plausible orders. Our contributions include: (a) creating a dense and low-dimensional representation of e-commerce orders, (b) train an ecommerceGAN (ecGAN) with real orders to show the feasibility of the proposed paradigm, and (c) train an ecommerce-conditional-GAN (ec^2GAN) to generate the plausible orders involving a particular product. We propose several qualitative methods to evaluate ecGAN and demonstrate its effectiveness.
Bio: Arijit Biswas is currently a Senior Machine Learning Scientist at the India machine learning team in Amazon, Bangalore. His research interests are mainly in deep learning, machine learning, Natural Language Processing and Computer Vision. Earlier he was a research scientist at Xerox Research Centre India (XRCI) from June, 2014 to July, 2016. He received his PhD in Computer Science from University of Maryland, College Park in April 2014. His PhD thesis was on Semi-supervised and Active Learning Methods for Image Clustering. His thesis advisor was David Jacobs and he closely collaborated with Devi Parikh and Peter Belhumeur during his stay at UMD. While doing his PhD, Arijit also did internships at Xerox PARC and Toyota Technological Institute at Chicago (TTIC). He has published papers in CVPR, ECCV, ACM-MM, BMVC, IJCV, ECML-PKDD and CVIU. Arijit is also a recipient of the MIT Technology Review Innovators under 35 award from India in 2016.
Several critical applications require ML inference on resource-constrained devices. In this talk, we will discuss two new methods FastGRNN and EMI-RNN that can enable time-series inference on devices as small as Arduino Uno that have 2KB of RAM. Our methods can provide as much as 70x speed-up and compression over state-of-the-art methods like LSTM, GRU, while also providing strong theoretical guarantees.
Bio: Prateek Jain is a member of the Machine Learning and Optimization and the Algorithms and Data Sciences Group at Microsoft Research, Bangalore, India. He is also an adjunct faculty member at the Computer Science department at IIT Kanpur. His research interests are in machine learning, non-convex optimization, high-dimensional statistics, and optimization algorithms in general. He is also interested in applications of machine learning to privacy, computer vision, text mining and natural language processing. He completed his PhD at the University of Texas at Austin under Prof. Inderjit S. Dhillon
Most of the machine learning systems rely on some implicit regularity assumptions about the data. For example, many classifiers assume that all classes have equal number of representatives, all the sub-concepts within the classes are characterized by equal number of representatives, all classes have similar class-conditional distributions. Further, both classifiers as well as clustering methods assume that all features are defined and observed for all data instances. However, many real-world datasets violate one or more of these assumptions, giving rise to data irregularities which can induce undue bias in the learning systems or even render the systems inapplicable to the data. Starting with a taxonomy of the various data irregularities, in this talk, we peruse some key practical difficulties of the learning systems to handle one or a combination of such data irregularities, given that all of these cannot be remedied through pre-processing. We will also highlight some major theoretical challenges in analyzing the behavior of the learning systems (e.g. in terms of test error bounds for classifiers on imbalanced datasets) in face of irregular datasets.
Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In this paper, we propose NeVAE, a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. In addition, by using masking, the decoder is able to guarantee a set of valid properties in the generated molecules. Experiments reveal that our model can discover plausible, diverse and novel molecules more effectively than several state of the art methods. Moreover, by utilizing Bayesian optimization over the continuous latent representation of molecules our model finds, we can also find molecules that maximize certain desirable properties more effectively than alternatives.
Estimating causal impact in terms of change in spending patterns for various customer events is fundamental to a large E-commerce company like Amazon. Questions like "Do (treatment) customers who sign up for prime membership or makes a first purchase in a category say books spends 'x' dollars more in the subsequent one year compared to (control) customers who do not perform these events" are invaluable in terms of boosting revenues, streamlining business operations, inventory planning, marketing and recommendations. Computing such causal inferences of customer events from observational data requires one to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes (one year spends) among units that received the treatment versus those that did not. One approach to reduce bias is to identify weighted prototypical examples from control population that match the treatment distribution. To this end, we will describe a fast algorithm ProtoDash for selecting prototypical examples. We associate non-negative weights for the selected prototypes which aids in interpreting the importance of each prototype in matching the treatment distribution. Though the non-negative requirement sacrifices (strong) submodularity, we show that the problem is weakly submodular and derive approximation guarantees for our ProtoDash algorithm. We demonstrate the efficacy of our method on diverse domains such as digit recognition (MNIST), publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website and retail.
Bio: Karthik Gurumoorthy graduated with a dual master's degree in Mathematics and Computer Science in 2009 and 2010 respectively and earned a doctorate degree in Computer Science in 2011 from the University of Florida, Gainesville. He continued at the same institution for a year in the capacity of a post-doctoral researcher and later joined GE Global Research, Bangalore as a Research Scientist in 2012 pursuing research in the field of medical image analysis. After completing a year and 3 months at GE, he accepted an AIRBUS post-doctoral fellowship position at the International Center for Theoretical Sciences, Tata Institute of Fundamental Research (ICTS-TIFR), Bangalore where he conducted research in data assimilation and filtering theory for over a year and 6 months. He currently works at Amazon Development Center, Bangalore as a Machine Learning Scientist. He has worked on a wide gamut of problems covering domains like signal processing, machine learning, density estimation, filtering theory, computer vision and image compression and is motivated by problems which are mathematical in nature.
In recent years, machine learning has been increasingly used to predict, enhance, and even replace human decision making in a wide variety of offline and online applications. Often the design goal is to maximize some performance metric of the system (for example, overall prediction accuracy). However, there is a growing concern that these automated decisions can lead, even in the absence of intent, to a lack of fairness, i.e., their outcomes can disproportionately impact particular groups of social groups (e.g., Blacks, Women). In this talk, I'll cover a set of techniques designed in recent years to ensure that machine learning methods fueling algorithmic decisions are fair to all while satisfying the performance requirements.
Many complex systems can be analyzed by examining the spectra of matrices central to them. The graph Laplacian of large social networks and the Hessian of the empirical loss function arising from fitting of models to data in machine learning are prime examples. Naive estimation of the eigenvalues is computationally infeasible even if the matrices are explicitly given; most often the only access to these matrices is only through matrix-vector products. In this talk, I will describe a scalable and accurate algorithm for estimating the eigenvalue density of large, symmetric matrices with a particular focus on the Hessian of loss function in deep neural networks. We use our algorithm to study how the loss landscape changes throughout training process, and how re-parameterizations like Batch Normalization affect the conditioning of the surface.
In this talk, I will first provide an overview of key problem areas where we are applying Machine Learning (ML) techniques within Amazon such as product demand forecasting, product search, and information extraction from reviews, and associated technical challenges. I will then talk about two specific applications where we use a variety of methods to learn semantically rich representations of data: question answering where we use deep learning techniques and product size recommendations where we use probabilistic models.
Bio: Rajeev Rastogi is a Director of Machine Learning at Amazon where he is developing ML platforms and applications for the e-commerce domain. Previously, he was Vice President of Yahoo! Labs Bangalore and the founding Director of the Bell Labs Research Center in Bangalore, India. Rajeev is an ACM Fellow and a Bell Labs Fellow. He is active in the fields of databases, data mining, and networking, and has served on the program committees of several conferences in these areas. He currently serves on the editorial board of the CACM, and has been an Associate editor for IEEE Transactions on Knowledge and Data Engineering in the past. He has published over 125 papers, and holds over 50 patents. Rajeev received his B. Tech degree from IIT Bombay, and a PhD degree in Computer Science from the University of Texas, Austin.
Friday, 28 December 2018
Our physical world is intrinsically spatiotemporal, demonstrating rich structures such as spatial smoothness, temporal periodicity, and multi-resolution spatiotemporal dependency. While deep learning has made major breakthroughs in the domain of images and texts, it fails to represent the intrinsic structures of spatiotemporal data commonly seen in trajectory tracking, automated sensing, and the Internet of Things.
In this talk, I will show how to design deep learning models to learn from large-scale spatiotemporal data, especially for dealing with non-Euclidean geometry, long-term dependencies and incorporating logical and physical constraints. I will showcase the application of these models to a variety of problems, including long-term forecasting (transportation), long-range trajectories synthesis (sport), and combating ground effect in quadcopter landing (aerospace control).
In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this talk, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent’s policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent’s actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.
BIO: Abir De is a postdoctoral researcher in Max Planck Institute for Software Systems at Kaiserslautern, Germany since January 2018. He was hosted by Manuel Gomez Rodriguez. He received his PhD from Department of Computer Science and Engineering, IIT Kharagpur in July 2018. During that time, he was a part of the Complex Network Research Group (CNeRG) at IIT Kharagpur. His PhD work was supported by Google India PhD Fellowship 2013. Prior to that, he did his BTech in Electrical Engineering and MTech in Control Systems Engineering both from IIT Kharagpur. His main research interests broadly lie in modeling, learning and control of networked dynamical processes. Very recently, he started working on deep learning on graphs, for example, deep generative random graph model, deep reinforcement learning of networked processes, etc.
Word embeddings, a technique that purportedly captures the semantic relationship between words, has been studied in various forms in IR at least as far back as 1990. Interest in the use of word embeddings has been rekindled thanks to relatively recent research in deep neural network based approaches, which have opened up a plethora of opportunities for researchers in different fields of Computer Science. While traditional IR models use relatively coarse (co-)occurrence statistics to implicitly quantify semantic relatedness between terms, embeddings seem to constitute a more direct method for capturing semantic relations. Thus, they may provide a natural approach to addressing the well-known problems of synonymy, polysemy and vocabulary mismatch in general. In this talk, we will look at some modifications of retrieval functions that use word embeddings and achieve significant improvements in retrieval performance. Additionally, we will talk about an embedding based retrieval performance estimator that predicts the outcome of a search a priori, without relevance judgements.
We give a solution to the following question from manifold learning.
Suppose data belonging to a high dimensional Euclidean space is drawn independently, identically distributed from a measure supported on a low dimensional twice-differentiable embedded compact manifold M, and is corrupted by a small amount of i.i.d gaussian noise.
How can we produce a manifold M' whose Hausdorff distance to M is small and whose reach (normal injectivity radius) is not much smaller than the reach of M?
This is joint work with Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, and Matti Lassas.
E-commerce search engines often receive queries for which it cannot find relevant products. A standard fallback in such scenarios is to recommend reformulated queries using simple heuristics. However such heuristics may yield results that seem unnatural to the human user. In this talk, I'll present our ongoing work to solving the problem of query reformulation using an embedding method. While there are embedding methods for text only data or for graph only data, I'll discuss why they cannot be used in this context and show how our embedding method incorporates both the query text as well as query-product/query-query relations (represented as a graph). This is a general approach to get text embeddings such that the embeddings' geometry mimics a graph based similarity measure. I'll also discuss the difficulty in devising a metric to understand the performance query reformulation models.
We look at detecting anomalies in e-commerce data: detect sellers who may have solicited fake reviews from customers to artificially increase the rating of their product(s). Since most of the connections between the seller’s product(s) and the customers who buy their products are inherently random and spread across time; the problem boils down to detecting non-random connections. This manifests as detecting dense bi-partite cores between the sellers & customers that are formed in a relatively short period of time. We apply a scalable tensor decomposition technique to detect such dense bi-partite cores. While tensor-decomposition is mostly unsupervised, we formulate Bayesian semi-supervised tensor decomposition to take advantage of sparse labeled data. In addition, we use Polya-Gamma data augmentation for the semi-supervised Bayesian tensor decomposition and show that this formulation simplifies calculation of the Fisher information matrix for partial natural gradient learning. Our experimental results show that our semi-supervised approach outperforms state of the art unsupervised baselines. And that the partial natural gradient learning outperforms stochastic gradient learning and Online-EM with sufficient statistics.
Bio: Anil R. Yelundur is working as an Applied Scientist with Amazon for over 2.5 years. He has over ten years of experience in the field of Machine Learning and has worked on problems ranging from applying topic modeling, anomaly detection in high-dimensional data and pattern recognition. His current focus includes applying state of the art machine learning techniques towards solving problems related to detecting review abuse and plagiarism.
Natural language processing technology (NLP) has been at the forefront of the Digital and the AI revolution. Language is the most natural medium of communication of humans with machines. While there have been great strides in NLP in the past few decades based on linguistic studies and machine learning, the focus has been only on a few of the world’s languages. We live in a multilingual world having about 7000 living languages. We will present algorithms and methods developed in recent years that can bootstrap the effort to bring NLP technology to the less resourced languages. The use of neural networks and representation learning methods in NLP has opened up opportunities of transfer learning to benefit low resource languages. Available resources such as unannotated corpus, dictionaries, and limited labeled data can be utilized for this. Lately completely unsupervised methods have been developed for this.
Transfer learning that work by transferring learned representations and models can be leveraged for cross-lingual and cross-domain transfer. This will be illustrated with techniques in crosslingual parsing.
Joint multilingual and multitask learning have been successfully used to build models that work well for both high and low resource languages. Zero-shot and one-shot learning can leverage this as wil be illustrated by multiligual machine translation methods.
We will also discuss the amazing strides made in machine translation that can enable unsupervised machine translation that work with monolingual corpus only.
Bio: Sudeshna Sarkar is a Professor of the Department of Computer Science and Engineering at IIT Kharagpur. She is currently the Head of the Department of Computer Science and Engineering and is also the Head of the newly formed Centre of Excellence in Artificial Intelligence.
She did her B.Tech. in 1989 from IIT Kharagpur, MS from University of California Berkeley in 1991, and PhD from IIT Kharagpur in 1995. She served in the faculty of IIT Guwahati and at IIT Kanpur before joining IIT Kharagpur in 1998.
Her research interests are in Artificial Intelligence, Machine Learning and Natural Language Processing. She has been working on Text Mining and Information Retrieval systems, and is very involved in developing natural language processing resources and tools for Indian languages. She is currently working on applications of Machine Learning to a variety of domains.
Bayesian Nonparametrics has been established as a key discipline in machine learning over the last decade. Stick-breaking process is a relatively unexplored tool in Bayesian Nonparametrics that helps in defining models with flexibility to learn the complexity of a model in terms of number of parameters. The talk will explore approaches how we can utilize stick-breaking process to address two very different prevalent issues in modern data science. Sequential processing of data in minibatches is a standard technique for learning from large scale datasets. However, the number of parameters is a critical issue. We can neither fix it a priori, nor we can allow it to grow linearly. Secondly, often large collection of data points which form big clusters dominate the learning of distributions for the entire dataset, and small rare patterns get subdued. This increases the gap between the true distribution and the learnt distribution leading to incomplete and sometimes incorrect conclusions from datasets. Stick-breaking process can provide some efficient solutions to both the problems with certain provable guarantees and significant empirical results.
Saturday, 29 December 2018
Deep neural networks (DNNs) have been successful in many application domains. However, DNNs are vulnerable to (adversarial) input perturbations that cause dramatic model output errors. Such problematic perturbations arise naturally or can be intentionally crafted ("adversarial examples"). This severely hampers the reliability of DNN models when deployed "in the wild". In this talk, I will present two fast and effective methods to improve the robustness of neural networks.
Firstly, I will present a general stability training method to stabilize deep networks against small input distortions that result from various types of common image processing, such as compression, rescaling, and cropping. We validate our method by stabilizing the state-of-the-art Inception architecture against these types of distortions. In addition, we demonstrate that our stabilized model gives robust state-of-the-art performance on large-scale near-duplicate detection, similar-image ranking, and classification on noisy datasets.
Secondly, I will demonstrate NeuralFingerprinting (NFP): a simple, fast and yet effective method to detect adversarial examples. NFP verifies whether model behavior is consistent with a set of "fingerprints", inspired by the use of biometric and cryptographic signatures. NFP detects the strongest known adversarial attacks with 95-100% AUC-ROC scores on the MNIST, CIFAR-10 and MiniImagenet (20 classes) datasets. This holds even in the most conservative security setting in which the attacker has full knowledge of the defender's strategy. In particular, the detection accuracy of NeuralFingerprinting generalizes well to unseen test-data and is robust over a wide range of hyperparameters.
With advent of deep learning, there are many gaps between theory and practice. Understanding optimization landscape of non-convex deep-learning loss functions is challenging. We present a few success stories that present both theoretical analyses and empirical results. We analyze signSGD: a gradient compression algorithm that only transmits the sign of the stochastic gradients during distributed training. We show that signSGD has nearly no loss in accuracy while yielding significant speedups. In another work, we analyze generalization bounds for domain adaptation under shifts in label distribution. We derive a generalization bound for a regularized importance weighting algorithm. Experiments show that regularization significantly improves accuracy, especially in low (target) sample and large-shift regimes. Finally, I will present a work that applies deep-learning in a control-theoretic problem that guarantees stable landing of a drone. This approach blends together a nominal dynamics model coupled with a neural network that learns the unknown ground effect model. We show that spectral normalization of neural network guarantees stability and shows performance improvement in practice.
Bio: Anima Anandkumar is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. Her research spans both theoretical and practical aspects of large-scale machine learning. In particular, she has spearheaded research in tensor-algebraic methods, non-convex optimization, probabilistic models and deep learning.
Anima is the recipient of several awards and honors such as the Bren named chair professorship at Caltech, Alfred. P. Sloan Fellowship, Young investigator awards from the Air Force and Army research offices, Faculty fellowships from Microsoft, Google and Adobe, and several best paper awards. She was recently nominated to the World Economic Forum's Expert Network consisting of leading experts from academia, business, government, and the media. She has been featured in documentaries by PBS, KPCC, wired magazine, and in articles by MIT Technology review, Forbes, Yourstory, O’Reilly media, and so on. Anima received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, a visiting researcher at Microsoft Research New England in 2012 and 2014, an assistant professor at U.C. Irvine between 2010 and 2016, an associate professor at U.C. Irvine between 2016 and 2017 and a principal scientist at Amazon Web Services between 2016 and 2018.
Deep neural networks have become highly effective tools for compression and image recovery tasks. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters and need to be trained on large datasets. We will discuss an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture fewer weight parameters than the output dimensionality. This under-parameterization enables the deep decoder to compress images into a concise set of network weights. Further, under-parameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder's simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations.
This talk presents the research activities and experiences at a lab in Indian Statistical Institute - Kolkata, where the researchers have been using deep learning systems for a couple of years for solving NLP and Vision problems. Recurrent neural nets along with attention mechanism, Capsule Nets and different versions of CNNs are used for realising textbook question answering (TQA), visual question answering (VQA), and segmentation of histopathological images for medical image analysis. Success out of using these deep structures will be highlighted along with some limitations of such algorithms in achieving certain goals. Lack of reasoning ability, inability to deal with ordinal classification problems of apparently successful architectures will be highlighted in context of future research efforts. The talk ends with discussion on adversarial attacks showing some of our recent results that further illustrate the lack of understanding ability of the otherwise successful CNNs.
Most of the work in Entity Recognition (ER) is segregated across several niche areas. For example, recognition of product names in reviews on e-commerce websites or recognition of organization names in news articles. This has led to the development of several independent ER systems each being a rigid expert in its area. This talk focuses on building ER systems that generalize better on the ER task as a whole, rather than a specific dataset. Such systems can simultaneously recognize thousands of fine entity types across various textual domains. In the talk, we will start with briefly summarizing the existing single domain ER works, highlighting the major challenges in scaling up ER. Then we'll present recent works to tackle those challenges and conclude with several open problems along with future research directions.
Agent based modeling is a computational modeling methodology that has been quite successful in modeling complex multi-agent systems where analytical modeling approaches prove to be too restrictive. In recent times, practitioners of agent based modeling have incorporated machine learning and deep learning techniques for building models that learn from real data. However, one aspect where pure data driven modeling approaches fall short is modeling of causality and answering of counterfactual queries. This issue also comes in tandem with a similar concern, explainability of the learnt models. In this talk, we shall discuss Bayesian Decision Flow Diagrams (BDFD) as an agent based modeling technique that can combine both domain expert knowledge and learning from data. Incorporating of domain expert knowledge allows for embedding of causal assumptions in the BDFD model, which in turn allows the it to answer causal effect size queries and counterfactual queries. Further, use of machine learning and deep neural networks allows the model to be data driven, allowing for learning of individual decision parameters from real data.