All our dreams can come true if we have the courage to pursue them

Walt Disney

About Me









Eeshan Gunesh Dhekane

Mila; Université de Montréal
Département d'Informatique
et de Recherche Opérationnelle



I am Eeshan, a graduate student pursuing M.Sc. (Informatique) at Mila, Université de Montréal. I graduated with B.Tech. in Electrical Engineering with Second Major in Computer Science and Engineering from Indian Institute of Technology, Kanpur.

I am interested in Machine Learning and its applications for Learning Better Representations, especially for Computer Vision. I am also interested in Reinforcement Learning and Probabilistic Modeling.

Apart from this, I enjoy Mathematics, Astronomy, and Philosophy. Also, I am a self-taught Indian Bamboo Flute player.

The whole secret of a successful life is to find out what is one’s destiny to do, and then do it

Henry Ford

Publications



Hierarchical Importance Weighted Autoencoders   Paper

Chin-Wei Huang, Kris Sankaran, Eeshan Gunesh Dhekane, Alexandre Lacoste, Aaron Courville

International Conference on Machine Learning (ICML), 2019




Importance weighted variational inference (Burda et al., 2015) uses multiple iid samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.





Transfer Learning by Modeling a Distribution over Policies   Paper

Disha Shrivastava*, Eeshan Gunesh Dhekane*, Riashat Islam

Workshop on Multi-Task and Lifelong Reinforcement Learning (MTLRL) at International Conference on Machine Learning (ICML), 2019



Exploration and adaptation to new tasks in a transfer learning setup is a central challenge in reinforcement learning. In this work, we build on the idea of modeling a distribution over policies in a Bayesian deep reinforcement learning setup to propose a transfer strategy. Recent works have shown to induce diversity in the learned policies by maximizing the entropy of a distribution of policies (Bachman et al., 2018; Garnelo et al., 2018) and thus, we postulate that our proposed approach leads to faster exploration resulting in improved transfer learning. We support our hypothesis by demonstrating favorable experimental results on a variety of settings on fully-observable GridWorld and partially observable MiniGrid (Chevalier-Boisvert et al., 2018) environments.





Learning Affective Correspondence between Music and Image   Paper

Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019






We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip and an image are considered similar (having true correspondence) if they have similar emotion content. In order to estimate this crossmodal, emotion-centric similarity, we propose a deep neural network architecture that learns to project the data from the two modalities to a common representation space, and performs a binary classification task of predicting the affective correspondence (true or false). To facilitate the current study, we construct a large scale database containing more than 3, 500 music clips and 85, 000 images with three emotion classes (positive, neutral, negative). The proposed approach achieves 61.67% accuracy for the affective correspondence prediction task on this database, outperforming two relevant and competitive baselines. We also demonstrate that our network learns modality-specific representations of emotion (without explicitly being trained with emotion labels), which are useful for emotion recognition in individual modalities.





Convolutional Neural Network Based Sensors for Mobile Robot Relocalization   Paper

Harsh Sinha*, Jay Patrikar, Eeshan Gunesh Dhekane*, Gaurav Pandey, Mangal Kothari

International Conference on Methods and Models in Automation and Robotics (MMAR), 2018



Recently many deep Convolutional Neural Networks (CNN) based architectures have been used for predicting camera pose, though most of these have been deep and require quite a lot of computing capabilities for accurate prediction. For these reasons their incorporation in mobile robotics, where there is a limit on the amount of power and computation capabilities, has been slow. With these in mind, we propose a real-time CNN based architecture which combines low-cost sensors of a mobile robot with information from images of a single monocular camera using an Extended Kalman Filter to perform accurate robot relocalization. The proposed method first trains a CNN that takes RGB images from a monocular camera as input and performs regression for robot pose. It then incorporates the relocalization output of the trained CNN in an Extended Kalman Filter (EKF) for robot localization. The proposed algorithm is demonstrated using mobile robots in GPS-denied indoor and outdoor environments.





High Accuracy Optical Flow Based Future Image Frame Predictor Model   Paper

Nishchal K. Verma, Eeshan Gunesh Dhekane, G. S. S. Srinivas Rao Aakansha Mishra

IEEE Applied Imagery Pattern Recognition Workshop (IEEE AIPR), 2015






In this paper, High Accuracy Optical Flow (HAOF) based future image frames generator model is proposed. The aim of this work is to develop a framework which is capable of predicting the future image frames for any given sequence of images. The requirement is to predict large number of image frames with better clarity and better accuracy. In the first step, the vertical and horizontal components of flow velocities of the intensities at each pixel positions are estimated using High Accuracy Optical Flow (HAOF) algorithm. The estimated flow velocities in all the image frames at all the pixel positions are then modeled using separate Artificial Neural Networks (ANN). The trained models are used to predict the flow velocities of intensities at all the pixel positions in the future image frames. The intensities at all the pixel positions are mapped to new positions by using the velocities predicted by the model. The concept of Bilinear Interpolation is used to obtain predicted images from the new positions of intensities. The quality of the predicted image frames is evaluated by using Canny Edge Detection based Image Comparison Metric (CIM) and Mean Structural Similarity Index Measure (MSSIM). The predictor model is simulated by applying it on the two image sequences-an image sequence of a fighter jet landing over the navy deck, and another image sequence of a train moving on a bridge. The proposed framework is found to give promising results with better clarity and better accuracy.



The first step is to establish that something is possible; then probability will occur.

Elon Musk

Research Experience



Attribute Based Video Captioning   Report   Demo

Course Project with Prof. Harish Karnick, G. Verma, C. Goswami, G. S. S. S. Rao, L. Taneja, T. C. Mandan




Generating captions and summaries of visual content in an automatic manner is a prominent research problem with immense academic and industrial applications. Humans possess the remarkable capacity to understand the perceived visual content and describe it in terms of natural language. Natural language description of this visual data can help improve its semantic understanding as well as enable learning from it to tackle numerous problems of immense practical significance; for example, efficient and accurate retrieval of images and videos, labeled dataset synthesis for specific multimedia problems, affective multimedia analysis and obscenity detection. Thus, it is imperative to caption or summarize the online visual content and the sheer magnitude of this content implies that this task needs to be done in an automatic fashion. Thus, this problem has immense practical importance in addition to its academic significance. The extensively-studied task of Image Captioning tries to achieve this, but the related problem of Video Captioning and Summarization is relatively new and less-studied, which motivated us to investigate it as the course project.





Zero Shot Learning and Image Synthesis in Generative Setting   Report

Course Project with Prof. Piyush Rai, Vinayak Tantia and Independent Extension






One of the major direction of research in Artificial Intelligence (AI) is to learn from available data so that tasks can be performed on totally new unseen data. This generalizability and transfer of learned representation is one of the most attractive features of human intelligence and thus, it is natural to inquire about the models that can explain this peculiar ability. One of the problems in the area of AI that tries to address this aspect of learning is the problem of Zero Shot Learning (ZSL). There is immense need of learning techniques similar to the transfer of knowledge to new categories, which is observed in human intelligence. This motivates us to investigate the problem of Zero-Shot Learning for the project.





Online Multiclass Classification under Partial Feedback   Report

Course Project with Prof. Purushottam Kar, Hardik Parwana, Abhinav Jain





Online learning enables a system to learn and adapt on-the-fly without the need of having completely demarcated phases of training and prediction. In contrast, offline learning forces a system to undergo a separate training phase before using it for predictions. Online multiclass classification is an important problem and is frequently used in many practical applications such as automatic email categorization into user defined folders. It is a very common problem where an input needs to be labeled as a member of one of the several available classes. This problem is of immense theoretical and practical importance. We aim at designing online learning algortihms for different types of partial feedback settings and proving guarantees regarding the efficiencies of such algorithms.





Cross-Modal Deep Learning for Affective Multimedia Retrieval   Report   Demo

Project with Prof. Tanaya Guha, Gaurav Verma and Extension





we target the problem of affective multimedia retrieval and address it using techniques of cross-modal deep learning. Deep learning techniques can be applied to the ever-growing online multimedia data in order to learn meaningful information and data representations. Specifically, we aim at learning representations for emotion-centric distance-metric learning and retrieval of images and audios by performing appropriate semantic tasks. Firstly, we consider the task of Cross-Modal Correspondence Prediction in order to jointly learn the audio and image representations in unsupervised settings. Then, we consider the task of Adversarial Cross-Modal Domain Adaptation in order to jointly learn the audio and image representations in supervised settings.





Modeling Physics Underlying Visual Input   Report

Course Project with Prof. Vinay Namboodiri, G. S. S. Srinivas Rao and Extension


Understanding the motion of objects in order to predict and control their movements is one of the crucial problems in Artificial Intelligence (AI). It is evident that humans and a large number of animals possess this extraordinary ability of easily manipulating object motion using visual inputs. This seemingly simple, but stark, ability raises the natural question— how is the visual input used to infer and understand the motion of surrounding objects. It is extremely interesting and intellectually challenging to study how can representations of physics underlying the visual inputs be modeled. Also, a model of the physical constraints on the visual inputs can be used to perform simulations, forecast object interactions and generate video sequences of moving objects. In view of this, we aim to study the problem of modeling abstractions of physics that underlie given visual inputs and propose a contextual RNN-GAN based approach to learning these models.






Cross Modality Supervision Transfer based Depth Estimation   Report

Course Project with Prof. Gaurav Sharma, G. S. S. Srinivas Rao




We propose to develop a model for depth maps estimation for an RGB sensor using the approach of supervision transfer across multiple modalities including– i. RGB and ii. Depth. The problem statement can be described as follows– We want to generate a model for depth estimation for a given RGB sensor. For this task, we consider another RGBD sensor and capture image frames that have some overlap in their field of view. We aim to learn CNN based representations for Depth of the RGB sensor using informations from the two sensors using supervision transfer across the different image modalities. We next aim to learn “invertible” CNNs to get partial network that can generate depth maps.





Automatic Speech Recognition Systems for Indian Accent English   Poster   Demo

Internship at Xerox Research Center India, 2016 with Om Deshmukh, Harish Arsikere, Sonal Patil


We aim to develop Automatic Speech Recognition System for obtaining the Transcripts of Education-Domain Indian-Accent English Speech. Several challenges associated with this task are Large Speaker Variability, Lack of Correctly Labelled Transcripts, Large Variations in Accents, Large Number of Domain-Specific Out-of_Vocabulary Words, and Semi-Spontaneous Speech. These issues are tackled with Maximum A Posteriori Model Based Adaptation of GMM/HMM Parameters, Grapheme to Phoneme Training, Phonetic Confusion Matrix Based Pronunciation Models, and Language Model Interpolation. In effect, we achieve 15% improvement in Word Error Rates on the NPTEL dataset under consideration.





Conditions for Signal Constellation Optimization   Report   Code

Project with Prof. Ajit K. Chaturvedi



(Representative Images. Credits: 4-Point Constellation, 8-Point Constellation)




In this report we state an exact condition satisfied by Optimal Signal Constellations for Additive White Gaussian Noise channels. The condition is based on geometrical modeling of the Signal Constellation Optimization Problem and Euclidean geometric properties. We also aim to use the condition to obtain the sub-optimal constellations.



Believe you can and you’re halfway there.

Theodore Roosevelt

Selected Projects



Unity-Based Basic First Person Shooter Game   Video Available Upon Request

Unity/C-Sharp Based Individual Project




A Very Basic Unity/C-Sharp Based First Person Shooter Game Skeleton. Currently Built Model Consists of Unity-Assets (FPSController) for the First-Person View. Freely Available .obj Weapon Models are Incorporated. Weapon Firing, Muzzle-Flashes, Bullet Decals, Ejected Catridges are Implemented. The Hit-Registration is Implemented. Weapon Firing, Player Movement, and Player Jump Audios are Incorporated. Weapon Recoil Animation is Incorporated.





Google Cardboard Virtual and Augmented Reality Encyclopedia   Video

Microsoft Code.Fun.Do Project with G. S. S. Srinivas Rao, Neeraj Thakur, Gontu Harish




A one-day development project for the Microsoft Code.Fun.Do competition at Indian Institute of Technology, Kanpur. In the project, I contributed with Astronomy-baesd in Virtual and Augmented Reality Encyclopedia for the areas of Astronomy and Chemistry. The original data was taken from NASA's databases and was re-interpreted for the VR component of the application.





Robots and Artificial Intelligence for Factories of Future   Demo

Hindustan Unilever Competition Project with Harsh Sinha, Animesh Shastry







Proposed AI and Robotics Based Solutions for Challenges proposed by HUL, with focus on Tracking and Navigation in Factory Environments. I contributed with TensorFlow-Based Models for Multiple Object Detection and its applications to Surveillance, Human Detection and Security. Our group was the Winner of IIT Kanpur Institute-Level Round and Secured All-India Rank 2 with Seed-Funding of INR 50000.





Adversarial Attacks on Neural Networks & Robust Defense   Report

Course Project with Prof. Purushottam Kar, Afroz Alam, Nayan Deshmukh, Vinayak Tantia

The tremendous growth in contemporary machine learning has come from the advent of Deep Neural Networks and their successful applications. Their immense ability to learn feature representations from input data for modelling a particular problem has enabled their wide range of applicable across a variety of areas. However, the huge success of Neural Networks is qualified by the fact that they work in “black-box fashion” as of now and hence, problem comes when a Neural Architecture output goes wrong. Very recently, some of the works pointed out that a very small magnitude targeted noise can be mixed with input images so that the classification of the net changes by huge amounts. Also, the net predicts the wrong class with extremely high confidence. This shows that despite the astonishing power of Deep Architectures, they are highly vulnerable to targeted attacks. This observation raises a very crucial point– Deep Architectures are extremely good on average, but highly unreliable in individual predictions. This vulnerability of Deep Nets points in the direction to study Adversarial Attacks against them and subsequently make them robust against such attacks.





Semantic Image Completion   Report   Demo

Survey Based Course Project with Prof. Tanaya Guha, G. S. S. Srinivas Rao, and Abhinav Jain





In this project, we explore the problem of semantic image completion. We propose to study two different approaches to address this problem. The first is exemplar-based approach and we consider PatchMatch-based inpainting for this study. The other is the data-driven approach and inpainting using a million images is considered for this study. We generate a huge dataset of images to study the later approach. We study the performances of these approaches to compare and contrast between them and to conclude about their applicability to semantic image completion.





Improving Recommender Systems by Reducing Hubnes   Report   Poster

Survey based Course Project with Prof. Piyush Rai, Vamsi Krishna


The problem of Hubs occur while dealing with high dimensional data. Due to which hubs tend to be in the nearest neighbors of most of the points which reduces the performance of nearest neighbor methods. Therefore, the nearest neighbor relations become asymmetric and hence the anti hubs will not be in the nearest neighbors of most of other points. In this project, we analyze the problem of hubs and remove the hubness in the data by making the nearest neighbor relations close to being symmetric.





When Artificial Intelligence Meets Ethics...   Report   Demo

Philosophy Project with Prof. Vineet Sahu

Artificial Intelligence (AI), with its current wave of Machine Learning (ML), is unarguably one of the most rapidly expanding fields of human enquiry. Though initially confined to purely academic interests, the contemporary developments in this area have resulted in an environment of immense possibilities of its applications to all personal as well as social aspects. When a technology starts to effect the human condition, it gains an ethical dimension. The aim of this paper is to have an overall view at various ethical aspects associated with AI, and create a platform for a philosophical dialogue for this relatively new but highly relevant, rapidly growing and extremely volatile technology of the human future.





Argument Against Sloppy Bartender Problem   Short Report

Philosophy Project with Prof. Philose Koshy



(Representative Image. Credits: Photo by Alejandro Sotillet on Unsplash)



Principle of Indifference addresses the issue of assigning probability measures to events in the partial logical entailment framework of Keynes. The tricky part of this seemingly conspicuous argument lies in the way of decomposing the totality of possibilities into atomic events. It is well-known that Principle of Indifference leads to several problems as well as limitations and it is not clear in its meaning as well. One of the problems is the so-called Sloppy Bartender Problem. We claim that this type of inconsistecy is caused by considering an ill-defined experiment scenario and then using the ambiguous nature of the scenario to demonstrate the shortcomings of the Principle of Indifference. Thus, we claim that this counter-example does not refute the principle and that the approach is somewhat fallacious.