Vinay Bettadapura (Photo)
Vinay Bettadapura
I am a Software Engineer at Google, working with the Machine Perception group (under Research and Machine Intelligence) on activity and event understanding from videos, images and sensor data.

Ph.D., Computer Science
Advisor: Prof. Irfan Essa
Computational Perception Lab (CPL)
College of Computing (CoC), Georgia Tech

[CV] CV | Research Interests | Research Projects | Dissertation | Publications | Work | Courses | Course Projects | Awards | Contact |   Instagram LinkedIn Google+ Facebook
Research Interests
My research interests are in the areas of Computer Vision, Machine Learning and Ubiquitous Computing. In particular, I am interested in the role of "context" in scene understanding.

Top

Research Projects
Leveraging Contextual Cues for Generating Basketball Highlights
Leveraging Contextual Cues for Generating Basketball Highlights
The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cues derived from the environment that the game is being played in. The contextual cues provide information about the excitement levels in the game, which can be ranked and selected to automatically produce high-quality basketball highlights. We introduce a new dataset of 25 NCAA games along with their play-by-play stats and the ground-truth excitement data for each basket. We explore the informativeness of five different cues derived from the video and from the environment through user studies. Our experiments show that for our study participants, the highlights produced by our system are comparable to the ones produced by ESPN for the same games.

Here is the ACM MM 2016 Project Webpage (PDF and Video Demo)

Accepted for oral presentation

Automated Video-Based Assessment of Surgical Skills for Training and Evaluation in Medical Schools
Automated Video-Based Assessment of Surgical Skills for Training and Evaluation in Medical Schools
Routine evaluation of basic surgical skills in medical schools requires considerable time and effort from supervising faculty. For each surgical trainee, a supervisor has to observe the trainees inperson. Alternatively, supervisors may use training videos, which reduces some of the logistical overhead. All these approaches however are still incredibly time consuming and involve human bias. In this paper, we present an automated system for surgical skills assessment by analyzing video data of surgical activities. Method: We compare different techniques for video-based surgical skill evaluation. We use techniques that capture the motion information at a coarser granularity using symbols or words, extract motion dynamics using textural patterns in a frame kernel matrix, and analyze fine-grained motion information using frequency analysis. We were successfully able to classify surgeons into different skill levels with high accuracy. Our results indicate that fine-grained analysis of motion dynamics via frequency analysis is most effective in capturing the skill relevant information in surgical videos. Conclusion: Our evaluations show that frequency features perform better than motion texture features, which in-turn perform better than symbol/word based features. Put succinctly, skill classification accuracy is positively correlated with motion granularity as demonstrated by our results on two challenging video datasets.

Here is the IJCARS 2016 journal paper [IJCARS 16]

Discovering Picturesque Highlights from Egocentric Vacation Videos
Discovering Picturesque Highlights from Egocentric Vacation Videos
We present an approach for identifying picturesque highlights from large amounts of egocentric video data. Given a set of egocentric videos captured over the course of a vacation, our method analyzes the videos and looks for images that have good picturesque and artistic properties. We introduce novel techniques to automatically determine aesthetic features such as composition, symmetry and color vibrancy in egocentric videos and rank the video frames based on their photographic qualities to generate highlights. Our approach also uses contextual information such as GPS, when available, to assess the relative importance of each geographic location where the vacation videos were shot. Furthermore, we specifically leverage the properties of egocentric videos to improve our highlight detection. We demonstrate results on a new egocentric vacation dataset which includes 26.5 hours of videos taken over a 14 day vacation that spans many famous tourist destinations and also provide results from a user-study to access our results.

Here is the WACV 2016 paper [WACV 16]

Automated Assessment of Surgical Skills Using Frequency Analysis
Automated Assessment of Surgical Skills Using Frequency Analysis
We present an automated framework for visual assessment of the expertise level of surgeons using the OSATS (Objective Structured Assessment of Technical Skills) criteria. Video analysis techniques for extracting motion quality via frequency coefficients are introduced. The framework is tested on videos of medical students with different expertise levels performing basic surgical tasks in a surgical training lab setting. We demonstrate that transforming the sequential time data into frequency components effectively extracts the useful information differentiating between different skill levels of the surgeons. The results show significant performance improvements using DFT and DCT coefficients over known state-of-the-art techniques.

Here is the MICCAI 2015 paper [MICCAI 15]

Predicting Daily Activities From Egocentric Images Using Deep Learning
Predicting Daily Activities From Egocentric Images Using Deep Learning
We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person's activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.

Here is the ISWC 2015 Project Webpage (with PDF)

Egocentric Field-of-View Localization Using First-Person Point-of-View Devices
Egocentric Field-of-View Localization Using First-Person Point-of-View Devices
We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization. We define egocentric FOV localization as capturing the visual information from a person’s field-of-view in a given environment and transferring this information onto a reference corpus of images and videos of the same space, hence determining what a person is attending to. Our method matches images and video taken from the first-person perspective with the reference corpus and refines the results using the first-person’s head orientation information obtained using the device sensors. We demonstrate single and multi-user egocentric FOV localization in different indoor and outdoor environments with applications in augmented reality, event understanding and studying social interactions.

Here is the WACV 2015 Project Webpage (PDF, Poster and Video Demo)

We won the best paper award at WACV 2015

Leveraging Context to Support Automated Food Recognition in Restaurants
Leveraging Context to Support Automated Food Recognition in Restaurants
The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat. In this paper, we study how taking pictures of what we eat in restaurants can be used for the purpose of automating food journaling. We propose to leverage the context of where the picture was taken, with additional information about the restaurant, available online, coupled with state-of-the-art computer vision techniques to recognize the food being consumed. To this end, we demonstrate image-based recognition of foods eaten in restaurants by training a classifier with images from restaurant’s online menu databases. We evaluate the performance of our system in unconstrained, real-world settings with food images taken in 10 restaurants across 5 different types of food (American, Indian, Italian, Mexican and Thai).

Here is the WACV 2015 Project Webpage (PDF and Poster).

Video Based Assessment of OSATS Using Sequential Motion Textures
Video Based Assessment of OSATS Using Sequential Motion Textures
We present a fully automated framework for video based surgical skill assessment that incorporates the sequential and qualitative aspects of surgical motion in a data-driven manner. We replicate Objective Structured Assessment of Technical Skills (OSATS) assessments, which provides both an overall and in-detail evaluation of basic suturing skills required for surgeons. Video analysis techniques are introduced that incorporate sequential motion aspects into motion textures. We also demonstrate significant performance improvements over standard bag-ofwords and motion analysis approaches. We evaluate our framework in a case study that involved medical students with varying levels of expertise performing basic surgical tasks in a surgical training lab setting.

Here is the M2CAI 2014 paper [M2CAI 14]

We received an honorable mention (2nd place) at M2CAI 2014

Activity Recognition From Videos Using Augmented Bag-of-Words
Activity Recognition From Videos Using Augmented Bag-of-Words
We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

Here is the CVPR 2013 Project Webpage (PDF and Code).

Detecting Insider Threats
Detecting Insider Threats in a Real Corporate Database of Computer Usage Activity
This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations’ information systems. Our system combines structural and semantic information from a real corporate database of monitored activity on their users’ computers to detect independently developed red team inserts of malicious insider activities. We have developed and applied multiple algorithms for anomaly detection based on suspected scenarios of malicious insider behavior, indicators of unusual activities, high-dimensional statistical patterns, temporal sequences, and normal graph evolution. Algorithms and representations for dynamic graph processing provide the ability to scale as needed for enterpriselevel deployments on real-time data streams. We have also developed a visual language for specifying combinations of features, baselines, peer groups, time periods, and algorithms to detect anomalies suggestive of instances of insider threat behavior. We defined over 100 data features in seven categories based on approximately 5.5 million actions per day from approximately 5,500 users. We have achieved area under the ROC curve values of up to 0.979 and lift values of 65 on the top 50 user-days identified on two months of real data.

Here is the KDD 2013 paper [KDD 13]

Activity Recognition Through IMS
Recognizing Water-Based Activities in the Home Through Infrastructure-Mediated Sensing
Activity recognition in the home has been long recognized as the foundation for many desirable applications in fields such as home automation, sustainability, and healthcare. However, building a practical home activity monitoring system remains a challenge. Striking a balance between cost, privacy, ease of installation and scalability continues to be an elusive goal. In this paper, we explore infrastructure-mediated sensing combined with a vector space model learning approach as the basis of an activity recognition system for the home. We examine the performance of our single-sensor water-based system in recognizing eleven high-level activities in the kitchen and bathroom, such as cooking and shaving. Results from two studies show that our system can estimate activities with overall accuracy of 82.69% for one individual and 70.11% for a group of 23 participants. As far as we know, our work is the first to employ infrastructuremediated sensing for inferring high-level human activities in a home setting.

Here is the UbiComp 2012 paper [UbiComp 12]

Activity Recognition
Activity Recognition from Wide Area Motion Imagery
This project aims at recognizing anomalous activities from aerial videos. My work is a part of the Persistent Stare Exploitation and Analysis System (PerSEAS) research program which aims to develop software systems that can automatically and interactively discover actionable intelligence from airborne, wide area motion imagery (WAMI) in complex urban environments.

A glimpse of this project can be seen here.

Electronics Field Guide
Leafsnap: An Electronics Field Guide

This project aims to simplify the process of plant species identification using visual recognition software on mobile devices such as the iPhone. This work is part of an ongoing collaboration with researchers at Columbia University, University of Maryland and the Smithsonian Institution. My major contribution to this project was the server's database integration and management. I also worked on stress-testing the backend server to improve its performance and scalability.

The free iPhone app can be downloaded from the app-store. Here is the project webpage and here is a video explaining the app's usage. Finally, Leafsnap in the news!


Face Verification
Visual Attributes for Face Verification

The project involves face verification in uncontrolled settings with non-cooperative subjects. The method is based on attribute (binary) classifiers that are trained to recognize the degrees of various visual attributes like gender, race, age, etc. Here is the project page.

I was a part of this research at Columbia University from December 2009 to May 2010. I mainly worked on Boosting to improve the classifiers' performance.

Face Rec
Face Recognition Using Gabor Wavelets

The choice of the object representation is crucial for an effective performance of cognitive tasks such as object recognition, fixation, etc. Face recognition is an example of advanced object recognition. In our project we demonstrate the use of Gabor wavelets for efficient face representation. Face recognition is influenced by several factors such as shape, reflectance, pose, occlusion and illumination which make it even more difficult. Today there exist many well known techniques to try to recognize a face. We want to introduce the Gabor wavelets for an efficient face recognition system simulating human perception of objects and faces. A face recognition system could greatly aid in the process of searching and classifying a face database and at a higher level help in identification of possible threats to security. The purpose of this study is to demonstrate that it is technically feasible to scan pictures of human faces and compare them with ID photos hosted in a centralized database using Gabor wavelets.

This was my undergraduate thesis supervised by Dr. C. N. S. Ganesh Murthy, Principal Scientist at Mercedes-Benz Research and Development, Bangalore, India. Here is the project report [FACE REC]

Top

Ph.D. Dissertation
Leveraging Contextual Cues for Dynamic Scene Understanding
Environments with people are complex, with many activities and events that need to be represented and explained. The goal of scene understanding is to either determine what objects and people are doing in such complex and dynamic environments, or to know the overall happenings, such as the highlights of the scene. The context within which the activities and events unfold provides key insights that cannot be derived by studying the activities and events alone. In this thesis, we show that this rich contextual information can be successfully leveraged, along with the video data, to support dynamic scene understanding.

We categorize and study four different types of contextual cues: (1) spatiotemporal context, (2) egocentric context, (3) geographic context, and (4) environmental context, and show that they improve dynamic scene understanding tasks across several different application domains.

We start by presenting data-driven techniques to enrich spatio-temporal context by augmenting Bag-of-Words models with temporal, local and global causality information and show that this improves activity recognition, anomaly detection and scene assessment from videos. Next, we leverage the egocentric context derived from sensor data captured from first-person point-of-view devices to perform field-of-view localization in order to understand the user’s focus of attention. We demonstrate single and multi-user field-of-view localization in both indoor and outdoor environments with applications in augmented reality, event understanding and studying social interactions. Next, we look at how geographic context can be leveraged to make challenging “in-the-wild” object recognition tasks more tractable using the problem of food recognition in restaurants as a case-study. Finally, we study the environmental context obtained from dynamic scenes such as sporting events, which take place in responsive environments such as stadiums and gymnasiums, and show that it can be successfully used to address the challenging task of automatically generating basketball highlights. We perform comprehensive user-studies on 25 full-length NCAA games and demonstrate the effectiveness of environmental context in producing highlights that are comparable to the highlights produced by ESPN.

Here is a PDF of my dissertation [DISSERTATION]

Top

Publications
Link to my Google Scholar page.
  1. V. Bettadapura, C. Pantofaru, I. Essa, "Leveraging Contextual Cues for Generating Basketball Highlights", ACM Multimedia Conference (ACM-MM 2016), Amsterdam, Netherlands, October 2016. [Oral] [Acceptance Rate: 20% (52/650)] [ACM-MM 16] [Project Webpage] [arXiv]
  2. A. Zia, Y. Sharma, V. Bettadapura, E. Sarin, T. Ploetz, M. Clements, I. Essa, "Automated Video-Based Assessment of Surgical Skills for Training and Evaluation in Medical Schools", International Journal of Computer Assisted Radiology and Surgery (IJCARS), 11(9), pp. 1623-1636, 2016. [IJCARS]
  3. V. Bettadapura, D. Castro, I. Essa, "Discovering Picturesque Highlights From Egocentric Vacation Videos", IEEE Winter Conference on Applications of Computer Vision (WACV 2016), Lake Placid, USA, March 2016. [Acceptance Rate: 34% (71/207)] [WACV 16] [Project Webpage] [arXiv]
  4. A. Zia, Y. Sharma, V. Bettadapura, E. Sarin, I. Essa, "Automated Assessment of Surgical Skills Using Frequency Analysis", 18th International Conference on Medical Image Computing and Computer Assisted Interventions (MICCAI 2015), Munich, Germany, October 2015. [Acceptance Rate < 30.0%] [MICCAI 15]
  5. D. Castro, S. Hickson, V. Bettadapura, E. Thomaz, G. Abowd, H. Christensen, I. Essa, "Predicting Daily Activities From Egocentric Images Using Deep Learning", 19th International Symposium on Wearable Computing (ISWC 2015), Osaka, Japan, September 2015. [Acceptance Rate (for full papers): 10.7% (13/121)] [ISWC 15] [Project Webpage] [arXiv]
  6. V. Bettadapura, I. Essa, C. Pantofaru, "Egocentric Field-of-View Localization Using First-Person Point-of-View Devices", IEEE Winter Conference on Applications of Computer Vision (WACV 2015), Hawaii, USA, January 2015. [Acceptance Rate: 36.7% (156/425)] [WACV 15] [Project Webpage] [arXiv] (Won the best paper award)
  7. V. Bettadapura, E. Thomaz, A. Parnami, G. Abowd, I. Essa, "Leveraging Context to Support Automated Food Recognition in Restaurants", IEEE Winter Conference on Applications of Computer Vision (WACV 2015), Hawaii, USA, January 2015. [Acceptance Rate: 36.7% (156/425)] [WACV 15] [Project Webpage] [arXiv]
  8. Y. Sharma, V. Bettadapura, et al., "Video Based Assessment of OSATS Using Sequential Motion Textures", 5th MICCAI Workshop on Modeling and Monitoring of Computer Assisted Interventions (M2CAI 2014), Boston, USA, September 2014. [KDD 2013] (Received an honorable mention - 2nd place)
  9. T. E. Senator, et al., "Detecting Insider Threats in a Real Corporate Database of Computer Usage Activity", 19th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2013), Chicago, USA, August 2013. [Acceptance Rate: 17.4% (126/726)] [KDD 2013]
  10. V. Bettadapura, G. Schindler, T. Ploetz, I. Essa, "Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition", 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), Portland, USA, June 2013. [Acceptance Rate: 25.2% (472/1870)] [CVPR 13] [Project Webpage] [arXiv]
  11. E. Thomaz, V. Bettadapura, G. Reyes, M. Sandesh, G. Schindler, T. Ploetz, G. Abowd, I. Essa, "Recognizing Water-Based Activities in the Home Through Infrastructure-Mediated Sensing", 14th ACM Conference on Ubiquitous Computing (UbiComp 2012), pp. 85-94, Pittsburgh, USA, September 2012. [Acceptance Rate: 19% (58/301)] [UbiComp 12]
  12. V. Bettadapura, "Face Expression Recognition and Analysis: The State of the Art", Tech Report, arXiv:1203.6722, April 2012. [FACE EXP REC] [arXiv]
  13. V. Bettadapura, D. R. Sai Sharan, "Pattern Recognition with Localized Gabor Wavelet Grids", IEEE Conference on Computational Intelligence and Multimedia Applications, vol. 2, pp. 517-521, Sivakasi, India, December 2007. [ICCIMA 07]
  14. V. Bettadapura, B. S. Shreyas, C. N. S Ganesh Murthy, "A Back Propagation Based Face Recognition Model Using 2D Symmetric Gabor Features", IEEE Conference on Signal Processing, Communications and Networking, pp. 433-437, Chennai, India, February 2007. [ICSCN 07]
  15. V. Bettadapura, B. S. Shreyas, "Face Recognition Using Gabor Wavelets", 40th IEEE Asilomar Conference on Signals, Systems and Computers, pp. 593-597, Pacific Groves (Monterey Bay), California, USA, October 2006. [ASILOMAR 06]

Top

Work Experience
  1. Google: Software Engineer (January 2016 - Present): Working on event and video understanding, and other related Computer Vision and Machine Learning technologies.
  2. Google: Software Engineering Intern (August 2013 - December 2015): Worked on event and video understanding using multi-modal data (videos, images and sensor data).
  3. Google Geo: Software Engineering Intern (May 2013 - August 2013): Worked with the Google Earth and Maps team on improving the quality of the satellite imagery.
  4. Google Research: Software Engineering Intern (May 2012 - August 2012): Worked with the Video Content Analysis team in developing algorithms and building systems for object detection and categorization in YouTube videos.
  5. Subex: Software Engineer (June 2006 - December 2008): Design and development of telecommunication fraud protection and anomaly detection systems. Worked on the mathematical modeling of user behaviors, data mining to detect anomalies in the signals and the design and development of the back-end server, database and web interfaces.

Top

Courses
Spring 2015
Introduction to Enterprise Computing (CS 6365)
Prof. Calton Pu
Fall 2011
Knowledge-Based AI (CS 7637)

Numerical Linear Algebra (MATH 6643)

Special Problems (CS 8903)
Prof. Ashok Goel

Prof. Silas Alben

Prof. Irfan Essa
Spring 2011
Machine Learning (CS 7641)

Special Problems (CS 8903)
Prof. Charles Isbell

Prof. Irfan Essa
Fall 2010
Computer Vision (CS 7495)

Grad Studies (CS 7001)

Special Problems (CS 8903)
Prof. Jim Rehg

Prof. Gregory Abowd and Prof. Nick Feamster

Prof. Irfan Essa
Spring 2010
Operating Systems (COMS W4118)

Projects in Computer Science (COMS E6901)

Research Assistantship (COMS E9910)
Prof. Junfeng Yang

Prof. Peter Belhumeur

Prof. Peter Belhumeur
Fall 2009
Analysis of Algorithms (COMS W4231)

Biometrics (COMS W4737)

Projects in Computer Science (COMS E6901)
Prof. Clifford Stein

Prof. Peter Belhumeur

Prof. Peter Belhumeur
Spring 2009
Programming Languages and Translators (COMS W4115)

Computational Aspects of Robotics (COMS W4733)

Visual Interfaces to Computers (COMS W4735)

Machine Learning (COMS 4771)
Prof. Alfred Aho

Prof. Peter Allen

Prof. John Kender

Prof. Tony Jebara

Top

(Selected) Course Projects
Geotagging
Automatic Geo-Tagging of Photos Using Google Street View Images

The goal of this project was to develop a system that automatically geo-tags an image by comparing it with a large collection of geo-tagged images (Google Street View images, in our case). SIFT descriptors are computed for the images and the matching is done using a KD-Tree. This project is an implementation based on the work of Schindler et al. (CVPR 2007) and Zamir et al. (ECCV 2010). This project was done as a part of the 'Computer Vision' course at Georgia Tech (instructor: Prof. Jim M. Rehg).

Here is the project presentation [GEO-TAG]

Raven's Test
Solving Raven's Matrices Using Visual and Propositional Reasoning

The goal of this project is to learn about the close relationship between learning and problem solving. In this project, we explore this relationship by considering several problems from the Raven's test of intelligence (Raven's matrices). We develop techniques to solve the Raven's matrices using both propositional and visual reasoning. This project was done as a part of the 'Knowledge Based AI' course at Georgia Tech (instructor: Prof. Ashok K. Goel).

Here are the project reports: Solving the Raven's matrices using Propositional Reasoning [GEO-TAG] using Visual Reasoning [GEO-TAG] and a combination of Visual and Propositional Reasoning [GEO-TAG]

SNOW Logo
The SN*W Programming Language

The SN*W Programming Language is a special purpose declarative language designed for Genetic Programming by allowing programmers to easily harness the power of Genetic Algorithms (GA). A SN*W program is a simple description of an organism structure along with simple methods for construction, mutation, selection and recombination. The SN*W compiler translates these events into a full environmental simulation. The language was developed by five of us as a part of the Programming Languages and Translators course at Columbia under the guidance of Prof. Alfred V. Aho.

Here is the complete SN*W Report (includes the Reference Manual and Tutorial) [CV]

Top

Awards

Top

Contact
Email:
vinay [at] gatech.edu
Also on:
Instagram LinkedIn Google+ Facebook

Top

CS@CU Logo COC@GT Logo RIM@GT Logo GOOGLE Logo

Valid HTML 4.01 Transitional Valid CSS!