Omar Emara
-
Working Project Title:
Multimodal Learning for Egocentric Computer Vision
-
Academic Background
MSc Computational Finance, King's College London (2020-2021)
General Profile:
I am an AI enthusiast with 2 MSc degrees and 3 years of work experience in Machine Learning. I have worked on several research and applied machine learning projects which have provided me with extensive experience with implementing all the different stages in the machine learning development lifecycle. I have decided to pursue a PhD in AI to follow my passion for research that is driven by natural curiosity that leads me to continuously ask and research for answers to why algorithms work the way they do, why certain model architectures are associated with certain limitations and certain applications and how the state of the art AI could be improved. Further, I continuously read published research papers to stay up to date with the field and found that reading papers was the most enjoyable aspect of my knowledge development which further motivated me to pursue a PhD in AI.
Research Project Summary:
Enabling AI systems to possess human level intelligence is a highly desirable feature sought after by researchers in both the academia and the industry. Learning from first person perspective has been shown to aid the process of learning in humans. Therefore, it holds the potential to allow AI system to become smarter. Moreover, humans utilise their different sensory inputs as clues on daily basis for different purposes such navigating their environment and learning new skills. As a result, there have been a plethora of multimodal egocentric datasets published in recent years such as EPIC-KITCHENS, EGO4D and EgoExo4D. These datasets introduce videos recorded from first person perspective accompanied with modalities such as text in the form of narrations and expert commentary and both mono and stereo audio.
This project will focus on training AI systems on multimodal computer vision datasets to address unsolved challenges in related domains such as sound source localisation in the audio-visual correspondence domain. Furthermore, this project will explore the potential of using additional modalities, such as text, to help in solving such problems and will introduce ablation studies to determine the value of the different modalities in solving the challenges of interest.
Supervisors:
- Professor Dima Damen, School of Computer Science
- Dr Michael Wray, School of Computer Science
Website: