Robert Pless
Professor of Engineering and Applied Sciences
George Washington University
Area of Expertise: Trustworthy Computer Vision
Robert Pless is a professor in the School of Engineering and Applied Sciences at George Washington University. He conducts research in the area of computer vision with applications to environmental science, medical imaging, robotics and virtual reality. He applies a data driven approach to understanding motion and change in video with a current focus on long term time-lapse imagery.
-
Xuan, H., Stylianou, A., Liu, X. and Pless, R., 2020. Hard Negative Examples are Hard, but Useful. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020.
Abstract: Triplet loss is an extremely common approach to distance metric learning. Representations of images from the same class are optimized to be mapped closer together in an embedding space than representations of images from different classes. Much work on triplet losses focuses on selecting the most useful triplets of images to consider, with strategies that select dissimilar examples from the same class or similar examples from different classes. The consensus of previous research is that optimizing with the hardest negative examples leads to bad training behavior. That’s a problem – these hardest negatives are literally the cases where the distance metric fails to capture semantic similarity. In this paper, we characterize the space of triplets and derive why hard negatives make triplet loss training fail. We offer a simple fix to the loss function and show that, with this fix, optimizing with hard negative examples becomes feasible. This leads to more generalizable features, and image retrieval results that outperform state of the art for datasets with high intra-class variance. Code is available here.
-
Comparing Deep Learning Approaches for Understanding Genotype× Phenotype Interactions In Biomass Sorghum. Zeyu Zhang, Madison Pope, Nadia Shakoor, Robert Pless, Todd C Mockler, Abby Stylianou, Frontiers in AI.
Abstract: We explore the use of deep convolutional neural networks (CNNs) trained on overhead imagery of biomass sorghum to ascertain the relationship between single nucleotide polymorphisms (SNPs), or groups of related SNPs, and the phenotypes they control. We consider both CNNs trained explicitly on the classification task of predicting whether an image shows a plant with a reference or alternate version of various SNPs as well as CNNs trained to create data-driven features based on learning features so that images from the same plot are more similar than images from different plots, and then using the features this network learns for genetic marker classification. We characterize how efficient both approaches are at predicting the presence or absence of a genetic markers, and visualize what parts of the images are most important for those predictions. We find that the data-driven approaches give somewhat higher prediction performance, but have visualizations that are harder to interpret; and we give suggestions of potential future machine learning research and discuss the possibilities of using this approach to uncover unknown genotype × phenotype relationships.
-
Stylianou, A., Xuan, H., Shende, M., Brandt, J., Souvenir, R. and Pless, R., 2019, July. Hotels-50k: A Global Hotel Recognition Dataset. In Proceedings of the AAAI Conference on Artificial Intelligence.
Abstract: Recognizing a hotel from an image of a hotel room is important for human trafficking investigations. Images directly link victims to places and can help verify where victims have been trafficked, and where their traffickers might move them or others in the future. Recognizing the hotel from images is challenging because of low image quality, uncommon camera perspectives, large occlusions (often the victim), and the similarity of objects (e.g., furniture, art, bedding) across different hotel rooms. To support efforts towards this hotel recognition task, we have curated a dataset of over 1 million annotated hotel room images from 50,000 hotels. These images include professionally captured photographs from travel websites and crowd-sourced images from a mobile application, which are more similar to the types of images analyzed in real-world investigations. We present a baseline approach based on a standard network architecture and a collection of data-augmentation approaches tuned to this problem domain.
-
Upchurch, P., Gardner, J., Pleiss, G., Pless, R., Snavely, N., Bala, K. and Weinberger, K., 2017. Deep Feature Interpolation For Image Content Changes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Abstract: We propose Deep Feature Interpolation (DFI), a new datadriven baseline for automatic high-resolution image transformation. As the name suggests, DFI relies only on simple linear interpolation of deep convolutional features from pre-trained convnets. We show that despite its simplicity, DFI can perform high-level semantic transformations like “make older/younger”, “make bespectacled”, “add smile”, among others, surprisingly well—sometimes even matching or outperforming the state-of-the-art. This is particularly unexpected as DFI requires no specialized network architecture or even any deep network to be trained for these tasks. DFI therefore can be used as a new baseline to evaluate more complex algorithms and provides a practical answer to the question of which image transformation tasks are still challenging after the advent of deep learning.
-
Black, S., Stylianou, A., Pless, R. and Souvenir, R., 2022. Visualizing Paired Image Similarity in Transformer Networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.
Abstract: Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available here.