Building Vision and Language Models with Implicit Supervision and Increased Efficiency

171740-Thumbnail Image.png
Description
An important objective of AI is to understand real-world observations and build up interactive communication with people. The ability to interpret and react to the perception reveals the important necessity of developing such a system across both the modalities of

An important objective of AI is to understand real-world observations and build up interactive communication with people. The ability to interpret and react to the perception reveals the important necessity of developing such a system across both the modalities of Vision (V) and Language (L). Although there have been massive efforts on various VL tasks, e.g., Image/Video Captioning, Visual Question Answering, and Textual Grounding, very few of them focus on building the VL models with increased efficiency under real-world scenarios. The main focus of this dissertation is to comprehensively investigate the very uncharted efficient VL learning, aiming to build lightweight, data-efficient, and real-world applicable VL models. The proposed studies in this dissertation take three primary aspects into account when it comes to efficient VL, 1). Data Efficiency: collecting task-specific annotations is prohibitively expensive and so manual labor is not always attainable. Techniques are developed to assist the VL learning from implicit supervision, i.e., in a weakly- supervised fashion. 2). Continuing from that, efficient representation learning is further explored with increased scalability, leveraging a large image-text corpus without task-specific annotations. In particular, the knowledge distillation technique is studied for generic Representation Learning which proves to bring substantial performance gain to the regular representation learning schema. 3). Architectural Efficiency. Deploying the VL model on edge devices is notoriously challenging due to their cumbersome architectures. To further extend these advancements to the real world, a novel efficient VL architecture is designed to tackle the inference bottleneck and the inconvenient two-stage training. Extensive discussions have been conducted on several critical aspects that prominently influence the performances of compact VL models.
Date Created
2022
Agent

Simultaneous Two-Color Lasing in a Single CdSSe Heterostructure Nanosheet

129642-Thumbnail Image.png
Description

The ability of a single monolithic semiconductor structure to emit or lase in a broad spectrum range is of great importance for many applications such as solid-state lighting and multi-spectrum detection. But spectral range of a laser or light-emitting diode

The ability of a single monolithic semiconductor structure to emit or lase in a broad spectrum range is of great importance for many applications such as solid-state lighting and multi-spectrum detection. But spectral range of a laser or light-emitting diode made of a given semiconductor is typically limited by its emission or gain bandwidth. Due to lattice mismatch, it is typically difficult to grow thin film or bulk materials with very different bandgaps in a monolithic fashion. But nanomaterials such as nanowires, nanobelts, nanosheets provide a unique opportunity. Here we report our experimental results demonstrating simultaneous lasing in two visible colors at 526 and 623 nm from a single CdSSe heterostructure nanosheet at room temperature. The 97 nm wavelength separation of the two colors is significantly larger than the gain bandwidth of a typical single II-VI semiconductor material. Such lasing and light emission in a wide spectrum range from a single monolithic structure will have important applications mentioned above.

Date Created
2013-10-28
Agent