Xinwang Liu, National University of Defense Technology
Liang Bai, Shanxi University
Aim and Scope
The past several years has witnessed an explosion of interest in and a dizzyingly fast development of machine learning, a subfield of artificial intelligence. Foremost among these approaches are Deep Neural Networks (DNNs) that can learn powerful feature representations with multiple levels of abstraction directly from data when large amounts of labeled data are available. One of the core computer vision areas, namely, object classification achieved a significant breakthrough result with a deep convolutional neural network and the large-scale ImageNet dataset, which is arguably what reignited the field of artificial neural networks and triggered the recent revolution in Artificial Intelligence (AI). Nowadays, artificial intelligence has spread over almost all fields of science and technology. Yet, computer vision remains in the heart of these advances when it comes to visual data analysis, offering the biggest big data and enabling advanced AI solutions to be developed.
Undoubtedly, DNNs have shown remarkable success in many computer vision tasks, such as recognizing/localizing/segmenting faces, persons, objects, scenes, actions, and gestures, and recognizing human expressions, emotions, as well as object relations and interactions in images or videos. Despite a wide range of impressive results, current DNN based methods typically depend on massive amounts of accurately annotated training data to achieve high performance and are brittle in that their performance can degrade severely with small changes in their operating environment. Generally, collecting large scale training datasets is time-consuming, costly, and in many applications even infeasible, as for certain fields only very limited or no examples at all can be gathered (such as visual inspection or medical domain), although for some computer vision tasks large amounts of unlabeled data may be relatively easy to collect, e.g., from the web or via synthesis. Nevertheless, labeling and vetting massive amounts of real-world training data is certainly difficult, expensive, or time-consuming, as it requires the painstaking efforts of experienced human annotators or experts, and in many cases prohibitively costly or impossible due to some reason, such as privacy, safety, or ethic issues (e.g., endangered species, drug discovery, medical diagnostics, and industrial inspection).
DNNs lack the ability of learning from limited exemplars and fast generalizing to new tasks. However, real-word computer vision applications often require models that are able to (a) learn with few annotated samples, and (b) continually adapt to new data without forgetting prior knowledge. By contrast, humans can learn from just one or a handful of examples (i.e., few shot learning), can do very long-term learning, and can form abstract models of a situation and manipulate these models to achieve extreme generalization. As a result, one of the next big challenges in computer vision is to develop learning approaches that can address the important shortcomings of existing methods in this regard. Therefore, to address the current inefficiency of machine learning, there is pressing need to research methods, (1) to drastically reduce requirements for labeled training data, (2) to significantly reduce the amount of data necessary to adapt models to new environments, and (3) to even use as little labeled training data as people need.
This special issue focuses on learning with fewer labels for computer vision tasks such as image classification, object detection, semantic segmentation, instance segmentation, and many others and the topics of interest include (but are not limited to) the following areas:
1）Self-supervised learning methods
2）New methods for few-/zero-shot learning
4）Life-long/continual/incremental learning methods
5）Novel domain adaptation methods
6）Semi-supervised learning methods
7）Weakly-supervised learning methods
This special track will be held in 2022 Chinese Conference on Pattern Recognition and Computer Vision (PRCV 2022). All papers should be prepared according to the PRCV2022 policy and should be submitted electronically using the conference website (https://cmt3.research.microsoft.com/PRCV2022).
To submit your paper to this special track, please choose "Special Track on Learning with Fewer Labels in Computer Vision" to create your submission in CMT.