3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Although accurate and fast point cloud classification is a fundamental task in 3D applications, it is difficult to achieve this purpose due to the irregularity and disorder of point clouds that make it challenging to achieve effective and efficient global discriminative feature learning. Lately, 3D Transformers have been adopted to improve point cloud processing. Nevertheless, massive Transformer layers tend to incur huge computational and memory costs. This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification, named 3D Convolution-Transformer Network (3DCTN), to combine the strong and efficient local feature learning ability of convolution with the remarkable global context modeling capability of Transformer. Our method has two main modules operating on the downsampling point sets, and each module consists of a multi-scale local feature aggregating (LFA) block and a global feature learning (GFL) block, which are implemented by using Graph Convolution and Transformer respectively. We also conduct a detailed investigation on a series of Transformer variants to explore better performance for our network. Various experiments on ModelNet40 demonstrate that our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
READ FULL TEXT