The entire network structure of Crossmodal Transformer

04/29/2021
by   Meng Li, et al.
0

Since the mapping relationship between definitized intra-interventional 2D X-ray and undefined pre-interventional 3D Computed Tomography(CT) is uncertain, auxiliary positioning devices or body markers, such as medical implants, are commonly used to determine this relationship. However, such approaches can not be widely used in clinical due to the complex realities. To determine the mapping relationship, and achieve a initializtion post estimation of human body without auxiliary equipment or markers, a cross-modal matching transformer network is proposed to matching 2D X-ray and 3D CT images directly. The proposed approach first deep learns skeletal features from 2D X-ray and 3D CT images. The features are then converted into 1D X-ray and CT representation vectors, which are combined using a multi-modal transformer. As a result, the well-trained network can directly predict the spatial correspondence between arbitrary 2D X-ray and 3D CT. The experimental results show that when combining our approach with the conventional approach, the achieved accuracy and speed can meet the basic clinical intervention needs, and it provides a new direction for intra-interventional registration.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset