The invention discloses a multi-modal three-dimensional point cloud segmentation system and method. According to the invention, the good fusion of modal data can be realized; a priori mask is introduced, robustness of an obtained scene segmentation result is improved, and the high segmentation precision is obtained. For different scenes, such as toilets, meeting rooms and offices, a good prediction result can be obtained, and the model has good generalization. For an unused skeleton network used for extracting point cloud features, a feature and decision fusion module can be attempted to be utilized, and the precision is improved; if calculation conditions allow, more points can be tried, and a larger area can be utilized, for example, the number of used points and the size of a scene areaare increased by the same multiple, so that the receptive field of the whole model is improved, and the perceptual capacity of the model to the whole scene is improved.