The invention relates to the technical field of
artificial intelligence, and discloses a PDF document table extraction method and device, equipment and a computer readable storage medium. The method comprises the steps of obtaining a to-be-identified PDF document, and
processing the to-be-identified PDF document; preprocessing the processed PDF document, inputting the preprocessed PDF document into a
convolutional neural network, outputting a feature map, inputting the feature map into an RPN region candidate network, and determining a table region; carrying out preprocessing and
feature extraction on the table area based on the OCR
character recognition technology, obtaining a feature picture, carrying out character detection on the feature picture, determining a text area, carrying out
character recognition on the text area, determining text informatio, wherein the text information comprises text position information and text content information; and determining structure informationof the table according to the text coordinate information, dividing each
cell of the table based on the structure information, and filling each corresponding
cell of the table with a text corresponding to the text content information. According to the method and the device, the accuracy of PDF document table extraction is improved.