A visual recognition of user interface objects on computer to recognize and localize objects on a computer screen such as input fields, buttons, icons, check boxes, text, and/or any other basic elements. A system captures the screen to an image, analyzes the image, and creates a layout with new virtual objects of the screen. The system captures the screen on a time basis like a movie camera as a bitmap. From the bitmap, the system generates lists of lines found on the screen, in which each line has properties such as length, color, starting point, and angle, for example. From the lines, the system creates rectangles found on the screen. From the bitmap, the system also searches each text element on the screen, and converts each text element to Unicode text. From the bitmap, the lines, the rectangles, and the text found on the screen, the system creates virtual objects that represent a one-for-one correspondence with each object found on the screen.