Patent research is usually difficult and time-consuming. Comparing individual patents is often laborious – whether texts have to be read at length or images have to be compared meticulously. In addition, patent searchers and researchers repeatedly face language and terminology barriers when checking the patentability of their own research results. Existing patent retrieval methods are primarily based on textual searches in patent specifications and largely exclude illustrations and the references between text and image. Often, however, the innovation and exploitation potential of a patent can only be identified with the help of an illustration, and the analysis of a set of patents with similar or related innovations can be done by quickly looking at illustrations in a comparative way. Access via illustrations offers an alternative or supplement that functions independently of language and terminology and simplifies the identification of relevant results in patent specifications.
This is exactly where the ExpResViP project (Exploitation of Research Results through Visual Patent Retrieval) comes in. Together with Fraunhofer IAIS, the Institute for Information Science and Language Technology at the University of Hildesheim (IWIST) and the Leibniz Association’s office, TIB is developing a novel visual search for patent retrieval based on the automatic recognition of image similarities and text-image relationships using machine learning methods in patent specifications. The goal is to develop new methods for searching and analyzing visual elements in patents and to integrate them into a patent retrieval tool.
The development is closely related to the needs of patent searchers. By means of interviews with experts from patent offices, industry and research institutions, a needs assessment was conducted. Based on the requirements, a priority list was created, which maps the desired functions of the retrieval tool. The requirements are implemented in several iterations in close consultation with the patent experts. At the top of the list of requirements are, among other things, the ability to search not only for but also with your own images, the ability to view multiple images even of different patents, and the highlighting and linking of reference characters in the images.
State of development
At the moment the SOLR-based search engine with text indexing is already available. That means it is possible to search for text and images with text. Currently, the possibility of image indexing is optimized. For this purpose, the images are scanned by means of text recognition (OCR – optical character recognition) and automatically categorized into different image classes. The text recognition also enables the assignment of reference signs in the images with their descriptions in the text. This mapping is in turn used in the GUI to provide patent searchers with a quick overview of images and descriptions.
Based on these developments, a first prototype was developed in the last months, which was presented to the cooperating patent experts for evaluation. Further development will be based on their feedback. Until the next version of the prototype, we will work on improving the communication between the individual components of the search in order to increase the speed of indexing and searching, further develop image indexing and improve the display of search results.