Manufactory Parts Finder: Focus on work, deep learning will search for you

7 min readAug 23, 2021

Meet Dave. Dave works as an engineer for a manufacturing company. His daily duties include the introduction of new technologies in production, as well as the elimination of existing breakdowns. Dave works with a lot of small parts and when a part wears out, he has to order it from the catalog for replacement. It often happens that the serial number on the part is erased or on the contrary, Dave only has a label from the part and in this case he has to look for the part in the catalog.

One of my customers was a manufacturer and they are faced with just such a problem.

In order to avoid a mechanical search for parts in the catalog, customer was interested in being able to search for parts using images, regardless of the photo of the part or its packaging.

Approach

During the analysis of the data that the customer shared, I found that they may contain photos of the parts themselves, photos of packages or labels. In some photos a serial number or mark was clearly visible, but on the other, on the contrary, they were not, and therefore I need to build a universal solution that is suitable for any type of data at the same time satisfies the following items:

the ability to quickly and easily locate / identify the parts on the place simply by using your mobile phone
search for the part regardless of the type of photo (label or the part itself)
allow users to search by keywords (serial number, color, material, name, category) for instant results

To solve these problems, I used a combination of several technologies, such as visual search, as well as optical character recognition.

Next, I will talk in more detail about each of them.

Powering the solution with Visual Search and OCR

The general Visual Search system consists of CNN model that represents each product as an encoded feature vector. The image that is being searched for will be encoded by the model to a point in multi-dimensional vector space and the nearest feature vectors to this point will be returned as a search result based on euclidean or cosine distance (K-nn Algorithm)

Sometimes pretrained networks can give good results even without additional training.

For this experiment I used CNN model based on ResNeXt architecture (ResNeXt-50 (32×4d)) pre-trained on ImageNet data set. But in this particular case the domain of the products of manufacturing companies is very differ from the images that are present in the dataset, thats why model was not able to cluster same images close to each other:

The problem of recognizing manufacturing parts can be very challenging. Microcircuits and chips, bolts and screws can differ in minor details that are not visible to the naked eye, creating hundreds of patterns with multiple products for each pattern. To solve this problem I tried Metric Learning which is the task of learning a distance function over objects.

Most popular techniques for such tasks are Triplet Loss and ArcFace, in this case was decided to use ArcFace

This technique introduce Additive Angular Margin (ArcFace) loss:

Despite most widely used loss for classification i.e softmax, arcface loss has

angular margin penalty m is between weight and feature to enhance the intra-class compactness and inter-class discrepancy.

This technique improves the discriminative power of the model and stabilizes the training process.

The training process consists of fine tuning ResNeXt-50 (32×4d) model pre-trained on ImageNet dataset within frozen layers on first epoch and the subsequent unfreezing of the layers. This allows to “fine-tune” the higher-order feature representations in the base model in order to make them more relevant for the specific task.

Dealing with Labels and Packaging

The visual model performed well when searching for images based on similar content, but what if an engineer wants to find a similar product but only has the packaging or label of this product, the visual model in this case can no longer cope alone.

To handle packaging and labels I decided to use Optical Character Recognition mode to identify keywords such as serial number, batch number, manufacturer’s brand, and then search for these words in the catalog.

In our case I used a cloud-based OCR solution with slight image preprocessing techniques.

Some images may be rotated incorrectly, to deal with this, so I rotated the image 4 times by 90 degrees and process the character recognition process for each side, after which I calculated the average score for each side and take the one with the highest score

As you can see from the images above I was able to find matching keywords on packages or labels and identify the correct product from the catalog.

Now I’m able to deal with both cases:

Similar content search
Search by labels/packaging.

Next step was to combine these two methods in to one solution.

Final solution overview

The visual content search system and character recognition separately performed well, but this may not be enough to find the perfect hit.

My solution was to provide a combined mix of technologies for the most accurate experience of parts search. Depending on the type of photo (Content Based or Text only), the system will perform a search using a Visual or OCR system.

In addition the user could provide keywords with which they can filter their query to get the most accurate results.

By providing intelligent keywords, the engineer can pre-filter the query before searching for products that contain key information, such as product model, year of manufacture, specific color, or filter after the search to exclude irrelevant results

The system will perform combined search by CNN feature extractor and OCR model with additional keywords, and the assumption containing the keyword will be weighted with an additional boost coefficient.

In combination with the visual model, this solution provides an even more accurate search result.

To do that I used the elasticsearch engine together with the Open Distro plugin, this allowed me to perform a k-nn search using bool query and boost coefficients for needed keywords.The Bool query takes a more-matches-is-better approach, so the score from each matching “Must” or “Should ”clause will be added together to provide the final score for each document.

Query consist of two boolean clauses Must with feature vector and Should with list of keyword dictionaries where needed keyword was also given specific boost coefficient to get more relevant search results.

In such a case search will be done at first by image content,and if there be found any keywords on the products, those images will be boosted up in the search result.

Eventually, I finished up a system that allow three types of search, depends on product type

Content Search — If the photo is content based, the system will execute Visual Search powered with CCN feature extraction and k-nn algorithm.
OCR Search — If there is only a package or label on the photo, the system will identify an image as Text Based and execute search powered with OCR system.
Combined Search powered with filter keywords — This method is suitable for the case when the product photo has a label, in which case the engineer can perform a combined search and he will be shown the results of the visual model filtered by specified keywords

Conclusion

In this blog post, I described how to build a manufactory parts finder system from scratch. We easily imagine that like mine customer there are many manufacturing companies that can face the same problem. With the power of CNN , and OpticalCharacter Recognition model it is possible to build high-quality visual search systems that can improve speed and quality of search regardless of the photo of the product or product label.

Don’t forget to give us your 👏 !