The business challenge

The beneficiary of our services, a global leader listed in the top 100 Forbes companies with over 100 billion EUR in revenues, operates across multiple high-tech industries, including Consumer Electronics, Home Appliances, Home Entertainment, Mobile Communications, Solar, and Display Technologies.
The challenge presented to us was both unique and complex – to develop an image recommendation system tailored for the visual arts and creative industries. This system needed to parallel the efficiency and accuracy of Google’s image search but with a focus on catering to the nuanced demands of creative professionals and art enthusiasts.
One of the main goals was to seamlessly integrate a sophisticated visual search capability that could understand and interpret artistic content, thereby revolutionizing how users discover and interact with visual arts online.
Our solution
To tackle this challenge, we designed a cutting-edge solution leveraging Convolutional Neural Networks (CNN), the pinnacle of image recognition technology. Our approach was methodical and the system entailed two distinct phases:
- an offline phase focused on model training and tuning, alongside feature extraction
- an online phase dedicated to image query and retrieval

This dual-phase process lays a solid foundation for the system to accurately understand and categorize images, ensuring efficient matching of user queries with relevant visual content.
By incorporating state-of-the-art technologies, our solution addresses the specific demands of the creative industries and enhances the capabilities of image recommendation systems.
Team setup
We embrace Agile methodologies in most of our projects. Many times it is our customers and partners who ask us to use Agile methodologies (Scrum, Kanban) from the inception of the project. When the choice is left to our engineers, we carefully analyze the project specifics, and we propose a project management methodology based on Scrum or Kanban which best fits the specific project needs and context.
The development team consisted of 5 R&D developers based in our headquarters and also a development team on our client premises. We worked closely with the Client Service Team and the Infrastructure Team to offer tech support and maintenance when needed.
Features are learned in three multi-layer CNNs (art-style, content, style) and then combined in one large feature-vector (meta-descriptor) which is further used for computing the image similarities.
- The ‘content’ model is the pre-trained GoogLeNet model described by Szegedy et al. in ILSVRC 2014 and learns 1000 features describing the content of the images
- The ‘art-style’ model is a tuned version of the GoogLeNet model, trained on 85K paintings annotated with 25 style/genre labels
- The ‘style’ model is the pre-trained ‘Flickr Style’ model described by Karayev et al. and outputs 20 features
- A 4th model is used to learn the type of the images (art, content or style) and its output is used as a dynamically feature weighting mechanism
For performance and scalability considerations, feature learning and extraction are performed on the GPU, using the CUDA software architecture provided by Nvidia. Training and feature extraction are done using the Caffe framework, because of its speed and modularity.
Architecture & Technologies
Architecture
The system is designed with extensibility in mind. It has a core that aggregates multiple pluggable modules: style, art-style, content, color. For system scalability consideration our choice was to run multiple instances of each pluggable module. The frontend has an adaptive/responsive layout using Bootstrap in order to address all kind of devices (including mobile).
Architecture
- C/C++, Pyton, Java
- Open CV, Open CL, Caffe
- Matlab, Octave
- Machine Learning, Deep Learning