Despite the availability of a multitude of tools, data can be quite a beast to tame. But, the world that we live in is such that ‘data has become the new oil’, especially when it comes to business. Today, even businesses have evolved to the point where they consider data as their competitive advantage. From Amazon to Google, Spotify, and Tesco, the examples are numerous.

The Problem

However, large volumes of data can make it extremely hard to glean information. This was a recent problem faced by one of Calcey’s very own European clients. The client is in the business of providing cloud-based Point of Sale (POS) solutions to independent restaurants in Northern Europe.

As it set about scaling its operations by signing up new restaurants, the company understood that the sheer volume and complexity of data rendered analysis (in the traditional sense) a wasteful affair. To understand this problem better, consider how a standalone restaurant stores its transaction data. There could be hundreds of SKUs, all recorded using a naming convention chosen by the owner of the restaurant. The data would most likely be stored in a proprietary database, or even on Microsoft Excel. When you consider how a cloud-based solution provider will now have to aggregate all this data across hundreds of restaurants in many different municipalities, the complexity of the task at hand becomes apparent.

The legacy system our client had to contend with before they approached us creaked under the weight of the data it had to bear. Database timeouts were common, and it took around fifteen minutes for a single report to be compiled. The client had to also resign themselves to generating only daily reports since the legacy system could not aggregate data to provide a weekly or monthly report.

So, how does one sanitize and unify all this data, so that actionable information can be gleaned at the click of a button?

Our Solution

In consultation with the client, we opted to conduct a pilot using the data set belonging to a single restaurant. Since unstructured data must first be sanitized, we chose Talend Cloud as the overall data integration and governance platform, primarily because of its flexibility and speed. Talend’s support to integrate third-party business intelligence (BI) tools was also a definite advantage. This allowed Calcey’s engineers to map the database structure to a set of API endpoints, thereby allowing the BI tool to access a dataset asynchronously.

The proposed system architecture

Second, we opted to use HSQL-DB to improve query performance. By using HSQL-DB, our engineers were able to create a memory cache of the dataset, which had the advantage of improving the speed of the API and improving the application’s performance, while reducing the load on the back-end infrastructure. As a result of this structure, Calcey’s solution was able to deliver a much welcome cost saving to the client.


How the caching works
The caching mechanism within Talend

The Results
By virtue of using an in-memory database to crunch the data, we managed to shorten the time it takes for our client to generate a report to mere seconds, compared to the fifteen minutes it took previously. The in-memory database structure also allows for real-time filtering of data. Additionally, we were able to integrate the database with Power BI through the Talend API, which granted our client the ability to generate deep, detailed, and actionable business insights.

How the API works
The API within Talend

Since the API works by obtaining data directly from the cache memory, we undertook to build a job within Talend (i.e. an updater module) which automatically runs according to a predetermined schedule, thus saving time and reducing the workload of the system administrator.