May 21, 2024


As a community of social companies, NTUC Enterprise is on a mission to harness the capabilities of its a number of models to fulfill urgent social wants in areas like healthcare, childcare, day by day necessities, cooked meals, and monetary companies. Serving over two million clients yearly, we search to allow and empower everybody in Singapore to stay higher and extra significant lives.

With so many strains of enterprise, every operating on completely different computing architectures, we discovered ourselves struggling to combine knowledge throughout our enterprise ecosystem and allow inner stakeholders to entry the info. We deemed this important to our mission of empowering our workers to collaborate on digital enterprise transformation in ways in which allow tailored options for patrons. 

The central problem was that our 5 predominant enterprise strains, together with retail, well being, meals, provide chain, and finance, had been working on completely different combos of Google Cloud, on-premises, and Amazon Internet Providers (AWS) infrastructure. The complicated setup drove us to create a unified knowledge portal that might combine knowledge from throughout our ecosystem, so enterprise models may create inter-platform knowledge options and analytics, and democratize knowledge entry for greater than 1,400 NTUC knowledge residents. In essence, we sought to create a one-stop platform the place inner stakeholders can simply entry any belongings they require from over 25,000 BigQuery tables and greater than 10,000 Looker Studio dashboards.

Here’s a step-by-step abstract of how we deployed DataHub, an open-source metadata platform alongside Google Cloud options to determine a unified Information Portal that permits seamless entry for NTUC workers throughout enterprise strains, whereas enabling safe knowledge ingestion and sturdy knowledge high quality.

  • DataHub’s built-in knowledge discovery operate gives primary performance to find particular knowledge belongings from BigQuery tables and Looker Studio dashboards for storage on DataHub. Nevertheless, we wanted a extra seamless solution to ingest the metadata of all knowledge belongings mechanically and systematically.

  • We subsequently carried out customizations and enhancements on Cloud Composer, a totally managed workflow orchestration service constructed on Apache Airflow, and Google Kubernetes Engine (GKE) Autopilot, which helps us scale out simply and effectively based mostly on our dynamic wants.

  • Subsequent, we constructed knowledge lineage, which allows the end-to-end circulate of information throughout our tech stack, drawing knowledge from Cloud SQL into Cloud Storage, then channeling the info again by BigQuery into Looker Studio dashboards for straightforward visibility. This was instrumental in enabling customers throughout NTUC’s enterprise strains to entry knowledge securely and intuitively on Looker Studio. 

Having arrange the fundamental platform structure, our subsequent process was to allow safe knowledge ingestion. Delicate knowledge wanted to be encrypted and saved in Cloud Storage earlier than populating BigQuery tables. The system wanted to be versatile sufficient to securely ingest knowledge in a multi-cloud atmosphere, together with Google Cloud, AWS, and our on-premises infrastructure.

Our resolution was to construct an in-house framework to suit necessities of Python and YML, in addition to GKE and Cloud Composer. We created the equal of a Collibra knowledge administration platform to go well with NTUC’s knowledge circulate (from Cloud Storage to BigQuery). The system additionally wanted to adapt to NTUC knowledge ideas, that are as follows: 

  • All knowledge in our Cloud Storage knowledge lake should be saved in a compressed kind like Avro, a knowledge safety service

  • Delicate columns should be hashed utilizing Safe Hash Algorithm 256-bit (SHA-256)

  • The answer should be versatile for personalization relying on wants

  • Connection should be made by username and password

  • Connection should be made with certificates (public key and personal key), together with override capabilities in code

  • Connections require one logical desk from lots of of bodily tables (MSSQL sharding tables)

Our subsequent process for the Information Portal was creating an automatic Information High quality Management service to allow us to examine knowledge in real-time every time a BigQuery desk is up to date or modified. This liberates our knowledge engineers, who had been beforehand constructing BigQuery tables by manually monitoring lots of of desk columns for adjustments or anomalies. This was a process that used to take a complete day, however is now decreased to only 5 minutes. We allow seamless knowledge high quality within the following manner: 

  • Exercise in BigQuery tables is mechanically written into Cloud Logging, a totally managed, real-time log administration service with storage, search, evaluation, and alerts 

  • The logging service can then filter out occasions from BigQuery into Pub/Sub for datastreams which might be then channeled into Looker Studio, the place customers can simply entry the particular knowledge they want

  • As well as, the Information High quality Management service sends notifications to customers every time somebody updates BigQuery tables incorrectly or in opposition to set guidelines, whether or not that’s deleting, altering or including knowledge to columns. This permits automated knowledge discovery, with out engineers needing to go intoBigQuery to lookup tables

These steps allow NTUC to create a versatile, dynamic, and user-friendly Information Portal that democratizes knowledge entry throughout enterprise strains for greater than 1,400 platform customers, opening up huge potential for inventive collaboration and digital resolution growth. Sooner or later, we plan to take a look at how we are able to combine much more knowledge companies into the Information Portal, and leverage Google Cloud to assist develop extra in-house options.


Source link