Gaurav Mishra

Gaurav Mishra

Recent Posts

Amazon Sagemaker: What, Why and How

Posted by Gaurav Mishra on Dec 27, 2019 4:59:00 PM

According to IDC, the Artificial Intelligence market will attain a gigantic 37% compound annual growth by 2022. Owing to its popularity, several tools and software have emerged in the market to make AI adaptation easier. However, one tool that clearly stands out in all respects is Amazon Sagemaker. In this blog, we take an in-depth look at what it is, why use it, and how to go about its usage.

What is Amazon Sagemaker?

Amazon SageMaker is a fully managed AWS solution that empowers data scientists and developers to quickly build, train, and deploy machine learning models. It is in the form of an integrated development environment for machine learning, the Amazon SageMaker Studio, which acts as a base to build upon a collection of other AWS SageMaker tools.

You can build and train ML models from scratch or purchase pre-built algorithms that suit your project requirements. Similar tools are available for debugging models or adding manual review processes atop model predictions.
amazon sagemaker what it isImage via Amazon

Why Should You Use It?

The complexity of the machine learning project in any enterprise increases with the expansion of scale. This is because machine learning projects comprise of three key stages - build, train and deploy - each of which can continuously loop back into each other as the project progresses. And as the amount of data being dealt with increases, so does the complexity. And if you are planning to build a ML model that truly works, your training data sets will tend to be on the larger side.

Typically, different skill sets are required at different stages of a machine learning project. Data scientists are involved in researching and formulating the machine learning model, while developers are the ones taking the model and transforming it into a useful, scalable product or web-service API. But not every enterprise can put together a skilled team like that, or achieve the necessary coordination between data scientists and developers to roll out workable ML models at scale.

This is exactly where Amazon Sagemaker steps in. As a fully managed machine learning platform, SageMaker abstracts the software skills, enabling data engineers to build and train the machine learning models they want with an intuitive and easy-to-use set of tools. While they play to the core strengths of working with the data and crafting the ML models, the heavy lifting needed for developing these into a ready-to-roll web-service API is handled by Amazon Sagemaker.

Amazon SageMaker packs all the components used for machine learning in a single shell, allowing data scientist to deliver end-to-end ML projects, with reduced effort and at lower cost.

How It Works?

With a 3-step model of Build-Train-Deploy, Amazon SageMaker simplifies and streamlines your machine learning modeling. Let’s take a quick look at how it works.


Amazon SageMaker offers you a completely integrated development environment for machine learning that lets you improve your productivity. With the help of its one-click Jupyter notebooks, you can build and collaborate with lightning speed. Sagemaker also offers you a one-click sharing facility for these notebooks. The entire coding structure is captured automatically, which allows you to collaborate with others without any hurdle.

Apart from this, the Amazon SageMaker Autopilot is the first automated machine learning capability of this industry. It allows you to have complete control as well as visibility into your respective machine learning models. The traditional approaches of automated machine learning do not allow you to peek in the data or logic used to create that model. However, the Amazon Sagemaker Autopilot is capable of integrating with Sagemaker Studio and provides you complete visibility into the raw data and information used in the creation.

One of the highlights of Amazon SageMaker is its Ground Truth feature that helps you in building as well as managing precise training datasets without facing any hurdle. The Ground Truth provides you complete access to the labelers via Amazon Mechanical Trunk along with pre-built workflows as well as interfaces for common labeling tasks. The Amazon Sagemaker comes with the support of various deep learning frameworks including PyTorch, TensorFlow, Apache MXNet, Chainer, Gluon, Keras, Scikit-learn, and Deep-Graph library.

Leveraging Amazon Sagemaker, Srijan built a video analytics solutions that can scrape video feed data to log asset performance.


Using AWS Lambda, Amazon SageMaker and Amazon S3, Srijan developed a video analytics solution for the client. The solution utilized a machine learning model to scrape video feed data and log asset performance over a given period of time and assigned location.

As a result, it helped in:

  • Claims validation against machines that were failing to clean the given sites
  • Insight based behavior analysis of the assets, leading to improvement of the product
  • Enabling more proactive, instead of reactive, asset performance assessment and maintenance

View the Complete Case Study



Using Amazon SageMaker Experiments, you can easily organize, track, and evaluate every iteration to machine learning models. Training a machine learning model packs various iterations to measure and isolate the impact of changing algorithm versions, model parameters, and changing datasets. The Sagemaker Experiments help you in managing these iterations via capturing the configurations, parameters, and results automatically, and storing them as ‘experiments’.

SageMaker comes with a debugger functionality that is capable of analyzing, debugging, and fixing all the problems in your machine learning model. Debugger makes the training process entirely transparent by capturing real-time metrics during the process. The Sagemaker Debugger also comes with a facility of generating warnings as well as remediation advice if any common problems are detected during the training process.

Apart from this, AWS Tensorflow optimization offers you a scaling facility of up to 90% with the help of its gigantic 256 GPUs. Using this, you can experience precise, and sophisticated training models in very little time. Furthermore, the Amazon Sagemaker comes with a Managed Spot Training that helps reduce training costs up to 90%.


Amazon SageMaker offers you a one-click deployment facility so that you can easily generate predictions for batch or real-time data. You can easily deploy your model on auto-scaling Amazon machine learning instances across various availability zones for improved redundancy. You just need to specify the desired maximum and minimum numbers, and the type of instance, and then leave the rest to Amazon Sagemaker.

The major problem that can affect the accuracy of your entire operation is the difference between data used to generate predictions and the data used to train models. The SageMaker Model Monitor can help you in getting out of this puzzle by detecting and remediating concept drift. The Sagemaker Model Monitor detects the concept drift in all of your deployed models automatically and then provides alerts to identify the main source of the problem.

The Amazon Sagemaker also packs Augmented AI facility, with the help of which, you can easily allow human reviewers to step in if the model is unable to make high confidence precise predictions. Moreover, the Amazon Elastic Inference is capable of minimizing your machine learning inference costs by 75%. Lastly, Amazon also allows you to integrate Sagemaker with Kubernetes, by which you can easily automate the deployment, scale, and management of your applications.

So there you have it, a look at how Amazon Sagemaker can help build, train and deploy machine learning models to suit your project requirements. 

Srijan is an advanced AWS Consulting Partner, and can help you utilize AWS solutions at your enterprise. To know more, drop us a line outlining your business requirements and our expert AWS team will get in touch.

Topics: AWS, Machine Learning & AI, Architecture

Digital Transformation Underpins Your Digital Customer Experience. Here's How

Posted by Gaurav Mishra on Dec 8, 2019 2:37:00 PM

- Two-thirds of global CEOs will be focusing on digital strategies to enhance their customer experience, by the end of 2019.


- 85% of enterprise stakeholders surveyed by IDC say that it’s imperative to achieve significant digital transformation by 2021, or avoid losing business to competitors.

These two statistics do a great job of framing the criticality of digital customer experience (DCX) in the next decade, and the starring role digital transformation will play in driving successful DCX.

With that fact being inevitable, all that’s left for enterprises is to understand how exactly digital transformation underpins their diigital customer experience.

We take a look at the four key pillars of digital customer experience and explore what technology transformations need to happen within enterprises, to steady those pillars. 


Your brand should be accessible through the channels that your customers use most often and are the most comfortable with. For enterprises, innovation lies in finding novel ways to enable interaction with customers, and allow them to get the job done with the least 

amount of friction. 

However, new interaction channels mean engaging with new technology solutions. The key digital transformation required here would be the ability to quickly understand emerging technologies and identifying which of those can be leveraged to deliver phenomenal digital experiences.


While we cannot recommend which exact transformations will work for which enterprise, here are a few interaction channels that have delivered the most business value across industries.


Retail brands have innovated their shopping and customer interaction workflows to give buyers the flexibility to order items via Facebook Messenger bots. Several airline brands and travel OTAs have adopted a similar route. Hospitality brands have created phenomenal guest experiences on their properties, and even before they reach their properties, with chatbots. 

There is a diverse range of technology solutions that go into crafting a successful chatbot experience. From being able to leverage tools like Amazon Lex and other natural language processing platforms, to establishing content management systems that can deliver the right content into these chatbots, enterprise IT teams have to become well versed with these technology solution very fast. 

Virtual Try-on

Virtual makeup apps are transforming how customers sample and buy beauty products. People can sample a range of products and see how each shade and effect looks on their face and skin, from the comfort of their homes, using these apps. The in-store experience is also being digitized by having these apps on the counter, as a very persuasive aid for the beauty advisors. They can help customers try out practically the entire product range, or several different looks, in a matter of minutes.

Creating these virtual try-on experiences require enterprises to adopt emerging technologies around Augmented Reality. Complementary to this will be technology transformations around the content management system that powers these experiences.


Customers expect to conduct their complete journey with a brand over multiple interaction channels. And you are expected to keep track of these multiple channels and ensure that customers can pick up their journey on any channel, from right where they left off last time. Integrated, omnichannel should be a given for your digital customer experience.  

Ensuring that all your customer interaction channels are integrated and deliver a holistic experience is driven solely by technology.


Content as a Service

When your customers are interacting with your brand across multiple channels, you want your communications with them to be consistent. If your brand promotes and retail offer with Twitter update, it needs to be showcased on your website as well. If a media and publishing brand delivers information on the website, the same should be available to users on the app, on a tablet, and on their smart home devices. And this need to be done without your team having to manually reformat content for these different channels.

What you need to make this happen is Content as a Service (CaaS). Essentially a technology solution that ensures content is collated in a single repository where you can create, manage, categorize, search, and make it available to other systems. It involves creating a content platform architecture that enables content to be separate from its presentation. And that requires a lot of transformation in your technology architecture - from a decoupled content management system to component-based development - to empower your omnichannel delivery with CaaS.


Innovative, omnichannel digital customer experiences are complex solutions with a lot of moving parts. But for the customer, these have to be extremely intuitive. They should work exactly how the customer expects them to, and given them what they want, even before they know they want it. 

But this doesn’t just happen magically. Intuitive digital customer experiences are driven by a mass of micro data points and technology solutions that can transform those into actionable insights. 

Transformations Needed

33% of consumers who ended their relationship with a company last year did so because the experience wasn’t personalized enough.

The fact that personalization is critical is not new to enterprises. But delivering personalization is dependent upon using the right technology elements:

Data Engineering Technology Stack

One of the key technology transformations required for personalization is the creation of an integrated data engineering tech stack. 

This would involve:

  • Identifying the disparate sources of data, and also classifying them as structured or unstructured data
  • Pulling this data from the different databases and cleaning it 
  • Bringing it all to a central repository where it can be processed into useful information and used to build detailed customer profiles

Depending upon the existing technology stack of any enterprise, this could mean different things:

  • If you already have different tools and systems at play that collect, clean, and structure data, you would need to engineer them to work with a central repository.
  • If your disparate databases host a combination of structured and unstructured data, you would need to significantly overhaul your data collection, processing, and storage infrastructure.

Given how important data is in this whole personalization endeavour, you would want to get this part absolutely right.

After all, what’s worse than no personalization? Wrong personalization!

Flexible Content Management System

The first level of personalization always happens on the website or branded application. But if your current CMS is one where putting up a new page or changing the design is a week-long project, then you are definitely not ready to deliver personalized digital customer experience.

Beyond the content management system, you might also need to adopt additional personalization solutions like Acquia Lift or Acquia Journey, other similar ones. These are solutions that work atop your existing CMS and allow you to personalize your digital properties and get better visibility into the customer journeys that happen across your brand touchpoints.


Your digital customer experience doesn’t become a self-sustaining and self-improving process unless customers interact with it. Successful DCX is heavily dependent on customers clicking, scrolling, asking or answering - performing actions on your digital channels. These interaction data points are what drives increasingly improved digital customer experience.

But a prerequisite for this a well-designed data architecture that is capable of collecting, processing and analyzing this data.

Currently, here’s what is possibly happening at your enterprise:

Customer interaction data - how they arrived on your website, which pages they browsed, what content they consumed, what they said about your brand on social media - are collected and stored by different systems. While page visits are usually sourced from  the web analytics tool, you might have two or three different tools to gather user behavior data like page scroll, CTA clicks etc. The CRM data is obviously in a different database, the geo-location data is being picked up in real time, and the social media data is picked up by a different tool.

Transformations Needed

Dedicated Data Lakes

If you are to make any sense of this vast volume of data, you have to be able to see them holistically and that is not possible with customer data residing in disparate systems. So the first transformation here has to be in terms of a marketing data lake.

With a marketing data lake, enterprises can gather data from external and internal systems and drop it all in one place. The possibilities with this data can be at several levels:

  • Basic analytics for a comprehensive look into persona profiles and campaign performance
  • Leveraging data to form basic and advanced personalization and recommendation engines for users
  • Forming a 360 degree view of individual customers by pulling together information on customer journey, preferences, social media activity, sentiment analysis and more

All of this information feeds into your digital customer experience to make it easier for your customers to interact with you, find what they need, and receive hyper-personalize, low-friction experiences across your digital properties.

Artificial Intelligence and Machine Learning

Beyond using the prima facie insights, enterprises can have data scientists perform exploratory analysis on look at the wide spectrum of data available. They build some statistical models based on machine learning algorithms to parse through this data and check if any new customer behavior patterns and insights emerge. These can form the basis of creating new and more innovative experiences and bringing more intuitive workflows into the system.

The Best Westerns’ AI-powered ads, on the other hand, are reflective of a different use case for leveraging artificial intelligence or travel personalization. The hotel chain is combining their CRM data with AI-based recommendation engines to predict what destinations customers will prefer. So whenever someone comes across their ad, they are asked a few questions around their travel plans, and the ad gives them personalized recommendations.

Here AI is being used to understand customer wants, and then look through a large database of recommendations to pick and choose options that correspond to those wants. And all of this is happening in real-time. This the kind of scaled personalization, with recommendation engines capable of delivering contextual answers, is impossible to deliver without the right technology.

As you can see, enhanced digital customer experiences that consistently showcase your brand and your business to your customers do not happen without significant digital transformation. For enterprises, it is critical to start evaluating how they want to roll out new DCX strategies, and identifying the specific technology transformations and expertise needed to make those possible.

Srijan is working with global enterprises across industries to create rich, immersive, well-crafted digital expereinces that work exactly as your customers expect them to. Our expert teams engage with your business goals and key stakeholders to craft digital experience solutions tailormade to their requirements.

Going betond the ordinary with your digital customer experience game? Let's start the conversation on how Srijan can help.

Topics: Customer Experience, Digital Experience

AWS Glue: Simple, Flexible, and Cost-effective ETL For Your Enterprise

Posted by Gaurav Mishra on Oct 31, 2019 6:28:00 PM

An Amazon solution, AWS Glue is a fully managed extract, transform, and load (ETL) service that allows you to prepare your data for analytics. Using the AWS Glue Data Catalog gives a unified view of your data, so that you can clean, enrich and catalog it properly. This further ensures that your data is immediately searchable, queryable, and available for ETL.

It offers the following benefits:

  • Less Hassle: Since AWS Glue is integrated across a wide range of AWS services, it natively supports data stored in Amazon Aurora, Amazon RDS engines, Amazon Redshift, Amazon S3, as well as common database engines and Amazon VPC. This leads to reduced hassle while onboarding.
  • Cost Effectiveness: AWS Glue is serverless, so there are no compute resources to configure and manage. Additionally, it handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. This is quite cost effective as you pay only for the resources used while your jobs are running.
  • More Power: AWS Glue automates much of the effort spent in building, maintaining, and running ETL jobs. It crawls your data sources, identifies data formats, and suggests schemas and transformations. It even automatically generates the code to execute your data transformations and loading processes.

AWS Glue helps enterprises significantly reduce the cost, complexity, and time spent creating ETL jobs. Here’s a detailed look on why use AWS Glue:

Why Should You Use AWS Glue?

AWS Glue brings with it the following unmatched features that provide innumerable benefits to your enterprise:

Integrated Data Catalog

AWS Glue consists of an integrated Data Catalog which is a central metadata repository of all data assets, irrespective of where they are located. It contains table definitions, job definitions, and other control information that can help you manage your AWS Glue environment. 

Using the Data Catalog can help you automate much of the undifferentiated heavy lifting involved in cleaning, categorizing or enriching the data, so you can spend more time analyzing the data. It computes statistics and registers partitions automatically so as to make queries against your data both efficient and cost-effective.

Clean and Deduplicate Data

You can clean and prepare your data for analysis by using an AWS Glue Machine Learning Transform called FindMatches, which enables deduplication and finding matching records. And you don’t need to know machine learning to be able to do this. FindMatches will just ask you to label sets of records as either “matching” or “not matching”. Then the system will learn your criteria for calling a pair of records a “match” and will accordingly build an ML Transform. You can then use it to find duplicate records or matching records across databases.

Automatic Schema Discovery

AWS Glue crawlers connect to your source or target data store, and progresses through a prioritized list of classifiers to determine the schema for your data. It then creates metadata and stores in tables in your AWS Glue Data Catalog. The metadata is used in the authoring process of your ETL jobs. In order to make sure that your metadata is up-to-date, you can run crawlers on a schedule, on-demand, or trigger them based on any event.

Code Generation

AWS Glue can automatically generate code to extract, transform, and load your data. You simply point AWS Glue to your data source and target, and it will create ETL scripts to transform, flatten, and enrich your data. The code is generated in Scala or Python and written for Apache Spark.

Developer Endpoints

AWS Glue development endpoints enable you to edit, debug, and test the code that it generates for you. You can use your favorite IDE (Integrated development environment) or notebook. Or write custom readers, writers, or transformations and import them into your AWS Glue ETL jobs as custom libraries. You can also use and share code with other developers using the GitHub repository.

Flexible Job Scheduler

You can easily invoke AWS Glue jobs on schedule, on-demand, or based on an event. Or start multiple parallel jobs and specify dependencies among them in order to build complex ETL pipelines. AWS Glue can handle all inter-job dependencies, filter bad data, and retry jobs if they fail. Also, all logs and notifications are pushed to Amazon CloudWatch so you can monitor and get alerts from a central service.

How It Works?

You are now familiar with the features of AWS Glue, and the benefits it brings for your enterprise. But how should you use it? Surprisingly, creating and running an ETL job is just a matter of few clicks in the AWS Management Console. 

All you need to do is point AWS Glue to your data stored on AWS, and AWS Glue will discover your data and store the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.

Here’s how it works:

  • Define crawlers to scan data coming into S3 and populate the metadata catalog. You can schedule this scanning at a set frequency or to trigger at every event
  • Define the ETL pipeline and AWS Glue with generate the ETL code on Python
  • Once the ETL job is set up, AWS Glue manages its running on a Spark cluster infrastructure, and you are charged only when the job runs

The AWS Glue catalog lives outside your data processing engines, and keeps the metadata decoupled. So different processing engines can simultaneously query the metadata for their different individual use cases. The metadata can be exposed with an API layer using API Gateway and route all catalog queries through it.

When to Use It?

What with all the information around AWS Glue, if you do not know where to put it in use? Here’s a look at some of the use case scenarios and how AWS Glue can make your work easier:

1 Queries Against an Amazon S3 Data Lake

Looking to build your own custom Amazon S3 data lake architecture? AWS Glue can make it possible immediately, by making all your data available for analytics even without moving the data. 

2 Analyze Log Data in Your Data Warehouse

Using AWS Glue, you can easily process all the semi-structured data in your data warehouse for analytics. It generates the schema for your data sets, creates ETL code to transform, flatten, and enrich your data, and loads your data warehouse on a recurring basis.

3 Unified View of Your Data Across Multiple Data Stores

AWS Glue Data Catalog allows you to quickly discover and search across multiple AWS data sets without moving the data. It gives a unified view of your data, and makes cataloged data easily available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

4 Event-driven ETL Pipelines

AWS Glue can run your ETL jobs based on an event, such as getting a new data set. For example, you can use an AWS Lambda function to trigger your ETL jobs to run as soon as new data becomes available in Amazon S3. You can also register this new dataset in the AWS Glue Data Catalog as part of your ETL jobs.

So there you have it, a look at how AWS Glue can help manage your data cataloguing process, and automation of the ETL pipeline. 

Srijan is an advanced AWS Consulting Partner, and can help you utilize AWS solutions at your enterprise. To know more, drop us a line outlining your business requirements and our expert AWS team will get in touch.

Topics: AWS, Architecture

Refactoring applications for cloud migration: What, when and how

Posted by Gaurav Mishra on Sep 27, 2019 3:54:00 PM

Enterprises migrating their applications to the cloud often face the difficulty of finalizing an approach that is in line with their migration goals. Here are a bunch of questions that can help you in this:

  • What are your business goals?
  • What are your application capacities?
  • What is the estimated cost for your cloud migration process?

Answering these questions, and then selecting the best suitable cloud migration path will guarantee long term success of your enterprise with the migration approach you choose.

In this post, we take a look at one of the most popular methods of cloud migration: Refactoring, what is it and when should you choose it?

What is refactoring migration?

Refactoring is the process of running your applications on the infrastructure of your cloud provider, that is, you will need to completely re-architecture your applications to better suit the new cloud environment. This approach involves modifying your existing applications, or a large chunk of the codebase in order to take better advantage of the cloud-based features and the extra flexibility that comes with them.

Refactoring migration is found to be more complex than the other cloud migration approaches because while making application code changes, you must also ensure that it does not affect the external behavior of the application.

For example, if your existing application is resource intensive, it may cause larger cloud billing because it involves big data processing or image rendering. In that case, redesigning the application for a better resource utilization is required before moving to the cloud.

This approach is the most time-consuming and resource-intensive of all approaches, yet it can offer the lowest monthly spend in comparison. We further take a look at the benefits, and limitations it has to offer:

Benefits of Refactoring

Most benefits of refactoring are delivered in the future. They include:

  • Long-term cost reduction: Refactoring approach ensures an over-time reduction in costs, matching resource consumption with the demand, and eliminating the waste. This results in a better, and more lasting ROI compared to the less cloud-native applications.

  • Increased resilience: By decoupling the application components and wiring together highly-available and managed services, the application inherits the resilience of the cloud.

  • Responsive to business events: Using this approach enables the applications to leverage the auto-scaling features of cloud services that scale up and down according to demand.

Limitations of Refactoring

The disadvantages of this approach include:

  • Vendor lock-in: The more cloud-native your application is, the more tightly it is coupled to the cloud you are in.

  • Skills: Refactoring is not for beginners. It requires the highest level of application, automation and cloud skills and experience.

  • Time taking: Because refactoring is resource-intensive, and much more complicated in terms of changing from a non-cloud application to a cloud-native application, it can take a lot of time to complete.

  • Getting it wrong: Refactoring requires changing everything about the application, so it has the maximum probability of things going sideways. Each mistake can cause delays, cost escalations and potential outages.


Refactoring is a complex process, but it is well worth the improvement that you get in return. Some companies even go as far as refactoring parts of their business solutions to make the whole process more manageable. This compartmentalization could also lead to refactor becoming longer and more resource-intensive.

When to choose refactoring?

Now that you are aware of the advantages and limitations associated with Refactoring approach, the next step is to identify when you should choose this approach. Take a look:

1. Enterprise wants to tap the cloud benefits

Does your business have a strong need to add features, scale, or performance? If so, refactoring is the best choice for you. Exploiting the cloud features will give you benefits that are otherwise difficult to achieve in an existing non-cloud environment. 

2. Scaling up or restructuring code

Is your organization looking to scale an existing application, or wants to restructure their code? You can take full advantage of cloud capabilities by migrating via the refactoring process.

3. Boost agility

If your organization is looking to boost agility or improve business continuity by moving to a service-oriented architecture, then this strategy may be worth pursuing. And that’s despite the fact that it is often the most expensive solution in the short-medium term.

4. Efficiency is a priority

Refactoring has the promise of being the most efficient cloud model because your application is cloud-native, and will exploit continuous cloud innovation to benefit from reducing costs and improvements in operations, resilience, responsiveness and security.

How to refactor?

So you know when to choose refactoring, the next question is how? There are in general, four ways to refactor your applications for the cloud.

1. Complete Refactoring

In this type, 50% of the code is changed and the database is updated to utilize as many cloud-native features as required by the application. This strategy can improve performance, operations costs and IT teams' ability to meet the needs of the business. On the downside however, the process could be too costly or complex, and can introduce bugs.

2. Minimum Viable Refactoring

This requires only slight changes in the application, and is therefore, both quick and efficient. Users who take this approach often incorporate cloud-native security, management and perhaps a public cloud database into their migrated workload.

3. Containerization Refactoring

In this, applications are moved into containers with minimal modifications. The applications exist within the containers, which enables users to incorporate cloud-native features and improve portability. 

This approach is found to be more complex because of the learning involved in adapting to new tools. But that is easily checked, as with the popularity of containers and their growing ecosystems, the costs and refactoring times continue to decrease.

4. Serverless Application Refactoring

This approach has similar issues as containerization as it changes the development and operations platform, which requires learning new tools and skills. Some modifications are required to make the application work effectively and take advantage of serverless systems on the public cloud. 

Unlike containers however, serverless platforms don't provide portability, so lock-in is a major downside to this approach.

You can refactor your applications using either of these ways, but it is advisable to do Minimum Viable Refactoring for most of it. Refactoring is a highly variable activity, dependent on the current application complexity. And during its discovery assessment process, it’s not possible to predict how long an application refactor will take. It could be around three-to-six months per application depending on complexity and previous experience.

Hence, a targeted timeline, refactoring in parts, and checking progress with collected data are some of the best practices to keep in mind while taking up Refactoring cloud migration approach. Because of these reasons, this approach is chosen by very few enterprises that have the time, money, and resources for it.

Looking to shift business-critical applications to or even between clouds? Just drop us a line and our expert team will be in touch.

Topics: Cloud, Architecture

Setting up a Data Lake architecture with AWS

Posted by Gaurav Mishra on Aug 27, 2019 5:50:00 PM

We’ve talked quite a bit about data lakes in the past couple of blogs. We looked at what is a data lake, data lake implementation, and addressing the whole data lake vs. data warehouse question. And now that we have established why data lakes are crucial for enterprises, let’s take a look at a typical data lake architecture, and how to build one with AWS. 

Before we get down to the brass tacks, it’s helpful to quickly list out what the specific benefits that we want an ideal data lake to deliver. These would be:

  • The ability to collect any form of data, from anywhere within an enterprise’s numerous data sources and silos. From revenue numbers to social media streams, and anything in between.
  • Reduce the effort needed to analyze or process the same data set for different purposes by different applications.
  • Keep the whole operation cost efficient, with the ability to scale up storage and compute capacities as required, and independent of each other.


And with those requirements in mind, let’s see how to set up a data lake with AWS

Data Lake Architecture

A typical data lake architecture is designed to:

  • take data from a variety of sources
  • move them through some sort of processing layer
  • make it available for consumption by different personas within the enterprise

So here, we have some key part of the architecture to consider:

Landing zone: This is the area where all the raw data comes in, from all the different sources within the enterprise. The zone is strictly meant doe data ingestion, and no modelling or extraction should be done at this stage.

Curation zone: Here’s where you get to play with the data. The entire extract-transform-load (ETL) process takes place at this stage, where the data is crawled to understand what it is and how it might be useful. The creation of metadata, or applying different modelling techniques to it to find potential uses, is all done here.

Production zone: This is where your data is ready to be consumed into different application, or to be accessed by different personas. 

Data Lake architecture with AWS

With our basic zones in place, let’s take a look at how to create a complete data lake architecture with the right AWS solutions. Throughout the rest of this post, we’ll try to bring in as many of AWS products as applicable in any scenario, but focus on a few key ones that we think brings the best results. 

Landing Zone - Data Ingestion & Storage

For this zone, let’s first look at the available methods for data ingestion:

  • Amazon Direct Connect: Establish a dedicated connect between your premises or data centre and the AWS cloud for secure data ingestion. With an industry standard 802.1q VLAN, the Amazon Direct Connect offers a more consistent network connection for transmitting data from your on premise systems to your data lake.
  • S3 Accelerator: Another quick way to enable data ingestion into an S3 bucket is to use the Amazon S3 Transfer Acceleration. With this, your data gets transferred to any of the globally spread out edge locations, and then routed to your S3 bucket via an optimized and secure pathway. 
  • AWS Snowball: You can securely transfer huge volumes of data onto the AWS cloud with AWS Snowball. It’s designed for large-scale data transport and is one-fifth of the cost of transferring data via high-speed internet. It’s a great option for transferring voluminous data assets like genomics, analytics, image or video repositories.
  • Amazon Kinesis: Equipped to handle massive amounts of streaming data, Amazon Kinesis can ingest, process and analyze real-time data streams. The entire infrastructure is managed by AWS to that it’s highly efficient and cost-effective. You have:
    • Kinesis Data Streams: Ingest real-time data streams into AWS from different sources and create arbitrary binary data streams that are on multiple availability zones by default.
    • Kinesis Firehose: You can capture, transform, and quickly load data onto Amazon S3, RedShift, or ElasticSearch with Kinesis Firehose. The AWS managed system autoscales to match your data throughput, and can batch, process and encrypt data to minimize storage costs.
    • Kinesis Data Analytics: One of the easiest ways to analyze streaming data, Kinesis Data Analytics pick any streaming source, analyze it, and push it out to another data stream or Firehose.

Storage - Amazon S3

One of the most widely used cloud storage solution, the Amazon S3 is perfect for data storage in the landing zone. S3 is a region level, multi availability zone storage options. It’s a highly scalable object storage solution offering 99.999999999% durability. 

But capacity aside, the Amazon S3 is suitable for a data lake because it allows you to set a lifecycle for data to move through different storage classes. 

  • Amazon S3 Standard to store hot data that is being immediately used across different enterprise applications
  • Amazon S3 Infrequent Access to hold warm data, that accessed less across the enterprise but needs to be accessed rapidly whenever required.
  • Amazon S3 Glacier to archive cold data at a very low cost as compared to on premise storage.

Curation Zone - Catalogue and Search

Because information in the data lake is in the raw format, it can be queried and utilized for multiple different purposes, by different applications. But to make that possible, usable metadata that reflects technical and business meaning also has to be stored alongside the data. This means you need to have a process to extract metadata, and properly catalogue it.

The meta contains information on the data format, security classification-sensitive, confidential etc, additional tags-source of origin, department, ownership and more. This allows different applications, and even data scientists running statistical models, to know what is being stored in the data lake.

Data Lake Architecture - polulating metadata

Source: Screengrab from "Building Data Lake on AWS", Amazon Web Services, Youtube

The typical cataloguing process involves lambda functions written to extract metadata, which get triggered every time object enters Amazon S3. This metadata is stored in a SQL database and uploaded to AWS ElasticSearch to make it available for search.

AWS Glue is an Amazon solution that can manage this data cataloguing process and automate the extract-transform-load (ETL) pipeline. The solutions runs on Apache Spark and maintains Hive compatible metadata stores. Here’s how it works:

  • Define crawlers to scan data coming into S3 and populate the metadata catalog. You can schedule this scanning at a set frequency or to trigger at every event
  • Define the ETL pipeline and AWS Glue with generate the ETL code on Python
  • Once the ETL job is set up, AWS Glue manages its running on a Spark cluster infrastructure, and you are charged only when the job runs.

The AWS Glue catalog lives outside your data processing engines, and keeps the metadata decoupled. So different processing engines can simultaneously query the metadata for their different individual use cases. The metadata can be exposed with an API layer using API Gateway and route all catalog queries through it.

Curation Zone - Processing

Once cataloging is done, we can look at data processing solutions, which can be different based on what different stakeholders want from the data.

Amazon Elastic MapReduce (EMR)

Amazon’s EMR is a managed Hadoop cluster that can process a large amount of data at low cost.

A typical data processing involves setting up a Hadoop cluster on EC2, set up data and processing layers, setting up a VM infrastructure and more. However, this entire process can be easily handled by EMR. Once configured, you can spin up new Hadoop clusters in minutes. You can point them to any S3 to start processing, and the cluster can disappear once the job is done. 

Data Lake Architecture - Amazon EMR Benefits-1

Source: Screengrab from "Building Data Lake on AWS", Amazon Web Services, Youtube

The primary benefit of processing with EMR rather than Hadoop on EC2 is the cost savings. With the latter, your data lies within the Hadoop processing cluster, which means the cluster needs to be up even when the processing job is done. So you are still paying for it. However with EMR, your data and processing layers are decoupled, allowing you to scale them both independent of each other. So while your data resides in S3, your Hadoop clusters on EMR can be set up and stopped as required, making the cost of processing completely flexible. Costs are also lowered by easily integrating it with Amazon spot instances for lower pricing.

Amazon ElasticSearch

This is another scalable managed search node cluster that can be easily integrated with other AWS services. It’s best for log analytics use cases. 

Amazon RedShift

If you have a lot of BI dashboards and applications, Amazon RedShift is a great processing solution. It’s inexpensive, fully managed, and ensures security and compliance. With RedShift, you can spin up a cluster of compute nodes to simultaneously process queries.

This processing stage is also where enterprises can set up their sandbox. They can open up  the data lake to data scientists to run preliminary experiments. Because data collection and acquisition is now taken care of, data scientists can focus on finding innovative ways to put the raw data to use. They can bring is open-source or commercial analytics tools to create required test beds, and work on creating new analytics models aligned with different business use cases.

Production Zone - Serve Processed Data

With processing, data lake is now ready to push out data to all necessary applications and stakeholders. So you can have data going out to legacy applications, data warehouses, BI applications and dashboards. This can be accessed by analysts, data scientists, business users, and other automation and engagement platforms.

So there you have it, a complete data lake architecture and how it can be set with the best-of-breed AWS solutions. 

Looking to set up an optimal data lake infrastructure? Talk to our expert AWS team, and let’s find out how Srijan can help.

Topics: AWS, Architecture

3 Key challenges to AI adoption and how to solve them

Posted by Gaurav Mishra on Aug 21, 2019 3:13:00 PM

While the potential ROI from investing in artificial intelligence and machine learning is abundantly clear to enterprise leaders, the actual adoption of the technology has been slower than you would believe. Yes, some very prominent brands across industries are leveraging it to drive significant revenues. But a 2018 survey by IDG finds that only 1 out of 3 AI projects across organizations actually succeed.

Here's a look at the three key challenges that enterprises are faced with on the AI adoption curve and how to get past them.

1. The Strategic Challenges: Use Cases and Ownership

Choosing the Right "First" Problem to Solve

Investing in AI/ML is a big decision for most enterprises, and there is the expectation of seeing some significant ROI from it within the first six to eight months. For that to happen, it’s important to choose the right business use case to optimize with AI/ML.

A lot of enterprises in the initial stages make the mistake of leveraging the technology for smaller fringe projects or choose projects where one can see some ready data available that is clean and categorized. However, while choosing a "small" first project is a good idea, ROI can be showcased only if the project is a key part of the core business.

How to Solve

So your basic sample set of use cases to choose from should only comprise of tasks that are a part of your key enterprise revenue streams. Out of these, identifying the right use case to test and showcase the power of AI/ML solutions can be done by answering a few guiding questions:

  • Which tasks involve making decisions based on searching and analyzing a huge amount of data — historical or real time? For example, customer service, customer experience personalization, and analyzing sales data.
  • Which tasks have a low tolerance for errors? For example, sensitive manufacturing processes or verification workflows that ensure compliance with various industry standards.
  • Which tasks are repetitive and time and effort intensive, but have to be performed at scale nevertheless? For example, identifying and tracking counterfeit products.

Picking a task that fulfils any one or more of the above criteria is a great use case to test out your first AI/ML solution. That’s because, in each of these cases, a successful solution deployment will lead to very tangible benefits — better decision making, near-zero errors, or reduction in operating costs by eliminating resources employed in repetitive processes.

Yes, you will need to have the right type of data in place to even begin creating an AI-powered solution, and several other factors have to fall in place for it to be successful. However, you’ll at least have ensured that when the solution does succeed, it’s compelling enough for stakeholders to take notice.

Owning the AI/ML Adoption Piece

The PwC Digital IQ report points out that of late, 68 percent of an enterprise’s technology expenditure is outside the CTO's budget. This means that business stakeholders across marketing, sales, HR, accounting, etc. are independently investing in technology solutions. And there are high chances that some of these are AI/ML-based.

So, you have the emerging tech being used in some aspect of the business without the core IT team or the larger enterprise being aware of it. Any significant benefits accruing out of these solutions are also not being shared across the organization.

Additionally, new stakeholders within the enterprise who partly own certain technology pieces like the Chief Digital Officer or Head of Data and Analytics also have significant say in the AI/ML adoption conversation.

These factors combined mean that there is no single person owning the AI/ML adoption piece within the enterprise. Whether this is due to lack of any ownership or because of multiple deciding voices, the result is the same — a bottleneck in effectively rolling out solutions based on these emerging technologies.

2. The Technology Challenges: Data and IT Infrastructure

Where Is the Data?

AI/ML solutions cannot be created without data, a lot of data that is clear, correct, structured, and accessible. However, while enterprises do have volumes of data, it often falls short in terms of the other characteristics. A few common challenges seen in this regard are:

  • Data is collected in silos across different business functions — collected in various formats and stored across different databases. And while that’s ok, the problem is the absence of a single unified repository from where this data can be accessed.
  • Unstructured data, meaning a mass of data points with no explanation as to what they represent. This lack of context and categorization makes the data redundant for machine learning. If there are no markers for what an ML algorithm is supposed to learn from this data, there is no solution to be created.
  • Missing or incomplete data, i.e there’s information available for all parameters in some cases while missing for certain parameters in other cases in the same data set. These inconsistencies result in skewed or faulty learning, which ultimately leads to failed solutions.

How to Solve

AI/ML solutions designed specifically for decision making, analytics, predictive maintenance, etc. require complete and clean datasets for effective learning.

So, it is critical to map out the complete range of data required to create such a solution for the chosen business use case. If enterprises have this ready, great. If not, they should take the time to harness, clean, and prepare the right datasets so as to ensure success.

Besides this, implementing a data lake architecture, with well designed landing, curation and processing, will go along way in ensuring that you are leveraging the full gamut of your enterprise data.


Data Lake Architecture with AWS - Read blog

The IT Infrastructure Is Outdated and Unprepared

A McKinsey report finds that organizations that are ahead in their digital transformation journey are also the ones successfully adopting AI solutions. This means that an outdated IT infrastructure with clunky legacy systems is a core challenge to AI/ML adoption.

How to Solve

Given how important data is in this whole scheme of things, the data engineering tech stack has to be the best you can get. This means putting solutions in place that can:

  • Identify the disparate sources of data and also classify them as structured or unstructured data
  • Pull this data from the different databases and clean it as per the requirements of the project
  • Bring it all to a central repository where it can be processed into useful information and training datasets for ML algorithms

Depending on the existing technology stack of any enterprise, this could mean different things:

  • If you already have different tools and systems at play that collect, clean, and structure data, you would need to engineer them to work with a central repository.
  • Solve challenges around storing massive volumes of training and generated data, with data lakes, cloud computing, and edge computing.
  • If your disparate databases host a combination of structured and unstructured data, you would need to significantly overhaul your data collection, processing, and storage infrastructure.

Going beyond that, enterprises also have to prepare hardware and software assets that can effectively deliver AI/ML-based solutions.

3. The Resource Challenge: Skilled Teams

Finally, successful AI/ML adoption depends on having a skilled team of professionals who can work to create the right solutions. But because the explosion of practical applications of AI is only a decade old, it’s difficult to find people with the right set of skills. According to the State of Artificial Intelligence report 2017 by Terradata, 34 percent of enterprises state that lack of talent is a key barrier to AI adoption.

How to Solve

The solution here would tie in with the need for a single stakeholder to own AI/ML adoption within the enterprise. They can identify the data science and allied skills gap within the organization with reference to the goals they are trying to achieve and then work with the relevant departments to hire or build the right skills into the organization.

In the meantime, it would be best to work with technology partners who offer skilled AI/ML teams. That’s because:

  • A skilled team taking on a first AI project creates a reliable process roadmap for the internal teams to follow once they are in place.
  • Any roadblocks that crop up in the course of the project are easier resolved by an expert team without resorting in too much trial and error.
  • An outside team will be able to get a bird’s-eye view of the complete technology infrastructure and suggest necessary changes in one go.

Almost every enterprise, irrespective of the industry, will have AI/ML solutions as a key part of their technology landscape over the next five years. However, to ensure that the emerging tech is actually delivering on its potential, a well-planned adoption plan will be critical. This will include tying in the strategy, technology, and resource perspectives to roll out a roadmap that an enterprise will follow. The omission of any one of these aspects could delay successful AI/ML adoption and could hurt the bottom line, especially considering the fact that your competitors could be doing it better and faster.

As an Advanced Consulting Partner in the Amazon Web Services Partner Network, Srijan has certified AWS professionals AI solutions like Lex, Sagemaker, Deep Lens and more. Srijan teams are also adept at leveraging Tensorflow, Python, Hadoop and other associated technologies to engineer niche solutions.

Ready to explore opportunities with AI and machine learning? Book a consultation with our experts.

Topics: Machine Learning & AI

How to conduct AWS cost optimization of your workload

Posted by Gaurav Mishra on Jul 30, 2019 12:04:00 PM

Your enterprise operates on the consumption-based model of AWS, but is your set up cost fully optimized? Are you able to best utilize your resources, achieve an outcome at the lowest possible price point, and meet your functional requirements?

If not, you are underutilizing the capabilities of your AWS cloud.

AWS offers several services and pricing options that can give your enterprise the flexibility to manage both your costs as well as keep the performance at par. And while it is relatively easy to optimize your costs in small environments, to scale successfully across large enterprises you need to follow certain operational best practices, and process automation.

Here’s taking a look at the six AWS cost optimization pillars to follow regardless of your workload or architecture:

Right size your services

AWS gives you the flexibility to adapt your services to meet your current business requirements. It also allows you to shift to the new services option when your demands change, to address new business needs anytime, without penalties or incidental fees.

Thus, through right sizing, you can:

  • use the lowest cost resource that still meets the technical specifications of a specific workload

  • adjust the size of your resources to optimize for costs

  • meet the exact capacity requirements you have without having to overprovision or compromise capacity. This allows you to optimize your AWS workload costs.

Amazon CloudWatch and Amazon CloudWatch Logs are key AWS services that support a right-sizing approach, and allow you to set up monitoring in order to understand your resource utilization.

Appropriately provisioned 

AWS Cloud gives you the ability to modify the attributes of your AWS managed services, in order to ensure there is sufficient capacity to meet your needs. You can turn off resources when they are not being used, and provision systems based on the requirements of your service capacity.

As a result, your excess capacity is kept to a minimum and performance is maximized for end users. This also helps optimize costs to meet your dynamic needs.

AWS Trusted Advisor helps monitors services such as Amazon Redshift and Amazon RDS for resource utilization and active connections. While the AWS Management Console can modify attributes of AWS services, and help align resource needs with changing demand. Amazon CloudWatch is also a key AWS service that supports an appropriately provisioned approach, by enabling you to collect and track metrics of usage.

Leverage the right pricing model

AWS provides a range of pricing models: On-Demand and Spot Instances for variable workloads, and Reserved Instances for predictable workloads. You can choose the right pricing model as per the nature of your workload to optimize your costs.

1. On Demand Instances

In On-Demand Instances, you pay for compute capacity by per hour or per second depending on which instances you run. No long-term commitments or upfront payments are needed. These instances are recommended for applications with short-term or predictable workloads that cannot be interrupted.

For example, in using resources like DynamoDB on demand, there is just the flat hourly rate, and no long-term commitments.

2. Spot Instances

A Spot Instance is an unused EC2 instance that you can bid for. Once your bid exceeds the current spot price (which fluctuates in real time based on demand-and-supply) the instance is launched. The instance can go away anytime the spot price becomes greater than your bid price.

Spot Instances are often available at a discount, and using it can lower your operating costs by up to 90% compared to On-Demand instances. They are ideal for use cases like batch processing, scientific research, image or video processing, financial analysis, and testing.

3. Reserved Instances

Reserved Instances enable you to commit to a period of usage (one or three years) and
save up to 75% over equivalent On-Demand hourly rates. They also provide significantly more savings than On-Demand Instances on applications with predictable usage, without requiring a change to your workload.

AWS Cost Explorer is a free tool to analyze your costs, and identify your expenses on AWS resources, areas that need further analysis, and see trends that can provide a better understanding of your costs.

Geographic selection

Another best practice to architect your solutions is to place your computing resource close to your users. This ensures lower latency, strong data sovereignty and minimizes your costs.

Every AWS region operates within local market conditions, with resource pricing different for each region. It is up to you to make the right geographic selection so that you can run at the lowest possible price globally.

AWS Simple Monthly Calculator can help you estimate the cost to architect your solution in various regions around the world and compare the cost of each. Simultaneously, using AWS CloudFormation or AWS CodeDeploy can help you provision a proof of concept environment in different regions, run workloads, and analyze the exact and complete system costs for each region.

Managed services

Using AWS managed services will not only help remove much of your administrative and operational overheads, but also reduce the cost of managing your infrastructure. Since they operate at cloud scale, the cost per transaction or service is efficiently lowered. And using managed services also helps you save on the license costs.

AWS database, Amazon RDS, Amazon DynamoDB, Amazon Elasticsearch Service, and Amazon EMR are some of the key AWS services that support a managed approach. These services reduce the cost of capabilities and also free up time for your developers and administrators.

Optimize data transfer

Lastly, architecting for a data transfer can help you optimize costs. This involves using content delivery networks to locate data closer to users (effectively done Amazon CloudFront), or using dedicated network links from your premises to AWS (as done by AWS Direct Connect).

Using AWS Direct Connect can help reduce network costs, increase bandwidth, and provide a more consistent network experience than internet based connections.

Starting with these best practices early in your journey will help you establish the right processes and ensure success when you hit scale.

AWS provides a set of cost management tools out of the box to help you manage, monitor, and, ultimately, optimize your costs. Srijan’s is an AWS Advanced Consulting Partner, with AWS certified teams that have the experience of working with a range of AWS products and delivering cost-effective solutions to global enterprises.

Ready to build cloud-native applications with AWS? Just drop us a line and our expert team will be in touch.

Topics: AWS, Cloud

Event Sourcing Microservices and Deploying with Docker

Posted by Gaurav Mishra on Jul 22, 2019 11:47:00 AM

The microservices architecture, while the right choice for enterprises looking to build scalable, future-ready applications, also come with a set of challenges. Moving from monolithic applications to microservices-based architecture means dealing with a set of independent services that could range from 10 to 100s, depending upon the complexity of the application. And managing this distributed system is naturally more nuanced than doing so for an application that is packaged as a single unit. 

The key challenges with a microservices architecture are:

  • Complexity in developing and deploying the microservices architecture with all its moving parts
  • Testing is complex owing to inter-service dependencies
  • Managing inter-service communication
  • Program each service to respond to failure in other services
  • Ensure database consistency even as each service ideally uses independent databases
  • Complexity in developing functions that span multiple services


This blog will concentrate on the best solutions for two of the challenges listed above, namely

  • Complexity of deploying a microservices architecture
  • Ensuring database consistency

We will also take a look at how to do this using AWS services.

Docker for Deploying Microservices

Deploying and orchestrating a host of different microservices to cohesively deliver an application experience is extremely complex. Add to it the fact that there are a few specific prerequisites for deploying these services:

  • Services must be deployed independent of and isolated from each other
  • Deployment process must be fast, if the application is to be truly scalable
  • Deployment process has to be viable, easily repeatable, and cost effective


Once you take all this into account, it might begin to look like you were better off with the monolithic architecture. But while progressing with monoliths simply has you banging against a wall of problems, the challenges with microservice deployment actually have a solution.

It’s Docker

How Docker helps

Microservices that make up an application can be written in different languages, and each service can have multiple instances that need to be deployed. With docker:

  • Each service instance is hosted on separate docker containers
  • These are self-contained packages that have the exact environment needed for the service to run uninterrupted. The containers can be hosted on any EC2 instance, moved around at will, and would still run perfectly.
  • Because they are lightweight, several of these containers can be hosted on a single virtual machine, making them extremely resource-efficient
  • From building a container image to registry to launching a docker container in a production environment can be done in under a minute

All of this put together not only makes deploying a microservices-based application simplified to deploy and manage, but also highly available with minimal downtime.

A few other advantages with docker are:

  • Manually setting up a new development environment with the exact configurations of your application can be extremely difficult. But with Docker Compose, replicating the infrastructure is as easy as deploying a configuration file
  • Faster end-to-end testing of the entire application can be automated, with a Jenkins pipeline that tests every single container image that’s created, to ensure its working as it’s supposed to.

Ensuring Database Consistency by Event Sourcing Microservices

An optimal microservice architecture is one where each service is designed to be completely independent of the others. That is what keeps the entire application agile, scalable, and fail resistant. A key element of ensuring this independence is that each microservice has its own separate database. This keeps the services loosely coupled and prevents any coordination nightmares between different microservice teams.

However, in any applications, there are scenarios where services need to access common data or access information from other service databases to fulfil a task. 

For example, if a social network application has a “user profile” service and a “social post” service, they would need to access each others databases. Sharing a post is handled by the ‘social post’ service, but the action also has to be reflected in the user’s profile, with an increase in the number of posts. And for that to happen, the ‘user profile’ service will need to access the ‘social post’ service’s database.

So, how do you ensure that both databases remain consistent while still keeping them isolated from each other?

The answer is event sourcing microservices.

How event sourcing works

With event sourcing, the process becomes something like this:

  1. Service 1 completes a certain task and updates its database, like when the ‘social post’ service publishes a new post
  2. This creates a particular event denoting a change in state of the application. For example: 1 new post has been created
  3. This event is consumed by Service 2, like the ‘user profile’ service
  4. Based on this event, the Service 2 database is updated, let’s say with the new number of posts published by this user

In applications, this whole process happens by way of creating an event table where every single change in state of the application is logged sequentially, as a series of events. Each microservice has its own event stream, and all other services that have dependencies on it can subscribe to this event stream. Each service can then consume one or more of the events in this stream and use the information to accordingly update their databases.

And that’s how event sourcing can help all service databases maintain consistency at all times, with each state change in the application.

Besides this, there are few other advantages to event sourcing:

  • The reliable list of events can be used to enable other functionalities in the application, like customer notification associated with certain events, or predictive analytics of application usage patterns based on historical event streams.
  • The event streams also becomes a reliable audit log of all state changes in the application, giving valuable information in case of application failure to trace point of origin of an error, or know the application state at any given time.
  • The preserved history of all state changes in the application can be fed into any new feature, and it will immediately be in sync with the rest of the application

How to achieve event sourcing with AWS

With AWS, you get a set of solutions to easily set up event sourcing for your microservices. Here’s a quick look:

  • Use Amazon Kinesis to set up event streams. Though it comes with certain limitations in terms of customization when compared to Kafka, Amazon Kinesis is extremely reliable for event streams. It’s capable of handling most enterprise application requirements while the limitations ensure that you don’t try to design something that’s very customized but ultimately too costly to maintain.
  • Set up Lambda Kinesis subscriptions to get services to tap into event streams. AWS can invoke Lambda functions to periodically pass records from the event stream to the interested services. It can also keep track of the record last read by a service, and initiate the next batch of records from that point onwards. 
  • Leverage Amazon Kinesis Data Firehose to load event data into data repositories, preferably Amazon S3 buckets. You can ensure that Kinesis Firehose Delivery stream is one of the subscribers to any event stream, and it can route the data to S3. The data can be stored here indefinitely and used for whenever you need to play it back. 


And that's how you address two of the most common challenges that enterprise face when shifting from monoliths to future-ready microservices. 

Srijan is assisting enterprises in modernizing applications with microservices architecture, primarily leveraging Docker and Kubernetes. Srijan’s is also an AWS Advanced Consulting Partner, with AWS certified teams that have the experience of working with a range of AWS products and delivering cost-effective solutions to global enterprises.

Ready to build modernize your application architecture with microservices? Just drop us a line and our expert team will be in touch.

Topics: Microservices, Architecture

4 Advantages to building cloud native applications with AWS

Posted by Gaurav Mishra on Jul 16, 2019 11:18:00 AM

The State of Cloud Native Security report 2018 states that 62% of enterprises today choose to go for cloud-native applications for more than half of their new applications. And this number is set to grow by 80% over the next three years. This is no surprise given the fact that most organizations are already heavily invested in their chosen cloud platform, and would like to use it up to its full potential.

Cloud-native applications are essentially those created specifically for the cloud and designed to leverage the gamut of resources available on the cloud. Being ‘cloud-native’ means that an application has a vast operational landscape, capable of being available from wherever the cloud is instead of being tied down to a physical location. 

The three defining characteristics for cloud native applications are:

  • Built with a microservices-based architecture
  • Containerized development
  • Dynamic orchestration of network and database resources


Besides this, agile development methodologies and the CI/CD approach is also common to most cloud-native applications.

The current leaders in cloud services - Amazon Web Services(AWS), Microsoft Azure, and Google Cloud Platform(GCP) - offer a whole host of services to enable the creation of cloud native applications. However, AWS is one of the top performing providers when it comes to cloud infrastructure as a service (IaaS). And this is both in terms of critical analysis, as shown by the 2018 Gartner Magic Quadrant for Cloud IaaS providers, as well as customer preference, as seen in Gartner Peer Insights

AWS is an enterprise favourite on the strength of its global infrastructure network and exhaustive range of serverless computing, storage and database solutions. Supporting giants like Netflix, Airbnb, and Comcast, AWS brings in a set of significant advantages for enterprises creating cloud-native applications.

Here’s a look.


30% of on-premise server capacity is idle at any given time, and yet organizations continuously spend money on upkeep and maintenance. With cloud-native applications, this expenditure on unused resources is completely eliminated. 

AWS offers dynamic scaling, allowing you to increase or decrease resource consumption based on application usage. Tools like Auto Scaling and Elastic Load Balancing help manage consumption of resources, ensuring that the massive AWS infrastructure is available to you on demand. 

But what makes this cost-effective is AWS’ pay-as-you-go models for all their cloud services whether it relates to infrastructure, platform or software. You pay only for the amount of resources used, and for the time frame you used them. This results in massive reduction in cloud expenditure outlays as you no longer have to maintain idle resources in anticipation of a surge. 

There are also secondary cost savings generated with cloud-native applications in the form of multitenancy. Several different apps can dynamically share platform resources leading to reduced expenditure.


Kicking off cloud-native applications is in itself a huge paradigm shift for an organization, in terms of how they function and how application development takes place. On top of that, if your chosen cloud platform also calls for the use of unfamiliar technology in terms of operating systems, languages or databases, things can get really complicated, really fast. Not to mention the added cost of training your team in these new elements.

However, going cloud-native on AWS comes with a lot of flexibility as you can choose any combination of operating system, programming language, web application platform, database, and other services, as per your business needs. So your teams have the advantage of working with the technology tools that they are comfortable with, leaving them more time to focus on building better applications.

Besides that, the AWS platform is easy to use for your application development teams, with well documented resources and APIs, and the AWS Management Console. Once again, this gives a smooth learning curve for your teams, enabling you to start creating cloud-native apps in no time.

No Strings Attached

While AWS does have long term contracts with several enterprises, all their solutions for severless applications operate on a pay-as-you-go basis. There is no minimum usage requirement or even a fixed usage period, with all charges accruing on a per hour basis. So on the off chance that you want to terminate using AWS services, you can do so immediately. Without a lock-in period, your AWS billing stops immediately and you are free to move on to other solutions.


One of the key reasons why cloud-native applications are superior to applications simply migrated to the cloud is that they are built as containerized microservices. This means that:

  • Different business functions are built into independent microservices, only loosely coupled with each other, and failure in one does not cause failure of the whole application
  • The application as a whole, or even parts of it, can be easily moved around because the containers are designed to be self-sufficient and will work uninterrupted, no matter where they are hosted


This is what makes cloud-native applications more reliable and resilient. Whether a particular part of an application fails or an entire server region goes down, the applications will continue to function.

This reliability is further strengthened when backed by AWS’ global infrastructure. AWS Availability Zones (AZ) currently span five continents, with multiple isolated server locations. Each AZ is designed with physical redundancies to ensure uninterrupted performance. Even in the case of an entire AZ going down, AWS systems ensure that your cloud-native applications can seamless move to the next unaffected location. 

Besides this, AWS has a wide network of technology partners that can help enterprises build resilient cloud-native applications. Owing to the fact that AWS Partners go through a stringent certification and verification process, you can rest assured that they bring in the best experience and expertise to your application development process. 

Cloud-native applications gives enterprises the ability to get to market faster and offer improved customer experiences. Consequently, they gain a competitive advantage that’s hard to beat with applications that are just migrated to the cloud. And there seems to be no better cloud IaaS provider for your applications than AWS.

Srijan’s is an AWS Advanced Consulting Partner, with AWS certified teams that have the experience of working with a range of AWS products and delivering solutions to global enterprises. From chatbots with AWS Lex to creating an enterprise decision analytics platform, our teams have in-depth expertise in solving enterprise challenges.

Ready to build cloud-native applications with AWS? Just drop us a line and our expert team will be in touch.

Topics: AWS, Cloud

DIY Bot platforms or build Bots from scratch - What to choose for Your Enterprise?

Posted by Gaurav Mishra on Jun 28, 2019 5:30:00 PM

Enterprises are constantly investing in solutions that can help scale up their operations and automate their internal as well as customer level interactions. Deploying chatbots across different enterprise usecases - accessing data from repository, handling customer queries, collecting feedback, booking tickets etc. - has emerged as one of they key ways to optimize operations. It is estimated that 80% of enterprises will be using chatbots by 2020, to solve a diverse range of business challenges

While that’s a great number, there are things you need to consider before deploying bots for your enterprise. Here’s a look:

Why Do You Need a Chatbot

Before you start with what kind of chatbot to deploy, and which platform to use, it is important to answer the first basic question: why do you need a chatbot? What is the business problem that you are trying to solve? Is it to conduct research, answer queries, give reminders, or something else? Starting with a clear definition of your business problem will give you clarity on how chatbots can solve that problem for you.

Clearly defining the 'why' will involve specifying:

  • The exact use cases of your bot. This will help define the first set of features and capabilities your bot should have
  • The users of the bot. This will help define additional features that might be valuable for the intended users. It will also help create the right conversation flow for the bot.

    Once the 'why' answered, the next question is how? Based on your tech stack capabilities, and the above factors, you can decide whether you want to build a DIY (do it yourself) drag-and-drop chatbot using any of the available bot platforms, or a customised bot from scratch.

    We take a look at the two ways to build a chatbot, and which one you should choose.

Proprietary Vs Open Source Platforms

Chatbots make use of machine learning and natural language processing engines to perform enterprise tasks, and solve related business problems. While typically this would involve a skilled team of developers, there are a number of DIY chatbot platforms that are gaining popularity.

Understanding Proprietary DIY Bot Platforms 

Beginners and non-technical users can simply use platforms like Chatfuel,, Aivo, Botsify etc to build and deploy bots without any coding. The key aspects of machine learning and natural language programming are incorporated into the platform, and all that you have to do is create the conversation flow and the tasks that you want the bot to perform. Designing these bots is as simple as dragging and dropping from a set of pre-defined functionalities, with some scope to modify and customize them for your specific business objectives.

For example, on Chatfuel, all you need to do is write use cases and user stories, follow tutorials, and run some testing. These kind of chatbots can be built using a drag-and-drop interface, and also integrate easily with third party integrations like Salesforce, Zendesk, WhatsApp etc.

Using these platforms, you can create a basic bot in minutes and then tailor it for your usecase. But even with these capabilities and ease of deployment, it may not always be the right choice for your business. Why you ask?

DIY bot platforms come with certain challenges:

Limited Functionality: Building chatbots using these platforms means limiting your bot's capapilities to what the platform can do. There are high chances of your bot missing out on elements like self-learning, responding based on user intent, or carrying out contextual conversations.

And this can severely affect your customer experience, especially if compared to competing organizations that deploy self-learning and intelligent bots.

Limited Extensibility: Most enterprise solutions need to take into account concerns around integration, scalability and extensibility. While your current chatbot usecase might be a simple one, and adequately served by a DIY platform, is it scalable in the long run? Given that most DIY platform offer only a specific set of functionalities, it becomes challenging to scale a DIY bot to perform tasks with greater complexity. 

Compounding this is the fact that DIY platform bots also have limited integration options. In a scenario where an enterprise has used different DIY platforms to build bots for different tasks, the complete bot ecosystem becomes a jumble of different systems straining to work cohesively. Frequent integration challenges with each other as well as with the existing enterprise architecture will likely become a major drain on enterprise resources and productivity.

Building Intelligent Bots from Scratch

Companies like Google and Amazon are investing heavily in to develop extraordinary capabilities in their voice assistants. Alongside, they have created products that bring in powerful machine learning and NLP capabilities for developers. AWS solutions like Amazon Lex and Sagemaker, along with Alexa skills gives enterprise development teams a complete toolbox to conceptualize and design bots from scratch, with a wide range of features. 

What's more important is that these solutions are focused on delivering capabilities like self-learning, understanding user intent, advanced analytics and also customized for people with speech disabilities. So the level of fine-tuned customer experience you can generate with these tools if your build your bot from scratch cannot be matched by DIY bot platforms. 

Yes, building a chatbot from scratch can seem like a complex and time consuming task upfront, but the gains for your business intelligence processes, operations, and user interactions are also higher. With code-based frameworks like AWS,,, or Microsoft Bot, a skilled team of developers can help you create a bot that's tailored to your organization’s needs. It can work across multiple platforms, solve complex use cases, generate analytics, and extend in close collaboration with your enterprise IT infrastructure.

Summing up, here's a look at the proprietary DIY bot platforms vs. building bots from scratch

DIY Bot Platforms vs Building from scratch

What Should You Choose?

Choosing either of these two depends largely on your enterprise requirements, team skills, and project limitations. So if you need a chatbot for a simple task, like feedback collection or setting reminders, it might make sense to use a DIY platform. But its benefits are only for short term. In the long run, you cannot scale up your bots, nor have innumerable use cases, or integration with other platforms, and cannot solve complex enterprise problems with it.

There are also chances that in an effort to keep all bots interoperable, you create all of them on the same platform. But then again, you get locked within a walled garden in terms of functionality and hinder the scalability of your bot ecosystem.

So if you want to ensure that your bots are future ready, and create a foundation that can scale with your enterprise requirements, it makes sense to build your bots from scratch, using an advanced set of machine learning and NLP solutions. And if you do not have a team of developers who can do that for you, you can always get in touch with qualified third party development teams.

Srijan's expert team of certified AWS engineers are working with machine learning and NLP to create interesting enterprise chatbots for diverse industry use cases. We recently built chatbots to access asset performance data for a large cleaning and hygiene solutions enterprise. AWS solutions like Amazon Lex, Amazon Cognito, AWS Lambda, AWS Translate and Amazon S3 were leveraged for the same, eventually leading the client to upsell to a business worth 90 million USD. 

Looking to develop an effective enterprise bot ecosystem? Just drop us a line and our team will get in touch.

Topics: Machine Learning & AI, Enterprises


Write to us

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms Of Service apply. By submitting this form, you agree to our Privacy Policy.

See how our uniquely collaborative work style, can help you redesign your business.

Contact us