Advanced enterprise applications for computer vision

Posted by Urvashi Melwani on Sep 10, 2019 5:55:00 PM

Computer vision is garnering the attention of the public as well as the commercial sector, making it one of the fastest-growing sectors in the field of technology. 

Given the potential diversity of applications, there may be no industry left unreformed eventually by this particular digital development. 

Indeed, according to a report from Tractica, as an industry itself, computer vision is a domain which is gaining zealous momentum and is expected to drive hardware, software, and services market up to $26.2 Billion by 2025.  

8 horizontal lines with 7 vertical bars of different length

Why computer vision is essential for enterprises?

Whenever one looks at objects, people or images, the brain immediately starts examining and identifying familiar faces, strangers, based on the gender - a man or a woman, based on age - a child, an adult, or old and roughly on ethnicity too.

On the other hand, a computer can look at the same image and see nothing. 

An area of computer science, computer vision infuse artificial intelligence to computers and let them see and comprehend to make an understanding. With that, it can recognize objects, faces, avoid obstacles and help people navigate.

It is a very fine amalgamation of machine learning, geometry, and applied maths. 

Using digital images from cameras and videos and deep learning models, machines can accurately identify, classify, and extract insights from visual information such as scanning the barcode.

History of computer vision

Box with text written in it

The experiments in the computer vision eventuated in the 1950s by using some of the first neural networks to find out the edges of an object and to categorize simple objects into categories like circles and squares.

Eventually, the year 1970 was marked as the starting point for the use of computer vision in commercial purpose. It interpreted typed or handwritten text using optical character recognition. This advancement was implemented to evaluate the written text for the blind.

The fast-paced internet in 1990 assisted in making large sets of images available for analysis which in turn, boosted facial recognition. The ever-growing data sets helped in making machines capable of identifying specific people in photos and videos.

There are multiple factors today that have focalized to bring about a revitalization in computer vision-

  • Mobile technology with innate cameras has provided abundant photos to the world.
  • Computing power has become economical and easily accessible.
  • Hardware designed for computer vision and evaluation is readily available now.
  • New algorithms like convolutional neural networks can take benefit of the hardware and software capabilities.

The effect of these refinements and advancements in the computer vision field has been astounding.

Accuracy rates for object identification and classification have spiked from 50 percent to 90 percent in less than a decade- and today’s modern system is ultraprecise and authentic at quickly detecting and reading to visual inputs, unlike humans.

How does computer vision work?
Human face on checked floor in a rectangle

Computer vision works in three fundamental steps-

  • Acquiring an image
    Batches of images can be easily acquired in real-time through video, photos or 3D technology for examination purpose.
  • Processing the image
    Deep learning modules drive much of this process. However, the models are often schooled by first being fed thousands of labeled or pre-identified images
  • Understanding the image
    The final step involves the interpretation of the image, where an object is identified or classified.

AI systems hold enough potential to go a step beyond and take necessary actions based on the cognizance of the images. There are several types of computer vision that are utilized in different ways:

  • Image segmentation walls off an image into multiple sections or pieces to analyze them separately.
  • Object detection singles out a specific object in an image. Advanced object detection techniques diagnose many objects in a single image: a football field, an energetic player, a defensive player, a ball, and so on. These models use an X & Y coordinates to build a design box and identify everything inside the box.
  • Facial recognition is an advanced version of object detection wherein it not only identifies a human face in an image but identifies a specific individual too.
  • Edge detection is a technique that uses the outside edge of an object or landscape to exactly understand what is in the image.
  • Pattern detection is a process that involves the identification of repeated shapes, colors, and other visual indicators in images.
  • Image classification divides images into various categories.
  • Feature matching is a pattern detection feature that matches similarities in images to help classify them.

Simple applications of computer vision may only use one of these techniques but more advanced ones, like computer vision for self-driving cars, count on the consolidation of multiple techniques to accomplish their goals.

Applications of computer vision

Be it data extraction, facial recognition, monitoring machine performance or detecting frauds, the usage of computer vision spans far and wide.

One big circle intersecting with other six circles

Versatility may be the key to the popularity of computer vision applications. This article highlights how they are used to improve business performance across industries.

Detection and Recognition

The ability of computer vision to identify the content of an image or a live video has eliminated the need for humans to perform certain optical tasks, such as recognizing a person’s face, objects, and other patterns. In fact, image recognition software has proven to be more effective as it has an infinite recall of any images that it can check against a database, even partially obscured faces!

Object detection, facial recognition, and product quality analysis to a large extent are some of the tasks that computer vision devotes itself to.

Amazon Rekognition and the controversy 

Amazon Rekognition relies on deep learning technology and computer vision to analyze the billions of images and videos in a fraction of seconds. 

It identifies the objects, people, text, scenes, and activities as well as detects any inappropriate content stored. This tool can be deployed in facial analysis and facial recognition on images and videos for a wide variety of user verification and public safety use cases.

However, this Amazon Rekognition brought forth a controversy which hinted at the use of this tool by multiple US law enforcement agencies contrary to its specified use. As per the reports, some police departments and other organizations, have been using facial recognition technology for years now but the disclosure was still enough to raise questions about Rekognition’s capabilities, and how it might be used or who exactly was using it. Especially in a case where it wrongly identified 28 of the lawmakers with people who had been arrested, amounting to a 5% error rate among legislators.

These proceedings violated the rights of immigrants, communities of color, protesters, and others; putting them at risk since Amazon continued providing the powerful surveillance system to government agencies.

Object Detection

The object detection principle can also be implemented in retail stores. Amazon’s Go store is a prime example of how computer vision can revolutionize retail. This Go store is packed with cameras where the video is fed to computer vision software to understand the behavior of shoppers and accordingly suggest them items based on their preferences. This way, it also keeps a complete track of running inventory of the customers’ shopping basket.

Building  and other objects in the box

Unlike conventional ubiquitous checkout process, the advanced analysis of moving images enables Go shoppers to ‘simply walk out with their purchases, and pay for them online via their Amazon accounts’.

Tesla

Tesla’s autopilot vehicle models come armed with all the resources like ultrasonic sensors to help cars detect trees, buildings, other vehicles, and even pedestrians on the road. Its camera system, called Tesla Vision, embraces vision processing tools to break the environment components (not literally!) and navigate the car through complex roads smoothly and efficiently.

Facial Recognition

A similar concept is being used in smartphones and digital cameras. Whether it is to tag photos on Facebook, or applying Snapchat filters; facial detection is extensively used application of computer vision.

Rectangle with man inside it

Alternately, it can become adept in surveillance measures to scan the face, fingerprint and biometric of security personnel or employee in an office building.

This facial recognition feature can also be introduced in the logistics environment as there is always the risk of warehouse break-ins and truck hijackings resulting in huge financial losses and operation failures.  

Facial recognition feature will only validate authorized operators to link to truck immobilizers, making it difficult for thieves to sneak off with goods, or for untrained operators to place themselves and others at risk.

Lolli & Pops

This candy retailer uses facial recognition technology in its store to identify frequent shoppers or visitors as they walk into the store.

Thus employees are able to provide a personalized shopping experience for them, giving product recommendations and occasional loyalty discounts.

Smart Mirrors: Improved Customer Experience

In fashion retail also, smart mirrors are used by combining AR and computer vision along with cameras to enable face any eye-tracking. These enhancements make sure that whenever shoppers try on outfits virtually, the image in the mirror is depicted accurately from every outlook, increasing the authenticity of the experience.

Girl trying red dress

Further, these smart mirrors also strengthen security by confining the ability of shoplifters to disguise items away on the pretext of trying them on in the fitting rooms.

IBM Watson Visual Recognition

The IBM Watson Visual Recognition service utilizes deep learning algorithms to evaluate images for scenes, objects, faces, and other content.

This visual recognition analyzes the content of images to provide you insights into your visual content.

You can also create and train your custom image classifiers with your own image collections. Its use cases include manufacturing, visual auditing, insurance, social listening, social commerce, retail and education.

Fraud Detection

Srijan’s RPA solution enables automated KYC process for the BFSI sector. It uses processing technology to read, validate details on the respective documents as well as scan and match photographs. Hence, it can detect fraud by verification of passports, ID cards, licenses, etc. It can also be incorporated into product cataloging and outlier detection.

Watch this video to understand better-

 

Stoplift

Computer vision can reduce theft and other losses at retail chains. StopLift makes use of its product ScanItAll for the same. It can find out checkout errors, such as hiding the barcode, accumulating items on top of one another, skipping the scanner and directly covering the commodities.

Smart Verification Software

Mitek’s identity verification software is a cloud-based platform that utilizes AI technology, computer vision, machine learning and deep learning. It ensures that government issued identity documents around the globe, like passport, ID cards, and driver’s licenses, are authentic and valid.

It can simultaneously complete the equivalent of hundreds of forensic check permutations in a fraction of seconds.

Product Defects & Quality Issues

Fujitsu’s Oyama factory uses computer vision to ascertain the production of optimal quality products as well as scrutinize the assembly process. 

Empty bottles with medicines lying next to them

Shelton

This manufacturing firm has a surface inspection system called WebSPECTOR to identify defects, store their images, and detect it among those items which are negatively impacting the production line.

Computer vision plays multiple roles in manufacturing line which can differ from identifying quality issues in supplier parts, defects in leather for footwear manufacturing, checking component presence, and installation process on electronic circuit boards.

Data Extraction & Analysis

Quickly sifting through a data repository, and extracting useful information from images, videos, and documents is another important function of computer vision. This faster and accurate analysis helps in making better decisions in healthcare, agriculture, and other sectors.

Image Processing

Gauss Surgical

This healthcare firm has designed blood monitoring solutions to evaluate the estimated blood loss in real-time during critical medical situations. It utilizes the computer vision to increase blood transfusions, and identify hemorrhage better than the human eye.

Cainthus

It leverages technology to identify cows, based on hiding patterns and facial recognition, and track their food and water intake, heat detection, and behavior patterns. This collective information is then sent to farmers who make predictions about milk production, reproduction management, and overall animal health.

Computer vision applications in agriculture

The practice of using image analysis technology and computer vision has eased the task of monitoring cattle, identification of health issues, such as lameness, which can impact milk yield.Drone in a field

This will notify farmers much earlier than traditional human visual inspections, enabling treatment to be administered way before animals start to suffer unduly and their milk yield reduces.

Computer vision applications have also found their place in arable agriculture and horticulture, where it can visually detect harvest, mainly fruits, vegetables, and nuts & grade them by color, size, and condition.

Consequently, it will save enormously on the cost of such operations as they’ll become less labor-intensive.

Data Extraction from Images

Computer vision can ease data extraction even through PDFs. A good PDF extractor can easily distinguish between the headings, subheadings, color, font size, footnotes, and graphs. This action helps PDF extractor in retrieving relevant and useful information which is human-error free and can be taken into account for further making decisions in different scenarios.

Building a PDF extractor

Srijan built a PDF extractor having capabilities ranging from extracting the model number, description, to language from a PDF file. It can also extract images or tables from the PDF, and also align them against their relative serial numbers. 

Video Analytics

Computer vision in video analytics can aid in finding the velocity of objects in a video, or in the camera itself. Alternatively, it can create a 3D model of a scene fed through a video. That makes it very useful when it comes to self-driving vehicles. It can also be used to monitor a stream of real-time video to identify anomalies in process, mostly complex manufacturing assembly lines or fine mechanical processes.

Men, women, and car on the road

Asset monitoring for cleaning solutions enterprise

The enormous cleaning solutions company incorporated video analytics solutions to detect and verify the machine performance at any given location. A real-time video of the machine was recorded, followed by scraping the video feed data and its automated analysis to evaluate the machine performance.

Merged Reality

When AR and VR are used in combination with computer vision, it is called a merged reality - the next stage of development, where:

  • External cameras and sensors the environment
  • Eye-tracking solutions and gyroscopes position the user

 

This further helps AR & VR systems to:

  • Provide the guidance and directions
  • Save users from hurdles
  • Detect eye and body movement of the user, and adapt the VR environment accordingly

 

Sephora

Their Virtual Artist app now integrated with a live 3D facial recognition let customers see how different makeup products look on their faces, in different light conditions.

Google Translate

Users can simply point their smartphone’s camera at any text, and the Google Translate app will translate it to another language on the screen instantaneously. This is a form of AR in association with computer vision to enable such accurate translation in an instant.

In combination with drones

Computer vision can be leveraged in combination with drones where the task is difficult for humans and involves great risk. This could be-

  • Tracking vehicles and inventory at huge construction sites
  • Creating maps for navigation purpose
  • Site surveys to get an update about the location for the development purpose

Slantrange

They utilize computer vision equipped drones to measure and monitor the condition of crops. The images photographed by drones are forwarded to the SlantView analytics system, which analyzes the data and helps farmers make decisions accordingly.

Insurance Industry

Computer vision in the insurance industry can help in analyzing the damage of assets under policy to decide whom should be offered coverage.

Thus, drones can be used in capturing the image and uploading to the Cloud. Now if this validates the claim of the customer, they will receive the payment. This entire series of tasks can be automated with the help of computer vision.

Computer vision is highly efficient in providing direct benefits to users by reducing development times and creating an end-product that meshes with what the user wants and needs to do. Developers can now easily rely on AI and ML to identify major patterns and hence bestow users with more tailored user-friendly products.

It’s a humongous step towards designing an invincible technology that adapts to users’ needs instantly and predicts their future needs with uncanny accuracy.

The potential of computer vision will only grow with time!

Topics: Machine Learning & AI

Building bots with Amazon Lex is the right solution for your business

Posted by Kimi Mahajan on Aug 23, 2019 2:22:00 PM

Did you know, over 80% of businesses are in favour of using chatbots by 2020? The demand for chatbots doesn’t seem to slow down since it encompasses a wide ecosystem of uses including automation, improving customer experience and mitigating latency.

With more than 65 million businesses using social media channels, chatbots have emerged as winners in grabbing marketing and sales opportunities by acquiring and engaging customers through messengers. 

Amazon Lex is an AWS solution, powered by deep learning functionalities like automatic speech recognition (ASR) and natural language processing (NLU), to publish bots for use across different channels. 

Although, Amazon Lex has emerged as a key player in the highly competitive chatbot market, but is it worth the investment?

Chatbots, but why?

The chatbots aren’t new and have been in use since the late nineties. The first chatbot, Eliza, was built in 1966 at the MIT artificial intelligence laboratory.

Chatbots, but Why?Source: Edureka!

With Chatbots, businesses can get rid of the need for hiring assistants. With minimal initial costs, they can serve the customers 24*7, with flawless customer experience, bringing in better RoI for the business.

With smartphones changing user behaviour, people have started to prefer chats over calls, which explains why businesses need personalised chat feature.

Here's how chatbots can help solve user’s queries quickly and easily:

  • Chatbots work on scripts and carry out actions as per defined workflow, with as many users as possible, without a minute delay. If the chatbot cannot answer an unexpected question it can be channelled to your support team.
  • Are available 24X7, making them the best investment in your business.
  • Can help automate routine mundane tasks, thereby reducing the overwhelming burden and enabling you to better able to focus on areas like sales and marketing, where your expertise is required. 

How Amazon Lex helps bot development?

Amazon Lex is a service which allows enterprises to create chatbots through voice and text inputs in minutes without any coding knowledge, ensuring highly engaging user experience. It enables you to build interfaces by embedding and integrating with a wide range of platforms with the help of deep learning technologies such as:

  1. Automatic Speech Recognition (ASR) for converting speech to text
  2. Natural Language Understanding (NLU) to recognize the intent of the text

The platform helps in developing customized and highly specialized chatbots to interact with your customers efficiently, which comes with a pay-as-you-go feature.

What makes Amazon Lex best?

Amazon Lex-powered bots can become a key competitive advantage for enterprises allowing them to optimize processes and enable cost savings. Let’s understand the benefits of leveraging Amazon Lex to build bots.

  • Automatic speech recognition
    With this unique deep learning feature, it enables you to develop chatbots that offer highly engaging user experiences by allowing you to initiate lifelike conversations with your customers.
  • Natural language understanding
    The unique in-depth learning feature enables your chatbot to understand the intent of the conversation. It allows delivering a highly interactive user experience based on human-like conversations. The lex powered chatbots have in-built ability to process the information enabling you to quickly and easily build sophisticated, natural language bots.
  • Versatility and automatic scaling
    By allowing you to master the above two functionalities, it enables you to define new product categories.

 

Amazon Lex relieves you from the responsibility of the management of infrastructure by giving the provision of paying only for the features that you used. It allows you to build, test, and deploy your chatbots directly from the lex console and enables you to easily publish your voice or text chatbots to mobile devices, web apps, and chat services. Once published, your Amazon Lex bot processes voice or text input in conversation with your end-users and requires minimal to almost no human intervention post deployment.

  • Seamless experience with easy to use console
    Amazon interface gives just the right interface with easy to use, point and click features, guiding almost anyone through the process of building chatbots in a matter of minutes. With a few example phrases, Amazon Lex builds a conversation interface model to answer queries in text and audio format and complete sophisticated tasks.

Watch the video here to know how easy it is to create a chatbot of your own with Amazon Lex:

Amazon Lex - Quickly Build Conversational Interfaces (1)

 

  • Flawless integration with almost any platform
    Amazon Lex can be easily integrated with many other AWS services including Amazon Cognito, and Amazon DynamoDB. AWS platform takes care of the bot's security, monitoring, user authentication, business logic, storage and mobile app development.
  • Cost-effective solution for bot development
    Amazon Lex doesn't involve any upfront costs or minimum fees except for text or speech requests made. It comes with a pay-as-you-go pricing model and with a minimal cost per request, it remains a cost-effective way to build conversational interfaces.

Srijan built Amazon Lex-powered bot for a global cleaning solution

Srijan worked with a global cleaning solution to help onboard assets to the IoT ecosystem, and collect, monitor, and analyze sensor data in real-time with the help of data visualization dashboards. This helped them track equipment conditions across their customer sites and automatically offer to serve as and when required. 

Srijan then suggested building a chatbot in addition to the dashboards to find relevant data. Chatbots worked on “ask-and-answer” approach and on simply asking a query, the bot, available via mobile apps would analyze necessary data to give an answer.

Amazon Lex was used as the base interface for building the chatbots. Deep learning functionalities and natural language understanding in Lex allowed creating conversations that accurately captured the business logic. The bot built was equipped in delivering information on equipment performance metrics, the equipment health, savings potential and could help with which equipment to use in which scenario.

Building bots using Lex helped the client in the upselling business worth 90 million USD and increased the user retention of the beta user group from 8% to 42%, boosting the product upselling. It also allowed the analytics team to automate a lot of tasks which earlier happened over spreadsheets.

Read the case study - Developing Enterprise Chatbots for Instant Access to Asset Performance Data

Contact us

Srijan leverages its expertise and knowledge of Amazon Lex to create unique conversational interfaces using Amazon Lex for your enterprise. Our team of developers are skilled in creating world-class experiences for your clients located globally.

Get in touch to partner with us for your chatbot requirements.

Topics: Machine Learning & AI, MarTech

Integrating Drupal with AWS Machine Learning

Posted by Kimi Mahajan on Aug 23, 2019 11:24:00 AM

With enterprises looking for ways to stay ahead of the curve in the growing digital age, machine learning is providing them with the needed boost for seamless digital customer experience.

Machine learning algorithms can transform your Drupal website into an interactive CMS and can come up with relevant service recommendations targeting each individual customer needs by understanding their behavioural pattern.


Machine Learning integrated Drupal website ensures effortless content management and publishing, better targeting and empowering your enterprise to craft personalized experiences for your customers. It automates the customer service tasks and frees up your customer support teams, subsequently impacting RoI.

However, with various big names competing in the market, let’s look at how Amazon’s Machine Learning stands out amongst all and provides customised offerings by integrating with Drupal.

Benefits of Integrating AWS Machine Learning with Drupal

AWS offers the widest set of machine learning services ranging from pre-trained AI services for computer vision, language, recommendations, and forecasting. These capabilities are built on the most comprehensive cloud platform and are optimized without compromising security. Let’s look at the host of advantages it offers when integrated with Drupal.

Search Functionality

One of the major problems encountered while searching on a website is the usage of exact keyword. If the content uses a related keyword, you will not be able to find it without using the correct keyword.

This problem can be solved by using machine learning to train the search algorithm to look for synonyms and display related results. The search functionality can also be improved by using automatically filtering as per past reads, the search results according to the past reads, click-through rate, etc.

Amazon Cloudsearch is designed to help users improve the search capabilities of their applications and services by setting up a scalable search domain solution with low latency and to handle high throughput.

Image Captioning

Amazon Machine Learning helps in automatic generation of related captions for all images on the website by analyzing the image content. The admin would have the right to configure whether the captions should be added automatically or after manual approval, saving a lot of time for the content curators and administrators of the website.

Amazon Rekognition helps search several images to find content within them and easily helps segregate them almost effortlessly with minimal human interaction.

Website Personalization

Machine learning ensures users get to view tailored content on websites as per their favorite reads and searches by assigning them unique identifier (UID) and tracking their behaviour (clicks, searches, favourite reads etc) on the website for personalized web experience.

Machine learning analyzes the data connected with the user’s UID and provides personalized website content.

Amazon Personalize is a machine learning service which makes it easy for developers to create individualized recommendations for its customers. It saves upto 60% of the time needed to set up and tune the infrastructure for the machine learning models as compared to setting own environment.

Another natural language processing (NLP) service that uses machine learning to find insights and relationships in text is Amazon Comprehend. It easily finds out which topics are the most popular on the internet for easy recommendation. So, when you’re trying to add tags to an article, instead of searching through all possible options, it allows you to see suggested tags that sync up with the topic.

Vulnerability Scanning

A website is always exposed to potential threats, with a risk to lose customer confidential data.

Using machine learning, Drupal based websites can be made secure and immune to data loss by automatically scanning themselves for any vulnerabilities and notifying the administrator about them. This gives a great advantage to websites and also help them save the extra cost spent on using external software for this purpose.

Amazon Inspector is an automated security assessment service, which helps improve the security and compliance of the website deployed on AWS and assesses it for exposure, vulnerabilities, and deviations from best practices.

Voice-Based Operations

With machine learning, it’s possible to control and navigate your website by using your voice. With Drupal standing by its commitment towards accessibility, when integrated with Amazon Machine Learning features, it promotes inclusion to make web content more accessible to people.

Amazon Transcribe is an automatic speech recognition (ASR) service. When integrated with a Drupal website, it benefits the media industry with live subtitling of news or shows, video game companies by streaming transcription to help hearing-impaired players, enables stenography in courtrooms in legal domain, helps lawyers to make legal annotations on top of live transcripts, and enables business productivity by leveraging real-time transcription to capture meeting notes.

The future of websites looks interesting and is predicted to benefit users through seamless experience by data and behavior analysis. The benefits of integrating Amazon Machine Learning with Drupal will clearly give it a greater advantage over other CMSs and will pave the way for a brighter future and better roadmap.

Srijan has certified AWS professionals and an expertise in AWS competencies. Contact us to get started with the conversation.

Topics: Drupal, AWS, Machine Learning & AI, Planet Drupal

3 Key challenges to AI adoption and how to solve them

Posted by Gaurav Mishra on Aug 21, 2019 3:13:00 PM

While the potential ROI from investing in artificial intelligence and machine learning is abundantly clear to enterprise leaders, the actual adoption of the technology has been slower than you would believe. Yes, some very prominent brands across industries are leveraging it to drive significant revenues. But a 2018 survey by IDG finds that only 1 out of 3 AI projects across organizations actually succeed.

Here's a look at the three key challenges that enterprises are faced with on the AI adoption curve and how to get past them.

1. The Strategic Challenges: Use Cases and Ownership

Choosing the Right "First" Problem to Solve

Investing in AI/ML is a big decision for most enterprises, and there is the expectation of seeing some significant ROI from it within the first six to eight months. For that to happen, it’s important to choose the right business use case to optimize with AI/ML.

A lot of enterprises in the initial stages make the mistake of leveraging the technology for smaller fringe projects or choose projects where one can see some ready data available that is clean and categorized. However, while choosing a "small" first project is a good idea, ROI can be showcased only if the project is a key part of the core business.

How to Solve

So your basic sample set of use cases to choose from should only comprise of tasks that are a part of your key enterprise revenue streams. Out of these, identifying the right use case to test and showcase the power of AI/ML solutions can be done by answering a few guiding questions:

  • Which tasks involve making decisions based on searching and analyzing a huge amount of data — historical or real time? For example, customer service, customer experience personalization, and analyzing sales data.
  • Which tasks have a low tolerance for errors? For example, sensitive manufacturing processes or verification workflows that ensure compliance with various industry standards.
  • Which tasks are repetitive and time and effort intensive, but have to be performed at scale nevertheless? For example, identifying and tracking counterfeit products.

Picking a task that fulfils any one or more of the above criteria is a great use case to test out your first AI/ML solution. That’s because, in each of these cases, a successful solution deployment will lead to very tangible benefits — better decision making, near-zero errors, or reduction in operating costs by eliminating resources employed in repetitive processes.

Yes, you will need to have the right type of data in place to even begin creating an AI-powered solution, and several other factors have to fall in place for it to be successful. However, you’ll at least have ensured that when the solution does succeed, it’s compelling enough for stakeholders to take notice.

Owning the AI/ML Adoption Piece

The PwC Digital IQ report points out that of late, 68 percent of an enterprise’s technology expenditure is outside the CTO's budget. This means that business stakeholders across marketing, sales, HR, accounting, etc. are independently investing in technology solutions. And there are high chances that some of these are AI/ML-based.

So, you have the emerging tech being used in some aspect of the business without the core IT team or the larger enterprise being aware of it. Any significant benefits accruing out of these solutions are also not being shared across the organization.

Additionally, new stakeholders within the enterprise who partly own certain technology pieces like the Chief Digital Officer or Head of Data and Analytics also have significant say in the AI/ML adoption conversation.

These factors combined mean that there is no single person owning the AI/ML adoption piece within the enterprise. Whether this is due to lack of any ownership or because of multiple deciding voices, the result is the same — a bottleneck in effectively rolling out solutions based on these emerging technologies.

2. The Technology Challenges: Data and IT Infrastructure

Where Is the Data?

AI/ML solutions cannot be created without data, a lot of data that is clear, correct, structured, and accessible. However, while enterprises do have volumes of data, it often falls short in terms of the other characteristics. A few common challenges seen in this regard are:

  • Data is collected in silos across different business functions — collected in various formats and stored across different databases. And while that’s ok, the problem is the absence of a single unified repository from where this data can be accessed.
  • Unstructured data, meaning a mass of data points with no explanation as to what they represent. This lack of context and categorization makes the data redundant for machine learning. If there are no markers for what an ML algorithm is supposed to learn from this data, there is no solution to be created.
  • Missing or incomplete data, i.e there’s information available for all parameters in some cases while missing for certain parameters in other cases in the same data set. These inconsistencies result in skewed or faulty learning, which ultimately leads to failed solutions.

How to Solve

AI/ML solutions designed specifically for decision making, analytics, predictive maintenance, etc. require complete and clean datasets for effective learning.

So, it is critical to map out the complete range of data required to create such a solution for the chosen business use case. If enterprises have this ready, great. If not, they should take the time to harness, clean, and prepare the right datasets so as to ensure success.

Besides this, implementing a data lake architecture, with well designed landing, curation and processing, will go along way in ensuring that you are leveraging the full gamut of your enterprise data.

 

Data Lake Architecture with AWS - Read blog

The IT Infrastructure Is Outdated and Unprepared

A McKinsey report finds that organizations that are ahead in their digital transformation journey are also the ones successfully adopting AI solutions. This means that an outdated IT infrastructure with clunky legacy systems is a core challenge to AI/ML adoption.

How to Solve

Given how important data is in this whole scheme of things, the data engineering tech stack has to be the best you can get. This means putting solutions in place that can:

  • Identify the disparate sources of data and also classify them as structured or unstructured data
  • Pull this data from the different databases and clean it as per the requirements of the project
  • Bring it all to a central repository where it can be processed into useful information and training datasets for ML algorithms

Depending on the existing technology stack of any enterprise, this could mean different things:

  • If you already have different tools and systems at play that collect, clean, and structure data, you would need to engineer them to work with a central repository.
  • Solve challenges around storing massive volumes of training and generated data, with data lakes, cloud computing, and edge computing.
  • If your disparate databases host a combination of structured and unstructured data, you would need to significantly overhaul your data collection, processing, and storage infrastructure.

Going beyond that, enterprises also have to prepare hardware and software assets that can effectively deliver AI/ML-based solutions.

3. The Resource Challenge: Skilled Teams

Finally, successful AI/ML adoption depends on having a skilled team of professionals who can work to create the right solutions. But because the explosion of practical applications of AI is only a decade old, it’s difficult to find people with the right set of skills. According to the State of Artificial Intelligence report 2017 by Terradata, 34 percent of enterprises state that lack of talent is a key barrier to AI adoption.

How to Solve

The solution here would tie in with the need for a single stakeholder to own AI/ML adoption within the enterprise. They can identify the data science and allied skills gap within the organization with reference to the goals they are trying to achieve and then work with the relevant departments to hire or build the right skills into the organization.

In the meantime, it would be best to work with technology partners who offer skilled AI/ML teams. That’s because:

  • A skilled team taking on a first AI project creates a reliable process roadmap for the internal teams to follow once they are in place.
  • Any roadblocks that crop up in the course of the project are easier resolved by an expert team without resorting in too much trial and error.
  • An outside team will be able to get a bird’s-eye view of the complete technology infrastructure and suggest necessary changes in one go.

Almost every enterprise, irrespective of the industry, will have AI/ML solutions as a key part of their technology landscape over the next five years. However, to ensure that the emerging tech is actually delivering on its potential, a well-planned adoption plan will be critical. This will include tying in the strategy, technology, and resource perspectives to roll out a roadmap that an enterprise will follow. The omission of any one of these aspects could delay successful AI/ML adoption and could hurt the bottom line, especially considering the fact that your competitors could be doing it better and faster.

As an Advanced Consulting Partner in the Amazon Web Services Partner Network, Srijan has certified AWS professionals AI solutions like Lex, Sagemaker, Deep Lens and more. Srijan teams are also adept at leveraging Tensorflow, Python, Hadoop and other associated technologies to engineer niche solutions.

Ready to explore opportunities with AI and machine learning? Book a consultation with our experts.



Topics: Machine Learning & AI

DIY Bot platforms or build Bots from scratch - What to choose for Your Enterprise?

Posted by Gaurav Mishra on Jun 28, 2019 5:30:00 PM

Enterprises are constantly investing in solutions that can help scale up their operations and automate their internal as well as customer level interactions. Deploying chatbots across different enterprise usecases - accessing data from repository, handling customer queries, collecting feedback, booking tickets etc. - has emerged as one of they key ways to optimize operations. It is estimated that 80% of enterprises will be using chatbots by 2020, to solve a diverse range of business challenges

While that’s a great number, there are things you need to consider before deploying bots for your enterprise. Here’s a look:

Why Do You Need a Chatbot

Before you start with what kind of chatbot to deploy, and which platform to use, it is important to answer the first basic question: why do you need a chatbot? What is the business problem that you are trying to solve? Is it to conduct research, answer queries, give reminders, or something else? Starting with a clear definition of your business problem will give you clarity on how chatbots can solve that problem for you.

Clearly defining the 'why' will involve specifying:

  • The exact use cases of your bot. This will help define the first set of features and capabilities your bot should have
  • The users of the bot. This will help define additional features that might be valuable for the intended users. It will also help create the right conversation flow for the bot.

    Once the 'why' answered, the next question is how? Based on your tech stack capabilities, and the above factors, you can decide whether you want to build a DIY (do it yourself) drag-and-drop chatbot using any of the available bot platforms, or a customised bot from scratch.

    We take a look at the two ways to build a chatbot, and which one you should choose.

Proprietary Vs Open Source Platforms

Chatbots make use of machine learning and natural language processing engines to perform enterprise tasks, and solve related business problems. While typically this would involve a skilled team of developers, there are a number of DIY chatbot platforms that are gaining popularity.

Understanding Proprietary DIY Bot Platforms 

Beginners and non-technical users can simply use platforms like Chatfuel, Motion.ai, Aivo, Botsify etc to build and deploy bots without any coding. The key aspects of machine learning and natural language programming are incorporated into the platform, and all that you have to do is create the conversation flow and the tasks that you want the bot to perform. Designing these bots is as simple as dragging and dropping from a set of pre-defined functionalities, with some scope to modify and customize them for your specific business objectives.

For example, on Chatfuel, all you need to do is write use cases and user stories, follow tutorials, and run some testing. These kind of chatbots can be built using a drag-and-drop interface, and also integrate easily with third party integrations like Salesforce, Zendesk, WhatsApp etc.

Using these platforms, you can create a basic bot in minutes and then tailor it for your usecase. But even with these capabilities and ease of deployment, it may not always be the right choice for your business. Why you ask?

DIY bot platforms come with certain challenges:

Limited Functionality: Building chatbots using these platforms means limiting your bot's capapilities to what the platform can do. There are high chances of your bot missing out on elements like self-learning, responding based on user intent, or carrying out contextual conversations.

And this can severely affect your customer experience, especially if compared to competing organizations that deploy self-learning and intelligent bots.

Limited Extensibility: Most enterprise solutions need to take into account concerns around integration, scalability and extensibility. While your current chatbot usecase might be a simple one, and adequately served by a DIY platform, is it scalable in the long run? Given that most DIY platform offer only a specific set of functionalities, it becomes challenging to scale a DIY bot to perform tasks with greater complexity. 

Compounding this is the fact that DIY platform bots also have limited integration options. In a scenario where an enterprise has used different DIY platforms to build bots for different tasks, the complete bot ecosystem becomes a jumble of different systems straining to work cohesively. Frequent integration challenges with each other as well as with the existing enterprise architecture will likely become a major drain on enterprise resources and productivity.

Building Intelligent Bots from Scratch

Companies like Google and Amazon are investing heavily in to develop extraordinary capabilities in their voice assistants. Alongside, they have created products that bring in powerful machine learning and NLP capabilities for developers. AWS solutions like Amazon Lex and Sagemaker, along with Alexa skills gives enterprise development teams a complete toolbox to conceptualize and design bots from scratch, with a wide range of features. 

What's more important is that these solutions are focused on delivering capabilities like self-learning, understanding user intent, advanced analytics and also customized for people with speech disabilities. So the level of fine-tuned customer experience you can generate with these tools if your build your bot from scratch cannot be matched by DIY bot platforms. 

Yes, building a chatbot from scratch can seem like a complex and time consuming task upfront, but the gains for your business intelligence processes, operations, and user interactions are also higher. With code-based frameworks like AWS, Wit.ai, API.ai, or Microsoft Bot, a skilled team of developers can help you create a bot that's tailored to your organization’s needs. It can work across multiple platforms, solve complex use cases, generate analytics, and extend in close collaboration with your enterprise IT infrastructure.

Summing up, here's a look at the proprietary DIY bot platforms vs. building bots from scratch

DIY Bot Platforms vs Building from scratch

What Should You Choose?

Choosing either of these two depends largely on your enterprise requirements, team skills, and project limitations. So if you need a chatbot for a simple task, like feedback collection or setting reminders, it might make sense to use a DIY platform. But its benefits are only for short term. In the long run, you cannot scale up your bots, nor have innumerable use cases, or integration with other platforms, and cannot solve complex enterprise problems with it.

There are also chances that in an effort to keep all bots interoperable, you create all of them on the same platform. But then again, you get locked within a walled garden in terms of functionality and hinder the scalability of your bot ecosystem.

So if you want to ensure that your bots are future ready, and create a foundation that can scale with your enterprise requirements, it makes sense to build your bots from scratch, using an advanced set of machine learning and NLP solutions. And if you do not have a team of developers who can do that for you, you can always get in touch with qualified third party development teams.

Srijan's expert team of certified AWS engineers are working with machine learning and NLP to create interesting enterprise chatbots for diverse industry use cases. We recently built chatbots to access asset performance data for a large cleaning and hygiene solutions enterprise. AWS solutions like Amazon Lex, Amazon Cognito, AWS Lambda, AWS Translate and Amazon S3 were leveraged for the same, eventually leading the client to upsell to a business worth 90 million USD. 

Looking to develop an effective enterprise bot ecosystem? Just drop us a line and our team will get in touch.

Topics: Machine Learning & AI, Enterprises

Creating an Amazon Lex Bot with Facebook Messenger

Posted by Ishan Yadav on May 2, 2019 4:51:00 PM

Here’s a blog on how you can create an Amazon Lex bot with the Facebook messenger platform. Take a look at the steps:

1.Publish the Bot

a. In the Amazon Lex console, choose one of the bots you created.

b. Verify that the console shows the $LATEST as the bot version next to the bot’s  name.

c. Choose Publish.

d. On the Publish botname wizard, specify the alias BETA, and then choose Publish.

e. Verify that the Amazon Lex console shows the new version next to the bot’s name.

publish-pizza-ordering-srijan-technologies

2. Create a Facebook Application

On the Facebook developer portal, create a Facebook application and a Facebook page. For instructions, refer to this Quick Start documentation on the Facebook Messenger platform documentation. Jot down the following:

  • The App Secret for the Facebook App

  • The Page Access Token for the Facebook page

3. Integrate Facebook Messenger with the Amazon Lex Bot 

  1. To integrate Facebook Messenger with your bot

  • Sign in to the AWS Management Console and open the Amazon Lex console at https://console.aws.amazon.com/lex/.

  • Choose your Amazon Lex bot.

  • Go to Channels.

  • Select the category Facebook under Chatbots. The console will display the Facebook integration page.

  • On the Facebook integration page, do the following:

      • Type the following name: BotFacebookAssociation.

      • For KMS key, choose aws/lex .

      • For Alias, choose the bot alias.

      • For Verify token, type a token. This can be any string that you choose (for example, ExampleToken). This token can be used later in the Facebook developer portal when you set up the webhook.

      • For Page access token, type the token that you obtained from Facebook in Step 2.

      • For App secret key, type the key that you obtained from Facebook in Step 2.

for-app-secret-key-srijan-technologies

  • Choose Activate.

  • The console creates the bot channel association and returns a callback URL. Write down this URL.

2. On the Facebook developer portal, choose your app.

3. Choose the Messenger product, and select Setup webhooks in the Webhooks section of the page.

4. For instructions, refer to the Quick Start document on the Facebook Messenger platform.

5. On the webhook page of the subscription wizard, do the following:       

  • For Callback URL, type the callback URL provided in the Amazon Lex console earlier in the procedure.

  • For Verify Token, type the same token that you used in Amazon Lex.

  • Choose Subscription Fields (messages, messaging_postbacks, and messaging_optins).

  • Verify and Save. This initiates a handshake between Facebook and Amazon Lex.

6. Enable Webhooks integration. Select the page that you created, and then subscribe.

    Note: If you update or recreate a webhook, unsubscribe and then resubscribe to the page.

4. Take your bot live for everyone

To allow your bot to send and receive messages, we will be needing a special permission from facebook, “pages_messaging."

Follow the official facebook document to achieve the same.

And that's how you create a chatbot with Facebook Messenger. Let me know how it goes for you, or if you would have done anything differently. Would love to hear new tricks on doing this.

Topics: AWS, Machine Learning & AI, MarTech

Amazon Lex and the possibilities it holds for Enterprises

Posted by Sanjay Rohila on Feb 28, 2019 2:52:00 PM

Amazon Lex is an AWS solution that allows developers to publish voice or chat bots for use across different mobile, web and chat platforms. It can listen, understand user intent, and respond to context. Powered by deep learning functionalities like automatic speech recognition (ASR) and natural language processing (NLU), Lex is also the technology behind Alexa devices. Available now in the open, it can be easily leveraged by enterprises to build their own digital assistants.

Amazon Lex for Enterprises

For enterprises, Lex-powered applications can become a key competitive advantage, allowing them to optimize processes and enable cost savings. A few key aspects where Amazon Lex can assist are:

Performing User-based Applications

Lex can help build bots capable of providing information, or addressing user requests and queries. It can perform applications like ordering food, booking tickets, and accessing bank account.

Made possible with the help of the ARS and NLU, these capabilities can help create powerful interfaces customer-facing mobile applications. Such a voice or text chat interface on mobile devices can help users perform tasks that involve a series of steps played out in a conversational format. Further, the integration of Lex with Amazon Cognito helps developers control user management, authentication, and sync across all devices.

For example, healthcare enterprises can enable patients to schedule appointments at their facility with Lex powered bots. The patient can send a text request via his mobile application for “an appointment on Monday”.

  • Amazon Lex will recognize that an appointment has been requested, and will ask the user for a “preferred time on Monday”.
  • The user responds with a text, say, “1 pm”.
  • Lex will reserve this appointment time for the user once the account information is retrieved.
  • It will further notify the patient that “an appointment time of 1 pm has been finalised on Monday”.

 

Similarly, tasks like opening bank accounts, ordering food, or finding the right dress at a retail store can all be accomplished via Lex-powered bots.

Enabling Device Interactions

Lex also helps you build highly interactive and conversational user experiences for connected devices ranging from vehicles, to wearables, and other appliances.

For example, a wearables company can have Lex powered bots installed on its products for providing information like day, date and weather. So when the user makes a request like, “temperature in California”, Amazon Lex on the device recognizes it and responds in an appropriate manner.

  • It can further inquire, “Celsius or Fahrenheit?” 
  • And on receiving an answer “Celsius”, it will retrieve the information with the help of other AWS services involved

This ability to imbibe everyday accessories with an intelligent digital assistant allows brands to always exist in their customers immediate environment. And that means an exponential rise in brands recall and customer retention.

Enhancing Enterprise Productivity

Whether it is checking your sales data from Salesforce, marketing performance from HubSpot, or customer service status from Zendesk, you can do it all and more, directly with your chatbots. Lex enables you to build bots that connect to a variety of such enterprise productivity tools via AWS Lambda functions.

So, if an employee wants to access the “sales numbers for the month of December”, he can simply ask the bot on his system. Lex will recognize this as a request, and pull data from relevant enterprise systems like Salesforce or proprietary BI dashboards. Once the data is received, it will deliver it to the executive on his device and platform of choice.

This helps enterprises streamline their operations, and improve organizational productivity. 

Benefits of Deploying Lex for Your Enterprise

Ease of usage: Amazon Lex lets you build your own bot in minutes, no deep learning expertise required. Once you have the basic objective of the bot mapped out, you can specify the conversation flow, and Lex will build a natural language model to ask and respond to user queries.

Seamless deployment and integration: A Lex powered bot has native interoperability with other AWS services like Cognito, Lambda, and CloudWatch. It can scale automatically, and you need not worry about provisioning hardware or managing infrastructure to power your bot experience.

High quality ASR and NLU: Lex enables your bots to understand the intent behind the input. It can then subsequently fulfil the user intent by invoking the appropriate response.

Multi-turn conversations: With the help of Lex, you can build multi-turn conversations for your bots. This means that once an intent has been identified, users will be prompted a series of next questions to extract the required information needed for giving the right answer. For example, if “book hotel” is the intent, the user is prompted for the location, check-in date, number of nights, etc.

Cost effectiveness - Amazon Lex has no upfront costs or minimum fees. With a pay-as-you-go model, users are charged only for the text or speech requests made. And with the Amazon Lex free tier, you can try it without any initial investment.

How Srijan can Help

AWS has a broad range of AI and Deep Learning solutions to help enterprises build and deploy intelligent products and services. But you also need a skilled team that can evaluate your business requirements, and choose the right AWS deep learning solutions that fit the bill. That’s where Srijan teams get into the game.

Srijan teams are adept at leveraging Amazon Lex to deliver a range of services:

Ready to leverage conversational interface for your enterprise? Let's brainstorm to explore where your enterprise can best leverage Lex-powered bots.

Topics: Machine Learning & AI, Enterprises

How We Built an Intelligent Automation Solution for KYC Validation

Posted by Sriram Sitaraman on Feb 15, 2019 1:52:00 PM

Financial institutions sift through a huge volume of documents as a key part of their operational processes. More importantly, the need for regulatory compliance means there is very low tolerance for error in these tasks.

However, document verification and processing for KYC validation, insurance claims, customer onboarding etc. are time-consuming processes across enterprises. By recent estimates, 26 days is the average customer on-boarding time for financial institutions. Organizations are also spending a lot on these processes, as they retain large teams to do the work manually. And scaling up operations just means employing more people.

Is there a way around these challenges?

Intelligent Automation Solution

While Robotic Process Automation (RPA) has a mainstream role in automating many of the manual processes in the BFSI sector. But this particular task requires AI with advanced Machine Learning algorithms to understand the documents in context. This is Intelligent Automation solution - blending AI with automation, which can create solutions that can read the documents, understand the content in context, and find patterns in the data.  

At Srijan, we created a POC for an Intelligent Automation solution for (KYC) validation, that can automate a key portion of the process. The solution employs deep-learning algorithm to scan documents and images uploaded by end-users, and classify them into pre-programmed categories.

Here’s a look.

 

 

The solutions is designed using the following technologies:

  • Convoluted Neural Network (CNN) using Python and TensorFlow
  • OpenCV for Computer Vision
  • OCR and MRZ packages

How It Works

The solution uses a combination of deep-learning based image recognition and classification models as well as Optical Character Recognition (OCR).  It is capable of:

  • understanding given text or image material

  • acting upon it according to a pre-trained set of rules

Let’s say we are working with passports submitted during the KYC process. Here’s what the solution does:

  • Scanning - to extract personal details and passport expiry dates

    • “Read” the passport, extract different sections of the main page, using OCR to read certain sections

    • Computer Vision solutions leveraging OpenCV are used to read the machine-readable zones in the passport

    • Deep Learning algorithms leveraging Tensorflow framework and OpenCV extract the photograph from the passport, as well as identify any “Cancellation” or other stamps

  • Compare extracted information with information available in the database, to validate submitted proof document

  • Based on the above comparison and validation, the solution can classify the document submitted, in this case the passport, as verified, expired, cancelled, or a data mismatch.

  • Cases that cannot be categorized with appropriate degree of accuracy or confidence are marked for manual classification

  • In case of manual intervention, a workflow is created where the operations team can validate manually and classify them

  • The model learns from manual classification, and over time can spot patterns and closely mirror the manual results. This is accomplished by automated retraining of the model including the newer data and manual classification data

How This Helps

With the KYC validation solution, enterprises can automate repetitive manual processes, achieving:

  • Speed: Faster turnaround at most stages of manual processes, to solve scalability challenges and time-critical needs. For example: document verification in 1/10th of the time taken manually

  • Accuracy: Rule-based algorithms executed by software makes sure that there is near-zero margin of error in processes

  • Efficiency: Intelligent automation means tasks are done efficiently, compliant to standard processes, and with minimal need for manual intervention. For example: reduce manual efforts for KYC verification by 70%

  • Resource Management: As repetitive processes are automated, organizations have the freedom to utilize their human resources for more value-added tasks.

Automation just a segment of the KYC validation can bring in a host of significant benefits, as outlined above. But the solutions can be extended to other BFSI operations, or even other industry use cases to deliver similar gains:

  • Passport checks at airports

  • Processing insurance claim documents

  • Reconcile financial statements

  • Resolve credit card disputes

  • Any other manual & repetitive processes that require documents to be validated or reviewed

Have repetitive manual processes that you think can be automated? Looking to increase cost saving on operations without compromising quality and productivity?

Let’s start the conversation on how Srijan’s experts teams can help identify key opportunities to deploy intelligent automation for your business.

Topics: Machine Learning & AI, Data Engineering & Analytics

APL: Alexa Presentation Language-basics

Posted by Sanjay Rohila on Feb 4, 2019 11:32:00 AM

I have been using display templates for quite a while now for screen enabled Alexa devices. There are few templates and directives we can use in response for display devices. But there is not as much in customization we can do about templates and layout.

Alexa Presentation Language (APL) is a beta feature which gives lot more power to developers. APL is a new directive type in DisplayRenderer. To give a perspective on what we can do is, below is the simple view I have created (below is code which is responsible for this view):

image showing header and footer block

A Basic APL document must have the type, version, and mainTemplate.

{
"type": "APL",
"version": "1.0",
"mainTemplate": {
"item": {
"type": "Text",
"text": "Hello, world"
}
}
}

There are lots of components (Text, Image, Video etc) that we can use in APL. We can also create a sequence for repeating set of components. For more details see the list of components — https://developer.amazon.com/docs/alexa-presentation-language/apl-component.html

We can define various styles for components, somewhat similar to CSS — color, font, background etc. Have a look at https://developer.amazon.com/docs/alexa-presentation-language/apl-styled-properties.html for details about styles.

Following is the document which is responsible for the above view. 

To use this in Alexa response we have to add Alexa.Presentation.APL.RenderDocument directive and add the above code in document property. The result would be:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>This is a minimal APL document</speak>"
},
"directives": [{
"type": "Alexa.Presentation.APL.RenderDocument",
"token": "document1",
"document": <content of above gist>
}]
}
}

To play around APL, try this new tool from Alexa team - https://developer.amazon.com/alexa/console/ask/displays

Stay tuned for followup post where we are going to talk to advance of APL including - dataSources, videos, sequence, html5 (this is gonna be huge, once supported by APL - it means we can run PWA apps in Alexa devices).

Topics: AWS, Machine Learning & AI, Architecture

Solving Alexa’s Accent Understanding Challenge, using Scalar Vector Machines

Posted by Chirash Rupela on Feb 1, 2019 11:48:00 AM

Alexa is great, providing amazing features to control apps and services with just your voice. But it’s understanding of non-American accents leaves much to be desired. Case in point - using Alexa with my Indian accent brings out some serious problems. No matter how many times I try to say “sprint”, it would only understand it as “spend”.

This is terrifying for Alexa developers like me who want to use the NLP power of Alexa to build solutions that cater primarily to the Indian population. Amazon does offer to develop Alexa skill in ‘EN-IN’ but it does not solve the problem. This major flaw in transcribing Indian accent results in a failure in the skill flow and broken conversations.

But should it be a roadblock for you to develop an Alexa skill?

No, because we found a way to solve this problem.

Devising a Solution

The solution is to use the ability to add synonyms for slot values (in custom slot types).

In any Alexa skill, you can add intents and each intent has different slots. You can choose pre-defined AMAZON slot types for your slots or you can create custom slot types. The difference between using AMAZON slot types and custom slot types is when you create a custom slot type, it allows you to add synonyms of slot values.

Using an example from our Alexa skill -

If we added “spend” as a synonym to “sprint” slot value, it would solve our problem. The next time Alexa hears “spend”, it would send slot value as “sprint” and that can be sent to the Lambda function which gives the back an appropriate response.

Quick aside: Our skill now available for beta testing, so do try it out.

manage-projects-jira-assist

This was the exact solution we were looking for.

Now we had the solutions and two ways to make it happen :

  • Manually add synonyms for each slot value based on user data and custom reviews.

  • Predict synonyms for each slot values and automatically add them once-twice a week.

    The manual additions are quite easy to do, but not a scalable option. Consider a case where you have more than 50 slot values and you want to add slot synonyms to each one or most of them. Doing it manually would be tedious.

    This is the reason we went with the Predictive approach and automated the addition of slot synonyms in our skill.

 
Implementing the Solution

To automate the prediction and addition of slot synonyms, we used following AWS resources  :

  • Lambda function

  • EC2 Instance

  • S3 bucket

  • Alexa developers account

 

Now, that all the resources are ready, there are three main steps in the Predictive approach :

       1. Capturing words like “spend” which are poorly transcribed by Alexa 

       2. Predicting the slot value the word “spend” belongs to. 

       3. Adding the word “spend” as a synonym to the predicted slot values.

I will explain steps 1 and 3 in a while, but let’s understand step 2 as it’s the most crucial step.

Prediction requires a machine learning algorithm. In our case, we have used Scalar Vector Machines(SVM) to predict the slot value. It’s one of the simplest yet quite accurate ML algorithm used for text classification.

SVM is a supervised ML algorithm which finds the line or hyperplane with the maximum distance from scalar vectors. Say, you have two classes -

a. Words similar to “sprint”

b. Words similar to “release”

Using SVM, we can find the line which clearly distinguishes these two classes based on the available training dataset. This line will be the maximum distance from the words which are on the outermost part of the clusters or so-called as scalar vectors.

SVM-alexa-srijan

You can learn more about SVM here

The  Architecture

svm-alexa-architecture-srijan

Step 1

To capture the poorly transcribed words such as “spend”, we use our Lambda function to read the request JSON from Alexa and store the word along with its slot name in a CSV file, and store it in S3 bucket.

def checkutterance(data):
   result = []
   for k, v in data.items():
       if "resolutions" in v.keys():
           for i in v["resolutions"]["resolutionsPerAuthority"]:
               if i["status"]["code"] == "ER_SUCCESS_NO_MATCH":
                   result.append({"slot": v["name"], "utterance": v["value"]})
   s3 = boto3.client('s3')
   response = s3.get_object(Bucket="BUCKET_NAME", Key="FILE_NAME")
   data = response['Body'].read().decode('utf-8')
   string = ""
   for j in result:
       string = string + json.dumps(j) + "\n"
   data = data + string
   encoded = data.encode('utf-8')
   s3.put_object(Body=encoded, Bucket='BUCKET_NAME', Key='FILE_NAME')

Step 2 

Once the missed values are stored in a S3 bucket, we use our EC2 instance to read the file.

In our case, we have scheduled a cron job to do it every day.

The script deployed on EC2 instance is responsible for training and predicting classes using SVM. The script reads the missed values from the file and predicts the class for each value.  In our case, it predicts “spend” as a synonym for slot value “sprint”.

Here, we have also set a threshold value in case the slot value matches quite low to either of the class. Such values are again stored in a CSV file and mailed to us so that manually we can add them in the Alexa skill if required.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import boto3
from sklearn.pipeline import Pipeline
from sklearn import svm
from sklearn.utils import shuffle
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer

text_clf = Pipeline([('vect', CountVectorizer()),
                    ('tfidf', TfidfTransformer()),
                    ('clf-svm', svm.SVC(C=1,  class_weight=None, coef0=0.0,
   decision_function_shape='ovr', degree=2, gamma='auto', kernel='rbf',
   max_iter=-1, probability=True,
   tol=0.001, verbose=False),),
])
Step 3

Once the slot value is predicted for each word, using Alexa cli, we update the word as a synonym for the respective slot in the Interaction Model JSON of our Alexa skill.

os.system('ask api get-model -s ALEXA_SKILL_ID -l en-IN > alexamodel.json ')
time.sleep(5)
data_alexa = []
with open('alexamodel.json', 'r+') as f :
   data_alexa = json.load(f)

for i in data_alexa["interactionModel"]["languageModel"]["types"]:

       if i["name"] == "choose":
           for j in i["values"]:
    if j["name"]["value"] =="sprint":
                   synonyms = j["name"]["synonyms"]
                   for s in sprint:
                       if s["utterance"] not in synonyms:
                           synonyms.append(s["utterance"])
                   print("new list of synonyms " , synonyms)
                   j["name"]["synonyms"] = synonyms
               if j["name"]["value"] == "release":
                   synonyms = j["name"]["synonyms"]
                   for r in release:
                       if r["utterance"] not in synonyms:
                           synonyms.append(r["utterance"])

                   print("new list of synonyms " , synonyms)
                   j["name"]["synonyms"] = synonyms

with open('alexa.json', 'w+') as fp :
                       json.dump(data_alexa, fp,ensure_ascii=False)
                       fp.close()
os.system("ask api update-model -s ALEXA_SKILL_ID -f alexa.json -l en-IN")

The Alexa skill is then built using the same skill and hence automating the process of updating synonyms in our Alexa skill.

svm-updating-synonims-alexa-skills-srijan

With this, the problem of transcribing Indian accent with Alexa skill has been solved to some extent. We are continuously updating our training dataset to improve the accuracy of our model.

If you have any suggestions on how to improve an Alexa skill for this particular problem, do let us know in the comments section below. 

Topics: AWS, Machine Learning & AI, Architecture

Discussion

Write to us

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms Of Service apply. By submitting this form, you agree to our Privacy Policy.

See how our uniquely collaborative work style, can help you redesign your business.

Contact us