Sanjay Rohila

Sanjay Rohila

Recent Posts

Powering the telecom transformation with APIs

Posted by Sanjay Rohila on Aug 16, 2019 4:03:00 PM

Moving into round two brought with it the metaphor of The concept of Telco 2.0 has been around for quite some time now. It has been around moving away from the competition with OTT and concentrate on forming complementary ecosystems that benefit both. Telcos began moving away from being mere carriers and becoming more visible within the value chain, essentially transforming from unidirectional to bi-directional business model. 

This meant becoming an ‘enabler’ in the communication supply chain. From proving singular services, telcos started to facilitate an ICT and application platform that could connect diverse businesses and service providers (upstream consumers)  to consumers (downstream consumers) . This model again saw telcos becoming the key link for a wider variety of consumers. 

Telcos are uniquely qualified to become a central broker in this exchange economy because of the amount of data and network infrastructure already at their disposal:

  • Information and Identity: The telephone numbers is a key piece of information that telcos own, and that’s linked to a host of other personal information on users. This also becomes a key identity maker and used for two-factor authentication by most OTTS to strengthen their security. 
  • Business Intelligence: Telcos have a lot of data from both mass and business users, and the tools to leverage that information to better connect service providers to consumers.  
  • Distribution: CSPs own the networks that OTTs use; the fiber, copper, coaxial and wireless broadband networks.
  • Billing and Payments: Telcos have a well established payment relationship with the subscribers, and can become a secure payment broker, offering businesses the ability to seamlessly offer services to a large user base. And for customers, it means ease of payment, either through their mobile numbers or just by adding all payments to their monthly phone bill. 
  • Customer Care: CSPs already have a well established customer care network, with access to a large user base, and the experience to handle care. This makes them uniquely positioned to offer outsourced customer care services to businesses and service providers. 

 

Now, there are several different technology and organizational aspects that need to be addressed to make the telco transformation complete. But a key peg in the whole scheme of things are APIs, especially when you consider how the whole model depends on the ability to connect and deliver information from massively disrate systems. APIs are the very instrument of transformation across technology, network, operations and business services.

We have taken a broad look at how telecom enterprises can leverage APIs to create new revenue streams. Here we try to break that down into the specific teleco APIs that can be turned to new products in the communications ecosystem.

Productizing Different Types of Telecom APIs

To begin with APIs were seen as a solution to expose different types of data and services and allow disparate systems to effectively communicate. However, there has been a recent paradigm shift in how APIs are perceived. They have moved beyond being a mere solution, and are increasingly seen as products that can be leveraged to open uniquely new revenue streams. This productization fuels design thinking and focus on who exactly are the consumers of any given API. And when these consumers are classified, telcos get three distinct types of APIs, with differing levels of complexity and RoI.

Telecom APIs - complexity of productization

Internal APIs

The first level telco APIs are internal, leveraged by their own applications that connect the telco and its various existing services to the consumer. These typically are low complexity API productization models that revolves around the core services. The data and information that arise from the existing infrastructure and network are bundled into API products developed solely by the telco, and made available for usage. 

Some of the common API products in this category are location, messaging and identity services. And these can be leveraged by telcos own-brand applications and OTT services like: 

General information: Applications that share general information about the telso with the consumers - plan options, mobile phone options, accessories available, store locator, ratings, and reviews. All this contributes to the bottomline by ensuring that the telcos offerings are accessible and available to the consumers within the digital economy that they are tuned in to.

Custom information and transactions: Applications that personalize the relationship between the telco and each of its consumers. The first strata of information and functionality being made available here are custom plans and packages for the users, usage data, upgrade eligibility, checking account balance, bill payments, changing account features, account maintenance (password, address, etc.). All of this can be easily made available by pulling information from different internal telco systems via APIs. 

Leveraging the phone hardware: Using telco applications on a phone also makes it easy to leverage mobile hardware features and functionalities - GPS tracking, camera, digital wallet etc. These facilities, in conjunction with the telco’s internal APIs can be used to provide specialized services to their users. A few examples could be providing telco store locations on the app in combination with the phone’s GPS. 

Besides this, internal APIs also streamline intra-organizational access to different data systems, make it easier and faster to build new solutions and applications. There’s ans estimated 5-10% savings  in terms of development effort and lower QA testing requirements, with the adoption of internal APIs. Designing and Implementing an API Monetization Roadmap for Globe Telecom - Case Study

Partner APIs 

These set of APIs are all about creating a platform where other businesses and service providers can build upon. The key idea is to make telecom APIs access data from the telcos own and connected partner systems and make it available for other providers to leverage. Some common possibilities here are:  

  • Content services – streaming media, news, stock information  
  • Online services – social media, video chat, messaging, search, shopping  
  • Technology services – hosting (i.e. cloud), caching, payments  
  • Connectivity services – support for intermittent connectivity  
  • Device integration – smart phone, wired telephone, tablet, computer, television  
  • Business services – analytics, billing, accounting, coupons 

Other businesses can build on these services with their value additions and rely on the telecom to manage the infrastructure and selected business services. 

Specific industry partners can also use telecom APIs to build B2B offerings. They can rely on telcos to offer billing or messaging service APIs to act as key aspects of their service offerings. For example - OTAs can partner with telcos to provide log in/authentication to their portals, payments, notifications, and push custom phone plans for bookings to specific destinations. 

With the rise of connected cities and virtually anything as a service, the phone number can become a singular digital identity for consumers. And almost any service provider, in any industry, can partner with telcos with offer their services, whether that’s by leveraging the data telcos have, or the network infrastructure, or even the phone hardware. All this is if telcos focus on identifying the emerging opportunities and building the right API ecosystem. 

In all these cases, the API products being designed are typically medium complexity and offered with a degree of federated services. The monetization opportunities here are majorly volumetric, as in the third-party service providers are charged as per the volume of API calls made in any given period. 

Partner APIs also make it easier to onboard new partners onto the telcos ecosystem and ensure quick time-to-market for new partner services. An estimated 5-20% savings accrue  from partner APIs owing to the consistent and self-service processes, reduced onboarding time and improved partner experience

Public APIs

Several of the APIs used for internal and partner platforms can be opened to the public for greater use by third-party developers. This allows telcos to further monetize their data and network, while leveraging new revenue streams. 

For example, open access to telco offers and plans data can be used by comparison applications and help drive more subscribers to the telco. Opening up APIs that securely expose user phone number and messaging can allow telcos to charge all third-party applications that use two-factor authentication via OTPs.

Several similar revenue options can open up for telcos with public APIs. However, in most cases, a single telco by itself may not have enough scale or access to data to become a truly game-changing resource for service providers. These are also high-complexity API products that could benefit from co-development models with a greater resource pool. And in such cases it makes sense for CSPs to come together on a federated API platform.

This is when several different telcos join together to offer a host of different APIs and share the cost of development and deployment. The federated platform, on the other hand, supports each participating telco with delivering their niche digital services, managing the full partner ecosystem, and providing common business processes as required. The federated platform also brings together the combined data of all telcos to offer more well-rounded and complete data sets for use by service providers, and manages revenue sharing for the participants.

Most telcos today are at the initial levels of API maturity with a broad vision and some basic APIs at play. Some have an API platform, exposing internal APIs like messaging or payment for third party providers. But the challenge is to scale their API program, unify different development tracks, and move them to a federated API platform. For this, telcos need to look for experienced teams that can take them from what they are, to where they want to be.

Srijan is working with leading telecom enterprises in the US, Europe and APAC regions. We are aiding these enterprises’ digital transformation journeys, leveraging data science and analytics, APIs, AI, machine learning, and chatbots to create tailor-made business solutions.

Rolling out your telecom API initiative? Talk to our experts and let's explore how Srijan can help with API productization, monetization and governance. 

Topics: Telecom, API Management, Digital Experience

Leveraging AWS Solutions to solve High-Value Enterprise Challenges

Posted by Sanjay Rohila on Mar 29, 2019 4:31:00 PM

The AWS ecosystem is an invaluable asset for enterprises driving their digital transformation. While the AWS Cloud infrastructure is powering a huge slice of enterprises, there are several other AWS solutions, especially in the realm of computation and machine learning, that’s enabling enterprises to leverage emerging technologies.

Here’s a look at some interesting projects and PoCs that Srijan has delivered for enterprise clients, using AWS solutions.

Chatbots powered by Amazon Lex and AWS Lambda

As a leading provider of intelligent cleaning solutions, the client wanted to be able to analyze and optimize the performance of their products. They had a set of data visualization dashboards that track this data in real time. However, these were not easily accessible and involved some effort before stakeholders could extract relevant insights.

The solution was to build enterprise chatbots that could deliver the same insights without taking up too much time or effort on the part of the client stakeholders. They could just type their query into the chatbot, and receive an appropriate response.

Srijan leveraged Amazon Lex as the core conversational interface framework to design the chatbot. Lex’s deep learning functionalities enabled the chatbot to identify the intent behind a particular questions, understand the context, and give back an appropriate response.

The other key solution used was AWS Lambda, that handled the backend extraction of data form the client databases, and computation to generate the correct response. The business logic defined atop Lamba determined how the raw data from various sources would be interpreted and presented to the user as a final answer.

Other AWS services used were:

  • AWS Cognito for user authentication
  • AWS Translate to ensure the chatbot could be used by client stakeholders in any location
  • Amazon S3 to store relevant asset images and performance graphs that could be accessed solely by the chatbot users.

 

READ COMPLETE CASE STUDY

Video Analytics powered by Amazon SageMaker

The cleaning solutions enterprise was also receiving increasing complaints around their floor cleaning machines not performing as expected. The client wanted to have detailed logs of machine performance across all locations, so validate or refute these customer claims, and prevent unwarranted expenditure on recalls and repairs.

Srijan proposed a video analytics algorithm capable of identifying the machine and verifying its performance at given locations. The approach was focussed on recording real-time footage of the machines operating at different customer locations and then automatically analyzing the video feed to identify and verify if the machines are performing as expected.

This was achieved with a deep learning model designed to analyze video feed data. The key objective of the model, built on convolutional neural network, was to accurately identify the machine in a video stream at 5 second intervals. These sightings are then timestamped and put together in a JSON file. This created a continuous log of whether a machine is working or not, in any given location.

Amazon SageMaker was the core solution used for this model. As a managed platform, it allowed for:

  • Creating the deep learning algorithm, with TensorFlow
  • Data augmentation and training the algorithm to accurately recognize the machines in a video stream
  • Quick and efficient scaling of training data to create a more accurate machine learning model

 

Once the model was in place, Srijan used Amazon S3 and AWS Lambda to create the workflow for collecting video feed from various customer locations, analyzing them, and creating detailed logs of machine performance.

READ COMPLETE CASE STUDY

Enterprise Data Analytics Platform with AWS Lambda

OnCorps offers PaaS for insight into enterprise data, to make better decisions using predictive analytics, machine learning and peer comparison. They wanted to create a platform that can do a lot of the heavy lifting when it came to data - right from gathering, to processing, to analytics and visualization.

While the platform was built on Drupal, Srijan leveraged a host of AWS solutions to deliver some powerful functionalities:

Amazon EC2: This offered an easily scalable and cost-effective computation solution. It gave the ability to run data analysis, compute workloads to aggregate data, as well as deliver predictive insight.

AWS Lambda: The frontend interface of the platform needed structured data to work with, preferably in JSON format. Lamba was used to transform the data coming in from various sources into a standard format.

Amazon S3: This was used to host the single page application built on AngularJS. S3 was also used as storage for all files and content assets for the platform.

AWS Cost Explorer: One of the Srijan team’s primary objectives was to keep product development costs on track. AWS Cost Explorer was used to get a clear visualization of operation costs across all solutions, and optimize the budget as much as possible.

With these solutions in place, OnCorps was able to roll out a scalable platform with >99% performance reliability across enterprise customers.

READ COMPLETE CASE STUDY

Cloud Native Storage with Amazon S3

TheRecordExchange (TRX) offers a SaaS-based workflow management application to capture and access recordings of legal proceedings. Their business model is dependent upon frictionless uploading and downloading of text and media files. The application frontend is designed so that transcription agents can easily upload files of legal proceedings, and customers can download them at will.

Given this, they needed a cloud-based server that could efficiently manage all file-related requests on the applications, and robustly render them on the frontend.

With Amazon S3, an object storage solution, Srijan was able to deliver a flexible clou-native storage for TRX. S3 enabled:

  • Addition of any number of files to the application, without worrying about capacity constraints. Since the application layer didn't have to handle file processing, it was lighter and delivered a better user experience.
  • Dynamic spacing, which allowed TRX to scale up or scale down space usage as and when required. With no minimum usage requirements and availability of on-demand usage, S3 proved to be a highly cost-effective solution for the client.

READ COMPLETE CASE STUDY

Srijan is an Advanced Consulting Partner for Amazon Web Services (AWS). It is currently working with enterprises across media, travel, retail, technology and telecom to drive their digital transformation, leveraging a host of AWS solutions.

Looking for an experienced AWS certified team to aid your digital growth strategy? Just drop us a line and our team will get in touch.

Topics: AWS, Cloud, Enterprises

Amazon Lex and the possibilities it holds for Enterprises

Posted by Sanjay Rohila on Feb 28, 2019 2:52:00 PM

Amazon Lex is an AWS solution that allows developers to publish voice or chat bots for use across different mobile, web and chat platforms. It can listen, understand user intent, and respond to context. Powered by deep learning functionalities like automatic speech recognition (ASR) and natural language processing (NLU), Lex is also the technology behind Alexa devices. Available now in the open, it can be easily leveraged by enterprises to build their own digital assistants.

Amazon Lex for Enterprises

For enterprises, Lex-powered applications can become a key competitive advantage, allowing them to optimize processes and enable cost savings. A few key aspects where Amazon Lex can assist are:

Performing User-based Applications

Lex can help build bots capable of providing information, or addressing user requests and queries. It can perform applications like ordering food, booking tickets, and accessing bank account.

Made possible with the help of the ARS and NLU, these capabilities can help create powerful interfaces customer-facing mobile applications. Such a voice or text chat interface on mobile devices can help users perform tasks that involve a series of steps played out in a conversational format. Further, the integration of Lex with Amazon Cognito helps developers control user management, authentication, and sync across all devices.

For example, healthcare enterprises can enable patients to schedule appointments at their facility with Lex powered bots. The patient can send a text request via his mobile application for “an appointment on Monday”.

  • Amazon Lex will recognize that an appointment has been requested, and will ask the user for a “preferred time on Monday”.
  • The user responds with a text, say, “1 pm”.
  • Lex will reserve this appointment time for the user once the account information is retrieved.
  • It will further notify the patient that “an appointment time of 1 pm has been finalised on Monday”.

 

Similarly, tasks like opening bank accounts, ordering food, or finding the right dress at a retail store can all be accomplished via Lex-powered bots.

Enabling Device Interactions

Lex also helps you build highly interactive and conversational user experiences for connected devices ranging from vehicles, to wearables, and other appliances.

For example, a wearables company can have Lex powered bots installed on its products for providing information like day, date and weather. So when the user makes a request like, “temperature in California”, Amazon Lex on the device recognizes it and responds in an appropriate manner.

  • It can further inquire, “Celsius or Fahrenheit?” 
  • And on receiving an answer “Celsius”, it will retrieve the information with the help of other AWS services involved

This ability to imbibe everyday accessories with an intelligent digital assistant allows brands to always exist in their customers immediate environment. And that means an exponential rise in brands recall and customer retention.

Enhancing Enterprise Productivity

Whether it is checking your sales data from Salesforce, marketing performance from HubSpot, or customer service status from Zendesk, you can do it all and more, directly with your chatbots. Lex enables you to build bots that connect to a variety of such enterprise productivity tools via AWS Lambda functions.

So, if an employee wants to access the “sales numbers for the month of December”, he can simply ask the bot on his system. Lex will recognize this as a request, and pull data from relevant enterprise systems like Salesforce or proprietary BI dashboards. Once the data is received, it will deliver it to the executive on his device and platform of choice.

This helps enterprises streamline their operations, and improve organizational productivity. 

Benefits of Deploying Lex for Your Enterprise

Ease of usage: Amazon Lex lets you build your own bot in minutes, no deep learning expertise required. Once you have the basic objective of the bot mapped out, you can specify the conversation flow, and Lex will build a natural language model to ask and respond to user queries.

Seamless deployment and integration: A Lex powered bot has native interoperability with other AWS services like Cognito, Lambda, and CloudWatch. It can scale automatically, and you need not worry about provisioning hardware or managing infrastructure to power your bot experience.

High quality ASR and NLU: Lex enables your bots to understand the intent behind the input. It can then subsequently fulfil the user intent by invoking the appropriate response.

Multi-turn conversations: With the help of Lex, you can build multi-turn conversations for your bots. This means that once an intent has been identified, users will be prompted a series of next questions to extract the required information needed for giving the right answer. For example, if “book hotel” is the intent, the user is prompted for the location, check-in date, number of nights, etc.

Cost effectiveness - Amazon Lex has no upfront costs or minimum fees. With a pay-as-you-go model, users are charged only for the text or speech requests made. And with the Amazon Lex free tier, you can try it without any initial investment.

How Srijan can Help

AWS has a broad range of AI and Deep Learning solutions to help enterprises build and deploy intelligent products and services. But you also need a skilled team that can evaluate your business requirements, and choose the right AWS deep learning solutions that fit the bill. That’s where Srijan teams get into the game.

Srijan teams are adept at leveraging Amazon Lex to deliver a range of services:

Ready to leverage conversational interface for your enterprise? Let's brainstorm to explore where your enterprise can best leverage Lex-powered bots.

Topics: Machine Learning & AI, Enterprises

APL: Alexa Presentation Language-basics

Posted by Sanjay Rohila on Feb 4, 2019 11:32:00 AM

I have been using display templates for quite a while now for screen enabled Alexa devices. There are few templates and directives we can use in response for display devices. But there is not as much in customization we can do about templates and layout.

Alexa Presentation Language (APL) is a beta feature which gives lot more power to developers. APL is a new directive type in DisplayRenderer. To give a perspective on what we can do is, below is the simple view I have created (below is code which is responsible for this view):

image showing header and footer block

A Basic APL document must have the type, version, and mainTemplate.

{
"type": "APL",
"version": "1.0",
"mainTemplate": {
"item": {
"type": "Text",
"text": "Hello, world"
}
}
}

There are lots of components (Text, Image, Video etc) that we can use in APL. We can also create a sequence for repeating set of components. For more details see the list of components — https://developer.amazon.com/docs/alexa-presentation-language/apl-component.html

We can define various styles for components, somewhat similar to CSS — color, font, background etc. Have a look at https://developer.amazon.com/docs/alexa-presentation-language/apl-styled-properties.html for details about styles.

Following is the document which is responsible for the above view. 

To use this in Alexa response we have to add Alexa.Presentation.APL.RenderDocument directive and add the above code in document property. The result would be:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>This is a minimal APL document</speak>"
},
"directives": [{
"type": "Alexa.Presentation.APL.RenderDocument",
"token": "document1",
"document": <content of above gist>
}]
}
}

To play around APL, try this new tool from Alexa team - https://developer.amazon.com/alexa/console/ask/displays

Stay tuned for followup post where we are going to talk to advance of APL including - dataSources, videos, sequence, html5 (this is gonna be huge, once supported by APL - it means we can run PWA apps in Alexa devices).

Topics: AWS, Machine Learning & AI, Architecture

Polly Voice-over for Web Recordings [Code]

Posted by Sanjay Rohila on Jan 16, 2019 3:05:00 PM

When we are recording on a demo, sometimes we want to show textual information about what the user is doing or what happening in the background. But it will be more useful if we can put voice-over for that textual information.

Now one way is person recording video do the voice-over, but that is not like a professional voice-over. So the question was can we use Polly to do this, in the end, it's text-to-voice which is what Polly is built for. Turned out, we can very well do that.

I have written a small script which uses AWS javascript-SDK and takes cognito-pool-id and then provide API function which can be used to add text on any event or delayed timeOut as a voice-over.

This script has some configuration also, so you can change the color or caption (subtitle) we are adding and can change the voice to different Polly lexicons (Read more here)

 

Let's see the output of a demo (of the script itself) I have recorded:

polly-voice-over-demo

 

Now you wish you can do these voice-overs in different language with fluent accent in each of them. worry not, This script has that part also, you can use translation also with voice-overs. Here is the video:

 

voice-over-with-translation

 

Go Ahead, have a look at https://github.com/crazyrohila/polly-voice-over and use it, tweak it as desired.

Topics: AWS, Architecture

How Telecom Enterprises can leverage APIs to create new revenue streams

Posted by Sanjay Rohila on Dec 28, 2018 12:39:00 PM

The telecom industry is said to be approaching a tipping point. While revenues might not exactly be falling, there is a definite slow down in growth. With little focus on introduction of new services, most communication service providers (CSPs) are competing over the same existing market, and experiencing high customer churn rates.

Current Challenges in Telecom

At the heart of this crawling growth are a few key challenges that telecom enterprises have to address:

Staying Relevant to the Customer

Everything that one could do over the phone can be done over the Internet today. Messaging, voice, and video calls are all being offered by a host of applications. OTT services have become the primary point of connection for consumers when it comes to communication; be it messaging, video calling, and increasingly voice calling as well. And they appropriate a large share of the revenues generated as well.

OTT revenue cannibalization - Telecom API blog

Source: www.mckinsey.com quoting Ovum; McKinsey analysis

In the absence of any new value-added services, telecom operators have become mere connectivity providers. And even if they focus on improving connectivity, say with the introduction of 5G, customers are likely to take it for granted.

The bottom line is customers hardly recognize the relevance of the telecom provider in their communications. One telecom provider is as good as the next one, because everything they care about is actually delivered by OTT services.

Lack of New Business Models

In order to be noticed by the customers, telecom operators have to develop and deliver new services. And that is proving to be an uphill battle because they are hardwired to build networks.

Within the existing structure and leadership, the dominant response is to drive down cost on network operation to remain profitable. And while that is a way to go, it’s not enough. Telecoms have access to a lot of network assets and data that could potentially give rise to new services. But change is always resisted, and hence CSPs have not been able to develop new business models for revenue.

Slow Technology Adoption

Along with a change in the business strategy, telecom enterprises will have to optimize operations to even begin to deliver new services. They will also need new technology expertise to create and deliver value-added services. But once again, this is an aspect that CSPs have been slow to adopt.

How Can APIs help Telecom Operators

Telecom APIs can be a key piece for telecom operators to transform their value propositions.

Developing New Services

API gateways are the most efficient way to access and use information assets stored across legacy systems. Telecom operators already have huge amounts of user data, and APIs are the most secure way to expose this data for developing new services at scale. This would be true for both B2C and B2B services.

Hybrid B2C Services: APIs give telecom operators the opportunity to share information assets with other third-party service providers. This could lead to the development of hybrid services that can compete with existing OTT services.

A key example of this would be customer authentication services. Since telecom operators offer connectivity to all manner of services, that can be the unified access to all of these services. A one-step login to all applications would be a major convenience for consumers, and also bolster data security. And this would strengthen telecom’s collaboration with other service providers.

Value-added B2B Services: B2B enterprises present a highly profitable market for CSPs because:

  • Enterprise are willing to pay more for value-added services
  • Lower chances of being usurped by OTT service providers, at least in the near future

 

Exposing APIs is one of the most convenient ways to productize existing telecom assets, and create custom solutions for B2B enterprises. The opportunities here could be in the form of:

New Solution Bundles: Given that CSPs already own and operate widespread networks, they can offer new solution offerings that provide the usual gateways for voice and video calls, messaging, bundled with payment gateways and location services enabled by APIs.

Data as a Service: A host of enterprise applications rely on communication databases to deliver services. CSPs have access to vast amounts of user data, which can be made available to enterprises via APIs, as they build new customer solutions. Telecom APIs can monetize this data sharing to add new revenue streams. They can also create APIs that allow the CSP’s billing systems to be used by enterprises, charging a transaction fee for the convenience.

Monetizing Network Assets

Telecom operators have a set of core assets that are being utilized by OTT services. APIs offer a way to commoditize these network assets and create new revenue models:

Flexible Charging Models for Network Usage by OTTs

The network established by CSPs is their biggest asset, and the key to OTT services being able to deliver value to customers. APIs can help efficiently monetize this asset and create variable charging models for different types of OTT services. They can be charged on the basis of volume of usage, number of transactions, or other custom models as applicable; and this is managed by telecom APIs.

IoT Ecosystems and Edge Augmentation

With the rise in interconnected device ecosystems, telecom operators have a huge market just waiting to be leveraged. There’s B2C categories like GPS and other telematics devices, and also B2B use-cases where machine-to-machine communications, both wired and wireless, are witnessing large-scale adoption. Since all data is being transmitted over carrier networks, CSPs can create new pricing models for network usage.

Telecom operators can offer computing capabilities closer to the source of data generation, at the edge of the networks. Telecom APIs can be the key to transferring IoT data to computing applications and to the end user.

Srijan is working with leading telecom enterprises across US, Europe and APAC region to drive their digital transformation via successful API management, monetization, and governance. Let's get the conversation started on how Srijan teams can help leverage APIs for your enterprise.

Topics: Telecom, API Management

Alexa skill  - Translation and Polly Voices (Lexicons)

Posted by Sanjay Rohila on Dec 11, 2018 3:28:00 PM

Managing translation seems a bit easier than managing it in Lex. Alexa has language settings where we can have the same model in different languages. We can different awake word, different utterances in for different languages. And the best part is based on the active/selected languages Alexa automatically selects Polly voice.

Language settings and awake word

Language settings and different awake word for different language models.From above language selection settings we can create a different model for each language, to clone model from one language to another just use JSON Editor and change the utterance based on language. 

alexa language settings

 

alexa spanish language awake wordalexa english language awake word

 

For example, I have intent called book_car. I have English utterances as:

"I want to book a car"
"book a car"

And in Spanish model I have different utterance for same intent:

"Quiero reservar un coche para"
"reservar un coche"

Alexa has locale parameter in the request which tells us which language request coming from, so based on that we can translate out response and Polly voice will automatically handle accent and pronunciation for that. As illustrated in the blog about single controller for Alexa and Lex, when we are using one controller, it might be a good idea to leave our success to big translation players and manage content in English only. So we have only English responses in our lambda and do the translation on runtime based on the locale in the request.

Controlling Polly Voice in Response

Above was an example of when Alexa handles Polly voices for us and does that as per language selected in skill. But we can achieve/control that by ourselves by using SSML in outputSpeech. Let's say, we have an introduction tour in our skill. Instead of being autonomous and explain in plan voice we can have actual conversation kind responses and make it more UX wise. Below is an example of use case:

User: "Alexa, Ask my assistant how does it work?"
Alexa: "Well, you can rent a car with my assistant. This is how you can do so:
<Brian>: 'Alexa, launch my assistant'.
<Amy>: 'Hi, welcome to my assistant. How can I help you?'
<Brian>: 'Rent a car'.
<Amy>: 'Sure, How many days you want to rent it for?'
<Brian>: 'Today only'.
<Amy>: 'Great!, In a bit rental service will call you and get it delivered to you.'
"

To achieve something like this, we have to use SSML response:

This gives a bit more idea about how a conversation should go.

Alexa Skills Kit Sound Library (Fun Part)

The sound library provides a wide range of cool sounds which we can use in our SSML messages. we don't have to manage mp3 files anywhere. Fun intent I have with sounds:

order_status
"<speak>On the way <audio src='soundbank://soundlibrary/transportation/amzn_sfx_motorcycle_engine_idle_01'/></speak>"

Topics: AWS, Architecture

Lex - Lambda policy limitation [Solved]

Posted by Sanjay Rohila on Dec 11, 2018 3:19:00 PM
Error: Maximum policy size of xxx bytes exceeded for Lambda xxx.

We can assign lambda function in Lex Intents. This gives lots of power to our bot, we can do lots of stuff with lambda. But there is a problem - when we assign lambda in intent, it asks for invocation permission and when we give that permission, it adds that as function policy to lambda. This works fine till we have so many intents that we exceed the limit of policy document length. Once we add lambda to so many intents, that function policy document is at its limit, we can't that lambda to any more intents.

Debug:

The problem is, it's not easy to find where these policies are going and where it's adding whenever we give permission for each intent. So it's hidden under this small button (screenshots below). This button is only visible if you have access to  lambda:GetPolicy action.

 

view-permissionslambda-function-policy

Solution:

The solution would be - instead of adding permission for every intent individually, we could add permission for all intents. That will reduce policy document size and we can live peacefully. But we can only see the function policy (if we have permission to lambda:GetPolicy) document in the console; we can't modify it from console interface. We have to do it via APIs - either command-line interface or SDKs. The API which will be able to do this is: lambda:AddPermission. If you are a python expert, just use boto3 API and use add_permission (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lambda.html#Lambda.Client.add_permission)

Function Policy Document (before):

"Statement": [
{
"Sid": "lex-us-east-1-my_first_intent",
"Effect": "Allow",
"Principal": {
"Service": "lex.amazonaws.com"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:us-east-1:xxx:function:My_Lex_Lambda",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:lex:us-east-1:xxx:intent:my_first_intent:*"
}
}
},
{
"Sid": "lex-us-east-1-my_second_intent",
"Effect": "Allow",
"Principal": {
"Service": "lex.amazonaws.com"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:us-east-1:xxx:function:My_Lex_Lambda",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:lex:us-east-1:xxx:intent:my_second_intent:*"
}
}
},
# ... all other intents in which this lambda assigned
]

Function Policy Document (after):

"Statement": [
{
"Sid": "lex-us-east-1-my_bot",
"Effect": "Allow",
"Principal": {
"Service": "lex.amazonaws.com"
},
"Action": "lambda:invokeFunction",
"Resource": "arn:aws:lambda:us-east-1:xxx:function:My_Lex_Lambda",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:lex:us-east-1:xxx:intent:*"
}
}
}
]

Topics: AWS, Coding and Tutorial

The Listeners Alexa Skill -  Confirmation and Re-prompt

Posted by Sanjay Rohila on Dec 5, 2018 3:43:00 PM

The listeners Alexa skill session has a time frame of 8 seconds. If the user doesn't ask anything within this time, the session ends (ensuring that Alexa is not always listening ;) )

Often after a fulfillment, we would like to know if the user is satisfied with the answer, or has another query, which is why we use re-prompt. After the initial 8 seconds, a re-prompt message appears and the user is given another 8 seconds to input a command in the session. If the user still doesn't respond, Alexa ends the session.

I used a FAQ skill, where users ask a question and Alexa answers. But closing the session immediately after the answer doesn't seem very user-friendly, so I added a re-prompt. The user is asked if he has any further queries. If not, the session ends. Here is a sample of the conversation:

User: 'Alexa, ask faq assistant Can I claim expenses as contractor?'
Alexa: 'Yeah, sure. You can claim expenses through Keka.'
(Alexa waits for 8 second here, then asks)
Alexa: 'Do you want to know anything else?'
User: 'Yes'
Alexa: 'Go Ahead, ask me'
OR
User: 'No'
Alexa: 'Thank you. It's pleasure to help you.'

And following is the response code I have used:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": "Yeah, sure. You can claim expenses through Keka."
},
"reprompt": {
"outputSpeech": {
"type": "PlainText",
"text": "Do you want to know anything else?"
}
}
}
}

Mostly, the re-prompt is a means to confirm that the user got what he wanted. User can simply respond with a Yes or No. To Handle the re-prompt request we can use in-built Amazon intents.

Alexa In-built Intents

While there are lots of in-built intent, we particularly use AMAZON.YesIntent and AMAZON.NoIntent for the re-prompt purpose. So when the user responds with a Yes or No, these intents get triggered and respond back accordingly (close the conversation or keep it open).

Re-prompt in Echo Show display

As we have seen display directives in the earlier post, we can add rich text also in templates. We have action tag which we are going to use to mimic re-prompt features. In the case of re-prompt, we can add tertiaryText in textContent with action tags. Below is our display response for the same scenario:

The Display.ElementSelected

Display Interface Reference is triggered when a user selects action element on the screen. So in our code, we will get this request with token values close_session or open_session and we can respond back accordingly (close the conversation or keep it open).

This is how Lambda pseudo looks like (This is not the final code I am using in production. I have some templateBuilder function in between final response and handleIntent functions. And the handleIntent function is triggering the core controller to get templateType and content):

Alexa re-prompt/confirmation with voice and screen, i.e., the final response with prompt and action links look like this:

{
  "version": "1.0",
  "response": {
  "outputSpeech": {
  "type": "PlainText",
  "text": "Yeah, sure. You can claim expenses through Keka."
  },
  "reprompt": {
  "outputSpeech": {
  "type": "PlainText",
  "text": "Do you want to know anything else?"
  }
  },
  "directives": [{
  "type": "Display.RenderTemplate",
  "template": {
  "type": "BodyTemplate1",
  "title": "Welcome",
  "textContent": {
  "primaryText": {
  "type": "RichText",
  "text": "Yeah, sure. You can claim expenses through Keka."
  },
  "tertiaryText": {
  "type": "RichText",
  "text": "<br/><br/>Do you want to know anything else? <br/><action token='close_session'><u>No</u></action> <action token='open_session'><u>Yes</u></action>"
  }
  }
  }
  }],
  }
  }

Topics: AWS, Architecture

Alexa Skill Directives and Managing Different Responses

Posted by Sanjay Rohila on Nov 15, 2018 11:06:00 AM

While there's a lot of documentation around how we created our own custom Alexa skill, let's talk a bit about the implementation part of it. We'll focus on how to manage Alexa skill directives, different responses, and the code in itself.

Amazon has a range of devices which support skills -  Echo Dot, Alexa, Echo Show, Fire TV etc. But they do not support the same kind of response templates which we can use. So let's dive a little deeper into all the different types:

Echo (Voice only)

The bare minimum response we can have in the Echo is:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": "Hey there!"
}
}
}

This is what the devices will receive and give as a voice response to the user. The immediate step we can take is to add more control over voice response and use SSM. We can have SSML in outputSpeech, which will allow us to control, emphasize, slow the pitch, or add a pause between lines. 

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>Hi, I am <emphasis level='strong'>Virtual Assistant.</emphasis></speak>"
}
}
}

For more information, you can also read in detail about SSML and the markup tag it supports.

Same as Lex, Echo also has a dialog state but in a different format. We have to use Alexa skill directives for that. Directives have the 'type' property which is similar to dialogState in Lex  -  Elicit slot, delegate, fulfill etc. Below is a response to the Elicit slot from Alexa:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": "What kind of pizza you like?"
},
"directives": [{
"type": "Dialog.ElicitSlot",
"slotToElicit": "pizza_type",
"updatedIntent": "order_pizza"
}]
}
}

If intent and slots, and response content appear identical, we should have a centralized place to manage content which could be served to both Lex and Alexa.

Echo Show (Voice and Screen)

Echo Show has a display screen in addition to the voice assistance. However, it has lots of limitations. As it is a display device, it has a directive 'Display.RenderTemplate' which is awesome but has very few usable pre-defined templates. 

Since we cannot do Elicit slot in 'RenderTemplate' directive, we can't have next level of the slot in Echo Show. Hence we will have to provide all inputs as intents. Below is an example of the basic Echo show response:

 

This will show title and body content on the screen, as well as a voice response. Text content can have rich text with limited markup.

Interested in more how-tos around AWS services? Check out:

 

Srijan is now an AWS Advanced Consulting Partner. Drop us a line if you want to get in touch!

Topics: AWS, Architecture

Discussion

Write to us

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms Of Service apply. By submitting this form, you agree to our Privacy Policy.

See how our uniquely collaborative work style, can help you redesign your business.

Contact us