8 Best Practices To Keep in Mind While Building Data Lake

Posted by Kimi Mahajan on Sep 4, 2019 4:41:00 PM

Data lakes have brought new possibilities and extra transformational capabilities to enterprises to represent their data in a uniform and consumable way in a readily available manner.

However, with an increasing risks of data lakes transforming to swamps and silos, it is important to define a usable data lake. One thing is clear when opting for data lake for your enterprise - it’s all about how it’s managed.

To help data management professionals get the most from data lakes, let’s look into the best practices for building an efficient data which they’re looking for.

Rising Era of Data Lakes

The challenges in storage flexibility, resource management, data protection gave rise to use of cloud based data lake.

As already detailed in our blog- What is a Data Lake - The Basics, data lakes refer to a central repository of storing all structured, semi-structured and unstructured data in a single place.

Hadoop file system (HDFS), a distributed file system, created the first version of data lake. With the increased popularity of data lakes, organizations face a bigger challenge of maintaining an infinite data lake. If the data in a lake is not well curated, it may flood it with random information difficult to manage and consume, leading to a data swamp.

Keeping Data Lakes Relevant

Data lakes have to capture data from the Internet of Things (IoT), social media, customer channels, and external sources such as partners and data aggregators, in a single pool. There is a constant pressure to develop business value and organizational advantage from all these data collections.

Data swamps can negate the task of data lakes and can make it difficult to retrieve and use data.

Here are best practices to keeping the data lake efficient and relevant at all times.

 

1. Understanding Business Problem, Allow Relevant Data

First and foremost, start with an actual business problem and think to answer the question why should a data lake be built?

Having a clear objective in mind as to why is data lake is required, helps in remaining focussed and works well to get the data job done, quickly and easily.

A common misconception that people have is that they think data lake and database are the same. The basics of a data lake should be clear and should be rightly implemented for the right use cases. It’s important to be sure about what all a data lake can do and what it can’t.

The practice of collecting data without having a clear goal in mind might make the existence of data irrelevant. A well-organized data lake can get easily transformed into a data swamp when companies don’t set parameters about the kinds of data they want to gather and why.

A data most important to a department in an organization might not be relevant to another department. In case of such conflicts over what kinds of data are most useful to a company at a given time, bringing everyone on the same page about when, why and how to acquire data would be crucial.

Companies leaders should adopt future-oriented mindsets for data collection.

Making clearly defined goals about data usage helps prevent overeagerness when collecting the information.

 

2. Ensuring Correct Metadata For Search

It’s important for every bit of data to have information about it (metadata) in a data lake. The act of creating metadata is quite common among enterprises as a way to organize their data and prevent a data lake from turning into a data swamp.

It acts as a tagging system to help people search for different kinds of data. In a scenario where there is no metadata, people accessing the data may run into a problematic scenario where they may not know how to search for information.Keeping Data Lakes Relevant

3. Understand the Importance of Data Governance

Data lakes should clearly define the way data should be treated, handled, how long it should be retained and more.

Excellent data governance is what equips your organisation to maintain a high level of data quality throughout the entire data lifecycle.

The absence of rules stipulating how to handle the data might lead to data getting dumped in one place with no thought on how long it is required and why. It is important to assign roles to give designated people access to and responsibility for data.

The access control permissions will help users, as per their roles, find data and optimize queries, with people assigned responsibility of governing data, and reducing redundancies.

Making data governance a priority as soon as companies start collecting data is crucial, to ensure data has a systematic structure and management principles applied to it.

 

4. Mandatory Automated Process

An organization needs to apply automation to maintain a data lake, before it gets converted to a data swamp. Automation is becoming increasingly crucial for data lakes and can help them achieve the identified goals in all phases as mentioned below:

  • Ingestion Phase

A data lake should not create development bottlenecks for data ingestion pipelines and rather allow any type of data to be loaded seamlessly in a consistent manner.

Early ingestion and late processing of data lakes will allow integrated data to be available quickly for operations, reporting, and analytics. However, there may be a lag between data updating and new insights being produced from the ingested data.

Change Data Capture (CDC) automates the process of data ingestion and makes it much easier for a data store to accept changes within a database. CDC ensures that it only updates the changed records of the database instead of reloading the entire tables. Though CDC ensures correct record update, those records need to be re-merged to the main database.

  • Data Querying Phase

The databases running on Hive or NoSQL need to be streamlined to process data sets as large as what the data lake might hold. The data visualization is required for the user to know what exactly to query.

The workaround for this is to use OLAP cubes or data models generated within memory, scalable to the level of use in a data lake.

  • Data Preparation Phase

When data in the cloud is not arranged and cleaned and is lumped with no one having an idea of what is linked to what, and what types of insights the business is looking for, it leads to confusion and issues for automating processing of raw data. They need to have clear goals in mind for what the data lake is supposed to look at.

  • Uniform Operations Across Platforms

Data lakes must be able to generate insights through ad hoc analytics efficiently to make the business more competitive and to drive customer adoption. This can be achieved with the creation of data pipelines to allow data scientists to run their queries on data sets. They should be able to use different data sets, and compare the results over a series of iterations to make better judgment calls. The lake is likely to be accessing data from multiple cloud sources and hence these pipelines must be able to play well with these different source materials.

 

5. Data Cleaning Strategy

A data lake can become data swamp unintentionally, unless enterprises adhere to strict plans for regularly cleaning their data.

The data is of no use if it has errors, or there are any redundancies. It loses its accountability and causes companies to reach incorrect conclusions, and might take years or even months before someone realizes that the data is not accurate, if they ever do.

Enterprises need to take a further step and decide what specific things they should regularly do to keep the data lake clean. It can be overwhelming to restore a data lake which has converted a swamp.

 

6. Flexibility & Discovery with Quick Data Transformation

A data lake should allow for flexible data refinement policies, auto data discovery and provide an agile development environment.

Many data lakes are deployed to handle large volumes of web data and can capture large data collections.

Out of the box transformations that are ready for use should be implemented in the native environment. One should be able to get accurate statistics and load control data for better insights into processes that can provide an operational dashboard using the statistics.

 

7. Enhancing Security and Operations Visibility

User authentication, user authorization, data in motion encryption and data at rest encryption is needed to keep your data safe, to securely manage data in the data lake.

The data lake solution should be able to provide real-time operations monitoring and debug capabilities and notify with real-time alerts on new data arrivals. In order to extract the most value out of your data, you need to be able to adapt quickly and integrate your data seamlessly.

 

8. Make Data Lake Multipurpose

A single lake should typically fulfill multiple architectural purposes, such as data landing and staging, archiving for detailed source data, sandboxing for analytics data sets, and managing operational data sets.

Being multipurpose, it may need to be distributed over multiple data platforms, each with unique storage or processing characteristics.

Today, data lake has come on strong in recent years and fits today's data and the way many users want to organize and use their data. Its ability to ingest data to be used for operations and analytics as enterprise’s requirements for business analytics and operations evolve.

Are you interested in exploring how data lakes can be best utilized for your enterprise? Contact us to get the conversation started.

Topics: Data Engineering & Analytics, Architecture

Data Lake Strategy:  6 Common Mistakes to Avoid During Implementation

Posted by Nilanjana on Aug 29, 2019 5:42:00 PM

While we have talked a lot about the rising need for data lakes, it’s probably as important to talk about how easily they can go wrong in the absence of a good data lake strategy. While most businesses expect phenomenal insights, not enough attention is paid to actually setting it up in the right manner. And that is where it can all start to unravel. 

It's not uncommon to see scenarios where businesses have invested a lot of time, money and resources into building a data lake but it’s actually not being used. It can be that people are slow to adopt it or it could be that faulty implementation actually made the data lake useless. 

So here, we take a brief look at six common data lake strategy pitfalls, and how to avoid them. 

Challenges involved in Loading Data 

There are two challenges involved when loading data into a data lake:

Managing big data file systems requires loading an entire file at a time. While this is no big deal for small file types, doing the same for large tables and files becomes cumbersome. Hence to minimize the time for large data sets, you can try loading the entire data set once, followed by loading only the incremental changes. So you can simply identify the source data rows that have changed, and then merge those changes with the existing tables in the data lake.

Data lake consumes too much capacity to load data from the same data source into different parts of the data lake. As a result, the data lake gets a bad reputation for interrupting operational databases that are used to run the business. To ensure this doesn’t happen, strong governance processes are required.

Lack of Pre-planning

Data lakes can store an unfathomable amount of data, but not planning the value of data before dumping it is one major reason for their failure. While the point of a data lake is to have all of your company’s data in it, it is still important that you build data lakes in accordance with your specific needs. Balancing the kind of data you need with the amount of data you dump into the data lake ensures the challenges of the data lake implementation is minimized.

Uncatalogued Data

When you store data into a data lake, you also need to make sure it is easy for analysts to find it. Merely storing all the data at once, without cataloguing is a big mistake for a few key reasons

  • Can lead to accidental loading of the same data source more than once, eating into storage
  • Ensuring metadata storage is key to a data lake that’s actually useful. There are several technologies available to set up your data cataloging process. You can also automate it within your data lake architecture with solutions like AWS Glue. 

Duplication of Data

When Hadoop distributions or clusters pop up all over the enterprise, there is a good chance you’re storing loads of duplicated data. As a result, data silos are created which inhibits big data analytics because employees can’t perform comprehensive analyses using all of the data.

All of this essentially re-creates the data proliferation problem data lakes were created to solve in the first place.

Inelastic Architecture

On of the most common mistakes organizations make is building their data lakes with inelastic architecture. Several of them start out with one server at a time, slowly and organically growing their big data environment, and adding high performance servers to keep up with the business demands. While this decision is taken because data storage can be costly, it eventually proves to be a mistake in the long run when the growth of data storage outpaces the growth of computing needs and maintaining such a large, physical environment becomes cumbersome and problematic.

Not the Right Governance Process

Not using the right governance process can be another obstacle to your data lake implementation. 

  • Too much governance imposes so many restrictions on who can view, access, and work on the data that no one ends up being able to access the lake, rendering the data useless
  • Not enough governance means that organizations lack proper data stewards, tools, and policies to manage access to the data. Unorganized and mismanaged data lakes can lead to an accumulation of low quality data, which is polluted or tampered with. Eventually the business stops trusting this data, rendering the entire data lake useless

Implementing good governance process and documenting your data lineage thoroughly can help illuminate the actions people took to ingest and transform data as it enters and moves through your data lake.

While this is by no means an exhaustive list, these are some of the most seen mistakes that businesses make. Plugging these holes in your data lake strategy sets you up for better returns from your initiative right out the gate. It also ensures that your data lake does not become a data swamp where information and insights disappear without a trace.

Working on a data lake strategy for your enterprise? Or building the right data lake architecture to leverage and monetize your data?

Tell us a bit about your project and our experts will be in touch to explore how Srijan can help.

Topics: Project Management, Agile, Data Engineering & Analytics

Preparing For A Data Lake Implementation

Posted by Kimi Mahajan on Aug 29, 2019 5:39:00 PM

Data remains a giant value generator and reinforces your enterprise’s ability to stay ahead of the competition.

However, managing, securing and storing data for its continued relevance and using that voluminous information to your advantage is difficult at times, and requires a streamlined process flowchart.

So, how do you make data more useful to you and benefit from its infinite possibilities? What are the cutting-edge tools you need to keep your enterprise future-ready?

We have already discussed the basics of Data Lake and  the expected stages of data lake implementation. Let’s dig deeper as to when and why to implement data lakes and how to strategize the implementation process.

When Should You Opt for a Data Lake

Here are a few scenarios you could be looking at, when it comes to enterprise data:

  • You’re working with a growing amount of unstructured data
  • You want to leverage big data across your offerings
  • Your organization needs a unified view of information
  • You need to be able to perform real-time analysis on data
  • Your organization is moving towards a culture of democratized data access
  • You need access to data, analytics and applications
  • Your organization can benefit from elasticity of scale

If one or more of these look familiar, then it’s time to formulate a phased transformational process.

Traditionally, an Enterprise Data Warehouse (EDW) has served as the foundation for data discovery and functioned well in defining the data according to its quality. However, EDWs are restricted in scope and ability, and are unable to handle data complexities.

So a data lake is required, to expand the possibilities of what you can do with your data. You can take a look at the whole data lake vs. data warehouse discussion, and see how they are actually complimentary.

That said, you can take a call whether now is the right time to start with a data lake or can you invest in that a few months/years down the line. And that depends mostly on your current business goals and challenges, and the kind of data that’s currently most valuable to you.

Here’s a list of pointers to consider before preparing to implement data lake architecture:

Type of Data

Data lakes are best used to store constantly generated data, which often accumulates quickly.

Usually streaming data has a common workload of tens of billions of records totalling to hundreds of terabytes. If you’re handling such huge amount of data, then you should definitely consider a data lake since the costs of structuring and storing it in a relational database will be too high.

Choosing to stay with data warehouse could be a better choice if you’re mostly working with traditional, tabular information, e.g., data generated by financial, CRM or HR systems.

Understanding the Intent

One of the great things about data lakes is the flexibility with which data is ingested and eventually be used, with a sole principle to ‘store now, analyze later’.

A data lake could be a good fit for a project where higher level of flexibility is required.

Complexity of Data Acquisition Process

The process of adding newly acquired data to your warehouse can often be a resource-intensive process. And the process can even get more complex when it comes to unstructured or semi-structured sources, with a serious ETL overhead in order to ingest the data into a format that your data warehouse can work with.

If this complex process is making you consider giving up on some sources altogether, it’s time to consider a data lake – which will allow you to store all the data with minimal overhead, and then extract and transform the data when you want to actually do something with it.

Existing Tools and Skills

A data lake would typically require big data engineers, which are difficult to find. In case of lack of such skills, consider sticking to your data warehouse until the prerequisite engineering talent is hired to manage your data lake.

Data Management and Governance

Both data lakes and data warehouses pose challenges when it comes to governance. Data warehouses pose the challenge of constantly maintaining and managing all the data, whereas data lakes are often quite difficult to effectively govern. Whichever approach you choose, make sure you have a good way to address these challenges as per your project.

The above points will help you decide to opt for data lake or not.

Once you decide to stay with data lake, blindly plunging into its implementation won't necessarily benefit your organization. The big picture of what you want to achieve with your data, and a strategy for a cohesive data infrastructure is crucial.

Strategy for Implementing Data Lake

A haphazard approach may lead to several challenges hampering the use of a data lake to support big data analytics applications.

In the absence of an overarching strategy, a lot of data handling best practices can get overlooked, causing challenges and bottlenecks further down the line. For example, not documenting the relevance of data objects stored in a data lake might make it difficult for data scientists to find relevant data and track who accesses what data sets and determine what level of access privileges are needed on them.

So, here are seven steps to avoid such concerns for implementing data lakes.

  1. Create a taxonomy of data classifications
    Classification of data objects plays an important role in how they’re organized. Identify the key dimensions of the data such as data type, content, usage scenarios, groups of possible users and data sensitivity as part of your classifications.
  2. Design a proper data architecture
    Apply the defined classification taxonomy to direct how the data is organized. Include file hierarchy structures for data storage, file and folder naming conventions, access methods and controls for different data sets. 
  3. Employ data profiling tools
    The segregation of data going into a data lake can be easily done by analyzing its content. Data profiling tools can help by gathering information about what's in data objects, thereby providing insight for classifying them. They can also help in identifying data quality issues to ensure analysts are working with accurate information.
  4. Standardize the data access process
    Use of diverse data access methods to obtain different data sets often pose difficulties. Standardizing the procedure with the help of a common and straightforward API can simplify data access and ultimately allow more users to take advantage of the data.
  5. Develop a searchable data catalog
    Prospective users might not be aware of what's in a data lake and where different data sets are located. A collaborative data catalog allows the users to know the details about each data asset and provides a forum for groups of users to share experiences, issues and advice on working with the data.
  6. Implement sufficient data protections
    Aside from the conventional aspects of IT security, utilize other methods to prevent the exposure of sensitive information contained in a data lake. This includes mechanisms like data encryption and data masking, along with automated monitoring to generate alerts about unauthorized data access or transfers.
  7. Raise data awareness internally
    Ensure the users of your data lake are aware of the need to actively manage and govern the data assets it contains with appropriate training. Knowledge of using the data catalog to find available data sets, and configuring analytics to access the data they need will help press upon them the importance of proper data usage.

Organizations are increasingly attempting to innovate processes, driving heightened service excellence and delivery quality. Interested in knowing how data lakes represent a smarter opportunity for effective data management and usage for your organization?

Contact us and let our experts do the talking.

 

 

Topics: Project Management, Agile, Data Engineering & Analytics

Data Lake vs Data Warehouse: Do you need both?

Posted by Nilanjana on Jul 17, 2019 3:20:00 PM

Most enterprises today have a data warehouse in place that is accessed by a variety of BI tools to aid the decision making process. These have been in use since several decades now and served the enterprise data requirements quite well. 

However, as the volume and types of data being collected expands, there’s also a lot more that can be done with it. Most of these are use cases that an enterprise might not even have identified yet. And they won’t be able to do that until they have had a chance to actually play around with the data. 

That is where the data lake makes an entrance. 

We took a brief look at the difference between a data warehouse and lake when defining what is a data lake. So in this blog, we’ll dig a little deeper into the data lake vs data warehouse aspect, and try to understand if it’s a case of the new replacing the old or if the two are actually complementary.

Data lake vs. Data Warehouse

The data warehouse and data lake differ on 3 key aspects:

Data Structure

A data warehouse is much like an actual warehouse in terms of how data is stored. Everything is neatly labelled and categorized and stored in a particular order. Similarly, enterprise data is first processed and converted into a particular format before being accepted into the data warehouse. Also, the data comes in only from a select number of sources, and powers only a set of predetermined applications. 

On the other hand, a data lake is a vast and flexible repository where raw, unprocessed data can be stored. The data is mostly in unstructured or semi-structured format with the potential to be used by any existing business application, or ones that an enterprise could think of in the future.

The difference in data structure also translates into a critical cost advantage for the data lake. Cleaning and processing raw data to apply a particular schema on it is a time consuming process. And changing this schema at a later date is also laborious and expensive. But because the data lakes do not require a schema to be applied before ingesting the data, they can hold a larger quantity and wider variety of data, at a fraction of the cost of data warehouses.

Purpose

Data warehouses demand structured data because how that data is going to be used is already defined. As the cleaning and processing of data is already expensive, the aim with data warehouses is to be as efficient with storage space as possible. So the purpose of every piece of data is known, with regards to what will be delivered to which business applications. That ensures that space is optimized to the maximum.

The purpose of the data flowing into the data lake is not determined. It’s a place to collect and hold the data, and where and how it will be used is decided later on. It usually depends on how that data is being explored and experimented with, and the requirements that arise with innovations within the enterprise.

Accessibility

Data lakes are overall more accessible as compared to data warehouses. Data in a data lake can be easily accessed and changed because it’s stored in the raw format. On the other hand, data existing in the data warehouse takes a lot of time and effort to be transformed into a different format. Data manipulation is also expensive in this case.

Does the data lake replace the data warehouse?

No. A data lake does not replace the data warehouse, but rather complements it. 

The organized storage of information in data warehouses makes it very easy to get answers to predictable questions. When you know that business stakeholders need certain pieces of information, or analyze specific data sets or metrics regularly, the data warehouse is sufficient. It is built to ingest data in the schema that will quickly give the required answers. For example: revenue, sales in a particular region, YoY increase in sales, business performance trends - all can be handled by the data warehouse. 

But as enterprises begin to collect more types of data, and want to explore more possibilities from it, the data lake becomes a crucial addition.

As discussed, schema is applied to the data after it’s loaded into the data lake. This is usually done at the point when the data is about to be used for a particular purpose. How the data fits into a particular use case determines what schema will be projected onto it. This means that data, once loaded, can be used for a variety of purposes, and across different business applications. 

This flexibility makes it possible for data scientists to experiment with the data to figure out what it can be leveraged for. They can set up quick models to parse through the data, identify patterns, evaluate the potential business opportunities. The metadata created and stored alongside the raw data makes it possible to try out different schemas, view the data in different structured formats, to discover which ones are valuable to the enterprise. 

Given these characteristics of the data lake, it can augment a data warehouse in a few different ways:

  • Start exploring the potential of the data you collect, beyond the structured capabilities of your current data warehouse. This could be around new products and services you can create with these data assets, or even enhance your current processes. For example: leverage data lake to gather information of site visitors and use that to drive more personalized buyer journeys and evolving marketing strategies.
  • Use the data lake as a preparatory environment to process large data sets before feeding them into your data warehouse
  • Easily work with streaming data, as the data lake is not limited to batch-based periodic updates.

     The bottomline is, the data warehouse continues to be a key part of the enterprise data architecture. It           keeps your BI tools running and allows different stakeholders to quickly access the data they need. 

But the data lake implementation further strengthens your business because:

  • You have access to a greater amount of data that can be stored for use, irrespective of its structure or quality
  • Storage is cost effective because it eliminates the need for processing the data before storage
  • Data can be used for a larger variety of purposes without having to bear the cost of restructuring it into different formats
  • The flexibility to run the data through different models and applications makes it easier and faster to identify new use cases

In a market where the ability to leverage data in novel ways offers a critical competitive advantage, the focus should no longer be on data lake vs data warehouses. If enterprises want to stay ahead, they will have to realise the complementary functions of the data warehouse and the lake, and work towards a model that gets the best out of both.

Interested in exploring how a data lake fits into your enterprise infrastructure? Talk to our expert team, and let’s find out how Srijan can help.

Topics: Data Engineering & Analytics

Data Lake Implementation - Expected Stages and Key Considerations

Posted by Nilanjana on Jun 17, 2019 4:01:00 PM

 

Efficient data management is a key priority for enterprises today. And it’s not just to drive effective decision-making for business stakeholders, but also for a range of other business processes like personalization, IoT data monitoring, asset performance management and more.

 

Most enterprises are maturing out of their traditional data warehouses and moving to data lakes. In one of our recent posts, we covered what is a data lake, how it’s different from a data warehouse, and the exact advantages it brings to enterprises. Moving a step further, this post will focus on what enterprises can expect as they start their data lake implementation. This mostly centres around the typical data lake development and maturity path, as well as some key questions that enterprises will have to answer before and during the process.

Enterprise Data Lake Implementation - The Stages

Like all major technology overhauls in an enterprise, it makes sense to approach the data lake implementation in an agile manner. This basically means setting up a sort of MVP data lake that your teams can test out, in terms of data quality, storage, access and analytics processes. And then you can move on to adding more complexity with each advancing stage. 

Most companies go through the basic four stages of data lake development and maturity

Data Lake Implementation

Stage 1 - The Basic Data Lake

At this stage you’ve just started putting the basic data storage functionality in place. The team working on setting up the data lake have made all the major choices in terms of using legacy or cloud-based technology for the data lake. They have also settled upon the right security and governance practices that you want to bake into the infrastructure.

With a plan in place, the team builds a scalable but currently low-cost data lake, separate from the core IT systems. It’s a small addition to your core technology stack, with minimal impact on existing infrastructure. 

In terms of capability, the Stage 1 data lake can:

  • Store raw data coming in from different enterprise sources
  • Combine data from internal and external sources to provide enriched information

Stage 2 - The Sandbox

The next stage involves opening up the data lake to data scientists, as a sandbox to run preliminary experiments. Because data collection and acquisition is now taken care of, data scientists can focus on finding innovative ways to put the raw data to use. They can bring is open-source or commercial analytics tools to create required test beds, and work on creating new analytics models aligned with different business use cases.

Stage 3 - Complement Data Warehouses

The third stage of data lake implementation is when enterprises use it as complementary to existing data warehouses. While data warehouses focus on high-intensity extraction from relational databases, low-intensity extraction and cold or rarely used data is moved to the data lakes. This ensures that the data warehouses don’t exceed storage limits, while low priority data sets still get stored. The data lake offers an opportunity to generate insights from this data, or query it to find information not indexed by traditional databases.

Stage 4 - Drive Data Operations

The final stage of maturity is when the data lake become a core part of the enterprise data architecture, and actually drives all data operations. At this point, the data lake will have replaced other data stores and warehouses, and is now the single source of all data flowing through the enterprise. 

The data lake now enables the enterprise to:

  • Build complex data analytics programs that serve various business use cases
  • Create dashboard interfaces that combine insights from the data lake as well as other application or sources
  • Deploy advanced analytics or machine learning algorithms, as the data lake manages compute-intensive tasks

This stage also means that the enterprise has put in place strong security and governance measures to optimally maintain the data lake. 

Points to Consider to Before Data Lake Implementation

While the agile approach is a great way to get things off the ground, there are always roadblocks that can kill the momentum on the data lake initiative. In most cases, these blocks are in the form of some infrastructural and process decisions that need to be made, to proceed with the data lake implementation. 

Stopping to think about and answer these questions in the middle of the project can cause delays because now you also have to consider the impact of these decisions on work that’s already been done. And that’s just putting too many constraints into the project, 

So here’s a look at a few key considerations to get out of the way, before you embark on a data lake project:

Pin Down the Use Cases

Most teams jump to technology considerations around a data lake as their first point of discussion. However, defining a few most impactful use cases for the data lake should take priority over deciding the technology involved. That’s because these defined use cases will help you showcase some immediate returns and business impact of the data lake. And that will be key to maintaining project support from the higher up the chain of command, and project momentum.

Physical Storage - Get It Right

The primary objective of the data lake is storing the vast amount of enterprise data generated, in their raw format. Most data lakes will have a core storage layer to hold raw or very lightly processed data. Additional processing layers are added on top of this core layer, to structure and process the raw data for consumption into different application and BI dashboards.

Now, you can have your data lake built on legacy data storage solutions like Hadoop or on cloud-based ones, as offered by AWS, Google or Microsoft. But given the amount of data being generated and leveraged by enterprises in recent times, the choice of data storage should consider:

  • Your data lake architecture should be capable of scaling with your needs, and not run into unexpected capacity limits
  • Should be designed to support structured, semi-structured and unstructured data all in a central repository
  • Building a core layer that can ingest raw data, so a diverse range of schema can be applied as needed at the point of consumption
  • Ideally decouple the storage and computation functions, allowing them to scale independently

Handling Metadata

Because information in the data lake is in the raw format, it can be queried and utilized for multiple different purposes, by different applications. But to make that possible, usable metadata that reflects technical and business meaning also has to be stored alongside the data. The ideal way is to have a separate metadata layer that allows for different schema to be applied on the right data sets. 

A few important elements to consider while designing a metadata layer are:

  • Make metadata creation mandatory for all data being ingested into the data lake from all sources
  • You can also automate the creation of metadata by extracting information from the source material. This is possible if you are on a cloud-based data lake

Security and Governance

The security and governance of an enterprise data should be baked in the design from the start, and be aligned with the overall security and compliance practices within the enterprise. Some key pointers to ensure here:

  • Data encryption, both for data in storage and in transit. Most cloud-based solutions provide encryption by default, for core and processed data storage layers
  • Implementing network level restrictions to block big chunks of inappropriate access paths
  • Create fine-grained access controls, in tandem with the organization-wide authentication and authorization protocols
  • Create a data lake architecture that enforces basic data governance rules like the compulsory addition of metadata, or defined data completeness, accuracy, consistency requirements.

With these questions answered in advance, your data lake implementation will move at a consistent pace. 

Interested in exploring how a data lake fits into your enterprise infrastructure? Talk to our expert team, and let’s find out how Srijan can help.

Topics: Data Engineering & Analytics, Architecture

What is a Data Lake - The basics

Posted by Nilanjana on May 31, 2019 3:49:00 PM


In the next 10 years, the global generation of data will grow from 16 zettabytes, to 160 zettabytes, says an estimate by IDC. In addition to this, the forecast by Deloitte claims that unstructured data is set to grow at twice that rate, with the average financial institution accumulating 9 times more unstructured data than structured data by 2020. And it stands to reason that data generation by enterprises in every industry will increase in a similar fashion.

All this data is crucial for businesses - for understanding trends, formulating strategies, understanding customer behaviour and preferences, catering to those requirements and building new products and services. But actually gathering, storing and working with data is never an easy task. Yes, the sheer volume of data seems intimidating, but that’s the least of our problems.

The fact that data is stored fragmented, in silos across the organization, or that a lot of enterprise data is never used because it’s not in the right format are currently some of the biggest challenges for enterprise working with big data.

Solution? Data lake.

What is a Data Lake?

A data lake is a part of the data management system of an enterprise, designed to serve as a centralized repository for any data, of any size, in its raw and native format. The most important element to note here is that a data lake architecture can store unstructured and unorganized data in its natural form for later use. This data is tagged with multiple relevant markers so it’s easy to search with any related query.

Data lakes operate on the ELT strategy:

  • Extract data from various sources like websites, mobile apps, social media etc
  • Load data in the data lake, in its native format
  • Transform it later to derive meaningful insights as and when there is a specific     business requirement.

    Since it is raw, the data can be transformed in the format of choice and convenience. When a business question arises, the data lake can be searched for relevant data sets which can be analyzed to help answer those questions. This is possible because the schema of the stored data are not defined in the repository, unless it is required by a business process.

 

This possibility of exploration and free association of unstructured data often leads to the discovery of more interesting insights than predicted.

How is Data Lake Different from a Data Warehouse 

A data Lake is often mistaken for a different version of a data warehouse. Though the basic function is the same – data storage, they both differ in the way information is stored in them.

Storing information in data warehouses requires properly defining the data, converting it into acceptable formats and defining its use case beforehand. In the process of data storage in a warehouse, the ‘transformation’ step of the ELT strategy comes before the ‘Loading’ phase. With a data warehouse:

  • Data is always structured and organized before being stored

  • Sources of data collection are limited

  • Data usage may be limited to a few pre-defined operational purposes and it may not be possible to exploit it to its highest potential

What are the Advantages of a Data Lake Architecture

Given the fact that enterprises collect huge volumes of data is different systems across the organization, a data lake can go a long way in helping leverage it all. Some of the key reasons to build a data lake are:


  • Diverse sources: Generally, data repositories can accept data from limited sources, after it has been cleaned and transformed. Unlike those, data lakes store data from a large range of data sources like social media, IoT devices, mobile apps etc. This is irrespective of the structure and format of the data, and ensures that data from any business system is available for usage, whenever required.
  • Ease of access to data: Not only does a data lake store information coming from various sources; it also makes it available for anyone in need of required data. Any business system can query the data lake for the right data, and define how it is processed and transformed to derive specific insights.
  • Security: Although anyone can freely access any data in the lake, access to the information about the source of that data can be restricted. This makes any data exploitation, beyond requirement, very difficult.
  • Ease of usage of data: The unprocessed data stored directly from the source allows greater freedom of usage to the information seeker. Data scientists and business systems working with the data do not need to adhere to a specific format while working with the data.
  • Cost effective: Data lakes are a single platform, cost effective solution for storing large data coming from various sources within and outside the organization. Because a data lake is capable of storing all kinds of data, and easily scalable to accommodate growing volumes, it is a one-time investment for enterprises to get it in place. Integrating a data lake with your cloud is another option which allows you to control your cost as you only pay for the space you actually use.
  • Analytics: Data lake architecture, when integrated with enterprise search and analytics techniques, can help firms derive insights from the vast structured and unstructured data stored. A data lake is capable of utilizing large quantities of coherent data along with deep learning algorithms to identify information that powers real-time advanced analytics. Processing raw data is very useful for machine learning, predictive analysis and data profiling.

Data Lake Use Cases

With the sheer variety and volume of data being stored, data lakes can be leveraged for a variety of use cases. A few of the most impactful ones would be:

Marketing Data Lake

The increasing focus on customer experience and personalization in marketing has data at the heart of it. Customer information, whether anonymized or personal, forms the base for understanding and personalizing for the user. Coupled with data on customer activity on the website, social media, transactions etc, it allows enterprise marketing teams to know and predict what their customers need.

With a marketing data lake, enterprises can gather data from external and internal systems and drop it all in one place. The possibilities with this data can be at several levels:

  • Basic analytics can help get a comprehensive look into persona profiles and campaign performance
  • Unstructured data coming from disparate sources can be queried and leverage to form basic and advanced personalization and recommendation engines for users
  • Moving further, a 360 degree view of individual customers can be formed with a data lake, pulling together information on customer journey, preferences, social media activity, sentiment analysis and more. Because of the sheer diversity of data, it is possible to drill down into any aspect of the customer lifecycle
  • Beyond this, enterprises can have data scientists perform exploratory analysis, look at the wide spectrum of data available, build some statistical models and check if any new patterns and insights emerge.

Cyber Security

Securing business information and assets is a crucial requirement for enterprises. This means cyber security data collection and analysis has to be proactive and always on. All such data can be constantly collected in data lakes, given its ability to store undefined data. It can also be constantly or periodically analyzed in order to identify any anomalies and their causes, to spot and nullify cyber threats in time.

Log Analytics

A lot of enterprises today rely on IoT data streaming in from various devices. A data lake can be the perfect storage solution to house this continuously expanding data stream. Teams also run quick cleaning processes on it and make it available for analysis across different business functions.

So that was a quick look at what is a data lake and why enterprises should consider building one. Moving forward, we’ll dive into how exactly to set up a data lake and the different levels of maturity for enterprise data lakes.

Interested in exploring how a data lake fits into your enterprise infrastructure? Talk to our expert team, and let’s find out how Srijan can help.
 

Topics: Data Engineering & Analytics

How We Built an Intelligent Automation Solution for KYC Validation

Posted by Sriram Sitaraman on Feb 15, 2019 1:52:00 PM

Financial institutions sift through a huge volume of documents as a key part of their operational processes. More importantly, the need for regulatory compliance means there is very low tolerance for error in these tasks.

However, document verification and processing for KYC validation, insurance claims, customer onboarding etc. are time-consuming processes across enterprises. By recent estimates, 26 days is the average customer on-boarding time for financial institutions. Organizations are also spending a lot on these processes, as they retain large teams to do the work manually. And scaling up operations just means employing more people.

Is there a way around these challenges?

Intelligent Automation Solution

While Robotic Process Automation (RPA) has a mainstream role in automating many of the manual processes in the BFSI sector. But this particular task requires AI with advanced Machine Learning algorithms to understand the documents in context. This is Intelligent Automation solution - blending AI with automation, which can create solutions that can read the documents, understand the content in context, and find patterns in the data.  

At Srijan, we created a POC for an Intelligent Automation solution for (KYC) validation, that can automate a key portion of the process. The solution employs deep-learning algorithm to scan documents and images uploaded by end-users, and classify them into pre-programmed categories.

Here’s a look.

 

 

The solutions is designed using the following technologies:

  • Convoluted Neural Network (CNN) using Python and TensorFlow
  • OpenCV for Computer Vision
  • OCR and MRZ packages

How It Works

The solution uses a combination of deep-learning based image recognition and classification models as well as Optical Character Recognition (OCR).  It is capable of:

  • understanding given text or image material

  • acting upon it according to a pre-trained set of rules

Let’s say we are working with passports submitted during the KYC process. Here’s what the solution does:

  • Scanning - to extract personal details and passport expiry dates

    • “Read” the passport, extract different sections of the main page, using OCR to read certain sections

    • Computer Vision solutions leveraging OpenCV are used to read the machine-readable zones in the passport

    • Deep Learning algorithms leveraging Tensorflow framework and OpenCV extract the photograph from the passport, as well as identify any “Cancellation” or other stamps

  • Compare extracted information with information available in the database, to validate submitted proof document

  • Based on the above comparison and validation, the solution can classify the document submitted, in this case the passport, as verified, expired, cancelled, or a data mismatch.

  • Cases that cannot be categorized with appropriate degree of accuracy or confidence are marked for manual classification

  • In case of manual intervention, a workflow is created where the operations team can validate manually and classify them

  • The model learns from manual classification, and over time can spot patterns and closely mirror the manual results. This is accomplished by automated retraining of the model including the newer data and manual classification data

How This Helps

With the KYC validation solution, enterprises can automate repetitive manual processes, achieving:

  • Speed: Faster turnaround at most stages of manual processes, to solve scalability challenges and time-critical needs. For example: document verification in 1/10th of the time taken manually

  • Accuracy: Rule-based algorithms executed by software makes sure that there is near-zero margin of error in processes

  • Efficiency: Intelligent automation means tasks are done efficiently, compliant to standard processes, and with minimal need for manual intervention. For example: reduce manual efforts for KYC verification by 70%

  • Resource Management: As repetitive processes are automated, organizations have the freedom to utilize their human resources for more value-added tasks.

Automation just a segment of the KYC validation can bring in a host of significant benefits, as outlined above. But the solutions can be extended to other BFSI operations, or even other industry use cases to deliver similar gains:

  • Passport checks at airports

  • Processing insurance claim documents

  • Reconcile financial statements

  • Resolve credit card disputes

  • Any other manual & repetitive processes that require documents to be validated or reviewed

Have repetitive manual processes that you think can be automated? Looking to increase cost saving on operations without compromising quality and productivity?

Let’s start the conversation on how Srijan’s experts teams can help identify key opportunities to deploy intelligent automation for your business.

Topics: Machine Learning & AI, Data Engineering & Analytics

Understanding Data Engineering - Part 1

Posted by Surya Akasam on Feb 6, 2019 1:20:00 PM

The need to use the Big Data to make businesses progress towards data-driven decision making created data engineering and it is evolving at a rapid pace. “Data engineering” & “data engineer” are the relatively new terms and are extremely popular since the last decade. 

Tracing the History to Data

 

Before we dive deeper into Data Engineering and how it is impacting both centuries old businesses and startups equally, let's see the brief history of events to know how it evolved over a period of time. There’s a fascinating timeline by the World Economic Forum, and I am picking some critical moments from that list:

1958: Seed of Business Intelligence

IBM researcher Hans Peter Luhn defines Business Intelligence as “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.”

1965: Birth of First Data Center

The US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tapes.

1970 - The Birth of RDBMS

IBM mathematician Edgar F Codd [father of SQL] presents his framework for storing and retrieving the data called “relational database”

1976 - The Birth of ERP

Material Requirements Planning (MRP) systems are becoming more commonly used across the business world and evolved as Enterprise Resource Applications (ERP) as we know today, like SAP, Oracle Financials.

1991 - Birth of the Internet

Tim Berners-Lee posted a summary of the World Wide Web project on several internet newsgroups, including alt.hypertext, which was for hypertext enthusiasts. The move marked the debut of the web as a publicly available service on the internet.

1997

Michael Lesk publishes his paper "How Much Information is there in the World?" estimating 12,000 petabytes and the data is growing at the 10X per year and says that this data is just collected and not been accessed by anyone and states that no insights can be derived from it.

Foundations of Big Data

1999

Association for Computing Machinery published an Article “Visually Exploring Gigabyte Datasets in Real Time" where the first time the world big data appeared and it quotes that “Purpose of computing is insights not numbers”

2000

Peter Lyman and Hal Varian tried to quantify the digital information and its growth in the world they concluded that “The world’s total yearly production of print, film, optical and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on Earth.”

And

From 2005 the web 2.0 had played a key role for increasing the quantum if data collected on a daily basis.

The world’s servers process 9.57 zettabytes (9.57 trillion gigabytes) of information – equivalent to 12 gigabytes of information per person, per day), according to the "How Much Information? 2010 report".

Working with Big Data

With that said, it is understood that big data is a possible future which businesses cannot ignore and it is fundamentally changing the way how businesses operate. And it's time to believe that “data is new fuel” that runs the businesses. But this also means they have to figure out the answers to some of the common challenges around Big Data:

How to store big Data?

Traditional methods of storing the data in RDBMS is not possible. It may store the volume & velocity Data, but it falls short in the variety aspect, because RDBMS is designed as transactional store (OLTP) and it expects the data to be in particular schema.

How to Process Big Data?

Traditionally data warehouses were designed to mine large amounts of data for insights and they work on the principle of ETL (Extract, Transform & Load). It expects data to be a particular schema; hence it also shortfalls in a variety of data storage.

How to collect real-time Big Data?

Real-time data is crucial is businesses want to shit from instinct, to predictive and proactive decision making.

The need for busineese to leverahe BigData gave rise to NoSQL (document, key value & graph-based), the cloud, data lakes etc. which were unheard of a decade earlier, and a need for professionals like data scientists, data engineers and cloud architects.

Understanding the Data Engineer

Data Engineering is a practice which creates a structure for how Big data is collected, maintained and shared with different stakeholders, so they can derive business value from it.

A Data Engineer is the person responsible for the building these structures.

Data Engineer Vs Data Scientist

Data Scientist is a role that people often confuse with Data Engineer.

It is an agreed fact that there are some technologies they commonly use between them, but they are very different role serving different purpose

Basic differences between Data Engineer & Scientist

1. Data Engineer creates/provides the structure & process how the data will be collected, processed and shared with the stakeholder. A Data Scientist on the other hand, is the stakeholder who uses that data to provide insights that business, by leveraging statistical models.

2. Data Engineer core competencies include distributed computing, data pipelines, and advance programming. For data scientists, core competencies include machine learning, artificial intelligence, statistics & mathematics.

A data scientist role is much fancied these days. But a data scientist is only as good as their data. That that’s taken care of by the data engineer.

With some of the basic definitions and differences out of the way, the next part of this blog post will discuss “How a Data Engineer can use Cloud to create Data Lakes”, which is at the core building a Big Data practice at any enterprise. 

Topics: Financial Services, Data Engineering & Analytics

A Tableau Dashboard Lifecycle

Posted by Anil Saini on Oct 19, 2018 11:24:00 AM

A dashboard is a vital tool to understand the business performance of an organization. From a single interface, decision makers have access to key performance indicators (KPIs) of their business. The successful implementation of a dashboard is complex and requires a step-by-step process — a methodology that considers all aspects of the project lifecycle.

A basic dashboard development process would cover the following aspects:

 

Stage 1:- Functional Knowledge

Functional knowledge is where it all begins. In this stage, the business analyst (BA) works closely with business stakeholders to understand the current functionality and terminology of the business. This helps is chalking out what exactly the dashboard should be able to deliver.

 

Stage 2:- Requirement Analysis

Once we understand the functionality of the business, it’s time for requirement analysis. At this stage, the BA and architects analyze and pin down certain specifics before proceeding to develop a Tableau dashboard:

  • Dashboard requirement
  • How data flows in the existing system, and the environment where data is situated
  • Layout and blueprint/mock-ups of dashboards
  • The scope of the dashboard
  • Value added to the business
  • Required tools for development/testing etc and their cost

Also, this is the phase where the development team should ask themselves a mandatory question, “Is our team capable of fulfilling these requirements?”

 

Stage 3:- Plan

The planning phase revolves around creating a roadmap for end-to-end development and delivery. First, the project team members must be identified and their roles clearly defined. In this phase, the project manager and team lead are involved in determining the:

  • Timeline and number of resource needed and their roles (BA, developers, QA)
  • Allocation of work and leave plan(buffer resources)
  • Dependencies and challenges
  • Methodologies to follow: Agile, Scrum, Waterfall etc and divide them accordingly

 

Stage 4:- Technical Specs

In this phase, we must understand the technical requirements of the project which includes Tableau desktop/Tableau server on which the dashboards need to be developed, data source setup and flow of data from transactional DB to reporting DB, testing tool etc.

Once these have been decided, the BA and technical architect need to understand the

  • data mapping - from the mockup dashboard to the tables and fields present in the database
  • the relationship between different tables in case of relational databases(RDBMS)

 

The last step here is to document these specs and get them verified by the client or the technical team.

In short, this phase includes:

  • All the technical details
  • Joins, relations, and SQL
  • Credentials to access the database, reporting server credentials to publish them
  • KPIs  to be measured and business logic

 

Stage 5:- Development

With all the information and understanding in place, dashboard development starts with:

  • SQL developer to generate the query
  • BI developer to design and develop the reports/dashboards
  • front-end developer to embed them in web-portal

This phase involves,

  • Connecting databases and building dimension models
  • Development of sheets & dashboards
  • Publishing them to server
  • Look and feel and appropriate filters on reports/dashboards etc.
  • Configure scheduling, refresh, and security
  • If required, customization like embedding in web-portal, passing filters from the web page, UI developer to develop web page where dashboard needs to be embedded.
  • Unit testing

 

Stage 6:- QA and Testing

Once developed, it’s time for the QA to check:

  • UI and functionality testing as per mock-up
  • Data validation and SQL testing
  • Testing schedules, jobs, and security testing
  • Testing of customization applied
  • Performance testing:- Report opening time, with/without the webpage

 

Stage 7:- UAT

(User Acceptance Testing) UAT is a crucial part of any BI project. It is the first time when business and IT together see the results of the project. This is where any necessary changes can be made to ensure that the final product is actually valuable for the end users.

This phase majorly includes data validation and functionality testing by the business user.

 

Stage 8:- Production & Support

Once the dashboard has been built and tested by the user, it is deployed into production. Security requirements must be implemented in the production environment. Integration within a corporate network environment must be completed, including considerations for portal frameworks etc. And after the product goes live and gains actual user traffic, monitoring, support, and maintenance must be provided.

So that’s how we approach building Tableau dashboards at Srijan, and we’ve followed the process successfully for several of our clients. You can also explore some of our experiments with various dashboard usecases.

Thanks for reading this, I hope you will find this helpful.

Love to hear your feedback and queries if any.

This post was originally published on Zyifers.

Topics: Data Engineering & Analytics, Architecture

Edge computing for IoT : what, why, & how

Posted by Sriram Sitaraman on Mar 8, 2018 12:25:00 PM

Gartner estimates that there will be around 8.4 billion connected devices installed worldwide by the end of 2017, up 31% on 2016, with roughly 37% of these devices set to be used by businesses and the rest by consumers. By 2020, it’s estimated that there will be more than 20 billion connected devices. Given this scale of IoT adoption, edge computing capabilities will have a significant impact on businesses, and their ability to compete in a dynamic market.

What is Edge Computing?

Edge computing refers to the processing of data of the Internet of Things (IoT) closer to where it is created, unlike cloud computing where it is sent over longer routes to data centers. In this way, data is processed at the edge of the network, by performing analytics and knowledge generation at or near the source of data. 

Examples of edge computing include a wide range of technologies, from wireless sensor networks, mobile data acquisition, mobile signature analysis to cooperated distributed peer-to-peer analysis ad-hoc networking.

What drives the need for it?

Cloud computing provides many benefits that today’s agile businesses can’t ignore, typically by transmitting data to a centralized computing location in the cloud. Cloud computing has made great strides in real-time data access, but latency is still an issue, which suggests universal centralization isn’t always the best idea for an organization.

IoT networks produce a huge amount of data. And even if it is possible to process this data, doing it on the cloud becomes impractical in terms of:

  • Cost-effectiveness

  • Computational capacity requirements

  • Relevancy

  • Network latency for critical actions

How will Edge computing power the future of IoT?

Irrespective of the industry you are in—whether it’s manufacturing, energy, transportation or any other—IoT will have a big impact on your business. Edge computing helps you manage and analyze all of the generated data at an increased speed with reduced load on the internet networks transmitting huge amounts of data. 

Edge computing use cases

Cameras, sensors, production line machines, cars and industrial equipment are a few examples of industries where edge computing is expected to play a larger role. Effective applications, enabling computation of data output at the edge would not only help in real-time decision making, mitigating any latency, but also save costs and improve RoI. A few use cases in various industry verticals are listed below: 

  • Oil & Gas: Edge computing is being deployed in a top-notch oil and gas company to detect faults at the machinery level, before they are found using predictive analysis.

  • Chemical Industries: It is being used to build smart petroleum refineries where the process is well analyzed to increase productivity and workplace safety.

  • Energy Sector: It is being used smartly in various energy producing industries to reduce power loss and make energy equipment reliable and efficient. 

  • Commuting Sector: Various transportation companies are making use of edge powered IoT devices and computing services to help find the right parking area and reduce parking downtime. Various AI algorithms work with these edge devices to optimize parking spaces and to collect real-time traffic and navigation data. They use analytics to make well-informed decisions.

  • Industrial Uses: Leveraging information at the edge, operations like shutting down systems can be carried out without having to query servers.

How it works

Edge computing works by pushing data, applications and computing power away from the centralized network to its extremes, enabling fragments of information to lie scattered across distributed networks of the server. Its target users remain any internet client using commercial internet application services. Earlier available to large-scale organizations, it’s now available to small and medium organizations because of the cost reductions in large-scale implementations.

Technologies used to enable computing at the edge of the networks

  • Mobile Edge Computing: Mobile edge computing or multi-access edge computing is a network architecture that enables the placement of computational and storage resources within the radio access network (RAN) to improve network efficiency and the delivery of content to end users. 

  • Fog Computing: This is a term used to describe a decentralized computing infrastructure which both extends cloud computing to the edge of a network while also placing data, compute, storage and applications in the most logical and efficient place between the cloud and the origin of the data. This is sometimes known as being placed “out in the fog.” 

  • Cloudlets: These are mobility-enhanced, small-scale cloud data centers located at the edge of a network and represent the second tier in a three-tier hierarchy: Mobile or smart device, Cloudlet and Cloud. The purpose of cloudlets is to improve resource intensive and interactive mobile applications by providing more capable computing resources with lower latency to mobile devices within a close geographical proximity. 

  • Micro Data Centers: These are smaller, reach-level systems that provide all the essential components of a traditional data center. It is estimated that micro data centers will be most beneficial to SMEs that don’t have their own data centers as larger corporations will tend to have more resources and thus not need such solutions.

 

For large enterprises with global operations, IoT, and consequently edge computing, are more a matter of 'when', rather than 'if'. The earlier businesses stat evaluating their requirements, the faster they can zero in on the necessary technology and implementation strategies. 

Srijan is already helping enterprises access and analyze their IoT data on interactive visualization dashboards. So how about we get that conversation started, and see how Srijan's expert teams can help with implementing IoT ecosystems and edge computing for your enterprise.

Topics: Machine Learning & AI, Data Engineering & Analytics

Discussion

Write to us

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms Of Service apply. By submitting this form, you agree to our Privacy Policy.

See how our uniquely collaborative work style, can help you redesign your business.

Contact us