How Edge Computing for IoT is set to impact Businesses

Posted by Sriram Sitaraman on Feb 14, 2018 4:29:00 PM

Connectivity has rapidly evolved from connected people to vast networks of connected things (IoT) including industrial machines, a variety of equipment, controllers, and sensors. With data scientists treading to harvest benefits, machine data are typically transmitted to a centralized computing location in the cloud.

The volume of data produced in IoT networks is massive. For example - a Boeing 787 generates 40 TB per hour of flight whereas a large retail store with IoT infrastructure gathers 10 GB per hour. Sending massive volumes of data to the cloud, even if technically doable, is both cost-prohibitive as well as impractical due to computational capacity requirements, relevancy, and network latency for critical actions.

This is where edge computing for IoT comes into play. The concept of edge computing refers to creating computing capacities nearer to the source/ consumer. The most familiar use of edge computing is- locally processing data that’s relevant near source or consumer to reduce the traffic to the central data-center, and mitigate latency where real-time response is required.

egde-computing-workflow

How Edge Computing will impact businesses

The speed and agility benefits of edge computing are so great that in coming years IoT-created data will be stored and acted upon close to the edge of the network rather than in their centralized data centers. It would not bring surprises if equipment manufacturers started building edge-computing capabilities into devices and sensors themselves over the next few years.

A research firm IDC estimates that “by 2019, 40% of IoT data will be stored, processed, analyzed, and acted upon close to or at the edge of the network.” Here are some ways edge computing will impact businesses: ·

  • Real-time response times: Faster or real-time response is a necessity in most applications of machine learning that involve time-critical action points such as artificial intelligence, robotic process automation, connected cars, M2M communications, etc. The inability to respond in real time could mean loss of business, functional and safety issues. With the ability to process data at the source or nearer to the source, edge computing enables to act immediately. For example – a driverless car with the ability to process sensor data within the car to apply brakes, accelerate or maintain a certain speed, a production line machine turning itself off due to parts failure. ·

  • Dependable operations: Most of the data is processed at the source or nearer to the source, that does not need internet connectivity at all times. Edge computing enables equipment and other smart devices to operate normally without disruption even when they’re offline or Internet connectivity is intermittent. This makes it an ideal computing model for businesses that count on the ability to quickly analyze data in remote locations such as ships, airplanes, and rural areas—for instance, detecting equipment failures even when it’s not connected to the cloud.

  • Cost savings: Data that has local relevance is processed at the edge of the network, reducing the amount of data that is sent to the cloud. Reduction of traffic to the data center in the cloud saves data transaction costs (back and forth between sources and cloud) as well as the need for higher computing capacities at a centralized cloud.

  • Secure and compliant: Edge computing helps to address the security and compliance requirements that have prevented some industries from using the cloud. With edge computing, companies can filter out sensitive personally identifiable information and process it locally, sending the non-sensitive information to the cloud for further processing.

  • Interoperability between new and legacy devices: Edge computing converts the communication protocols used by legacy devices into a language that modern smart devices and the cloud can understand, making it easier to connect legacy equipment with modern IoT platforms. As a result, businesses can get started with IoT without investing in expensive new equipment—and immediately capture advanced insights across their operations.

cloud-vs-edge-mainenance-cost

Cloud Computing and Edge Computing

It’s expected that by 2020, up to 65% enterprises will use IoT. This means more complex networks of connected things including multitudes of machines, equipment, and devices that will produce a mindboggling amount of data. Most of the elements in these networks will require a faster processing and real-time response that centralized processing in the cloud simply can’t fulfill.

Edge computing will be able to deliver quick processing requirements and mitigate latency, intermittent connectivity and computational challenges in processing a vast amount of data. Cloud computing is going to stay but it will act more like an off-site location to draw patterns, forecasts, and reporting while handing over immediate processing to edge computing.

In the current landscape of industrial IoT and the pace it’s growing – it’s imminent that edge computing is going to take a lead pushing cloud computing to the second place, in near future.

Edge Computing - A complement to the cloud

Most industry experts are calling edge computing a successor of cloud computing that will have a symbiotic relationship with each other. We foresee the emergence of a strong relationship between cloud and edge computing, in which each will handle data for different computing tasks and data types while complementing each other. While edge computing will serve time-sensitive data for immediate intelligence within or closer to the device itself, the cloud will handle data intended for historical analysis. The next wave of the business transformation will be the Edge computing. The time to prepare for the future starts now, to avoid the risk of getting left behind.

Take a closer look at how Edge computing is being implemented, and specific use cases across industries.

Looking to evaluate your IoT and edge computing requirements? Write to us and let's explore how our expert teams can help.

Topics: Data Engineering & Analytics, Enterprises

Data Visualization Dashboards - A must have for lean enterprises

Posted by Sriram Sitaraman on Feb 14, 2018 12:53:00 PM

When your data assets are accessed by decision-makers in your organization, are they presentable enough to showcase critical insights and drive informed actions? The traditional tools fail to see the high-level picture and are unable to deal with the overwhelming volume of data, and are hence cannot support effective decision making.

Data visualization dashboards provide effective ways to make your data understandable and easy to manipulate in an aesthetical format. It helps enterprises realize rich and interactive data visualizations so as to take business decisions when it matters most.

Why do we need data visualization dashboards?

Data visualization dashboards populate visually stunning patterns for instant clarity into trends, helps draw patterns on multiple parameters, analyze them and derive prescriptive and predictive intelligence. Dashboards using tailored algorithms significantly facilitate streamlining of internal or external processes, and business decisions using real-time data.

Visualizing meaningful patterns through varied sources

Enterprises today, gather an unprecedented amount of data, the source of which could be various business processes, productivity tools, project management tools and various other sources.

Now, the need arises wherein every department and level of your enterprise should be able to make well-informed decisions. To enable this, the data coming from various sources has to be transformed into one common format, and correlated relevantly to make sense for decision makers. A few use cases are as follows:

  • Data coming from multiple sources such as business applications, production management tools, operation management applications, CRM, ERP, etc. presented on a common dashboard showcasing correlations using multiple scenarios
  • View, monitor and analyze key business metrics with interactive charts, graphs and patterns helps in deriving trends, anomalies correlations etc. without any technical assistance
  • Prescriptive actions availing historical data, and past performance across business functions Deriving predictive intelligence using deep learning enabled by past patterns and real-time data
  • Identify statistically significant trends and patterns to enable comprehensive analysis and effective decision making

Using the right tools

A quick scatter plot, time series forecast, and cluster analysis can unfold interesting stories that lead analysts to further questions and statistical testing. While developing a visualization dashboard, it is important to have right set of tools and algorithms that fit best for your business needs.

Enterprises are going for tailored data visualization dashboards that uses custom algorithms built over modern tools such as:

  • Tableau
  • Highcharts
  • PowerBI
  • D3.JS
  • Qlikview
  • Spotfire

It’s essential for large-scale companies and enterprises to have business intelligence dashboards that not only help with insights but also help optimize resources for higher profitability. Perhaps you also need a tailored dashboard if your organization relates to any one of the following:

  • An enterprise producing massive amounts of data coming from sensors, machines, people and processes across functions
  • A large-scale organization using multiple business applications - decision makers have to look at more than one place for insights
  • A growing organization consistently taking efforts to optimize resources for higher profitability
  • An organization not being able to leverage data assets to take quick informed actions at right time
  • A lean organization seeking efficiency and higher ROI

Srijan helped several Fortune 500 companies and large enterprises have transparency, derive intelligence and convert them into action points through tailored data visualization dashboards. Data Science and Analytics offerings from Srijan cover end-to-end services including data integration, master data management, warehousing, data visualization with tailored dashboards, advanced and predictive analytics for performance management. Let's get the conversation started on how Srijan's data science experts can help.

Topics: Data Engineering & Analytics, MarTech

Data, Power, & Decisions: Data Visualization Meetup at Srijan

Posted by Poonam Lata on Apr 5, 2017 6:16:00 PM

Data is one of the most talked about resources these days. It has the power to narrate insightful stories, which can lead to strategically important and competitively advantageous decisions.

Last Saturday, on 25th March, we organized organised a data visualization meetup at . The objective of the meetup was to understand the complexity and the nuances of the stellar work done by individuals and corporations in the space of data visualization.

The meetup had two key speakers,

Avinash, has worked as a business journalist, across a variety of national papers and magazines for last 15 years. In the last couple of years he has learnt to code and delivers his stories with the visualization. He also has a personal blog at datastories.in

Mr. Akhilesh is a statistician, with an experience of 15 years and has been working as a senior technical manager and supports various data initiatives of Open Government Data. He has contributed to various research studies, has conducted various statistical analysis using wide variety of tools and have published articles in Indian Journal.

Mr. Shubhadip is a senior analyst with the Open Government Data and works in the areas of big data management, statistical models, and is involved in various research studies on better governance, socio-economic impact of open data, and data-driven decision-making.

Session 1

Avinash shared with us the key case studies along with his methodologies, challenges, and the guidelines for making effective visualizations. He emphasized on the following principles:

  • Understanding of audience
  • Clear purpose of the visualization
  • Identifying the right metrics
  • Contextualization of the Data & the visualization
  • Identifying assumptions
  • Data cleaning and sanitization
  • Choice of type of visualization
  • Explicitly guiding the audience

The objective of any data visualization is to answer certain key questions. Data is a key ingredient of a visualization and more often than not data preparation is the step that takes the longest(up to 80% of the time). He highlighted the challenges of:

  • Merging the data from two sources where the definitions of the same metrics could differ
  • Data types in a column could be inconsistent
  • NA/missing values
  • Treatment of missing values
  • Invisible characters (line breaks with fields, leading and trailing whitespace, spaces )
  • Different record length etc.

Above mentioned issues need to be explicitly addressed or they pose a serious challenge to the concerned visualization and in turn the insights from the visualization.

Although tools and technologies make it easy for us to visualize the data but the real value-add from the visualization would come only after addressing the above-mentioned challenges.

Visualization often helps us address some key questions and solve interesting problems. For example, Avinash spoke about how their team helped one of the diaper makers to identify the opportunity areas and the right market. They did this by visualizing the publicly available data on the percentage of babies born in private hospitals (Data source: National Health Information Systems).

Again, the power of visualization rests on the assumption that the underlying data is cleaned, sanitized and free of the above-mentioned challenges.

Session 2

The team from NIC talked about the visualization engine on the open data platform. It’s an engine which allows the user to perform the following tasks:

  • Use a new dataset/use an existing dataset from https://data.gov.in/
  • Copy the data from the CSV or JSON URL
  • Select the visualization type
  • Make changes to the data like filtering, adding a field, etc.
  • Create a visualization

The team also demonstrated the working of this visualization engine and showcased how the users can create interesting maps and different visualizations with just a few clicks.

The NIC team has worked with more than 15 open source libraries and frameworks. Some of them include D3.js, C3.js, NVD3 and jVectormap for creating the all India map, Python, Leaflet, Openstreetmap for geolocation.

The NIC team is constantly adding features to this visualization engine and to the open data portal. The features like “Suggest a dataset” is certainly a boon for the data analysts out there.

Open data platform has more than 74K datasets and this visualization engine is certainly an added incentive for the users to experiment with ideas and data on the portal.

The speakers, once again, emphasized on the need to clean the data for visualization. They had a checklist of steps to clean the datasets, which included the following steps:

  • Remove the formulas from the excel sheet if any Unmerge the cells in the excel sheet
  • Keep the header in the first row
  • Remove the blank cells and replace the NA cells with appropriate values
  • Remove the special characters
  • Remove the spaces File name, dataset title and dataset should not come in metadata file, etc.

This is just a glimpse of all the steps that the team at NIC undertakes to make the datasets ready for visualization.

We often read about the tools & techniques to make great visualizations. Great is the visualization, which achieves the purpose of solving a problem or addresses the pre-set questions. To ensure that a visualization is powerful one needs to make certain that the underlying dataset is carrying the right and clean information.

Both the speakers emphasized the need to have clean data sets and also outlined the steps for the same, but these steps were in their own list, on their local machines and discussion environments. We suggested that such steps should be publicly available so that all visualization experiments yield powerful insights and decisions.

Srijan's data visualization team and all meetup attendees gained some valuable insights from our speakers. Working to create short experimental data stories, we have experienced first-hand the challenges the speakers talked about. And now we are better equipped to resolve them and deliver better solutions.

Topics: Community, Data Engineering & Analytics, Event

Data Visualization: Exploring female life expectancy in India

Posted by Poonam Lata on Jan 9, 2017 12:12:00 PM

This past year, we at Srijan, tried our hands at several new tools and technologies. We carried out short exercises so our teams could learn and implement new skills. One of these skills was creating data visualizations using tableau.

Here's a look at a short project we carried out:

Life expectancy is quite an interesting story to pursue and understand. We started with an objective to understand various factors that impact the life expectancy of females in India.

The visualization journey started with brainstorming and identifying variables that are thought to be associated with life expectancy. This was followed by researching information from government sources. Information organized in various excel sheets was then used with tableau public to explore and analyze the patterns in data.

BI tools allow an analyst to quickly plot data in various possible ways with ease. The story depicted through this visualization follows a structured thought process. We created interactive data visualizations using tableau, which succinctly captured the various correlations, making the data easier to process and analyze.

We started off with plotting information points across variables to explore relationships and trends. A large number of factors were shortlisted to be analysed, but due to paucity of data, many had to be dropped. The factors depicted in this study are therefore not necessarily exhaustive, but are the ones where we either had the data available or which were found to have significant relationships(correlations).

The Data Visualization

Here's a look at our findings, with interactive data visualizations using tableau:

 

Our Findings

Life Expectancy in India

To set the context, we chose the first graph to depict life expectancy figures for males and females over the years. Life expectancy is growing at a steady rate for both males and females in India, due to improved access to healthcare facilities, education, and sanitation. Increase in per capita income has improved the average Indian's access to nutritious food and other needed resources .

Life Expectancy of Females across States

The second visualization shows the variation in life expectancy patterns of females in the country. Maximum life expectancy is in Kerala and the least in Assam. Southern, western, and certain pockets of north India have better life expectancy than rest of the country.

Factors Influencing Female Life Expectancy

The data exploration exercise threw up some significant correlations: 

Sanitation:  

Topping the list is non-availability of latrine facility in the household premises. Although the correlation is not extremely high, but one could characterize it as a moderately strong relationship. Several other studies in the past also cite this trend. A campaign run by the corporate house Unilever has shown similar results in Thesgora, a village in Madhya Pradesh, where the simple practice of washing hands has dramatically reduced the mortality rate for children below 5 years of age.This village is infamous for the highest rates of diarrhoea in the country.

Literacy Rate: 

The second strongest association is the correlation with literacy rates. A more interesting analysis could also have been based on higher levels of education and not just literacy. Education brings awareness, and also favours the adoption of better practices like eating nutritious food, regular visits to hospitals, institutional deliveries etc.

Quick fact:

A recent analysis by the famous statistician and writer Nate Silver revealed that it was education and not income which predicted who would vote for Trump.

Fertility Rate: 

The third strongest factor impacting the life expectancy was found to be crude birth rate, i.e. the number of children a woman is likely to give birth to. The smaller the family size, higher the life expectancy across states, which is in turn strongly related to education levels. Higher the education level, lower the number of children born. This trend was observed in both the urban and rural datasets.

Per Capita Income: 

The last significant variable was found to be per capita income, which was analyzed using net state domestic product across states. Decent income levels ensure access to resources, which is very important to live a quality and healthy life. However, what is more interesting is, in the list of significant relationships, income levels took the 4th(last) place whereas better sanitation facilities and education levels stood stronger than income levels.

We would like to add that some obvious correlations were not found be statistically significant. For example, correlation between the availability of required hospital staff and life expectancy. This could be due to the lack of quality data.

Please let us know your comments and feedback on this exercise. We would love to hear about any other ways this data could be analyzed, or other interesting insights that could emerge with data visualization using tableau.

Topics: Data Engineering & Analytics, Architecture

Improving Content Strategy through Drupal and Adobe Analytics integration

Posted by Nilanjana on Jun 23, 2016 3:08:00 PM

Adobe Analytics or SiteCatalyst (called Omniture before being acquired by Adobe) is a leading industry solution that collects website statistics and applies real-time analytics and detailed segmentation across all of your marketing channels. It is used for the measurement, collection and analysis of web traffic data. The data collated can be used to discover high-value audiences. It can provide customer intelligence for your business.

Adobe Analytics can be integrated with Drupal 8. Once the integration capability is built between Adobe Analytics and Drupal 8, the client has the Adobe Analytics website to refer to for the traffic data. The client can log in to this site to view a dashboard with website statistics, and gain insights into customer behavior. This data can then be used to optimize the website to improve user experience and conversions.

Srijan has integrated Adobe Analytics with Drupal  successfully many times for its clients, allowing them to manage their content better.

Key Features of Adobe Analytics

The tool allows the capture of real time data that allows understanding of current and relevant business scenarios.Source: Democratizing Insights with Analysis Workspace

Apart from website traffic statistics like number of hits on each page, number of unique visitors to each page, total hits, exit links, click maps, number of page not found instances, etc. the tool also shows the devices used to visit the website, along with details such as screen size, screen height, screen width and audio/video support on the device. It also gives details of referrals, next URL visited, etc.

Adobe Analytics provides drill down facilities to get precise and comprehensive views on the customers to understand most valuable customer segments or segments that can give business opportunities. The business can categorize customers into personas using the intelligence that the tool provides in terms of product preference, geo-demographics, and behavioral attributes. 

It provides mobile analytics intelligence with which you can understand the mobile app user base and review performance of mobile marketing campaigns launched. It provides rules based decision making tools. Visitor statistics include the number of users by geography, users by language and users by time zone. The client can generate reports for web traffic, based on time parameters.

This data is represented visually as well. Data for multiple websites can be viewed on one Adobe Analytics/SiteCatalyst website. On the home page, the dashboard can be set up to show important/critical indicators from the different websites configured under that account.

Custom data specific to a particular use-case can be captured—this is of great help for enterprises. For example, if an organization wants to capture data on its employees visiting the site, variables to capture employee data such as Employee ID can be included if this data is available in the network. Also, if there are videos on the site, events like 'play video' can be recorded. All this data is valuable in helping clients gain insights into customer behavior and manage their content better.

Integration Challenges

Srijan did face some challenges while integrating Drupal with Adobe Analytics. Since Drupal 8 has a completely different architecture from Drupal 7 and since there was no available module for Drupal * our teams had to work on porting the module to Drupal 8. The implementation had to be modified to ensure that there is no change in the functional performance of Adobe Analytics.

Srijan also had to be careful while handling data. With enterprises, a lot of data is available—both private and public data, which could include a lot of sensitive data that cannot be stored. The integration module had to effectively manage data privacy wherever applicable as per the enterprise confidentiality policy

Benefits

Clients who have opted for integration of Adobe Analytics with Drupal have benefited from the improved website analytics in various ways:

Improved content strategy and content management - The integration helps clients understand which pages are visited more often and which not. Downloads and payment mechanisms can be tracked. The client can then devise a content strategy based on users' behavior on the website, and provide them with the right information at the right time. This gets more engagement and conversions as per online goals.

Improved customer retention - The tool provides data related to user navigation. It gives details about access mechanisms. The user journey and user map can be understood. This knowledge helps in refining the flow of data on the website, and allows the client to deliver content based on user needs. Better content strategy and improved information flow translate into better customer retention.

Better digital marketing strategy - The statistics provided by the tool helps a client understand how web traffic has changed since the launch of a promotion campaign, and to determine the success of their digital marketing campaigns.

By allowing the client to make better decisions around content and the flow of information on their website, the solution helps them achieve their marketing objectives.

Topics: Drupal, Data Engineering & Analytics, Framework and Libraries

Advantages of using Highcharts API

Posted by Nilanjana on Jul 22, 2013 4:12:00 PM

Highcharts is a pure JavaScript library which offers an easy way of adding an interactive charts to your website or web application. Highcharts at present supports line, area, scatter, areaspline, column, bar, spline, pie, angular gauges, areasplinerange, arearange, columnrange, error bars, box plot, funnel, bubble, waterfall and polar chart types.

+ points of using highcharts:

  • Compatibility - It is compatible with all modern browsers including the iPhone/iPad and Internet Explorer from version 6.
  • Free for non-commercial - It is free for a personal website, a school site or a non-profit organisation, there is no need of the author's permission.
  • Open - This is the main feature of Highcharts that under any of the licenses, free or not, you are allowed to download the source code and make your own edits. Also permits for personal modifications and a ample flexibility.
  • Pure JavaScript - Highcharts is based on innate browser technologies. It does not depend on any client side plugins like Java or Flash. There is also no need to install anything on your server. No PHP or ASP.NET. It only requires two JS files to run: The highcharts.js core and each of the jQuery, MooTools or Prototype framework.
  • Numerous Chart Types - Highcharts supports line, column, area, areaspline, spline, bar, scatter, pie, angular gauges, arearange, column range, areasplinerange and polar chart types.
  • Simple Configuration Syntax - No special programming skills are required for its configuration. The choices are given in a JavaScript object notation structure, which is usually a set of keys and values connected by colons, separated by commas and grouped by curly brackets.
  • Dynamic - Even after creation of chart, you can add, remove and modify series and points or even axes whenever you want with the help of full API.
  • Multiple Axes - In case, you want to compare variables that are not on the same scale, Highcharts lets you assign an y axis for each series - or an x axis if you want to compare data sets of different categories. Each axis can be placed to the right or left, top or bottom of the chart. All options can be set individually, including reversing, styling and position.
  • Tooltip Labels - Highcharts can show a tooltip text with information on each point and series. The tooltip follows as the user moves the mouse over the graph, and efforts have been taken to make it stick to the nearest point and making it easy to read a point that is below another point.
  • Date-time Axis - Almost 75% of all charts with an X and Y axis have a date-time X axis. Highchart is really smart about time values. With milliseconds axis units, Highcharts decides where to place the ticks in order that they always mark the start of the month or the week, midnight and midday, the full hour etc.
  • Export and print - After enabling the exporting module, users can export the chart to PDF, SVG, PNG or JPG format or can be printed directly from the web page.
  • Zooming - Zooming in, can provide you with a view of the data more closely. It can be done in the X or Y dimension, or both.
  • External Data Loading - Highcharts takes the data in a JavaScript array, which can further be defined in the local configuration object, in a separate file or even on a different site. Besides this, the data can be handled over to Highcharts in any form, and a callback function used to parse the data into an array.
  • Angular gauges - Perfect for dashboards. Provides speedometer-like charts, which is easy to read at a quick glance.
  • Inverted Chart or Reversed Axis - In case, you need to flip over your chart and make the X axis appear vertical, like for instance in a bar chart. Reversing the axis, with the highest values appearing closest to origo, is also supported.
  • Text Rotation for Labels - All text labels, including axis labels, data labels for the point and axis titles, can be rotated in any angle.
excelonz (1)

Implementation of Highcharts at Srijan:

  • Highcharts in Srijan is used with Drupal 7. It has been used for making students examination portal, it displays students progress report by which it becomes easy to find the current status. http://www.highcharts.com/demo/line-basic is used here.

Topics: API Management, Data Engineering & Analytics

Improve your database backup policy by using automated scripts

Posted by Nilanjana on May 17, 2011 2:58:00 PM

If you manage hosting infrastructure for Drupal websites as we do, you would already be having a process for taking back-ups of the databases of various websites at regular intervals - daily, weekly or monthly. The size of databases grows over a period of time, thus making it neccessary to know which databases are active and need to be backed up. Not caring for these may encourage backing up of un-wanted data leading to tremendous wastage of expensive bandwidth and storage, which does delay in development life-cycles.

 

As an exmaple in our case at Srijan - all backups were being pushed to late-at-night timings to ensure the performance of our hosted websites are not affected - thus leading to delays in Staging/QA server deployments by the development team. The key issues were:

  • dead databases were getting backed-up
  • log related tables were getting backed-up (which in some cases formed 80%-90% of database size)

This is where my work came in to improve the backup process at Srijan; the scripts and approach defined here would be useful for you if you face similar challenges at your organization.

Which Drupal tables to exclude?


The list of tables to be avoided in any backup includes: watchdog, accesslog, search_index, search_total, search_dataset, search_node_links, sessions and all cache tables. These tables SHOULD NOT form part of these database backups (even as the structure of these tables SHOULD be backed up).

The script given below specifies such tables list for which data is not dumped.

STRUCTURE_ONLY="/^(prefix1_|prefix2_)?(watchdog|search_index|search_total|search_dataset|
search_node_links|sessions|cache(_.+)?)$/"

 

For sites running Apache SOLR, you DO NOT need to take back up of the search_index and its secondary tableseither.

Avoid taking a dump of "information_schema" database. This database does not need a backup. It is not dump using MySqldump by default, but if mentioned, it does take a back up. So avoid this.

This takes a dump of the complete database, with only the structure the specified tables. It does takes a backup of data for all tables except the ones specified here.

Depending upon the modules installed in your Drupal installation, such a to-be-avoided list of tables may grow. These can then be added to the above script separated by a single pipe operator (|).

Skipping entire databases


Similarly, databases can be skipped too. If you want to skip certain databases during this process mention their name too. In the script below, the database "db_1" would be skipped from the backup.

DBS_SKIP="/^(prefix1_|prefix2_)?(information_schema|db_1)$/"

In case, if the list of skipped databases is too many, add those databases name which needs to be backed up, in the skip list and change the following line of code:

DBS_SKIP="/^(prefix1_|prefix2_)?(db_1|db_2|db_3)$/"
SKIP_DB=`echo "$DB" | gawk "$DBS_SKIP"`
# original line tells to skip, but the current change says only these to consider
# if [ $SKIP_DB ]
if [ ! $SKIP_DB ]
  then continue
fi

Success story


You may like to us this enhanced automated script to take the database backup, just as we do at Srijan. The script takes dump of each database, makes bz2 file and removes the SQL file, thus also reducing the backup size on the server.

The use of these scripts has helped us at Srijan reduce our 10GB daily backup data to only 180MB (bz2) - a reduction of over 55 times.

Topics: Data Engineering & Analytics

Discussion

Write to us

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms Of Service apply. By submitting this form, you agree to our Privacy Policy.

See how our uniquely collaborative work style, can help you redesign your business.

Contact us