BIG DATA JUNCTION – Converting Raw Data into Meaningful Insights remains to be a Challenge for Railway industry

Railways are harvesting increasing volumes of data on the performance of their assets. But converting raw data into meaningful insights remains a challenge for many industry organizations. It also takes Indian Railway to a new level it deserves!

MUNICH: A conference on “Rise of IoT and Big Data in Railway Industry” was held at a recently concluded Rail conference in Munich, sponsored by Rotaia Media, suppliers, tech companies, train operators and infrastructure managers described how they are harnessing the power of data through advanced asset management systems.

The answer to the Great Question of Life, the Universe and Everything is 42. After seven-and-a-half million years of calculations, this was the conclusion delivered by Deep Thought, a supercomputer in Douglas Adams’ novel The Hitchhiker’s Guide to the Galaxy. When Deep Thought’s hyper-intelligent pan-dimensional creators contest her findings, she replies, “I think the problem, to be quite honest with you, is that you’ve never actually known what the question was.”

This visionary tale was written long before the advent of Big Data, but it carries a message pertinent to the modern railway: Data in isolation has no meaning, and the real value lies in how it is used. Infrastructure managers and train operators harvest increasing volumes of data about their assets, and this is the starting point for potentially transformative insights that could have profound insights for the way railways operate. But in a world where assets are becoming more and more digitally interconnected, extracting meaningful insight from the terabytes of data streaming in from billions of data points on trains, track and signalling systems is a hugely challenging task. Digitalizing asset management and digitalizing assets must therefore move forward in unison if the potential of data is to be unlocked.

“If rail wants to survive competition with driverless cars, we need to fundamentally change the way we operate, and this includes moving to CBM,” Gerhard Kress, Director of Mobility Data Services for Siemens, told conference delegates. “It’s essential to focus on creating value and not just harvesting huge amounts of data, because data alone is only cost. The real benefit comes when you can turn it into insights. Everything we do around data has no value unless it results in tangible benefits for the customer.”

According to Matthew Miller, Global Transportation Industry Principal for OSIsoft, market hype around the potential for Big Data is being driven by four megatrends:

  • Pervasive, cheap, and small sensors.
  • A decline computing and data storage costs.
  • New abilities to process and analyze data.
  • Ubiquitous connectivity.

Miller says convergence of these key enablers is driving the adoption of IoT (Internet of Things) in rail, and with it demand for analytic applications to turn data into operational intelligence for better decision-making. However, he warns that organizations need to be able to cope with variation in data quality. “Real data isn’t perfect, and data quality is a killer issue for every project,” he says. “Communications failures, spiking sensors, calibration issues and out-of-sequence data can all cause massive problems, and you need to consider how you will deal with these issues.”

Another challenge is the increasing volume of data being generated. As IoT expands, the quantity of data from connected devices looks set to rise dramatically. Even with the advent of 5G, centralized processing of such massive volumes of data will place a huge strain on telecommunications networks. Edge computing—moving the processing of data closer to the IoT devices that created it—is currently a focus for IoT firms around the globe.

“Analytics is a challenge,” says Michael Thiel, CEO of Frauscher Sensor Technology. “With artificial intelligence (AI), a lot will be possible, but where should we place the analytics for that? A sensor network monitoring 40km of track can generate a terabyte of data in an hour, so we need to reduce it to the point where it can be handled easily, and that’s what we’re working on now, shifting intelligence closer to the track.”


Condition-based maintenance (CBM) is a key business driver for digitalization in the rail sector. According to The Rail Sector’s Changing Maintenance Game, a report published by McKinsey & Co. in December 2017, CBM can reduce rolling stock manual diagnostics by at least 60%, and could lead to an overall reduction of at least 10-15% in maintenance costs—equivalent to an annual saving of up to $4.7 billion for train operators, $2.35 billion for rolling stock OEMs, and $4.7 billion for third parties.

According to Perpetuum, which develops asset management solutions for rolling stock, 9-12% of total vehicle operating costs are spent on truck maintenance, and lifecycles can be extended significantly with the aid of remote condition monitoring (RCM). “We are burning millions of [dollars] replacing assets that are not life-expired,” explains Perpetuum Global Sales Director Robert Mulder. “If the condition of every single bogie was known, we would be able to extend overhaul intervals by 25-75%. With vibration, the start of degradation is already picked up 6-7 months before the asset is replaced.”

Perpetuum has developed a “fit-and-forget solution,” which is self-powered by vibration and has a “maintenance-free” design life of 20 years. The technology enables remote monitoring of wheels, bearings, brakes, axle boxes, traction motors, and track quality in real time. Bespoke algorithms in the cloud provide the operator with real-time status updates.

“You have to do more than simply implement RCM,” says Mulder. “To gain the potential benefits, processes need to be in place. No technology project is going to generate a return on its own.”

For OEMs looking to build their aftermarket business, asset management has become a key focus in recent years, and suppliers are helping their customers to bridge the gap between raw data and meaningful insights into asset status and performance. Siemens uses its Railigent platform to remotely monitor gearboxes, bearings and traction motors on Velaro high-speed trains, as well 5,000 doors on the fleet of Desiro City EMUs used on London’s Thameslink network. Siemens has also used machine learning to predict bearing failure on high-speed trains and high-end data analytics to predict point machine failure without the need for additional sensors.

Siemens is working with third-party suppliers to integrate their applications into the Railigent platform. Voith recently signed an agreement with Siemens to develop a monitoring solution for its Scharfenberg coupling, and Siemens is also working with SKF to integrate its Insight Rail CBM solution with Railigent to optimize bearing maintenance.

According to McKinsey, railway companies should be looking to initiate three short-term steps to begin preparing for CBM and predictive maintenance:

  1. Define a strategically appropriate target state and structure data partnerships accordingly, assessing current preparedness for CBMsetting goals and establishing where data ownership lies.
  2. Create a physical space to bundle engineering and analytics know-how, pulling together relevant experts from the siloed functions of procurement, fleet management and maintenance planning.
  3. Commit to “value-chain-wide” digitalization, upgrading the entire maintenance process with digital capabilities to optimize the return on investment.

McKinsey also warns that companies should not underestimate the organizational challenges of cross-department or cross-company collaboration, culture clashes between data analysts and railway engineers, and the transformation of maintenance processes.

A Container Train leaves for Bangladesh from Majerhat station of Kolkata, India.

Dr. Burkhard Schulte-Werning, head of the maintenance business line for German Rail (DB), told delegates that optimization of conventional rolling stock maintenance practices is reaching its limits, and DB’s vision is to use continuous data transmission to link its assets to its InfraView open data platform as basis for CBM.

In a step toward this goal, DB Cargo will equip 2,000 locomotives of 16 different types with telematics systems by 2020. DB Cargo’s TechLok system collects data from the locomotive, with up to 7,000 different diagnosis codes or status messages for each type. Data is transmitted in near real-time via a GSM connection for visualization and analytics, which is used to develop use cases and transform data into usable information and results. DB Cargo is using GE’s RailConnect 360 asset performance software as well as Siemens’ Railigent platform and MindSphere IoT system.

DB is also using scheduled trains for continuous track monitoring (CTM), with around 2,500 km of track now under CTM supervision using an ICE 2 high-speed train, a class 189 electric locomotive, and a class 424 EMU on the Hanover S-Bahn network. A $1.8 million research project has been launched as part of the Shift2Rail Joint Technology Initiative, which aims to optimize rail vehicle maintenance processes through the integration of predictive data analysis algorithms and online optimization tools within a CBM strategy.

The maintenance element of the Smart Maintenance and the Rail Traveler Experience (SMaRTE)project will focus on the use of information and modelling to reduce lifecycle costs and improve vehicle availability and performance through CBM.

Objectives include:

  • Review and benchmarking of current CBM practices in other sectors, including aviation.
  • Development and integration of reliability ontology.
  • Development and integration of predictive tools for current and future condition of rolling stock components.
  • Development of optimization tools to support decision making.
  • Application of a CBM model to two real-world case studies on rolling stock components.

The 28-month project is due to conclude in December 2019.

In 2014, Belgian rail freight operator Lineas began experimenting with asset data with the aim of generating a competitive advantage. One of its most successful initiatives was optimizing maintenance planning for its fleet of 110 class 77 diesel locomotives, which are assigned to a variety of duties from shunting to longer-distance main line operations. The IoT solution provides data every 10 seconds, and Lineas developed a dashboard to provide a complete overview of the fleet’s status. “Small and medium-sized companies were brought in to carry out analytics on the fleet and explain the problems to our bosses,” says Director of Assets and Network operations Jeroen Spruyt. “We constructed a Know Your Locomotive view, which was used to build a predictive model of failures. This will lead to a reduction in capex of $1.2 million on a total annual fleet running cost of $29.4 million and will also stretch the life of the locomotives.

Lineas has used Big Data to optimize maintenance planning on its fleet of class 77 diesel locomotives. Quintus Vosman photo.

“We challenge people in our organization to come up with a business case for small projects. For Lineas, this is no longer a hobby. In 2014, we were playing around; today we’re building an IoT backbone and equipping our fleet with IoT solutions. To solve the questions of the future we need to start building models today.”

Netherlands Railway (NS) has established a competence center for advanced analytics that works closely with analytics teams in all other departments of the company to develop dashboards, data flows for use in business processes, and products that will benefit end users. Through this cross-department collaboration, NS has developed a model for monitoring air leakage from train brake and door systems, which reduces the need for an operator to walk alongside the train.

Another innovation is the Zitplaatszoeker seat finder app currently being tested on the Arnhem – Nijmegen – Den Bosch line. The app uses a color-coding system to display levels of seat occupancy throughout the train, helping customers to find a seat and ensuring a more even distribution of passengers. The seat searcher app uses data generated by weight sensors on the track that were originally installed to weigh freight trains as part of a plan for weight-based track access charges.

Netherlands Railways Zitplaatszoeker seat finder app.

Austrian Federal Railways (ÖBB) has harnessed Big Data and the company’s asset management system to support the development of its Target 2025+ long-term infrastructure strategy. ÖBB Infrastructure Senior Asset Manager Richard Mair told delegates that clearly defining desired outcomes at the beginning of the process was critical to successful interrogation of the data.

“The objectives [of the strategy] were so big that they needed to be broken down,” he explained. “We had a lot of ambiguous data, so we needed an Asset Management System (AMS) as a single point of truth providing us with a good description of the base and supporting a vision for the future. The AMS allows us to answer the questions of asset owner, and there must be a question, otherwise you cannot provide an answer. Big Data is not a magic wand: Before you start to search for a solution, you need to know what outcomes you want.”

For ÖBB, the AMS was the enabler between raw data and business insight. “Information is one of our most important assets today and it will be even more important in the future,” Mair said. “However, data doesn’t help you at all with information. You need IT systems to aggregate and present data in a meaningful way.”

In February 2017, Dutch infrastructure manager Prorail established DataLab, which harnesses Big Data to develop solutions to issues affecting the performance of the network, including switch and crossing failures, track defects, signalling and train detection faults, trespassers, and stray animals.

Predictive models are developed in four-week scrum cycles by a team that includes experts from relevant areas of the business with input from academic institutions, contractors, engineering firms and tech startups.

One of the first DataLab projects involved developing a predictive model for trespassing, a common cause of operational disruption on the Dutch network. This looked at key influencers—environment, hotspots, weather, school holidays, and ease of access to railway property—and used machine learning to develop a predictive dashboard that could predict the risk of trespass at key locations. Police have been using the dashboard since August 2017, and trespassing incidents have fallen by 50-100% at the locations covered by the system, with a doubling of the arrest rate.

The DataLab has also developed a predictive model for switch failures that is capable of detecting up to 20% of faults before they occur. Last year, Prorail awarded machine-to-machine technology company Dual Inventive a contract to supply 1,500 wireless sensors to remotely monitor the health of points and crossings, and a further 500 sensors to monitor other infrastructure systems such as level crossings.

Belgian infrastructure manager Infrabel has also embraced big data and built an analytics platform to find answers to key questions about its assets. Infrabel has developed a model to evaluate how frequently trains approach red signals in a bid to reduce Signal Passed at Danger (SPAD) incidents. The resulting model integrates data on signal aspects, train movement, infrastructure and track occupancy to identify red signal hotspots and build a SPAD risk index model. Big data has also been used to support a program to reduce the number of switches and crossings on the network, which is intended to improve reliability and cut maintenance costs.

French National Railways (SNCF) has begun fleet rollout of an IoT-based solution for monitoring the status of passenger train doors in France. The Avisé system has been developed as part of the Digital SNCF program to automatically notify traincrew if the doors of Corail coaches are not properly secured when the train is in motion.

Two sensors remotely monitor the status of each door, illuminating a lamp in a vestibule cabinet if the door is properly secured. The status of the lamp is transmitted via a communications module to SNCF’s IoT platform, which compares the position of all doors of the train and issues a smartphone alert to the train crew if it detects an anomaly. This enables unsecured doors to be quickly identified.

SNCF plans to install Avisé on 350 Corail vehicles, which are used on Intercités services across the country.

Swiss Federal Railways (SBB) has developed the Swiss Track Analysis & Maintenance Planning (swissTAMP) tool, which integrates data from various sources to provide information on asset condition, which in turn supports decision-making on maintenance measures. This means planning for infrastructure maintenance and renewals can be carried out in a traceable and needs-driven manner. The tool enables visualiZation and analysis of component and system data and provides site-specific prognoses of future maintenance requirements. SBB says swissTAMP is playing a key role in the transition toward preventative maintenance on the Swiss rail network.

All these initiatives demonstrate the central role of asset management systems in turning data into insight. Organizational factors are also key: Success depends on a company’s ability to fuse traditional railway disciplines with data science and adapt to new ways of working. Most important, Big Data can only provide an answer if there is a question, and a clear focus on the desired outcome is essential if the transformational potential of Big Data is to be realized.

The Big Data Junction – Indian Railways

he very first train journey in India took place in the year 1853, when a fourteen carriage long train drawn by three locomotives covered 21 miles between Bori Bunder (Bombay) and Thane. We have come a long way since then, and the huge strides that the Indian Railways has made over the last 160 years make me proud to call it as our own. The second largest rail web in the world (115,000 km of track and 19,600-odd trains), Indian Railways has employed over one million people and carries more than 23 million people daily.

But these stats are slowly losing their sheen. The rise of low fare airlines as well as improvement of roadways and preference for buses for the smaller routes has made the competition intense. IR has also come under fire repeatedly for its e-Ticketing system. The IRCTC website is one of the most frequented websites in the country, and yet the navigation is far from streamlined. The server errors and the slow speed are all indicators of a website still stuck in the early 2000s. And the lesser said about the Tatkal ticket booking system, the better. The process of procuring a Tatkal ticket is proving to be one of the most complicated and annoying tasks ever. The server gives up on many systems just seconds before 10 am, and comes back to life only when all the seats have gone.

The question is what can be done to arrest the slide downward, and get railways some of its sheen back? Big data can be one of the ways to answer it.

Thinking from the customers’ point of view, the most obvious concern is the ease (or the lack of it) of booking tickets. With reliable big data tools, it is possible for the system to handle more people logging on the website. These people should also get a hassle-free experience of booking their tickets. Measures to ensure this include analyzing the most frequented routes by a passenger and providing him/her the details for the same in a matter of 10 seconds or less, and getting an E-wallet system for all of the account holders. Data analytics will munch historical transactions for the specified customer and pre-empt the train and seat nos. chosen by him/her, thus saving loads of time for people who tend to travel a lot and usually prefer same trains/coaches.

Tatkal ticketing mechanism needs to be tweaked with in terms of technology used. With thousands of people logging in on the website at the same time, it is imperative that the system bears such shocks without crashing. The in-memory capacity for such cases needs to be increased, so that transactions don’t falter during the critical Tatkal hours.

The good news is, Indian Railways have been working on this for some time now and have taken steps in the right direction by employing CRIS (Centre for Railway Information Systems) for revamping the IRCTC website. The CRIS based their new technology on Pivotal GemFire, a distributed in-memory database which is part of Pivotal Big Data Suite. The resulting system was tested for a month and then officially launched in July 2014. The system has improved the stats significantly, with the load limit rising to 10000 from 2000 per minute. Time taken to book a ticket has come down considerably and the user authorization is done from in-memory data.

Yet, there is a lot of scope for improvement still. Tatkal tickets haven’t been any easier to get, and people still struggle with the navigation on the IRCTC website. The onus is on the authorities to not to let this drive fade away, and get Indian Railways back to the heights that it deserves. Here’s hoping that this time when we hop on, the next stop would be Big Data Junction.

This large volume of data, often referred to as Big Data, generally refers to data sets that are so voluminous and complex that traditional data-processing application software is inadequate to deal with them, resulting in the need to use advanced data analytic tools. This is illustrated in Figure 1, which compares the traditional data analysis approach to data handling with the Big Data approach.

Figure 1: Traditional Data Analysis and Big Data.

This use of Big Data in the railroad industry, to include freight and passenger rail, cuts across traditional departmental lines, with applications in Engineering (Track and Structures), Equipment (Rolling Stock) and Transportation (Operations). These applications have been highlighted at the University of Delaware’s annual “Big Data in Railroad Maintenance Planning” conference, where railroad users, data science professionals, consultants, suppliers and academia come together to examine new and emerging uses of data science analytics in the railroad industry. The 2017 conference looked at Engineering (Track) and Rolling Stock applications on passenger and freight railroads, in the U.S. and worldwide.

Figure 2: Daily Transactional Volume reported by Railinc.

To illustrate the scope of this data, Figure 2 shows the daily transactional volume reported by Railinc at the 2017 Big Data conference. Railinc currently houses nearly 100 Terabytes of data and accommodates 2,500 business customers and 65,000 users. A significant portion of this data is railcar (rolling stock) data. When dealing with this large volume of data, it is necessary first to provide access to the data on multiple levels, to include the ability to “drill down” to an individual railcar.

However, access to data, no matter how complex and sophisticated, is not enough; railway managers want “information,” particularly “business information,” which allows railways to make intelligent decisions based on not just the data, but the information derived from this data. Figure 3 illustrates the process of defining the problem and the available data, developing the necessary tools (models), and then deploying these models through interactive data integration and “learning.” This is the “decision support/intelligent software” portion of data analytics illustrated in Figure 1.

Figure 3: Application of Data Analytics.

Figure 4 presents the process in a slightly different perspective, moving from data acquisition and management to machine learning and analytics, to obtaining real business outcomes that improve operations and provide real value. This increased value can encompass improved identification of “bad actor” railcars, improved equipment failure analysis and associated improved preventive maintenance.

Figure 4: Road Map for Advanced Analytics.

Another Big Data conference presentation illustrated this business benefit, specifically the business benefits of improved locomotive shop productivity associated with the application of data analytics in the locomotive shop environment as shown in Figure 5.

Figure 5: Application of Data Analytics to Improve productivity in Locomotive Maintenance Facility.

In the area of passenger equipment monitoring and maintenance, integrated planning and management systems are emerging that monitor condition and operational data from an entire fleet of equipment and provides specific actionable information to managers and maintenance personnel at multiple levels, as illustrated in Figure 6. This Figure illustrates how analysis of large volumes of data (Big Data Analysis or data analytics) generates useable information and specific actions at four different manager levels. Thus, the Maintenance Planner/Help Desk (second level in Figure 6) analyzes data from train events, rolling stock faults, mileage, inspection data and other individual car and train data and generates preventive maintenance information. An example of this is identification of a car that needs (or will soon need) maintenance This in turn can lead to a specific action such as generation of a work order for this car.

Figure 6: Big Data Derived “Actions” at multiple management levels.

Again, as was noted in the May article, the use of Big Data or Data Analytics has only scratched the surface of the data now available. As railways learn how to more effectively “mine” their data, the data can be converted into information, actionable insights, and specific “actions” to help optimize maintenance management for not only rolling stock but also infrastructure across the entire spectrum of railway operations. The University of Delaware expects even more insightful information to be available in its 2018 Big Data in Railroad Maintenance Planning conference, December 13-14, 2018 at the University of Delaware’s Newark, Delaware campus.