Integrate SAP HANA and Hadoop for effective Big Data landscape
on 18th April, 2017
Technology has the ability to shape our world. Big Data is one of the most important technology trends that will impact our world in the coming time. By 2020 the total amount of data stored is supposed to be 50 times larger than today.
Big Data holistically covers all data, independent of whether the source is a machine sensor or social media, whereas SAP HANA is the SAP strategic platform for unifying and combining all this data. It is ideal for central data management for all applications because it is open and capable of handling not only transactional but also analytics workloads, all on one platform. Moreover, the integration capabilities in SAP HANA make it possible to combine it with other technologies (such as Hadoop and members of its family) to obtain the most suitable and effective Big Data landscape.
Power of Hadoop integrated with SAP HANA:
SAP announced the introduction of SAP HANA Vora recently. SAP HANA Vora allows customers to easily combine business data from SAP HANA with data from industrial sensors, telephone networks and other data sources that have been saved in Apache Spark. This is an important new step for SAP in the Big Data market. SAP has more than 300 different applications in the Fiori landscape, with many belonging to the category of analytical apps.
Lumira SAP is SAP’s self-service tool for data analysis and visualization. Using SAP Lumira anyone can conduct advanced analytics to gain new insights without needing IT. Every user can produce gorgeous data visualizations on their screen in an instant. Datavard’s tool called Glue functions as a middleware. It is built in ABAP, but can reach into Hadoop to move the data back and forth between SAP and Hadoop. Glue seamlessly integrates the Big Data world with SAP technology, and is the only solution available that allows users to access Hadoop through ABAP and SAP GUI.
SAP HANA + Hadoop = Instant access + Infinite scale
ETL – Extract Transform Load | OLAP – Online Analytical Processing | OLTP – Online Transaction Processing
As we have understood so far, Hadoop can store a huge amount of data. It is well suited for storing unstructured data, is good for manipulating very large files and is tolerant to hardware and software failures. But the main challenge with Hadoop is getting information out of this huge data in real time. So with the help of SAP HANA; Hadoop Integration we can also combine both structured and unstructured data, which can then be transferred to SAP HANA via a Hadoop / HANA Connector.
Data Services: Simple GUI build and run ETL process
SAP HANA Platform for managing Big Data
We all know that Big Data is a concept and HANA is software. We cannot compare the two. Big data platform performs a more complex analysis. Even HANA uses Hadoop as a big data platform when it needs to process multiple types of data.
Simplification and Security:
Using SAP HANA as the unifying platform for all your data also simplifies system administration and software lifecycle management, thus helping to reduce the total cost of ownership.
HANA is a relational MPP (massive parallel processing) database that relies very heavily on in-memory data storage for fast writes and retrieval. It is ACID compliant, follows ANSI SQL standards, and has strict hardware specifications. Hadoop is an entire suite of open source distributed computing tools centered on a file system (HDFS) and an API (Hadoop map-reduce). It is designed to work on commodity hardware of almost any specification.
HANA has a capability of connecting with Hadoop using Smart Data Access, where it can pull data from Hadoop, merge it with org. data and can give you meaningful insights.
You can use Hadoop to store and probably analyze data, which comes outside the organization, mostly unstructured from customers in the form of comments, tweets, likes, etc from the social media, from the website, etc.
Lastly, I would conclude that HANA’s main job is to release the bottleneck we faced with traditional databases where data is in secondary memory and the processing is done in primary memory. In HANA, all data is kept in primary memory while secondary memory acts merely as a Persistent Layer, which is nothing but keeps a backup of data. All this huge Primary memory makes HANA insanely fast and costly whereas Hadoop doesn’t do any of this.
People often get confused and relate the two terms as both the platforms deal with Big Data.
However, neither of the technologies replaces the other.