E-MapReduce

EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.

Get it FreeEMR SolutionContact Sales

Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS).


Tab #2 content goes here!

Donec pulvinar neque sed semper lacinia. Curabitur lacinia ullamcorper nibh; quis imperdiet velit eleifend ac. Donec blandit mauris eget aliquet lacinia! Donec pulvinar massa interdum risus ornare mollis. In hac habitasse platea dictumst. Ut euismod tempus hendrerit. Morbi ut adipiscing nisi. Etiam rutrum sodales gravida! Aliquam tellus orci, iaculis vel.

Tab #3 content goes here!

Donec pulvinar neque sed semper lacinia. Curabitur lacinia ullamcorper nibh; quis imperdiet velit eleifend ac. Donec blandit mauris eget aliquet lacinia! Donec pulvinar massa interdum ri.

Tab #4 content goes here!

Donec pulvinar neque sed semper lacinia. Curabitur lacinia ullamcorper nibh; quis imperdiet velit eleifend ac. Donec blandit mauris eget aliquet lacinia! Donec pulvinar massa interdum risus ornare mollis. In hac habitasse platea dictumst. Ut euismod tempus hendrerit. Morbi ut adipiscing nisi. Etiam rutrum sodales gravida! Aliquam tellus orci, iaculis vel.

Benefits

Easy-to-use
You can quickly create clusters without the need to configure hardware and software. All maintenance operations are completed on its Web interface.
Cost-effectiveness
You can create clusters and dynamically scale in and out the number of compute nodes based on current computing needs.
Stability
EMR provides a deeply optimized cluster environment, automated background maintenance, and multiple online support channels.
Security
EMR supports Kerberos authentication and data encryption. You can use RAM users to refine the management of service permissions.

Features

Automated Cluster Deployment and Expansion

You can quickly deploy and expand clusters from a Web interface without the need to manage the hardware and software.

Cluster Creation

You can quickly deploy multiple types of clusters, such as Hadoop, Kafka, Druid, and ZooKeeper.

Cluster Expansion

You can quickly add any types of nodes to the existing clusters.

Scheduled Cluster Creation

You can execute plans to create clusters, execute jobs at the scheduled time, and release clusters after job execution.

Automatic Component Deployment

You can add, configure, and maintain components based on your needs.

Dynamic Expansion

You can scale in and out cluster compute resources at the specified time to reduce the total cost of ownership (TCO).

Workflow Scheduling

EMR offers simple job orchestration and scheduling.

Job Editing and Management

EMR supports graphical job editing and management for you to execute and orchestrate multiple types of jobs.

Workflow Scheduling

EMR supports job and dependency scheduling. You can orchestrate and schedule jobs as DAG-based workflows.

Dynamic Clusters

You can use EMR to start a temporary cluster to execute jobs at the scheduled time and stop the cluster after job execution.

Guaranteed Job Execution

When EMR fails to execute a job, it immediately sends an alarm. You can also set EMR to automatically re-execute the job.

Multiple Components

EMR provides multiple components.

Hadoop

A big data processing platform with petabytes of storage capacity and compute capability.

Spark

A memory-based new-generation distributed computing framework that supports offline and real-time computing, SQL syntax, and machine learning.

Hive

An offline data processing system based on Hadoop. Hive supports structured table management based on Hadoop Distributed File System (HDFS) and provides query syntax that is similar to SQL for data analysis and processing.

Kafka

A high-throughput and reliable distributed message publication and subscription system.

Storm

A real-time compute engine that supports real-time data processing within milliseconds.

ZooKeeper

A distributed and open-source coordination service that can ensure the consistency of distributed applications.

Hue

A management tool and Web interface.

Oozie

An open-source job scheduling tool.

Druid

An open-source real-time big data analysis software.

Flink

A distributed engine for batch processing and stream processing.

Complete Ecosystem Support

EMR is deeply integrated with Alibaba Cloud services.

Support for OSS

You can use Object Storage Service (OSS) as HDFS in most of the EMR components.

Support for SLS

EMR provides an SDK which allows you to input real-time data (RTD) from Log Service (SLS).

Support for Elasticsearch

Hadoop carries a built-in ES-Hadoop plug-in which supports all Elasticsearch operations.

Support for MaxCompute

EMR supports reading and writing Alibaba Cloud MaxCompute data.

Support for Alibaba Cloud Message Services

EMR supports reading and writing data from Alibaba Cloud message services, such as Message Queue and Message Service, and supports SDK integration.

Scenarios

Separation of dynamic resources from static resources for websites or applications

OSS is the cost-effective and elastically scalable storage service.

You can manage all static website content such as ../../images, scripts, and videos, and store this content in the same way that you store folders in OSS, and access nearby resources using Border Gateway Protocol (BGP) or Content Delivery Network (CDN) acceleration. In this way, OSS can effectively reduce the load of ECS instances and improve user experience.

Benifits

High performance

Prevents overloading on instances that is caused by increasing service data volumes.

Cost-effective

Supports the elastic scaling of resources and the Pay-As-You-Go billing method.

Integrations and Configurations

Multimedia data storage

Alibaba Cloud OSS provides mass data storage options for multiple types of content, such as ../../images, audio, videos, logs and other files.

OSS supports multiple types of terminals, Web applications, and mobile apps, and allows them to write data to or read data from OSS directly, and to write to both streams and files. OSS secures data with a reliability of 99.999999999%, and enables seamless access to CDN and Media Transcoding Service (MTS).

Benifits

Flexible access

Provides easy access by using standard RESTful APIs, multiple SDKs, clients, and the OSS console.

Security and reliability

Multi-level security measures ensure data reliability.

Cost-effective

Supports the elastic scaling of resources and flexible billing methods.

Integrations

Cloud data ETL

You can extract more value from your data, because data processing is simplified with Alibaba Cloud OSS.

You can use MTS, image processing service, BatchCompute, and MaxCompute to fully extract the value of your data that is stored to OSS.

Benifits

Rich media processing

Provides image processing, user-defined functions, and other value-added services, and works with MTS to provide video transcoding and frame capturing.

Storage and computing

Uses Alibaba Cloud data computing services to extract the value from your data.

Integrations and Configurations

Multiple storage types

Classified storage substantially reduces costs.

You can store all types of data in OSS, such as popular data including ../../images, audio, and videos, less popular data including backups, and archived data. According to the specified lifecycle, OSS transfers data to the corresponding storage type at a lower price, and optimizes the storage cost.

Benifits

Standard storage

Features high performance, high reliability, and high availability.

Less popular data storage

Features cost-effective and real-time access

Archives

Suitable for long-term archival storage at the lowest price.

Integrations

Cross-region disaster recovery

OSS supports cross-region replication.

Therefore, you can synchronize data to a specified region in real time for remote disaster recovery. In this way, OSS secures important data from the impact of extreme disasters and ensures service stability.

Benifits

Remote disaster recovery

Maintains your service with data that has been replicated to the remote standby data center.

Data compliance

Replicates data between remote OSS data centers to meet data compliance requirements.

Integrations and Configurations

Upgraded Support For You

1 on 1 Presale Consultation, 24/7 Technical Support, Faster Response, and More Tickets.

1 on 1 Presale Consultation

Consulting by experienced cloud experts. Learn More

24/7 Technical Support

Extended service time from 10 hours 5 days a week to 24/7. Learn More

6 Free Tickets per Quarter

The number of free tickets doubled from 3 to 6 per quarter. Learn More

Faster Response

Shorten after-sale response time from 36 hours to 18 hours. Learn More