Amazon EMR

Easily Run and Scale Apache Spark, Hadoop, HBase, Presto, Hive, and other Big Data Frameworks

Get started with Amazon EMR

Amazon EMR is the industry leading cloud-native big data platform, allowing teams to process vast amounts of data quickly, and cost-effectively at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, and Presto, coupled with the dynamic scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis for a fraction of the cost of traditional on-premise clusters. Developers and analysts can use Jupyter-based EMR Notebooks for iterative development, collaboration, and access to data stored across AWS data products such as Amazon S3, Amazon DynamoDB, and Amazon Redshift to reduce time to insight and quickly operationalize analytics.

Customers across many industry verticals use EMR to securely and reliably handle broad sets of big data use cases, including machine learning, data transformations (ETL), financial and scientific simulation, bioinformatics, log analysis, and deep learning. EMR gives teams the flexibility to run use cases on single-purpose short lived clusters that automatically scale to meet demand, or on long running highly available clusters using the new multi-master deployment mode.


Tab #2 content goes here!

Donec pulvinar neque sed semper lacinia. Curabitur lacinia ullamcorper nibh; quis imperdiet velit eleifend ac. Donec blandit mauris eget aliquet lacinia! Donec pulvinar massa interdum risus ornare mollis. In hac habitasse platea dictumst. Ut euismod tempus hendrerit. Morbi ut adipiscing nisi. Etiam rutrum sodales gravida! Aliquam tellus orci, iaculis vel.

Tab #3 content goes here!

Donec pulvinar neque sed semper lacinia. Curabitur lacinia ullamcorper nibh; quis imperdiet velit eleifend ac. Donec blandit mauris eget aliquet lacinia! Donec pulvinar massa interdum ri.

Tab #4 content goes here!

Donec pulvinar neque sed semper lacinia. Curabitur lacinia ullamcorper nibh; quis imperdiet velit eleifend ac. Donec blandit mauris eget aliquet lacinia! Donec pulvinar massa interdum risus ornare mollis. In hac habitasse platea dictumst. Ut euismod tempus hendrerit. Morbi ut adipiscing nisi. Etiam rutrum sodales gravida! Aliquam tellus orci, iaculis vel.

BENIFITS

EASY TO USE
EMR launches clusters in minutes. You don’t need to worry about node provisioning, infrastructure setup, Hadoop configuration, or cluster tuning. EMR takes care of these tasks so you can focus on analysis. Analysts, data engineers, and data scientists can launch a serverless Jupyter notebook in seconds using EMR Notebooks, allowing individuals and teams to collaborate and interactively explore, process and visualize data in an easy to use notebook format.
LOW COST
Amazon CloudSearch offers powerful autoscaling for all search domains. As your data or query volume changes, Amazon CloudSearch can scale your search domain's resources up or down as needed. You can control scaling if you know that you need more capacity for bulk uploads or are expecting a surge in search traffic.EMR pricing is simple and predictable: You pay a per-instance rate for every second used, with a one-minute minimum charge. You can launch a 10-node EMR cluster with applications such as Apache Spark, and Apache Hive, for as little as $0.15 per hour. Because EMR has native support for Amazon EC2 Spot and Reserved Instances, you can also save 50-80% on the cost of the underlying instances.
FLEXIBLE
You have complete control over your cluster. You have root access to every instance, you can easily install additional applications, and customize every cluster with bootstrap actions. You can also launch EMR clusters with custom Amazon Linux AMIs, and reconfigure running clusters on the fly without the need to re-launch the cluster.
ELASTIC
With EMR, you can provision one, hundreds, or thousands of compute instances to process data at any scale. The number of instances can be increased or decreased manually or automatically using Auto Scaling (which manages cluster sizes based on utilization), and you only pay for what you use. Unlike the rigid infrastructure of on-premise clusters, EMR decouples compute and persistent storage, giving you the ability to scale each independently.

Use cases

Get started with AWS

Sign up for an AWS account

Instantly get access to the AWS Free Tier.

Learn with 10-minute Tutorials

Explore and learn with our Getting Started documentation

Start building with AWS

Get started with Amazon Athena