Cloudera Impala

What is Cloudera Impala

Cloudera Impala is a massively parallel mechanism for interactive execution of SQL queries against data stored in Apache Hadoop, written in C++ and distributed under the Apache 2.0 licence.

Cloudera Impala

Advantages

High performance and low latency SQL queries

Impala uses its own set of daemons running on each of the data nodes, saving time by avoiding the MapReduce job startup latency, compiling the query code for optimal performance, and streaming intermediate results in memory.

High speed

Impala can process data stored in HDFS at lightning speed using traditional SQL knowledge.

Improved performance

Impala can more naturally decentralise query plans instead of including them in a map and shorten job pipelines. It allows Impala to process multiple request steps in parallel and avoid unnecessary overheads like sorting and shuffling.

Prioritisation and the ability to manage the queue of requests

Impala uses better I/O scheduling. It understands the location of blocks on hard drives and can plan how they are processed to ensure that all hard drives are busy.

Support for popular big data formats

Impala can read almost all file formats like Avro, RCFile, Parquet, used by Hadoop.

Enterprise-grade authentication system (Kerberos)

Kerberos includes capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of impersonation by never sending a user’s credentials in cleartext over the network.

Cloudera Impala Alternatives

PostgreSQL

PostgreSQL

What is PostgreSQL?

PostgreSQL is an object-relational database management system (ORDBMS), the most advanced open-source database management system in the world.

Learn more
Clickhouse

Clickhouse

What is Clickhouse?

ClickHouse is an open-source columnar relational DBMS from Yandex for fast processing of analytical SQL queries on structured big data in real time.

Learn more

How do you rate the tool?