High performance and low latency SQL queries
Impala uses its own set of daemons running on each of the data nodes, saving time by avoiding the MapReduce job startup latency, compiling the query code for optimal performance, and streaming intermediate results in memory.
High speed
Impala can process data stored in HDFS at lightning speed using traditional SQL knowledge.
Improved performance
Impala can more naturally decentralise query plans instead of including them in a map and shorten job pipelines. It allows Impala to process multiple request steps in parallel and avoid unnecessary overheads like sorting and shuffling.
Prioritisation and the ability to manage the queue of requests
Impala uses better I/O scheduling. It understands the location of blocks on hard drives and can plan how they are processed to ensure that all hard drives are busy.
Support for popular big data formats
Impala can read almost all file formats like Avro, RCFile, Parquet, used by Hadoop.
Enterprise-grade authentication system (Kerberos)
Kerberos includes capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of impersonation by never sending a user’s credentials in cleartext over the network.
PostgreSQL
What is PostgreSQL?
PostgreSQL is an object-relational database management system (ORDBMS), the most advanced open-source database management system in the world.
Learn moreClickhouse
What is Clickhouse?
ClickHouse is an open-source columnar relational DBMS from Yandex for fast processing of analytical SQL queries on structured big data in real time.
Learn more