GitHub - hevoio/trino-hive: A sample docker setup for Trino (Presto) with Hive connector

Trino (Presto) with Hive connector

The instructions here relate to run Presto with data present in GCS. This setup has been adapted from this repository: Hive connector over MinIO file storage.

Stack

Trino (Presto)
Hive GCS connector
Hive Metastore with MariaDB persistence

Steps

Download the gcs connector and place the jar file in lib directory.
Obtain GCP service-account credentials json with the service-account having admin permissions on GCS. Place this file in creds directory
Set the GCP project-id against fs.gs.project.id in metastore-site.xml.
Run docker-compose up -d.
It should start instances of the database, metastore and Trino

Creating network "trino-hive_trino-network" with driver "bridge"
Creating trino-hive_mariadb_1           ... done
Creating trino-hive_trino-coordinator_1 ... done
Creating trino-hive_hive-metastore_1    ... done

Access trino

docker container exec -it trino-hive_trino-coordinator_1 trino

# The following prompt must show up
trino>

Create Entities

View Catalogs

trino> show catalogs;
 Catalog
---------
 hive
 system
 tpcds
 tpch
(4 rows)

Create a Schema in the Hive Catalog

trino> CREATE SCHEMA hive.hive_gcs WITH (location = 'gs://bucket-test-tj-1/');
CREATE SCHEMA

Create some data in GCS. Ensure that partitions like partion_name=foo are present in the structure and each file has 5 columns with headers col_1, col_2, col_3, col_4, col_5. Example:

Create a partitioned Hive Table

USE hive.hive_gcs;

CREATE TABLE sample_table1 (
  col_1 varchar, col_2 varchar, col_3 varchar, col_4 varchar, col_5 varchar,
  xing varchar, fing varchar
)
WITH (
  format = 'CSV',
  partitioned_by = ARRAY['xing','fing'],
  external_location = 'gs://bucket-test-tj-1/ping',
  skip_header_line_count = 1
);

Update partitions

CALL system.sync_partition_metadata(schema_name=>'hive_gcs', table_name=>'sample_table1', mode=>'FULL');

Read Data


trino:hive_gcs> select * from sample_table1 where xing = 'bar' and fing = 'io';

 col_1 | col_2 | col_3 | col_4 | col_5 | xing | fing
-------+-------+-------+-------+-------+------+------
 91    | 92    | 93    | 94    | 95    | bar  | io
 81    | 82    | 83    | 84    | 85    | bar  | io

Sample code

Maven dependency

<dependency>
    <groupId>io.trino</groupId>
    <artifactId>trino-jdbc</artifactId>
    <version>373</version>
</dependency>

Java Code

public static void main(String[] args) throws Exception {
    // Format: <Trino_Coordinator>/Catalog/Schema
    try (Connection conn = DriverManager.getConnection("jdbc:trino://localhost:9080/hive/hive_gcs?user=anything")) {
        try(Statement stmt = conn.createStatement()) {
            try(ResultSet rs = stmt.executeQuery("SELECT * FROM sample_table1")) {
                while (rs.next()) {
                    String col1 = rs.getString("col_1");
                    String col2 = rs.getString("col_2");
                    System.out.println(String.format("col_1=%s and col_2=%s", col1, col2));
                }
            }
        }
    }
}

References

MinIO

MinIO is a multi-cloud object storage abstraction for any cloud. It has s3 like semantics and is compatible with most clouds. Hence, it can be a good alternate to object storage if rich abstraction is a necessity. A sample experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
conf		conf
etc		etc
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trino (Presto) with Hive connector

Stack

Steps

Access trino

Create Entities

Sample code

References

MinIO

About

Releases

Packages

hevoio/trino-hive

Folders and files

Latest commit

History

Repository files navigation

Trino (Presto) with Hive connector

Stack

Steps

Access trino

Create Entities

Sample code

References

MinIO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages