Thrift 連接器

Thrift 連接器可讓您與外部儲存系統整合,而無需自訂 Presto 連接器實作。

為了將 Thrift 連接器與外部系統搭配使用,您需要實作 PrestoThriftService 介面,如下所示。接下來,您設定 Thrift 連接器以指向一組稱為 Thrift 伺服器的機器,這些機器實作該介面。作為介面實作的一部分,Thrift 伺服器將提供中繼資料、分割和資料。連接器將從可用執行個體中隨機選擇一個伺服器來進行中繼資料呼叫,或用於資料呼叫,除非分割包含位址清單。所有請求都假設是等冪的,並且可以在任何伺服器之間自由重試。

設定

若要設定 Thrift 連接器,請建立包含以下內容的目錄屬性檔案 etc/catalog/thrift.properties,並根據需要取代屬性

connector.name=presto-thrift
presto.thrift.client.addresses=host:port,host:port

多個 Thrift 系統

您可以視需要擁有任意數量的目錄,因此如果您有其他 Thrift 系統要連線,只需將另一個屬性檔案新增至 etc/catalog,並使用不同的名稱 (確保其結尾為 .properties)。

設定屬性

可以使用以下設定屬性

屬性名稱

描述

presto.thrift.client.addresses

Thrift 伺服器的位置

presto-thrift.max-response-size

從 Thrift 伺服器傳回的資料最大大小

presto-thrift.metadata-refresh-threads

中繼資料快取的重新整理執行緒數

presto.thrift.client.max-retries

失敗 Thrift 請求的最大重試次數

presto.thrift.client.max-backoff-delay

重試嘗試之間的最大間隔

presto.thrift.client.min-backoff-delay

重試嘗試之間的最小間隔

presto.thrift.client.max-retry-time

Thrift 請求所有嘗試的最大持續時間

presto.thrift.client.backoff-scale-factor

指數回退的縮放比例

presto.thrift.client.connect-timeout

連線逾時

presto.thrift.client.request-timeout

請求逾時

presto.thrift.client.socks-proxy

SOCKS Proxy 位址

presto.thrift.client.max-frame-size

原始 Thrift 回應的最大大小

presto.thrift.client.transport

Thrift 傳輸類型 (UNFRAMEDFRAMEDHEADER)

presto.thrift.client.protocol

Thrift 通訊協定類型 (BINARYCOMPACTFB_COMPACT)

presto.thrift.client.addresses

host:port 形式表示的 Thrift 伺服器逗號分隔清單。例如

presto.thrift.client.addresses=192.0.2.3:7777,192.0.2.4:7779

這是必要屬性,沒有預設值。

presto-thrift.max-response-size

連接器接受的資料回應最大大小。此值由連接器在要求資料時傳送至 Thrift 伺服器,讓伺服器可以適當調整回應的大小。

這是選用屬性;預設值為 16MB

presto-thrift.metadata-refresh-threads

中繼資料快取的重新整理執行緒數。

這是選用屬性;預設值為 1

Thrift IDL 檔案

下列 IDL 描述必須實作的 PrestoThriftService

enum PrestoThriftBound {
  BELOW = 1;
  EXACTLY = 2;
  ABOVE = 3;
}

exception PrestoThriftServiceException {
  1: string message;
  2: bool retryable;
}

struct PrestoThriftNullableSchemaName {
  1: optional string schemaName;
}

struct PrestoThriftSchemaTableName {
  1: string schemaName;
  2: string tableName;
}

struct PrestoThriftTableMetadata {
  1: PrestoThriftSchemaTableName schemaTableName;
  2: list<PrestoThriftColumnMetadata> columns;
  3: optional string comment;

  /**
   * Returns a list of key sets which can be used for index lookups.
   * The list is expected to have only unique key sets.
   * {@code set<set<string>>} is not used here because some languages (like php) don't support it.
   */
  4: optional list<set<string>> indexableKeys;
}

struct PrestoThriftColumnMetadata {
  1: string name;
  2: string type;
  3: optional string comment;
  4: bool hidden;
}

struct PrestoThriftNullableColumnSet {
  1: optional set<string> columns;
}

struct PrestoThriftTupleDomain {
  /**
   * Return a map of column names to constraints.
   */
  1: optional map<string, PrestoThriftDomain> domains;
}

/**
 * Set that either includes all values, or excludes all values.
 */
struct PrestoThriftAllOrNoneValueSet {
  1: bool all;
}

/**
 * A set containing values that are uniquely identifiable.
 * Assumes an infinite number of possible values. The values may be collectively included (aka whitelist)
 * or collectively excluded (aka !whitelist).
 * This structure is used with comparable, but not orderable types like "json", "map".
 */
struct PrestoThriftEquatableValueSet {
  1: bool whiteList;
  2: list<PrestoThriftBlock> values;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code ints} array are values for each row. If row is null then value is ignored.
 */
struct PrestoThriftInteger {
  1: optional list<bool> nulls;
  2: optional list<i32> ints;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code longs} array are values for each row. If row is null then value is ignored.
 */
struct PrestoThriftBigint {
  1: optional list<bool> nulls;
  2: optional list<i64> longs;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code doubles} array are values for each row. If row is null then value is ignored.
 */
struct PrestoThriftDouble {
  1: optional list<bool> nulls;
  2: optional list<double> doubles;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Each elements of {@code sizes} array contains the length in bytes for the corresponding element.
 * If row is null then the corresponding element in {@code sizes} is ignored.
 * {@code bytes} array contains UTF-8 encoded byte values.
 * Values for all rows are written to {@code bytes} array one after another.
 * The total number of bytes must be equal to the sum of all sizes.
 */
struct PrestoThriftVarchar {
  1: optional list<bool> nulls;
  2: optional list<i32> sizes;
  3: optional binary bytes;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code booleans} array are values for each row. If row is null then value is ignored.
 */
struct PrestoThriftBoolean {
  1: optional list<bool> nulls;
  2: optional list<bool> booleans;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code dates} array are date values for each row represented as the number
 * of days passed since 1970-01-01.
 * If row is null then value is ignored.
 */
struct PrestoThriftDate {
  1: optional list<bool> nulls;
  2: optional list<i32> dates;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code timestamps} array are values for each row represented as the number
 * of milliseconds passed since 1970-01-01T00:00:00 UTC.
 * If row is null then value is ignored.
 */
struct PrestoThriftTimestamp {
  1: optional list<bool> nulls;
  2: optional list<i64> timestamps;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Each elements of {@code sizes} array contains the length in bytes for the corresponding element.
 * If row is null then the corresponding element in {@code sizes} is ignored.
 * {@code bytes} array contains UTF-8 encoded byte values for string representation of json.
 * Values for all rows are written to {@code bytes} array one after another.
 * The total number of bytes must be equal to the sum of all sizes.
 */
struct PrestoThriftJson {
  1: optional list<bool> nulls;
  2: optional list<i32> sizes;
  3: optional binary bytes;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Each elements of {@code sizes} array contains the length in bytes for the corresponding element.
 * If row is null then the corresponding element in {@code sizes} is ignored.
 * {@code bytes} array contains encoded byte values for HyperLogLog representation as defined in
 * Airlift specification: href="https://github.com/airlift/airlift/blob/master/stats/docs/hll.md
 * Values for all rows are written to {@code bytes} array one after another.
 * The total number of bytes must be equal to the sum of all sizes.
 */
struct PrestoThriftHyperLogLog {
  1: optional list<bool> nulls;
  2: optional list<i32> sizes;
  3: optional binary bytes;
}

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Each elements of {@code sizes} array contains the number of elements in the corresponding values array.
 * If row is null then the corresponding element in {@code sizes} is ignored.
 * {@code values} is a bigint block containing array elements one after another for all rows.
 * The total number of elements in bigint block must be equal to the sum of all sizes.
 */
struct PrestoThriftBigintArray {
  1: optional list<bool> nulls;
  2: optional list<i32> sizes;
  3: optional PrestoThriftBigint values;
}

/**
 * A set containing zero or more Ranges of the same type over a continuous space of possible values.
 * Ranges are coalesced into the most compact representation of non-overlapping Ranges.
 * This structure is used with comparable and orderable types like bigint, integer, double, varchar, etc.
 */
struct PrestoThriftRangeValueSet {
  1: list<PrestoThriftRange> ranges;
}

struct PrestoThriftId {
  1: binary id;
}

struct PrestoThriftSplitBatch {
  1: list<PrestoThriftSplit> splits;
  2: optional PrestoThriftId nextToken;
}

struct PrestoThriftSplit {
  /**
   * Encodes all the information needed to identify a batch of rows to return to Presto.
   * For a basic scan, includes schema name, table name, and output constraint.
   * For an index scan, includes schema name, table name, set of keys to lookup and output constraint.
   */
  1: PrestoThriftId splitId;

  /**
   * Identifies the set of hosts on which the rows are available. If empty, then the rows
   * are expected to be available on any host. The hosts in this list may be independent
   * from the hosts used to serve metadata requests.
   */
  2: list<PrestoThriftHostAddress> hosts;
}

struct PrestoThriftHostAddress {
  1: string host;
  2: i32 port;
}

struct PrestoThriftPageResult {
  /**
   * Returns data in a columnar format.
   * Columns in this list must be in the order they were requested by the engine.
   */
  1: list<PrestoThriftBlock> columnBlocks;

  2: i32 rowCount;
  3: optional PrestoThriftId nextToken;
}

struct PrestoThriftNullableTableMetadata {
  1: optional PrestoThriftTableMetadata tableMetadata;
}

struct PrestoThriftValueSet {
  1: optional PrestoThriftAllOrNoneValueSet allOrNoneValueSet;
  2: optional PrestoThriftEquatableValueSet equatableValueSet;
  3: optional PrestoThriftRangeValueSet rangeValueSet;
}

struct PrestoThriftBlock {
  1: optional PrestoThriftInteger integerData;
  2: optional PrestoThriftBigint bigintData;
  3: optional PrestoThriftDouble doubleData;
  4: optional PrestoThriftVarchar varcharData;
  5: optional PrestoThriftBoolean booleanData;
  6: optional PrestoThriftDate dateData;
  7: optional PrestoThriftTimestamp timestampData;
  8: optional PrestoThriftJson jsonData;
  9: optional PrestoThriftHyperLogLog hyperLogLogData;
  10: optional PrestoThriftBigintArray bigintArrayData;
}

/**
 * LOWER UNBOUNDED is specified with an empty value and an ABOVE bound
 * UPPER UNBOUNDED is specified with an empty value and a BELOW bound
 */
struct PrestoThriftMarker {
  1: optional PrestoThriftBlock value;
  2: PrestoThriftBound bound;
}

struct PrestoThriftNullableToken {
  1: optional PrestoThriftId token;
}

struct PrestoThriftDomain {
  1: PrestoThriftValueSet valueSet;
  2: bool nullAllowed;
}

struct PrestoThriftRange {
  1: PrestoThriftMarker low;
  2: PrestoThriftMarker high;
}

/**
 * Presto Thrift service definition.
 * This thrift service needs to be implemented in order to be used with Thrift Connector.
 */
service PrestoThriftService {
  /**
   * Returns available schema names.
   */
  list<string> prestoListSchemaNames()
    throws (1: PrestoThriftServiceException ex1);

  /**
   * Returns tables for the given schema name.
   *
   * @param schemaNameOrNull a structure containing schema name or {@literal null}
   * @return a list of table names with corresponding schemas. If schema name is null then returns
   * a list of tables for all schemas. Returns an empty list if a schema does not exist
   */
  list<PrestoThriftSchemaTableName> prestoListTables(
      1: PrestoThriftNullableSchemaName schemaNameOrNull)
    throws (1: PrestoThriftServiceException ex1);

  /**
   * Returns metadata for a given table.
   *
   * @param schemaTableName schema and table name
   * @return metadata for a given table, or a {@literal null} value inside if it does not exist
   */
  PrestoThriftNullableTableMetadata prestoGetTableMetadata(
      1: PrestoThriftSchemaTableName schemaTableName)
    throws (1: PrestoThriftServiceException ex1);

  /**
   * Returns a batch of splits.
   *
   * @param schemaTableName schema and table name
   * @param desiredColumns a superset of columns to return; empty set means "no columns", {@literal null} set means "all columns"
   * @param outputConstraint constraint on the returned data
   * @param maxSplitCount maximum number of splits to return
   * @param nextToken token from a previous split batch or {@literal null} if it is the first call
   * @return a batch of splits
   */
  PrestoThriftSplitBatch prestoGetSplits(
      1: PrestoThriftSchemaTableName schemaTableName,
      2: PrestoThriftNullableColumnSet desiredColumns,
      3: PrestoThriftTupleDomain outputConstraint,
      4: i32 maxSplitCount,
      5: PrestoThriftNullableToken nextToken)
    throws (1: PrestoThriftServiceException ex1);

  /**
   * Returns a batch of index splits for the given batch of keys.
   * This method is called if index join strategy is chosen for a query.
   *
   * @param schemaTableName schema and table name
   * @param indexColumnNames specifies columns and their order for keys
   * @param outputColumnNames a list of column names to return
   * @param keys keys for which records need to be returned; includes only unique and non-null values
   * @param outputConstraint constraint on the returned data
   * @param maxSplitCount maximum number of splits to return
   * @param nextToken token from a previous split batch or {@literal null} if it is the first call
   * @return a batch of splits
   */
  PrestoThriftSplitBatch prestoGetIndexSplits(
      1: PrestoThriftSchemaTableName schemaTableName,
      2: list<string> indexColumnNames,
      3: list<string> outputColumnNames,
      4: PrestoThriftPageResult keys,
      5: PrestoThriftTupleDomain outputConstraint,
      6: i32 maxSplitCount,
      7: PrestoThriftNullableToken nextToken)
    throws (1: PrestoThriftServiceException ex1);

  /**
   * Returns a batch of rows for the given split.
   *
   * @param splitId split id as returned in split batch
   * @param columns a list of column names to return
   * @param maxBytes maximum size of returned data in bytes
   * @param nextToken token from a previous batch or {@literal null} if it is the first call
   * @return a batch of table data
   */
  PrestoThriftPageResult prestoGetRows(
      1: PrestoThriftId splitId,
      2: list<string> columns,
      3: i64 maxBytes,
      4: PrestoThriftNullableToken nextToken)
    throws (1: PrestoThriftServiceException ex1);
}