Utilities and Configuration

This section covers utility functions, retry configuration, and helper classes.

Retry Configuration

class pyathena.util.RetryConfig(exceptions: Iterable[str] = ('ThrottlingException', 'TooManyRequestsException'), attempt: int = 5, multiplier: int = 1, max_delay: int = 100, exponential_base: int = 2)[source]

Configuration for automatic retry behavior on failed API calls.

This class configures how PyAthena handles transient failures when communicating with AWS services. It uses exponential backoff with customizable parameters to retry failed operations.

exceptions

List of AWS exception names to retry on.

attempt

Maximum number of retry attempts.

multiplier

Base multiplier for exponential backoff.

max_delay

Maximum delay between retries in seconds.

exponential_base

Base for exponential backoff calculation.

Example

>>> from pyathena.util import RetryConfig
>>>
>>> # Default retry configuration
>>> retry_config = RetryConfig()
>>>
>>> # Custom retry configuration
>>> custom_retry = RetryConfig(
...     exceptions=["ThrottlingException", "ServiceUnavailableException"],
...     attempt=10,
...     max_delay=60
... )
>>>
>>> # Use with connection
>>> conn = pyathena.connect(
...     s3_staging_dir="s3://bucket/path/",
...     retry_config=custom_retry
... )

Note

Retries are applied to AWS API calls, not to SQL query execution. Query failures typically require manual intervention or query fixes.

__init__(exceptions: Iterable[str] = ('ThrottlingException', 'TooManyRequestsException'), attempt: int = 5, multiplier: int = 1, max_delay: int = 100, exponential_base: int = 2) None[source]

Utility Functions

pyathena.util.retry_api_call(func: Callable[[...], Any], config: RetryConfig, logger: Logger | None = None, *args, **kwargs) Any[source]

Execute a function with automatic retry logic for AWS API calls.

This function wraps AWS API calls with retry behavior based on the provided configuration. It uses exponential backoff and only retries on specific AWS exceptions that indicate transient failures.

Parameters:
  • func – The AWS API function to call.

  • config – RetryConfig instance specifying retry behavior.

  • logger – Optional logger for retry attempt logging.

  • *args – Positional arguments to pass to the function.

  • **kwargs – Keyword arguments to pass to the function.

Returns:

The result of the successful function call.

Raises:

The original exception if all retry attempts are exhausted.

Example

>>> from pyathena.util import RetryConfig, retry_api_call
>>> config = RetryConfig(attempt=3, max_delay=30)
>>> result = retry_api_call(
...     client.describe_table,
...     config=config,
...     logger=logger,
...     TableName="my_table"
... )

Note

Only retries on AWS exceptions listed in the RetryConfig.exceptions. Does not retry on client errors or non-AWS exceptions.

pyathena.util.parse_output_location(output_location: str) Tuple[str, str][source]

Parse an S3 output location URL into bucket and key components.

Parameters:

output_location – S3 URL in format ‘s3://bucket-name/path/to/object’

Returns:

Tuple of (bucket_name, object_key)

Raises:

DataError – If the output_location format is invalid.

Example

>>> bucket, key = parse_output_location("s3://my-bucket/results/query.csv")
>>> print(bucket)  # "my-bucket"
>>> print(key)    # "results/query.csv"
pyathena.util.strtobool(val)[source]

Convert a string representation of truth to True or False.

This function replaces the deprecated distutils.util.strtobool method. It converts string representations of boolean values to actual boolean values.

Parameters:

val – String representation of a boolean value.

Returns:

1 for True values, 0 for False values.

Raises:

ValueError – If the input string is not a recognized boolean representation.

Example

>>> strtobool("yes")  # 1
>>> strtobool("false")  # 0
>>> strtobool("invalid")  # ValueError

Note

True values: y, yes, t, true, on, 1 (case-insensitive) False values: n, no, f, false, off, 0 (case-insensitive)

References

Common Base Classes

class pyathena.common.CursorIterator(**kwargs)[source]

Abstract base class providing iteration and result fetching capabilities for cursors.

This mixin class provides common functionality for iterating through query results and managing cursor state. It implements the iterator protocol and provides standard fetch methods that conform to the DB API 2.0 specification.

DEFAULT_FETCH_SIZE

Default number of rows to fetch per request (1000).

Type:

int

DEFAULT_RESULT_REUSE_MINUTES

Default minutes for Athena result reuse (60).

arraysize

Number of rows to fetch with fetchmany() if size not specified.

Note

This is an abstract base class used by concrete cursor implementations. It should not be instantiated directly.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
__init__(**kwargs) None[source]
property arraysize: int
property rownumber: int | None
property rowcount: int
abstract fetchone()[source]
abstract fetchmany()[source]
abstract fetchall()[source]
class pyathena.common.BaseCursor(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, **kwargs)[source]

Abstract base class for all PyAthena cursor implementations.

This class provides the foundational functionality for executing SQL queries and calculations on Amazon Athena. It handles AWS API interactions, query execution management, metadata operations, and result polling.

All concrete cursor implementations (Cursor, DictCursor, PandasCursor, ArrowCursor, SparkCursor, AsyncCursor) inherit from this base class and implement the abstract methods according to their specific use cases.

LIST_QUERY_EXECUTIONS_MAX_RESULTS

Maximum results per query listing API call (50).

LIST_TABLE_METADATA_MAX_RESULTS

Maximum results per table metadata API call (50).

LIST_DATABASES_MAX_RESULTS

Maximum results per database listing API call (50).

Key Features:
  • Query execution and polling with configurable retry logic

  • Table and database metadata operations

  • Result caching and reuse capabilities

  • Encryption and security configuration support

  • Workgroup and catalog management

  • Query cancellation and interruption handling

Example

This is an abstract base class and should not be instantiated directly. Use concrete implementations like Cursor or PandasCursor instead:

>>> cursor = connection.cursor()  # Creates default Cursor
>>> cursor.execute("SELECT * FROM my_table")
>>> results = cursor.fetchall()

Note

This class contains AWS service quotas as constants. These limits are enforced by the AWS Athena service and should not be modified.

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
LIST_DATABASES_MAX_RESULTS = 50
__init__(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, **kwargs) None[source]
static get_default_converter(unload: bool = False) DefaultTypeConverter | Any[source]

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

property connection: Connection[Any]
list_databases(catalog_name: str | None, max_results: int | None = None) List[AthenaDatabase][source]
get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata[source]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) List[AthenaTableMetadata][source]
abstract execute(operation: str, parameters: Dict[str, Any] | List[str] | None = None, **kwargs)[source]
abstract executemany(operation: str, seq_of_parameters: List[Dict[str, Any] | List[str] | None], **kwargs) None[source]
abstract close() None[source]
setinputsizes(sizes)[source]

Does nothing by default

setoutputsize(size, column=None)[source]

Does nothing by default