Utilities and Configuration¶
This section covers utility functions, retry configuration, and helper classes.
Retry Configuration¶
- class pyathena.util.RetryConfig(exceptions: Iterable[str] = ('ThrottlingException', 'TooManyRequestsException'), attempt: int = 5, multiplier: int = 1, max_delay: int = 100, exponential_base: int = 2)[source]¶
Configuration for automatic retry behavior on failed API calls.
This class configures how PyAthena handles transient failures when communicating with AWS services. It uses exponential backoff with customizable parameters to retry failed operations.
- exceptions¶
List of AWS exception names to retry on.
- attempt¶
Maximum number of retry attempts.
- multiplier¶
Base multiplier for exponential backoff.
- max_delay¶
Maximum delay between retries in seconds.
- exponential_base¶
Base for exponential backoff calculation.
Example
>>> from pyathena.util import RetryConfig >>> >>> # Default retry configuration >>> retry_config = RetryConfig() >>> >>> # Custom retry configuration >>> custom_retry = RetryConfig( ... exceptions=["ThrottlingException", "ServiceUnavailableException"], ... attempt=10, ... max_delay=60 ... ) >>> >>> # Use with connection >>> conn = pyathena.connect( ... s3_staging_dir="s3://bucket/path/", ... retry_config=custom_retry ... )
Note
Retries are applied to AWS API calls, not to SQL query execution. Query failures typically require manual intervention or query fixes.
Utility Functions¶
- pyathena.util.retry_api_call(func: Callable[[...], Any], config: RetryConfig, logger: Logger | None = None, *args, **kwargs) Any [source]¶
Execute a function with automatic retry logic for AWS API calls.
This function wraps AWS API calls with retry behavior based on the provided configuration. It uses exponential backoff and only retries on specific AWS exceptions that indicate transient failures.
- Parameters:
func – The AWS API function to call.
config – RetryConfig instance specifying retry behavior.
logger – Optional logger for retry attempt logging.
*args – Positional arguments to pass to the function.
**kwargs – Keyword arguments to pass to the function.
- Returns:
The result of the successful function call.
- Raises:
The original exception if all retry attempts are exhausted. –
Example
>>> from pyathena.util import RetryConfig, retry_api_call >>> config = RetryConfig(attempt=3, max_delay=30) >>> result = retry_api_call( ... client.describe_table, ... config=config, ... logger=logger, ... TableName="my_table" ... )
Note
Only retries on AWS exceptions listed in the RetryConfig.exceptions. Does not retry on client errors or non-AWS exceptions.
- pyathena.util.parse_output_location(output_location: str) Tuple[str, str] [source]¶
Parse an S3 output location URL into bucket and key components.
- Parameters:
output_location – S3 URL in format ‘s3://bucket-name/path/to/object’
- Returns:
Tuple of (bucket_name, object_key)
- Raises:
DataError – If the output_location format is invalid.
Example
>>> bucket, key = parse_output_location("s3://my-bucket/results/query.csv") >>> print(bucket) # "my-bucket" >>> print(key) # "results/query.csv"
- pyathena.util.strtobool(val)[source]¶
Convert a string representation of truth to True or False.
This function replaces the deprecated distutils.util.strtobool method. It converts string representations of boolean values to actual boolean values.
- Parameters:
val – String representation of a boolean value.
- Returns:
1 for True values, 0 for False values.
- Raises:
ValueError – If the input string is not a recognized boolean representation.
Example
>>> strtobool("yes") # 1 >>> strtobool("false") # 0 >>> strtobool("invalid") # ValueError
Note
True values: y, yes, t, true, on, 1 (case-insensitive) False values: n, no, f, false, off, 0 (case-insensitive)
References
Common Base Classes¶
- class pyathena.common.CursorIterator(**kwargs)[source]¶
Abstract base class providing iteration and result fetching capabilities for cursors.
This mixin class provides common functionality for iterating through query results and managing cursor state. It implements the iterator protocol and provides standard fetch methods that conform to the DB API 2.0 specification.
- DEFAULT_RESULT_REUSE_MINUTES¶
Default minutes for Athena result reuse (60).
- arraysize¶
Number of rows to fetch with fetchmany() if size not specified.
Note
This is an abstract base class used by concrete cursor implementations. It should not be instantiated directly.
- DEFAULT_RESULT_REUSE_MINUTES = 60¶
- class pyathena.common.BaseCursor(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, **kwargs)[source]¶
Abstract base class for all PyAthena cursor implementations.
This class provides the foundational functionality for executing SQL queries and calculations on Amazon Athena. It handles AWS API interactions, query execution management, metadata operations, and result polling.
All concrete cursor implementations (Cursor, DictCursor, PandasCursor, ArrowCursor, SparkCursor, AsyncCursor) inherit from this base class and implement the abstract methods according to their specific use cases.
- LIST_QUERY_EXECUTIONS_MAX_RESULTS¶
Maximum results per query listing API call (50).
- LIST_TABLE_METADATA_MAX_RESULTS¶
Maximum results per table metadata API call (50).
- LIST_DATABASES_MAX_RESULTS¶
Maximum results per database listing API call (50).
- Key Features:
Query execution and polling with configurable retry logic
Table and database metadata operations
Result caching and reuse capabilities
Encryption and security configuration support
Workgroup and catalog management
Query cancellation and interruption handling
Example
This is an abstract base class and should not be instantiated directly. Use concrete implementations like Cursor or PandasCursor instead:
>>> cursor = connection.cursor() # Creates default Cursor >>> cursor.execute("SELECT * FROM my_table") >>> results = cursor.fetchall()
Note
This class contains AWS service quotas as constants. These limits are enforced by the AWS Athena service and should not be modified.
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- LIST_DATABASES_MAX_RESULTS = 50¶
- __init__(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, **kwargs) None [source]¶
- static get_default_converter(unload: bool = False) DefaultTypeConverter | Any [source]¶
Get the default type converter for this cursor class.
- Parameters:
unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
- Returns:
The default type converter instance for this cursor type.
- property connection: Connection[Any]¶
- list_databases(catalog_name: str | None, max_results: int | None = None) List[AthenaDatabase] [source]¶
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata [source]¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) List[AthenaTableMetadata] [source]¶
- abstract execute(operation: str, parameters: Dict[str, Any] | List[str] | None = None, **kwargs)[source]¶