Connection and Cursors¶
This section covers the core connection functionality and basic cursor operations.
Connection¶
- class pyathena.DBAPITypeObject[source]¶
Type Objects and Constructors
https://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors
- pyathena.connect(*args, cursor_class: None = ..., **kwargs) Connection[Cursor] [source]¶
- pyathena.connect(*args, cursor_class: Type[ConnectionCursor], **kwargs) Connection[ConnectionCursor]
Create a new database connection to Amazon Athena.
This function provides the main entry point for establishing connections to Amazon Athena. It follows the DB API 2.0 specification and returns a Connection object that can be used to create cursors for executing SQL queries.
- Parameters:
s3_staging_dir – S3 location to store query results. Required if not using workgroups or if the workgroup doesn’t have a result location.
region_name – AWS region name. If not specified, uses the default region from your AWS configuration.
schema_name – Athena database/schema name. Defaults to “default”.
catalog_name – Athena data catalog name. Defaults to “awsdatacatalog”.
work_group – Athena workgroup name. Can be used instead of s3_staging_dir if the workgroup has a result location configured.
poll_interval – Time in seconds between polling for query completion. Defaults to 1.0.
encryption_option – S3 encryption option for query results. Can be “SSE_S3”, “SSE_KMS”, or “CSE_KMS”.
kms_key – KMS key ID for encryption when using SSE_KMS or CSE_KMS.
profile_name – AWS profile name to use for authentication.
role_arn – ARN of IAM role to assume for authentication.
role_session_name – Session name when assuming a role.
cursor_class – Custom cursor class to use. If not specified, uses the default Cursor class.
kill_on_interrupt – Whether to cancel running queries when interrupted. Defaults to True.
**kwargs – Additional keyword arguments passed to the Connection constructor.
- Returns:
A Connection object that can be used to create cursors and execute queries.
- Raises:
AssertionError – If neither s3_staging_dir nor work_group is provided.
Example
>>> import pyathena >>> conn = pyathena.connect( ... s3_staging_dir='s3://my-bucket/staging/', ... region_name='us-east-1', ... schema_name='mydatabase' ... ) >>> cursor = conn.cursor() >>> cursor.execute("SELECT * FROM mytable LIMIT 10") >>> results = cursor.fetchall()
- class pyathena.connection.Connection(s3_staging_dir: str | None = ..., region_name: str | None = ..., schema_name: str | None = ..., catalog_name: str | None = ..., work_group: str | None = ..., poll_interval: float = ..., encryption_option: str | None = ..., kms_key: str | None = ..., profile_name: str | None = ..., role_arn: str | None = ..., role_session_name: str = ..., external_id: str | None = ..., serial_number: str | None = ..., duration_seconds: int = ..., converter: Converter | None = ..., formatter: Formatter | None = ..., retry_config: RetryConfig | None = ..., cursor_class: None = ..., cursor_kwargs: Dict[str, Any] | None = ..., kill_on_interrupt: bool = ..., session: Session | None = ..., config: Config | None = ..., result_reuse_enable: bool = ..., result_reuse_minutes: int = ..., on_start_query_execution: Callable[[str], None] | None = ..., **kwargs)[source]¶
- class pyathena.connection.Connection(s3_staging_dir: str | None = ..., region_name: str | None = ..., schema_name: str | None = ..., catalog_name: str | None = ..., work_group: str | None = ..., poll_interval: float = ..., encryption_option: str | None = ..., kms_key: str | None = ..., profile_name: str | None = ..., role_arn: str | None = ..., role_session_name: str = ..., external_id: str | None = ..., serial_number: str | None = ..., duration_seconds: int = ..., converter: Converter | None = ..., formatter: Formatter | None = ..., retry_config: RetryConfig | None = ..., cursor_class: Type[ConnectionCursor] = ..., cursor_kwargs: Dict[str, Any] | None = ..., kill_on_interrupt: bool = ..., session: Session | None = ..., config: Config | None = ..., result_reuse_enable: bool = ..., result_reuse_minutes: int = ..., on_start_query_execution: Callable[[str], None] | None = ..., **kwargs)
A DB API 2.0 compliant connection to Amazon Athena.
The Connection class represents a database session and provides methods to create cursors for executing SQL queries against Amazon Athena. It handles authentication, session management, and query result storage in S3.
This class follows the Python Database API Specification v2.0 (PEP 249) and provides a familiar interface for database operations.
- s3_staging_dir¶
S3 location where query results are stored.
- region_name¶
AWS region name.
- schema_name¶
Default database/schema name for queries.
- catalog_name¶
Data catalog name (typically “awsdatacatalog”).
- work_group¶
Athena workgroup name.
- poll_interval¶
Interval in seconds for polling query status.
- encryption_option¶
S3 encryption option for query results.
- kms_key¶
KMS key for encryption when applicable.
- kill_on_interrupt¶
Whether to cancel queries on interrupt signals.
- result_reuse_enable¶
Whether to enable Athena’s result reuse feature.
- result_reuse_minutes¶
Minutes to reuse cached results.
Example
>>> conn = Connection( ... s3_staging_dir='s3://my-bucket/staging/', ... region_name='us-east-1', ... schema_name='mydatabase' ... ) >>> with conn: ... cursor = conn.cursor() ... cursor.execute("SELECT COUNT(*) FROM mytable") ... result = cursor.fetchone()
Note
Either s3_staging_dir or work_group must be specified. If using a workgroup, it must have a result location configured unless s3_staging_dir is also provided.
- __init__(s3_staging_dir: str | None = None, region_name: str | None = None, schema_name: str | None = 'default', catalog_name: str | None = 'awsdatacatalog', work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, profile_name: str | None = None, role_arn: str | None = None, role_session_name: str = 'PyAthena-session-1755337635', external_id: str | None = None, serial_number: str | None = None, duration_seconds: int = 3600, converter: Converter | None = None, formatter: Formatter | None = None, retry_config: RetryConfig | None = None, cursor_class: None = None, cursor_kwargs: Dict[str, Any] | None = None, kill_on_interrupt: bool = True, session: Session | None = None, config: Config | None = None, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) None [source]¶
- __init__(s3_staging_dir: str | None = None, region_name: str | None = None, schema_name: str | None = 'default', catalog_name: str | None = 'awsdatacatalog', work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, profile_name: str | None = None, role_arn: str | None = None, role_session_name: str = 'PyAthena-session-1755337635', external_id: str | None = None, serial_number: str | None = None, duration_seconds: int = 3600, converter: Converter | None = None, formatter: Formatter | None = None, retry_config: RetryConfig | None = None, cursor_class: Type[ConnectionCursor] = None, cursor_kwargs: Dict[str, Any] | None = None, kill_on_interrupt: bool = True, session: Session | None = None, config: Config | None = None, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) None
Initialize a new Athena database connection.
- Parameters:
s3_staging_dir – S3 location to store query results. Required if not using workgroups or if workgroup doesn’t have result location.
region_name – AWS region name. Uses default region if not specified.
schema_name – Default database/schema name. Defaults to “default”.
catalog_name – Data catalog name. Defaults to “awsdatacatalog”.
work_group – Athena workgroup name. Can substitute for s3_staging_dir if workgroup has result location configured.
poll_interval – Seconds between query status polls. Defaults to 1.0.
encryption_option – S3 encryption for results (“SSE_S3”, “SSE_KMS”, “CSE_KMS”).
kms_key – KMS key ID when using SSE_KMS or CSE_KMS encryption.
profile_name – AWS profile name for authentication.
role_arn – IAM role ARN to assume for authentication.
role_session_name – Session name when assuming IAM role.
external_id – External ID for role assumption (if required by role).
serial_number – MFA device serial number for role assumption.
duration_seconds – Role session duration in seconds. Defaults to 3600.
converter – Custom type converter. Uses DefaultTypeConverter if None.
formatter – Custom parameter formatter. Uses DefaultParameterFormatter if None.
retry_config – Retry configuration for API calls. Uses default if None.
cursor_class – Default cursor class for this connection.
cursor_kwargs – Default keyword arguments for cursor creation.
kill_on_interrupt – Cancel running queries on interrupt. Defaults to True.
session – Pre-configured boto3 Session. Creates new session if None.
config – Boto3 Config object for client configuration.
result_reuse_enable – Enable Athena query result reuse. Defaults to False.
result_reuse_minutes – Minutes to reuse cached results.
on_start_query_execution – Callback function called when query starts.
**kwargs – Additional arguments passed to boto3 Session and client.
- Raises:
AssertionError – If neither s3_staging_dir nor work_group is provided.
Note
Either s3_staging_dir or work_group must be specified. Environment variables AWS_ATHENA_S3_STAGING_DIR and AWS_ATHENA_WORK_GROUP are checked if parameters are not provided.
- property session: Session¶
Get the boto3 session used for AWS API calls.
- Returns:
The configured boto3 Session object.
- property client: BaseClient¶
Get the boto3 Athena client used for query operations.
- Returns:
The configured boto3 Athena client.
- property retry_config: RetryConfig¶
Get the retry configuration for AWS API calls.
- Returns:
The RetryConfig object that controls retry behavior for failed requests.
- __enter__()[source]¶
Enter the runtime context for the connection.
- Returns:
Self for use in context manager protocol.
- __exit__(exc_type, exc_val, exc_tb)[source]¶
Exit the runtime context and close the connection.
- Parameters:
exc_type – Exception type if an exception occurred.
exc_val – Exception value if an exception occurred.
exc_tb – Exception traceback if an exception occurred.
- cursor(cursor: None = None, **kwargs) ConnectionCursor [source]¶
- cursor(cursor: Type[FunctionalCursor], **kwargs) FunctionalCursor
Create a new cursor object for executing queries.
Creates and returns a cursor object that can be used to execute SQL queries against Amazon Athena. The cursor inherits connection settings but can be customized with additional parameters.
- Parameters:
cursor – Custom cursor class to use. If not provided, uses the connection’s default cursor class.
**kwargs – Additional keyword arguments to pass to the cursor constructor. These override connection defaults.
- Returns:
A cursor object that can execute SQL queries.
Example
>>> cursor = connection.cursor() >>> cursor.execute("SELECT * FROM my_table LIMIT 10") >>> results = cursor.fetchall()
# Using a custom cursor type >>> from pyathena.pandas.cursor import PandasCursor >>> pandas_cursor = connection.cursor(PandasCursor) >>> df = pandas_cursor.execute(“SELECT * FROM my_table”).fetchall()
- close() None [source]¶
Close the connection.
Closes the database connection. This method is provided for DB API 2.0 compatibility. Since Athena connections are stateless, this method currently does not perform any actual cleanup operations.
Note
This method is called automatically when using the connection as a context manager (with statement).
- commit() None [source]¶
Commit any pending transaction.
This method is provided for DB API 2.0 compatibility. Since Athena does not support transactions, this method does nothing.
Note
Athena queries are auto-committed and cannot be rolled back.
- rollback() None [source]¶
Rollback any pending transaction.
This method is required by DB API 2.0 but is not supported by Athena since Athena does not support transactions.
- Raises:
NotSupportedError – Always raised since transactions are not supported.
Standard Cursors¶
- class pyathena.cursor.Cursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs)[source]¶
A DB API 2.0 compliant cursor for executing SQL queries on Amazon Athena.
The Cursor class provides methods for executing SQL queries against Amazon Athena and retrieving results. It follows the Python Database API Specification v2.0 (PEP 249) and provides familiar database cursor operations.
This cursor returns results as tuples by default. For other data formats, consider using specialized cursor classes like PandasCursor or ArrowCursor.
- description¶
Sequence of column descriptions for the last query.
- rowcount¶
Number of rows affected by the last query (-1 for SELECT queries).
- arraysize¶
Default number of rows to fetch with fetchmany().
Example
>>> cursor = connection.cursor() >>> cursor.execute("SELECT name, age FROM users WHERE age > %s", (18,)) >>> while True: ... row = cursor.fetchone() ... if not row: ... break ... print(f"Name: {row[0]}, Age: {row[1]}")
>>> cursor.execute("CREATE TABLE test AS SELECT 1 as id, 'test' as name") >>> print(f"Created table, rows affected: {cursor.rowcount}")
- __init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) None [source]¶
- property result_set: AthenaResultSet | None¶
Get the result set from the last executed query.
- Returns:
The result set object containing query results, or None if no query has been executed or the query didn’t return results.
- property query_id: str | None¶
Get the Athena query execution ID of the last executed query.
- Returns:
The query execution ID assigned by Athena, or None if no query has been executed.
- property rownumber: int | None¶
Get the current row number within the result set.
- Returns:
The zero-based index of the current row, or None if no result set is available or no rows have been fetched.
- property rowcount: int¶
Get the number of rows affected by the last operation.
For SELECT statements, this returns the total number of rows in the result set. For other operations, behavior follows DB API 2.0 specification.
- Returns:
The number of rows, or -1 if not applicable or unknown.
- close() None [source]¶
Close the cursor and free any associated resources.
Closes the cursor and any associated result sets. This method is provided for DB API 2.0 compatibility and should be called when the cursor is no longer needed.
Note
After calling this method, the cursor should not be used for further database operations.
- execute(operation: str, parameters: Dict[str, Any] | List[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) Cursor [source]¶
Execute a SQL query.
- Parameters:
operation – SQL query string to execute
parameters – Query parameters (optional)
on_start_query_execution – Callback function called immediately after start_query_execution API is called. Function signature: (query_id: str) -> None This allows early access to query_id for monitoring/cancellation.
**kwargs – Additional execution parameters
- Returns:
Self reference for method chaining
- Return type:
- Example with callback for early query ID access:
- def on_execution_started(query_id):
print(f”Query execution started: {query_id}”) # Store query_id for potential cancellation from another thread global current_query_id current_query_id = query_id
- cursor.execute(“SELECT * FROM large_table”,
on_start_query_execution=on_execution_started)
- executemany(operation: str, seq_of_parameters: List[Dict[str, Any] | List[str] | None], **kwargs) None [source]¶
Execute a SQL query multiple times with different parameters.
This method executes the same SQL operation multiple times, once for each parameter set in the sequence. This is useful for bulk operations like inserting multiple rows.
- Parameters:
operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter dictionaries or lists, one for each execution.
**kwargs – Additional keyword arguments passed to each execute() call.
Note
This method executes each query sequentially. For better performance with bulk operations, consider using batch operations where supported. Operations that return result sets are not allowed with executemany.
Example
>>> cursor.executemany( ... "INSERT INTO users (id, name) VALUES (%(id)s, %(name)s)", ... [ ... {"id": 1, "name": "Alice"}, ... {"id": 2, "name": "Bob"}, ... {"id": 3, "name": "Charlie"} ... ] ... )
- cancel() None [source]¶
Cancel the currently executing query.
Cancels the query execution on Amazon Athena. This method can be called from a different thread to interrupt a long-running query.
- Raises:
ProgrammingError – If no query is currently executing (query_id is None).
Example
>>> import threading >>> import time >>> >>> def cancel_after_delay(): ... time.sleep(5) # Wait 5 seconds ... cursor.cancel() >>> >>> # Start cancellation in separate thread >>> threading.Thread(target=cancel_after_delay).start() >>> cursor.execute("SELECT * FROM very_large_table")
- fetchone() Tuple[Any | None, ...] | Dict[Any, Any | None] | None [source]¶
Fetch the next row of a query result set.
Returns the next row of the query result as a tuple, or None when no more data is available. Column values are converted to appropriate Python types based on the Athena data types.
- Returns:
A tuple representing the next row, or None if no more rows.
- Raises:
ProgrammingError – If called before executing a query that returns results.
Example
>>> cursor.execute("SELECT id, name FROM users LIMIT 3") >>> while True: ... row = cursor.fetchone() ... if not row: ... break ... print(f"ID: {row[0]}, Name: {row[1]}")
- fetchmany(size: int | None = None) List[Tuple[Any | None, ...] | Dict[Any, Any | None]] [source]¶
Fetch multiple rows from a query result set.
Returns up to ‘size’ rows from the query result as a list of tuples. If size is not specified, uses the cursor’s arraysize attribute.
- Parameters:
size – Maximum number of rows to fetch. If None, uses arraysize.
- Returns:
List of tuples representing the fetched rows. May contain fewer rows than requested if fewer are available.
- Raises:
ProgrammingError – If called before executing a query that returns results.
Example
>>> cursor.execute("SELECT id, name FROM users") >>> rows = cursor.fetchmany(5) # Fetch up to 5 rows >>> for row in rows: ... print(f"ID: {row[0]}, Name: {row[1]}")
- fetchall() List[Tuple[Any | None, ...] | Dict[Any, Any | None]] [source]¶
Fetch all remaining rows from a query result set.
Returns all remaining rows from the query result as a list of tuples. For large result sets, consider using fetchmany() or iterating with fetchone() to avoid memory issues.
- Returns:
List of tuples representing all remaining rows in the result set.
- Raises:
ProgrammingError – If called before executing a query that returns results.
Example
>>> cursor.execute("SELECT id, name FROM users WHERE active = true") >>> all_rows = cursor.fetchall() >>> print(f"Found {len(all_rows)} active users") >>> for row in all_rows: ... print(f"ID: {row[0]}, Name: {row[1]}")
Warning
Be cautious with large result sets as this loads all data into memory.
- DEFAULT_RESULT_REUSE_MINUTES = 60¶
- LIST_DATABASES_MAX_RESULTS = 50¶
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- property connection: Connection[Any]¶
- static get_default_converter(unload: bool = False) DefaultTypeConverter | Any ¶
Get the default type converter for this cursor class.
- Parameters:
unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
- Returns:
The default type converter instance for this cursor type.
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata ¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) List[AthenaTableMetadata] ¶
- setinputsizes(sizes)¶
Does nothing by default
- setoutputsize(size, column=None)¶
Does nothing by default
- class pyathena.cursor.DictCursor(**kwargs)[source]¶
A cursor that returns query results as dictionaries instead of tuples.
DictCursor provides the same functionality as the standard Cursor but returns rows as dictionaries where column names are keys. This makes it easier to access column values by name rather than position.
Example
>>> cursor = connection.cursor(DictCursor) >>> cursor.execute("SELECT id, name, email FROM users LIMIT 1") >>> row = cursor.fetchone() >>> print(f"User: {row['name']} ({row['email']})")
>>> cursor.execute("SELECT * FROM products") >>> for row in cursor.fetchall(): ... print(f"Product {row['id']}: {row['name']} - ${row['price']}")
- DEFAULT_RESULT_REUSE_MINUTES = 60¶
- LIST_DATABASES_MAX_RESULTS = 50¶
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- cancel() None ¶
Cancel the currently executing query.
Cancels the query execution on Amazon Athena. This method can be called from a different thread to interrupt a long-running query.
- Raises:
ProgrammingError – If no query is currently executing (query_id is None).
Example
>>> import threading >>> import time >>> >>> def cancel_after_delay(): ... time.sleep(5) # Wait 5 seconds ... cursor.cancel() >>> >>> # Start cancellation in separate thread >>> threading.Thread(target=cancel_after_delay).start() >>> cursor.execute("SELECT * FROM very_large_table")
- close() None ¶
Close the cursor and free any associated resources.
Closes the cursor and any associated result sets. This method is provided for DB API 2.0 compatibility and should be called when the cursor is no longer needed.
Note
After calling this method, the cursor should not be used for further database operations.
- property connection: Connection[Any]¶
- execute(operation: str, parameters: Dict[str, Any] | List[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) Cursor ¶
Execute a SQL query.
- Parameters:
operation – SQL query string to execute
parameters – Query parameters (optional)
on_start_query_execution – Callback function called immediately after start_query_execution API is called. Function signature: (query_id: str) -> None This allows early access to query_id for monitoring/cancellation.
**kwargs – Additional execution parameters
- Returns:
Self reference for method chaining
- Return type:
- Example with callback for early query ID access:
- def on_execution_started(query_id):
print(f”Query execution started: {query_id}”) # Store query_id for potential cancellation from another thread global current_query_id current_query_id = query_id
- cursor.execute(“SELECT * FROM large_table”,
on_start_query_execution=on_execution_started)
- executemany(operation: str, seq_of_parameters: List[Dict[str, Any] | List[str] | None], **kwargs) None ¶
Execute a SQL query multiple times with different parameters.
This method executes the same SQL operation multiple times, once for each parameter set in the sequence. This is useful for bulk operations like inserting multiple rows.
- Parameters:
operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter dictionaries or lists, one for each execution.
**kwargs – Additional keyword arguments passed to each execute() call.
Note
This method executes each query sequentially. For better performance with bulk operations, consider using batch operations where supported. Operations that return result sets are not allowed with executemany.
Example
>>> cursor.executemany( ... "INSERT INTO users (id, name) VALUES (%(id)s, %(name)s)", ... [ ... {"id": 1, "name": "Alice"}, ... {"id": 2, "name": "Bob"}, ... {"id": 3, "name": "Charlie"} ... ] ... )
- fetchall() List[Tuple[Any | None, ...] | Dict[Any, Any | None]] ¶
Fetch all remaining rows from a query result set.
Returns all remaining rows from the query result as a list of tuples. For large result sets, consider using fetchmany() or iterating with fetchone() to avoid memory issues.
- Returns:
List of tuples representing all remaining rows in the result set.
- Raises:
ProgrammingError – If called before executing a query that returns results.
Example
>>> cursor.execute("SELECT id, name FROM users WHERE active = true") >>> all_rows = cursor.fetchall() >>> print(f"Found {len(all_rows)} active users") >>> for row in all_rows: ... print(f"ID: {row[0]}, Name: {row[1]}")
Warning
Be cautious with large result sets as this loads all data into memory.
- fetchmany(size: int | None = None) List[Tuple[Any | None, ...] | Dict[Any, Any | None]] ¶
Fetch multiple rows from a query result set.
Returns up to ‘size’ rows from the query result as a list of tuples. If size is not specified, uses the cursor’s arraysize attribute.
- Parameters:
size – Maximum number of rows to fetch. If None, uses arraysize.
- Returns:
List of tuples representing the fetched rows. May contain fewer rows than requested if fewer are available.
- Raises:
ProgrammingError – If called before executing a query that returns results.
Example
>>> cursor.execute("SELECT id, name FROM users") >>> rows = cursor.fetchmany(5) # Fetch up to 5 rows >>> for row in rows: ... print(f"ID: {row[0]}, Name: {row[1]}")
- fetchone() Tuple[Any | None, ...] | Dict[Any, Any | None] | None ¶
Fetch the next row of a query result set.
Returns the next row of the query result as a tuple, or None when no more data is available. Column values are converted to appropriate Python types based on the Athena data types.
- Returns:
A tuple representing the next row, or None if no more rows.
- Raises:
ProgrammingError – If called before executing a query that returns results.
Example
>>> cursor.execute("SELECT id, name FROM users LIMIT 3") >>> while True: ... row = cursor.fetchone() ... if not row: ... break ... print(f"ID: {row[0]}, Name: {row[1]}")
- static get_default_converter(unload: bool = False) DefaultTypeConverter | Any ¶
Get the default type converter for this cursor class.
- Parameters:
unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
- Returns:
The default type converter instance for this cursor type.
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata ¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) List[AthenaTableMetadata] ¶
- property query_id: str | None¶
Get the Athena query execution ID of the last executed query.
- Returns:
The query execution ID assigned by Athena, or None if no query has been executed.
- property result_set: AthenaResultSet | None¶
Get the result set from the last executed query.
- Returns:
The result set object containing query results, or None if no query has been executed or the query didn’t return results.
- property rowcount: int¶
Get the number of rows affected by the last operation.
For SELECT statements, this returns the total number of rows in the result set. For other operations, behavior follows DB API 2.0 specification.
- Returns:
The number of rows, or -1 if not applicable or unknown.
- property rownumber: int | None¶
Get the current row number within the result set.
- Returns:
The zero-based index of the current row, or None if no result set is available or no rows have been fetched.
- setinputsizes(sizes)¶
Does nothing by default
- setoutputsize(size, column=None)¶
Does nothing by default
Asynchronous Cursors¶
- class pyathena.async_cursor.AsyncCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs)[source]¶
Asynchronous cursor for non-blocking Athena query execution.
This cursor allows multiple queries to be executed concurrently without blocking the main thread. It’s useful for applications that need to execute multiple queries in parallel or perform other work while queries are running.
The cursor maintains a thread pool for executing queries asynchronously and provides methods to check query status and retrieve results when ready.
- rowcount¶
Number of rows affected by the last query (-1 for SELECT queries).
- arraysize¶
Default number of rows to fetch with fetchmany().
- max_workers¶
Maximum number of worker threads for concurrent execution.
Example
>>> cursor = connection.cursor(AsyncCursor) >>> >>> # Execute multiple queries concurrently >>> future1 = cursor.execute("SELECT COUNT(*) FROM table1") >>> future2 = cursor.execute("SELECT COUNT(*) FROM table2") >>> future3 = cursor.execute("SELECT COUNT(*) FROM table3") >>> >>> # Check if queries are done and get results >>> if future1.done(): ... result1 = future1.result().fetchall() >>> >>> # Wait for all to complete >>> results = [f.result().fetchall() for f in [future1, future2, future3]]
Note
Each execute() call returns a Future object that can be used to check completion status and retrieve results.
- __init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs) None [source]¶
- description(query_id: str) Future[List[Tuple[str, str, None, None, int, int, str]] | None] [source]¶
- query_execution(query_id: str) Future[AthenaQueryExecution] [source]¶
Get query execution details asynchronously.
Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.
- Parameters:
query_id – The Athena query execution ID.
- Returns:
Future object containing AthenaQueryExecution with query details.
- poll(query_id: str) Future[AthenaQueryExecution] [source]¶
Poll for query completion asynchronously.
Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.
- Parameters:
query_id – The Athena query execution ID to poll.
- Returns:
Future object containing the final AthenaQueryExecution status.
Note
This method performs polling internally, so it will take time proportional to your query execution duration.
- execute(operation: str, parameters: Dict[str, Any] | List[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, **kwargs) Tuple[str, Future[AthenaResultSet | Any]] [source]¶
Execute a SQL query asynchronously.
Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.
- Parameters:
operation – SQL query string to execute.
parameters – Query parameters (optional).
work_group – Athena workgroup to use (optional).
s3_staging_dir – S3 location for query results (optional).
cache_size – Query result cache size in MB (optional).
cache_expiration_time – Cache expiration time in seconds (optional).
result_reuse_enable – Enable result reuse for identical queries (optional).
result_reuse_minutes – Result reuse duration in minutes (optional).
paramstyle – Parameter style to use (optional).
**kwargs – Additional execution parameters.
- Returns:
query_id: Athena query execution ID for tracking
future: Future object for result retrieval
- Return type:
Tuple of (query_id, future) where
Example
>>> query_id, future = cursor.execute("SELECT * FROM large_table") >>> print(f"Query started: {query_id}") >>> # Do other work while query runs... >>> result_set = future.result() # Wait for completion
- executemany(operation: str, seq_of_parameters: List[Dict[str, Any] | List[str] | None], **kwargs) None [source]¶
Execute multiple queries asynchronously (not supported).
This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.
- Parameters:
operation – SQL query string.
seq_of_parameters – Sequence of parameter sets.
**kwargs – Additional arguments.
- Raises:
NotSupportedError – Always raised as this operation is not supported.
Note
For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.
- cancel(query_id: str) Future[None] [source]¶
Cancel a running query asynchronously.
Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.
- Parameters:
query_id – The Athena query execution ID to cancel.
- Returns:
Future object that completes when the cancellation request finishes.
Example
>>> query_id, future = cursor.execute("SELECT * FROM huge_table") >>> # Later, cancel the query >>> cancel_future = cursor.cancel(query_id) >>> cancel_future.result() # Wait for cancellation to complete
- LIST_DATABASES_MAX_RESULTS = 50¶
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- property connection: Connection[Any]¶
- static get_default_converter(unload: bool = False) DefaultTypeConverter | Any ¶
Get the default type converter for this cursor class.
- Parameters:
unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
- Returns:
The default type converter instance for this cursor type.
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata ¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) List[AthenaTableMetadata] ¶
- setinputsizes(sizes)¶
Does nothing by default
- setoutputsize(size, column=None)¶
Does nothing by default
- class pyathena.async_cursor.AsyncDictCursor(**kwargs)[source]¶
Asynchronous cursor that returns query results as dictionaries.
Combines the asynchronous execution capabilities of AsyncCursor with the dictionary-based result format of DictCursor. Results are returned as dictionaries where column names are keys, making it easier to access column values by name rather than position.
Example
>>> cursor = connection.cursor(AsyncDictCursor) >>> future = cursor.execute("SELECT id, name, email FROM users") >>> result_cursor = future.result() >>> row = result_cursor.fetchone() >>> print(f"User: {row['name']} ({row['email']})")
- LIST_DATABASES_MAX_RESULTS = 50¶
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- cancel(query_id: str) Future[None] ¶
Cancel a running query asynchronously.
Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.
- Parameters:
query_id – The Athena query execution ID to cancel.
- Returns:
Future object that completes when the cancellation request finishes.
Example
>>> query_id, future = cursor.execute("SELECT * FROM huge_table") >>> # Later, cancel the query >>> cancel_future = cursor.cancel(query_id) >>> cancel_future.result() # Wait for cancellation to complete
- property connection: Connection[Any]¶
- execute(operation: str, parameters: Dict[str, Any] | List[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, **kwargs) Tuple[str, Future[AthenaResultSet | Any]] ¶
Execute a SQL query asynchronously.
Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.
- Parameters:
operation – SQL query string to execute.
parameters – Query parameters (optional).
work_group – Athena workgroup to use (optional).
s3_staging_dir – S3 location for query results (optional).
cache_size – Query result cache size in MB (optional).
cache_expiration_time – Cache expiration time in seconds (optional).
result_reuse_enable – Enable result reuse for identical queries (optional).
result_reuse_minutes – Result reuse duration in minutes (optional).
paramstyle – Parameter style to use (optional).
**kwargs – Additional execution parameters.
- Returns:
query_id: Athena query execution ID for tracking
future: Future object for result retrieval
- Return type:
Tuple of (query_id, future) where
Example
>>> query_id, future = cursor.execute("SELECT * FROM large_table") >>> print(f"Query started: {query_id}") >>> # Do other work while query runs... >>> result_set = future.result() # Wait for completion
- executemany(operation: str, seq_of_parameters: List[Dict[str, Any] | List[str] | None], **kwargs) None ¶
Execute multiple queries asynchronously (not supported).
This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.
- Parameters:
operation – SQL query string.
seq_of_parameters – Sequence of parameter sets.
**kwargs – Additional arguments.
- Raises:
NotSupportedError – Always raised as this operation is not supported.
Note
For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.
- static get_default_converter(unload: bool = False) DefaultTypeConverter | Any ¶
Get the default type converter for this cursor class.
- Parameters:
unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
- Returns:
The default type converter instance for this cursor type.
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata ¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) List[AthenaTableMetadata] ¶
- poll(query_id: str) Future[AthenaQueryExecution] ¶
Poll for query completion asynchronously.
Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.
- Parameters:
query_id – The Athena query execution ID to poll.
- Returns:
Future object containing the final AthenaQueryExecution status.
Note
This method performs polling internally, so it will take time proportional to your query execution duration.
- query_execution(query_id: str) Future[AthenaQueryExecution] ¶
Get query execution details asynchronously.
Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.
- Parameters:
query_id – The Athena query execution ID.
- Returns:
Future object containing AthenaQueryExecution with query details.
- setinputsizes(sizes)¶
Does nothing by default
- setoutputsize(size, column=None)¶
Does nothing by default