Data Conversion¶
This section covers data type converters and parameter formatters.
Type Converters¶
- class pyathena.converter.Converter(mappings: ~typing.Dict[str, ~typing.Callable[[str | None], ~typing.Any | None]], default: ~typing.Callable[[str | None], ~typing.Any | None] = <function _to_default>, types: ~typing.Dict[str, ~typing.Type[~typing.Any]] | None = None)[source]¶
Abstract base class for converting Athena data types to Python objects.
Converters handle the transformation of string values returned by Athena into appropriate Python data types. Different cursor implementations may use different converters to optimize for their specific use cases.
This class provides a framework for mapping Athena data type names to conversion functions and handles the conversion process during result set processing.
- mappings¶
Dictionary mapping Athena type names to conversion functions.
- default¶
Default conversion function for unmapped types.
- types¶
Optional dictionary mapping type names to Python type objects.
- __init__(mappings: ~typing.Dict[str, ~typing.Callable[[str | None], ~typing.Any | None]], default: ~typing.Callable[[str | None], ~typing.Any | None] = <function _to_default>, types: ~typing.Dict[str, ~typing.Type[~typing.Any]] | None = None) None [source]¶
- property mappings: Dict[str, Callable[[str | None], Any | None]]¶
Get the current type conversion mappings.
- Returns:
Dictionary mapping Athena data types to conversion functions.
- property types: Dict[str, Type[Any]]¶
Get the current type mappings for result set descriptions.
- Returns:
Dictionary mapping Athena data types to Python types.
- get(type_: str) Callable[[str | None], Any | None] [source]¶
Get the conversion function for a specific Athena data type.
- Parameters:
type – The Athena data type name.
- Returns:
The conversion function for the type, or the default converter if not found.
- set(type_: str, converter: Callable[[str | None], Any | None]) None [source]¶
Set a custom conversion function for an Athena data type.
- Parameters:
type – The Athena data type name.
converter – The conversion function to use for this type.
- remove(type_: str) None [source]¶
Remove a custom conversion function for an Athena data type.
- Parameters:
type – The Athena data type name to remove.
- class pyathena.converter.DefaultTypeConverter[source]¶
Default implementation of the Converter for standard Python types.
This converter provides mappings for all standard Athena data types to their corresponding Python types using built-in conversion functions. It’s used by the standard Cursor class by default.
- Supported conversions:
Numeric types: integer, bigint, real, double, decimal
String types: varchar, char
Date/time types: date, timestamp, time (with timezone support)
Boolean: boolean
Binary: varbinary
Complex types: array, map, row/struct
JSON: json
Example
>>> converter = DefaultTypeConverter() >>> converter.convert('integer', '42') 42 >>> converter.convert('date', '2023-01-15') datetime.date(2023, 1, 15)
Parameter Formatters¶
- class pyathena.formatter.Formatter(mappings: Dict[Type[Any], Callable[[Formatter, Callable[[str], str], Any], Any]], default: Callable[[Formatter, Callable[[str], str], Any], Any] | None = None)[source]¶
Abstract base class for formatting Python values for SQL queries.
Formatters handle the conversion of Python objects to SQL-compatible string representations for use in parameterized queries. They ensure proper escaping and formatting of values based on their types.
This class provides a framework for mapping Python types to formatting functions and handles the formatting process during query preparation.
- mappings¶
Dictionary mapping Python types to formatting functions.
- default¶
Default formatting function for unmapped types.
- __init__(mappings: Dict[Type[Any], Callable[[Formatter, Callable[[str], str], Any], Any]], default: Callable[[Formatter, Callable[[str], str], Any], Any] | None = None) None [source]¶
- property mappings: Dict[Type[Any], Callable[[Formatter, Callable[[str], str], Any], Any]]¶
Get the current parameter formatting mappings.
- Returns:
Dictionary mapping Python types to formatting functions.
- get(type_) Callable[[Formatter, Callable[[str], str], Any], Any] | None [source]¶
Get the formatting function for a specific Python type.
- Parameters:
type – The Python value to get formatter for.
- Returns:
The formatting function for the type, or the default formatter if not found.
- set(type_: Type[Any], formatter: Callable[[Formatter, Callable[[str], str], Any], Any]) None [source]¶
- update(mappings: Dict[Type[Any], Callable[[Formatter, Callable[[str], str], Any], Any]]) None [source]¶
- static wrap_unload(operation: str, s3_staging_dir: str, format_: str = 'PARQUET', compression: str = 'SNAPPY')[source]¶
Wrap a SELECT query with UNLOAD statement for high-performance result retrieval.
Transforms SELECT or WITH queries into UNLOAD statements that export results directly to S3 in optimized formats (Parquet, ORC) with compression. This approach is significantly faster than standard CSV-based result retrieval for large datasets and preserves data types more accurately.
- Parameters:
operation – SQL query to wrap. Must be a SELECT or WITH statement.
s3_staging_dir – Base S3 directory for storing UNLOAD results.
format – Output file format. Defaults to Parquet for optimal performance.
compression – Compression algorithm. Defaults to Snappy for balanced compression ratio and speed.
- Returns:
Modified UNLOAD query string
S3 location where results will be stored (None if not SELECT/WITH)
- Return type:
Tuple containing
Example
>>> query = "SELECT * FROM sales WHERE year = 2023" >>> unload_query, location = Formatter.wrap_unload( ... query, "s3://my-bucket/results/" ... ) >>> print(unload_query) UNLOAD ( SELECT * FROM sales WHERE year = 2023 ) TO 's3://my-bucket/results/unload/20231215/uuid//' WITH ( format = 'PARQUET', compression = 'SNAPPY' )
Note
Only SELECT and WITH statements are wrapped. Other statement types are returned unchanged with location=None.
- class pyathena.formatter.DefaultParameterFormatter[source]¶
Default implementation of the Formatter for SQL parameter formatting.
This formatter provides standard formatting for common Python types used in SQL parameters. It handles proper escaping and quoting to prevent SQL injection and ensure valid SQL syntax.
- Supported types:
None: Converts to SQL NULL
Strings: Properly escaped and quoted
Numbers: int, float, Decimal
Dates and times: date, datetime, time
Booleans: Converted to SQL boolean literals
Sequences: list, tuple, set (for IN clauses)
Example
>>> formatter = DefaultParameterFormatter() >>> sql = formatter.format( ... "SELECT * FROM users WHERE name = %(name)s AND age > %(age)s", ... {"name": "John's Data", "age": 25} ... ) >>> print(sql) SELECT * FROM users WHERE name = 'John''s Data' AND age > 25