After using SQLAlchemy for over 2 years in various projects, I can say it’s been an amazing tool despite its occasional quirks. Like any ORM, it comes with trade-offs, but what sets SQLAlchemy apart is its flexibility in handling these limitations. SQLAlchemy gives developer from higher level access to lower level access to handling these queries and let the dev choose the approach to solve their problem. Let me share my benchmarks and insights from real-world usage.
Performance Benchmarks
Query Building Performance
Batch Size | SQLAlchemy Core | Raw SQL | SQLAlchemy ORM |
---|---|---|---|
1000 | 0.025565147 | 0.00173068 | 0.05320406 |
10000 | 0.267536163 | 0.027742147 | 0.404067755 |
100000 | 2.614121437 | 0.245116472 | 4.017444611 |
1000000 | 27.26590657 | 2.872803688 | 47.62686253 |
Note that these times represent only the SQL string building process. The query builder adds some overhead when constructing the actual SQL query that will be executed.
Converting results back to ORM objects also creates overhead. For maximum raw performance, it’s best to use lower-level approaches and write efficient queries.
Large Result Sets
Number of Iterations: 1,000,000
Description | MySQL (sec) | PostgreSQL (sec) |
---|---|---|
Load rows using DBAPI fetchall(), don’t make any objects | 5.23 | 1.31 |
Load rows using DBAPI fetchall(), generate an object for each row | 5.52 | 1.51 |
Load individual columns into named tuples using the ORM | 6.61 | 2.48 |
Load Core result rows using fetchall | 6.88 | 3.17 |
Load Core result rows using fetchmany/streaming | 6.96 | 3.39 |
Load lightweight “bundle” objects using the ORM | 7.61 | 3.53 |
Load Core result rows using Core / fetchmany | 6.82 | 3.59 |
Load Core result rows using fetchall | 6.99 | 3.86 |
Load fully tracked ORM objects a chunk at a time using yield_per() | 10.09 | 8.53 |
Load fully tracked ORM objects into one big list() | 13.77 | 15.05 |
From the table above, we can see that the ORM struggles with large result sets. The raw DBAPI calls are the fastest. This is because the ORM has to create a lot of objects and this adds overhead to the performance.
Code Snippets for Loading Large Result Sets
-
Load rows using DBAPI fetchall(), don’t make any objects
-
Load rows using DBAPI fetchall(), generate an object for each row
-
Load individual columns into named tuples using the ORM
-
Load Core result rows using fetchall
-
Load Core result rows using fetchmany/streaming
-
Load lightweight “bundle” objects using the ORM
-
Load Core result rows using Core / fetchmany
-
Load Core result rows using fetchall
-
Load fully tracked ORM objects a chunk at a time using yield_per()
-
Load fully tracked ORM objects into one big list
Bulk Inserts
Description | 1K (s) | 10K (s) | 100K (s) | 1M (s) |
---|---|---|---|---|
Batched INSERT statements via the ORM in “bulk”, returning new Customer objects | 0.036 | 0.757 | 8.212 | 91.571 |
A single Core INSERT construct inserting mappings in bulk | 0.112 | 0.393 | 2.876 | 34.276 |
The DBAPI’s API inserting rows in bulk | 0.115 | 0.356 | 3.728 | 35.063 |
Batched INSERT statements via the ORM in “bulk”, not returning rows | 0.073 | 0.382 | 3.887 | 41.563 |
INSERT statements via the ORM (batched with RETURNING if available), fetching generated row id | 0.045 | 0.817 | 8.452 | 101.481 |
Batched INSERT statements via the ORM, PKs already defined | 0.157 | 1.042 | 10.584 | 111.096 |
Inserting a row is a relatively expensive operation. And with the increasing number of rows, the performance degrades significantly. I usually prefer batched inserts, as this saves memory and faster.
Code Snippets for Bulk Inserts
-
Batched INSERT statements via the ORM in “bulk”, returning new Customer objects
-
A single Core INSERT construct inserting mappings in bulk
-
The DBAPI’s API inserting rows in bulk
-
Batched INSERT statements via the ORM in “bulk”, not returning rows
-
INSERT statements via the ORM (batched with RETURNING if available), fetching generated row id
-
Batched INSERT statements via the ORM, PKs already defined
Choosing the Appropriate Abstraction Level
Based on comprehensive benchmarks and understanding the trade-offs, here are the recommended approaches for different scenarios:
- Raw SQL/DBAPI: Optimal for maximum performance in bulk operations
- SQLAlchemy Core: Recommended for bulk operations requiring SQL generation benefits
- ORM Bulk Operations: Suitable for batch operations requiring ORM features
- Full ORM: Best for complex operations requiring complete object functionality
Data Loading Performance Analysis
PostgreSQL benchmark results demonstrate the following performance patterns:
- Raw DBAPI fetch: 1.31 seconds
- ORM Column Loading: 2.48 seconds
- Core Operations: ~3.5 seconds
- Full ORM Objects: 8.53-15.05 seconds
All the analysis and benchmark was done using SQLAlchemy’s example scripts