SQLAlchemy Performance on Query Builder, Selects and Inserts

22 Nov 24

pythonsqlalchemydatabase

image.png

After using SQLAlchemy for over 2 years in various projects, I can say it’s been an amazing tool despite its occasional quirks. Like any ORM, it comes with trade-offs, but what sets SQLAlchemy apart is its flexibility in handling these limitations. SQLAlchemy gives developer from higher level access to lower level access to handling these queries and let the dev choose the approach to solve their problem. Let me share my benchmarks and insights from real-world usage.

Performance Benchmarks

Query Building Performance

Batch SizeSQLAlchemy CoreRaw SQLSQLAlchemy ORM
10000.0255651470.001730680.05320406
100000.2675361630.0277421470.404067755
1000002.6141214370.2451164724.017444611
100000027.265906572.87280368847.62686253

Note that these times represent only the SQL string building process. The query builder adds some overhead when constructing the actual SQL query that will be executed.

Converting results back to ORM objects also creates overhead. For maximum raw performance, it’s best to use lower-level approaches and write efficient queries.

Large Result Sets

Number of Iterations: 1,000,000

DescriptionMySQL (sec)PostgreSQL (sec)
Load rows using DBAPI fetchall(), don’t make any objects5.231.31
Load rows using DBAPI fetchall(), generate an object for each row5.521.51
Load individual columns into named tuples using the ORM6.612.48
Load Core result rows using fetchall6.883.17
Load Core result rows using fetchmany/streaming6.963.39
Load lightweight “bundle” objects using the ORM7.613.53
Load Core result rows using Core / fetchmany6.823.59
Load Core result rows using fetchall6.993.86
Load fully tracked ORM objects a chunk at a time using yield_per()10.098.53
Load fully tracked ORM objects into one big list()13.7715.05

From the table above, we can see that the ORM struggles with large result sets. The raw DBAPI calls are the fastest. This is because the ORM has to create a lot of objects and this adds overhead to the performance.

Code Snippets for Loading Large Result Sets

  1. Load rows using DBAPI fetchall(), don’t make any objects

    cursor.execute(sql)
    [row[0:3] for row in cursor.fetchall()]
  2. Load rows using DBAPI fetchall(), generate an object for each row

    cursor.execute(sql)
    [
       SimpleCustomer(id_=row[0], name=row[1], description=row[2])
       for row in cursor.fetchall()
    ]
  3. Load individual columns into named tuples using the ORM

    (
       session.query(Customer.id, Customer.name, Customer.description)
          .yield_per(10000)
          .limit(n)
    )
  4. Load Core result rows using fetchall

    (
       conn.execute(Customer.__table__.select().limit(n))
          .mappings()
          .fetchall()
    )
  5. Load Core result rows using fetchmany/streaming

    (
       conn.execution_options(stream_results=True)
          .execute(Customer.__table__.select().limit(n))
    )
  6. Load lightweight “bundle” objects using the ORM

    (
       session.query(
          Bundle("customer", Customer.id, Customer.name, Customer.description)
       ).yield_per(10000)
    )
  7. Load Core result rows using Core / fetchmany

    result = conn.execute(Customer.__table__.select().limit(n))
    result.fetchmany(10000)
  8. Load Core result rows using fetchall

    conn.execute(Customer.__table__.select().limit(n)).fetchall()
  9. Load fully tracked ORM objects a chunk at a time using yield_per()

    session.query(Customer).yield_per(1000).limit(n)
  10. Load fully tracked ORM objects into one big list

    list(session.query(Customer).limit(n))

Bulk Inserts

Description1K (s)10K (s)100K (s)1M (s)
Batched INSERT statements via the ORM in “bulk”, returning new Customer objects0.0360.7578.21291.571
A single Core INSERT construct inserting mappings in bulk0.1120.3932.87634.276
The DBAPI’s API inserting rows in bulk0.1150.3563.72835.063
Batched INSERT statements via the ORM in “bulk”, not returning rows0.0730.3823.88741.563
INSERT statements via the ORM (batched with RETURNING if available), fetching generated row id0.0450.8178.452101.481
Batched INSERT statements via the ORM, PKs already defined0.1571.04210.584111.096

Inserting a row is a relatively expensive operation. And with the increasing number of rows, the performance degrades significantly. I usually prefer batched inserts, as this saves memory and faster.

Code Snippets for Bulk Inserts

  1. Batched INSERT statements via the ORM in “bulk”, returning new Customer objects

    session.scalars(
       insert(Customer).returning(Customer),
       [
          {"name": "customer %d" % i, "description": "customer %d" % i}
          for i in range(n)
       ],
    )
  2. A single Core INSERT construct inserting mappings in bulk

    conn.execute(
     Customer.__table__.insert(),
       [
          {"name": "customer %d" % i, "description": "customer %d" % i}
          for i in range(n)
       ],
    )
  3. The DBAPI’s API inserting rows in bulk

    cursor.executemany(
       str(compiled),
       [("customer %d" % i, "customer %d" % i) for i in range(n)],
    )
  4. Batched INSERT statements via the ORM in “bulk”, not returning rows

    session.execute(
       insert(Customer),
       [
          {"name": "customer %d" % i, "description": "customer %d" % i}
          for i in range(n)
       ],
    )
  5. INSERT statements via the ORM (batched with RETURNING if available), fetching generated row id

    session.add_all(
       [
          Customer(name="customer %d" % i, description="customer %d" % i)
          for i in range(chunk, chunk + 1000)
       ]
    )
    session.flush()
  6. Batched INSERT statements via the ORM, PKs already defined

    session.add_all(
       [
          Customer(
             id=i + 1,
             name="customer %d" % i,
             description="customer %d" % i,
          )
          for i in range(chunk, chunk + 1000)
       ]
    )
    session.flush()

Choosing the Appropriate Abstraction Level

Based on comprehensive benchmarks and understanding the trade-offs, here are the recommended approaches for different scenarios:

  • Raw SQL/DBAPI: Optimal for maximum performance in bulk operations
  • SQLAlchemy Core: Recommended for bulk operations requiring SQL generation benefits
  • ORM Bulk Operations: Suitable for batch operations requiring ORM features
  • Full ORM: Best for complex operations requiring complete object functionality

Data Loading Performance Analysis

PostgreSQL benchmark results demonstrate the following performance patterns:

  • Raw DBAPI fetch: 1.31 seconds
  • ORM Column Loading: 2.48 seconds
  • Core Operations: ~3.5 seconds
  • Full ORM Objects: 8.53-15.05 seconds

All the analysis and benchmark was done using SQLAlchemy’s example scripts

© 2025 Written by Md Nahid Hasan