Pandas Series: to_sql() function
Series-to_sql() function
The to_sql() function is used to write records stored in a DataFrame to a SQL database.
Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.
Syntax:
Series.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Parameters:
Name | Description | Type/Default Value | Required / Optional |
---|---|---|---|
name | Name of SQL table. | string | Required |
con | Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. | sqlalchemy.engine.Engine or sqlite3.Connection | Required |
schema | Specify the schema (if database flavor supports this). If None, use default schema. | string | Optional |
if_exists | How to behave if the table already exists.
|
{'fail', 'replace', 'append'}, default 'fail' | Required |
index | Write DataFrame index as a column. Uses index_label as the column name in the table. | bool, default True | Required |
index_lable | Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. | string or sequence, default None | Required |
chunksize | Rows will be written in batches of this size at a time. By default, all rows will be written at once. | int | Optional/td> |
dtype | Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. | dict | Optional |
method | Controls the SQL insertion clause used:
|
{None, ‘multi’, callable}, default None | Required |
dropna | If true, ALL nan rows will not be written to store. | bool, default False | Required |
errors | Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options. | str, default ‘strict’ | Required |
Raises: ValueError
When the table already exists and if_exists is 'fail' (the default).
Notes Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.
Example - Create an in-memory SQLite database:
Create a table from scratch with 3 rows:
Python-Pandas Code:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.DataFrame({'name' : ['User A', 'User B', 'User C']})
df
Output:
name 0 User A 1 User B 2 User C
Python-Pandas Code:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.DataFrame({'name' : ['User A', 'User B', 'User C']})
df.to_sql('users', con=engine)
engine.execute("SELECT * FROM users").fetchall()
Output:
[(0, 'User A'), (1, 'User B'), (2, 'User C')]
Python-Pandas Code:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
df1 = pd.DataFrame({'name' : ['User D', 'User E']})
df1.to_sql('users', con=engine, if_exists='append')
engine.execute("SELECT * FROM users").fetchall()
Output:
[(0, 'User A'), (1, 'User B'), (2, 'User C'), (0, 'User D'), (1, 'User E')]
Example - Overwrite the table with just df1:
Python-Pandas Code:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
df1 = pd.DataFrame({'name' : ['User D', 'User E']})
df1.to_sql('users', con=engine, if_exists='append')
df1.to_sql('users', con=engine, if_exists='replace',
index_label='id')
engine.execute("SELECT * FROM users").fetchall()
Output:
[(0, 'User D'), (1, 'User E')]
Example - Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars:
Python-Pandas Code:
import numpy as np
import pandas as pd
df = pd.DataFrame({"X": [2, None, 3]})
df
Output:
X 0 2.0 1 NaN 2 3.0
Python-Pandas Code:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.types import Integer
df.to_sql('integers', con=engine, index=False,
dtype={"X": Integer()})
engine.execute("SELECT * FROM integers").fetchall()
Output:
[(2,), (None,), (3,)]
Previous: Series-to_hdf() function
Next: Series-to_json() function
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics