As we do need to connect with Database using python when we do the data science or do data engineering, there are many ways that we could connect with them, for Mysql, there will be PyMysql, I have used it many times, but most time will combine with Pandas so that we could get a DataFrame for later step use case, but I find another excellent library called: Records named: SQL for humans, we are the humans 😄.
I have to say that for a Python developer, there are so many great modules that we could use for our daily life, how great the Python world is! No more words, let’s get started with Records library.
First of all, you have to install it with pip is recommended, but maybe create a virtual environment is a better way to not affect the global environment. There are many ways to create a virtual environment with python like conda create ** or python -m venv **, I use later one for now.
python -m venv records
Then you have to activate the environment, then we could use it. For now, I use Windows:
If you are in Mac or Linux, use this one, remember to enter the folder: records first, then type the command:
We could install the records library now:)
pip install records
After a moment, then we have already installed the records. Now let’s try to use it with records, I will just use ipython to interact with records, but in the real project, we could use other IDE like PyCharm(My favorite).
One word to provide with records is with the connection string, for example, with my local mysql server will just provide with: [mysql://root:password@localhost/person], the format is:
Then we need to instant the database object with the command:
db = records.Database("mysql://root:password@localhost/person")
That’s it. We have connected with Mysql, then we could use some queries with function query, you could write SQL here.
rows = db.query("select * from users limit 3")
We could inspect with rows object, it is a RecordCollection object that contains each row, for each row represent a Record object, in fact, we could make it as a Dictionary object, if we need the key to be ordered, we could just pass ordered=True, then will be an OrderedDictionary object ordered by key.
We could get some sample data for evidence that we do get records from the database, or even get the whole records list with function all().
The most important thing is that we could export the result into a DataFrame with just one line, then we could do anything with Pandas DataFrame as we can. Also sometimes we maybe need a JSON, YAML object from the Database or even with excel, that’s also really easy, with just a configuration, that’s it. Currently will support with csv tsv json yaml html xls xlsx dbf latex ods format for export, you could find it here.
There are many interesting functions that we could use, you could infer the official guide with records Github.
If you just need to use the records, then that’s it, you could try it yourself. But I also walk through the source code with records, please join me with the walk step with these functionalities of records.
First, we should init Database object first with db = records.Database(host), host object is just the connection string, default records use sqlalchemy to do the connection with the database, so if we provide with a connection string, records will first create the connection, one more thing, the connection isn’t checked when you provide with host, username, password and database, it’s lazy, will check when to execute the query, if you provide with any wrong parameter, then will get an error, you could find source code here.
Then we already get the connection, we could execute the query, then we will provide with query string for querying, like select * from table, will trigger the query function based on the connection:
Based on the connection class function query, will get the generator object of each row from the database, then will wrap the whole list into a RecordCollection, that’s a generator of Record class, whole logic to get sample data like first() or whole data all() happens here. The Record class is like a dictionary class with keys and values. It supports querying key with __getitem__ function, if we don’t find the key, it will get a key error, otherwise, we will get the value with the index key.
I think that’s all for the records library. In fact, records support many other connections like RedShift, Postgres, MySQL, SQLite, Oracle, and MS-SQL (drivers not included). Currently, I just put the MySQL database, others should be similar.
Happy coding with records. 😄