We are proud to announce the initial release of PyMongoArrow - a companion library to PyMongo that makes it easy to move data between MongoDB and Python’s scientific & numerical computing libraries like Pandas, PyArrow and NumPy.
PyMongoArrow extends PyMongo’s APIs and makes it possible to materialize query result sets as pandas DataFrames:
>>> data_frame = client.db.test.find_pandas_all({'qty': {'$gt': 5}}, schema=schema)
>>> data_frame
_id qty
0 1 25.4
1 2 16.9
Similar APIs facilitate loading result sets as PyArrow Tables:
>>> arrow_table = client.db.test.find_arrow_all({'qty': {'$gt': 5}}, schema=schema)
>>> arrow_table
pyarrow.Table
_id: int64
qty: double
As well as NumPy ndarrays:
>>> ndarrays = client.db.test.find_numpy_all({'qty': {'$gt': 5}}, schema=schema)
>>> ndarrays
{'_id': array([1, 2, 3]), 'qty': array([25.4, 16.9, 2.3])}
Installation
Wheels are available on PyPI for macOS and Linux platforms on x86_64 architectures.
$ python -m pip install pymongoarrow
Links
- Documentation: mongo-arrow.readthedocs.io
- Source: github.com/mongodb-labs/mongo-arrow
- Running into problems? Open a GitHub Issue