This Week in Databend #91
April 30, 2023 · 4 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Data Type: BITMAP
Databend has added support for the bitmap datatype.
BITMAP
is a type of compressed data structure that can be used to efficiently store and manipulate sets of boolean values. It is often used to accelerate count distinct.
> CREATE TABLE IF NOT EXISTS t1(id Int, v Bitmap) Engine = Fuse;
> INSERT INTO t1 (id, v) VALUES(1, to_bitmap('0, 1')),(2, to_bitmap('1, 2')),(3, to_bitmap('3, 4'));
> SELECT id, to_string(v) FROM t1;
┌──────────────────────┐
│ id │ to_string(v) │
│ Int32 │ String │
├───────┼──────────────┤
│ 1 │ 0,1 │
│ 2 │ 1,2 │
│ 3 │ 3,4 │
└──────────────────────┘
We used RoaringTreemap
to implement the BITMAP
data type, which is a compressed bitmap with u64
values. By utilizing this data structure, we expect to achieve better performance and reduced memory usage compared to other bitmap implementations.
If you are interested in learning more, please check out the resources listed below.
- PR #11097 | feat: add bitmap data type
- Website | Roaring Bitmaps
- Paper | Consistently faster and smaller compressed bitmaps with Roaring
Improving Hash Join Performance with a New Hash Table Design
Our previous hash table implementation was optimized for aggregation functions, but it significantly limited the performance of hash join operations.
To improve the performance of hash join operations, we implemented a dedicated hash table optimized for hash join operations. We also allocated a fixed-size hash table based on the number of rows in the build stage, eliminating the need for growth during insertion. We replaced the value type of the hash table from Vec
to a pointer that can be used for CAS operations, which ensures memory control and eliminates the need for Vec
's growth.
The new implementation showed significant improvement in performance. If you are interested in learning more, please check out the resources listed below.
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Rust Compilation Challenges and Solutions
Compiling a medium to large Rust program is not a breeze due to the accumulation of complex project dependencies and boilerplate code.
To address these challenges, Databend team implemented several measures, including observability tools, configuration adjustments, caching, linker optimization, compile-related profiles, and refactoring.
If you are interested in learning more, please check out the resources listed below.
Highlights
Here are some noteworthy items recorded here, perhaps you can find something that interests you.
- Databend has announced its participation in OSPP 2023 projects. For more information about OSPP, visit OSPP2023 - Databend.
- To develop applications with Databend using Rust, refer to Docs | Developing with Databend using Rust and utilize
databend-driver
. - Learn to manage and query databases with ease using BendSQL, a powerful command-line tool for Databend. Check out Docs | BendSQL now!
- Check out Docs | Loading from a Stage and Docs | Loading from a Bucket to learn more about loading data from stages and object storage buckets.
- Added
table-meta-inspector
, a command-line tool for decoding new table metadata in Databend.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Call for Contributors to Help with Functions
We are currently working on improving our functions, and we need your help!
We have identified four areas that require attention, and we would be extremely grateful for any assistance that you can provide.
If you are interested in contributing to any of these areas, please refer to the following resources to learn more about how to write scalar and aggregate functions:
We appreciate any help that you can provide, and we look forward to working with you.
Issue #11220 | Tracking: functions
Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.14-nightly...v1.1.23-nightly
🎉 Contributors 25 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!