Return

This Week in Databend #91

April 30, 2023 · 4 min read

PsiACE

Stay up to date with the latest weekly developments on Databend!


Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's On In Databend

Stay connected with the latest news about Databend.

Data Type: BITMAP

Databend has added support for the bitmap datatype.

BITMAP is a type of compressed data structure that can be used to efficiently store and manipulate sets of boolean values. It is often used to accelerate count distinct.

> CREATE TABLE IF NOT EXISTS t1(id Int, v Bitmap) Engine = Fuse;
> INSERT INTO t1 (id, v) VALUES(1, to_bitmap('0, 1')),(2, to_bitmap('1, 2')),(3, to_bitmap('3, 4'));
> SELECT id, to_string(v) FROM t1;

┌──────────────────────┐
│ id │ to_string(v)
│ Int32 │ String │
├───────┼──────────────┤
10,1
21,2
33,4
└──────────────────────┘

We used RoaringTreemap to implement the BITMAP data type, which is a compressed bitmap with u64 values. By utilizing this data structure, we expect to achieve better performance and reduced memory usage compared to other bitmap implementations.

If you are interested in learning more, please check out the resources listed below.

Improving Hash Join Performance with a New Hash Table Design

Our previous hash table implementation was optimized for aggregation functions, but it significantly limited the performance of hash join operations.

To improve the performance of hash join operations, we implemented a dedicated hash table optimized for hash join operations. We also allocated a fixed-size hash table based on the number of rows in the build stage, eliminating the need for growth during insertion. We replaced the value type of the hash table from Vec to a pointer that can be used for CAS operations, which ensures memory control and eliminates the need for Vec's growth.

The new implementation showed significant improvement in performance. If you are interested in learning more, please check out the resources listed below.

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Rust Compilation Challenges and Solutions

Compiling a medium to large Rust program is not a breeze due to the accumulation of complex project dependencies and boilerplate code.

To address these challenges, Databend team implemented several measures, including observability tools, configuration adjustments, caching, linker optimization, compile-related profiles, and refactoring.

If you are interested in learning more, please check out the resources listed below.

Highlights

Here are some noteworthy items recorded here, perhaps you can find something that interests you.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Call for Contributors to Help with Functions

We are currently working on improving our functions, and we need your help!

We have identified four areas that require attention, and we would be extremely grateful for any assistance that you can provide.

If you are interested in contributing to any of these areas, please refer to the following resources to learn more about how to write scalar and aggregate functions:

We appreciate any help that you can provide, and we look forward to working with you.

Issue #11220 | Tracking: functions

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.14-nightly...v1.1.23-nightly


🎉 Contributors
25 contributors

Thanks a lot to the contributors for their excellent work.

🎈Connect With Us

Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.

Join the Databend Community to try, get help, and contribute!