STL杂谈：priority_queue

2023-07-31 C++

Intro

优先队列和队列的区别在与“优先”的概念上。对于普通队列而言，遵循的是先进先出（FIFO）的驱逐原则。而优先队列是默认会驱逐队列中最大的元素，因为插入元素和驱逐元素是会始终保持队内有序。优先队列以对数时间复杂度的插入和提取为代价，提供了O(1)时间的提取最大元素的操作。

priority_queue的定义为：

template<class T,
				 class Container = std::vector<T>,
				 class Compare = std::less<typename Container::value_type>
class priority_queue;

T：存储元素的类型（存储整数就是int，也可存储自定义的class类型）。如果T和Container中的存储值类型不符合，会导致undefined behavior。

Container：底层存储元素的container。必须是Sequence Container（std::vector or std::deque）。内部必须提供以下常用的函数：

front()
push_back()
pop_back()

Compare：用于比较每个元素。在push()和pop()阶段都会自动调用，维护顺序，并且必须遵循strict weak ordering。否则会导致undefined behaviour。

可以看到，默认情况下，在使用内建的类型时，会使用std::less方法作为比较器，在这个情况下，调用top()方法，获取的就是比较后最大的元素。

Functions

常用的函数方法如下：

名称	用途
top	获取顶部的元素
empty	判断优先队列是否为空
size	判断当前优先队列的大小
push	在队列中插入元素
emplace	原地在队列中插入元素
pop	移除顶部元素

值得注意的是，优先队列并不支持访问除去顶部元素之外的其他元素。

Use Case

使用时需要#include <queue>。

使用内建类型时，可以不定义Container和Compare，自动使用std::less的方法进行排序。

std::vector<int> data{1, 2, 4, 8, 5, 7};
// 初始化方式 1
std::priority_queue<int> queue;
for(auto val : data) {
  queue.push(val);
}

// 初始化方式 2
std::priority_queue<int> queue{data.begin(), data.end()}

// 默认top是最大的，若想让top是最小的
std::priority_queue min_queue{data.begin(), data.end(), std::greater<int>()};

由于priority_queue仅能访问队顶的元素，若想打印和访问所有元素，需要循环通过top获取后再pop的方式。注意，如果在原队列上进行遍历（如果仅仅想观察所有的元素），pop会将原队列的元素删除，因此需要对原队列做一个拷贝，然后在拷贝的对象上遍历。

auto queue_cpy = queue;
while(!queue.empty()) {
  std::cout << queue_cpy.top() << " ";
  queue_cpy.pop();
}

很多博客中也讨论了如何为自定义的类型使用自定义的比较函数。就目前而言，可以构建的方法有两种。第一种是使用functor，即将函数定义在结构体当中；第二种是使用c++20中新提供的lambda（unnamed function objects）。很多博客并没有提到如何在class内，作为private member时如何去定义这样的优先队列。

下面代码中，我们定义了一个Tuple类，包含vector数据。接着定义了一个Foo类，用于实现自定义类型的自定义比较函数。

queue_test.h

#include <iostream>
#include <queue>
#include <vector>
#include <concepts>
#include <functional>
#include <ranges>
#include <string_view>

class Tuple {
    public:
    explicit Tuple(std::vector<int> val) : val_(val) {}
    void ToString() const {
        for (auto val : val_) {
            std::cout << val << " ";
        }
        std::cout << std::endl;
    }
    auto GetValAt(int idx) const -> const int &  {
        return val_[idx];
    }
    private:
    std::vector<int> val_;
};

class Foo {
    public:
    struct compare {
        public:
        explicit compare(int idx) : idx_(idx) {};
        auto operator() (const Tuple &t1, const Tuple &t2) -> bool {
            return t1.GetValAt(idx_) > t2.GetValAt(idx_);
        }
        private:
        int idx_;
    };
    explicit Foo(std::vector<Tuple> tuple, int idx) : comp_(idx), heap_(tuple.begin(), tuple.end(), comp_) {}
    auto GetHeap() const -> const std::priority_queue<Tuple, std::vector<Tuple>, compare> & {
        return heap_;
    }
    void ToString() {
        auto heap_cpy = heap_;
        while(!heap_cpy.empty()) {
            heap_cpy.top().ToString();
            heap_cpy.pop();
        }
    }
    private:
    compare comp_;
    std::priority_queue<Tuple, std::vector<Tuple>, compare> heap_;
};

观察到，struct是定义在class Foo的public中的。为什要定义在public中而不是private中呢？因为我的Foo类中定义了一个叫做GetHeap()的函数，这个函数需要向外界返回带有struct compare类型的私有变量。如果这个compare类型的定义也在private中，那么编译器就会报错（实际上，在vscode使用clangd插件的话，会直接标红。原因就是compare定义到私有，外界对这个返回结果的compare类型不可见。如果没有GetHeap()这个函数，那么struct compare就可以成功定义在private中了。

还有值得注意的几个地方：

由于优先队列只接受二元函数，也就是仅有两个参数的函数（两个参数必须是Tuple类型）。因此如果比较的过程中需要用到其他参数时，就需要在结构体里声明一些私有变量，并且定义构造函数。
struct compare必须对()操作符进行重载。
heap_的初始化需要传递比较函数。
class Foo的构造函数也需要构造comp_。

定义好头文件之后，在调用端就可以这样：

queue_test.cpp

#include "./priority_queue_exp.h"

int main() {
    std::vector<Tuple> tuple_vec{
        Tuple{{1, 4, 2, 7, 3, 9}},
        Tuple{{11, 14, 12, 17, 13, 19}},
        Tuple{{9, 42, 22, 7, 32, 92}},
        Tuple{{8, 42, 22, 7, 32, 92}},
        Tuple{{12, 42, 22, 7, 32, 92}},
    };

    std::cout << "print" << std::endl;
    Foo foo{tuple_vec, 1};
    foo.ToString();
}

// results
1 4 2 7 3 9 
11 14 12 17 13 19 
8 42 22 7 32 92 
12 42 22 7 32 92 
9 42 22 7 32 92

要正确使用lambda的方式进行调用，需要在C++20的条件下才能成功。具体的答案在stackoverflow上。

Relative pitfalls and notes

Capturing only a specific member of a class inside lambda
What is a proper way to declare priority queue as a private member in class with custom compare function?
Why use decltype when using lambda? (ChatGPT answer)

In C++, decltype is a keyword used to query the type of an expression. It's often used in generic programming where the type of an expression depends on template parameters and cannot be explicitly stated.

When defining a custom comparator for a std::priority_queue or other standard library containers, decltype can be used to create a type alias for the comparator type. This can simplify the code and make it easier to maintain, especially if the comparator is a complex type or depends on template parameters.

Here's an example:
```
// A comparator that sorts integers in reverse order
auto reverse_comparator = [](int a, int b) { return a > b; };

// Use decltype to create a type alias for the comparator type
using ComparatorType = decltype(reverse_comparator);

// Declare a priority_queue with the custom comparator
std::priority_queue<int, std::vector<int>, ComparatorType> pq(reverse_comparator);
```
In this example, decltype(reverse_comparator) is the type of the reverse_comparator lambda function. ComparatorType is a type alias for this type, and is used to declare pq.

This is particularly useful if the comparator is defined inline where the priority queue is declared, because in that case there's no variable whose type can be used to instantiate the priority queue:
```
// Declare a priority_queue with an inline comparator
std::priority_queue<int, std::vector<int>, decltype([](int a, int b) { return a > b; })> pq([](int a, int b) { return a > b; });
```
In this case, decltype is necessary to refer to the type of the inline comparator.

Lambda的本质其实也是functors。当定义一个lambda函数时：

int a = 3;
auto lambda = [a](int x) { return a + x; };

C++会在编译的时候将它翻译为：

struct unnamed_lambda_type {
  public:
    int operator() (int x) const {
        return a + x;
    }
  private:
  	int a;
};

unnamed_lambda_type lambda;

使用lambda函数的目的是更简洁和高效。

Is it safe to reuse a std container after std::move?

在使用std::move对container的资源做转移后，std的container仍然可以在之后的scope中使用，只不过当前的container不会包含任何移动之前的信息。