2020-03-24

Language►C++

STL源码分析

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

从这开始我们进入《STL源码分析》的学习

STL分为6大组件: 空间配置器、容器、迭代器、算法、仿函数、配接器

空间配置器

STL的空间适配器事STL的基础，我们不能靠操作系统来为我们管理内存，那样的代价太大了，这不划算，作为一个c/c++开发人员，我们要完全控制我们程序的一切。

allocator

这是他的英文名字，我们的allocator定义了四个操作

alloc::allocate() 内存配置
alloc::dellocate() 内存释放
alloc::construct() 构造对象
alloc::destroy() 析构对象

type_traits<T>

一个模版元技术，他是一个类,能够萃取类型的相关信息，模版元详见C++笔记中的Boost源码分析

destroy

对于基本数据类型，我们啥都不用干，对于用户定义的数据类型，我们显示调用析构函数即可，这个使用模版特化即可。

construct

就是new，但是不用申请空间了，因为allocate已经干了

一级配置器、二级配置器

一级配置大空间(>128bytes)就是malloc和free，二级配置小空间，利用内存池。

一级配置器

直接new的，new失败以后调用out of memery的处理方式调用处理例程，让其释放内存，不断尝试,释放的时候直接free

二级配置器

维护16个链表，每个链表维护一种类型的内存，分别为8bytes、16bytes、24bytes、一直到128bytes。更加巧妙的地方是将维护的内存和链表的指针使用联合体组装。这就不浪费空间了。当需要配置内存的时候，向8字节对齐，然后再分配，当释放内存的时候，丢到链表里面就行了
当链表空了的时候，从内存池中取出20个新的内存块填充链表。
内存池是一个大块大内存，当内存池空了以后，申请更多的内存，保证每次都比上一次申请的多就行了，要是让我实现，我才不这样做，我要用计算机网络中的自适应rtt原理来做。

迭代器

说白了就是个指针，但是他比指针更强大，更灵活。

迭代器类型

input iterator 只读
output iterator 只写
forward iterator 单向迭代器
bidirectional iterator 双向移动一个单位
random access iterator 双向移动多个单位

graph TB
1((input)) --> 3((forward))
2((output)) --> 3((forward))
3((forward)) --> 4((bi))
4((bi)) --> 5((random))

类型

首先为了实现的容易，得设计iterator_category为迭代器自己的类型，value_type为迭代器维护的具体数据的类型，diference_type为两个迭代器之间的距离的类型，pointer为原生指针，reference为原生引用。

vector

不讲了，太简单了

vector 的迭代器

服了，居然就是指针，我一直以为他封装了一下，这也太懒了。

list

算了这都跳过算了，没啥意思，

deque

用分段连续来制造整体连续的假象。
两个迭代器维护首尾，一个二维指针维护一个二维数组，感觉很low，每一行被称为一个缓冲区,但是列的话，他前后都预留了一些指针位置。
当我们随机访问的时候，就可以根据每一行的长度来选择正确的缓冲区了。

deque的迭代器

这个就厉害一些了，他包含了4个地方，当前指针、当前缓冲区首尾指针，中控器上面当前缓冲区的指针。

代码我感觉也一般般，我写也这样

queue和stack

居然是deque实现的，明明有更好的实现方法，再见，看都不想看

heap

算法都是这样写的

priority heap

vector实现的，

slist

我还是不适合这个东西

关联式容器

这玩意就是红黑树啦，上一章的序列容器看的我难受死了，希望这个能爽一些

红黑树

翻了好几页，都是红黑树，并没有让我感到很吃惊的代码

set和map

set就是直接红黑树，map就把用pair分别存kv，然后自己定一个仿函数，难怪map找到的都是pair

multi

算了自己写过平衡树的都知道，和非multi没几行代码不一样。

hashtable

算法

分为质变算法和非质变算法，一个会改变操作对象，另一个不会。

accumulate

这个强，accmulate(first,last,init),将[first,last)的值累加到init上

accmulate(first,last,init,binary op),将[first,last)从左到右二元操作(init,*)到init上

adjacent_difference

666666666，adjacent_difference(first,last,result)差分都来了[first,last)差分到[result,*)

6666666,自己定义的差分adjacent_difference(first,last,result,binary_op); 这个能自定定义减法，

注意可以result设为first

inner_product

内积，inner_product(first1,last1,first2,init),加到init上然后返回。

参数在加上一个binary_op1和binary_op2,init=binary_op1(init,binary_op2(eme1,eme2))

太强了，佩服的五体投地，明天继续学,看java去

partial_sum

和前面的差分一样,partial_sum 为前缀和，partial_sum(first,last,result)为前缀和输出到result中

当然你也可以定义binary_op操作，加在最后面

power

快速幂算法了解一下，power(x,n)x的n次方，n为整数，要求满足乘法结合律。

power(x,n,op),这个同理

itoa

itoa(first,last,value);

while(first!=last) *first++=value++;

equal

equal(first1,last1,first2)

判断[first1,last1) 和[first2,…)是否相同

同样支持二元仿函数。

fill

fill(first,last,value)

把区间的值设为value

fill_n

fill(first,n,value)

把first开始的n个元素设为value

iter_swap

iter_swap(a,b)

交换迭代器的内容，这里就有意思了，如何获取迭代器内容的类型呢？还记得之前讲迭代器的时候，在迭代器内部定义的

value_type吗？对！就是那个。

lexicographical_compare

lexicographical_compare(first1,last1,first2,last2)

字典序比大小，需要支持小于号

max min

这俩也支持仿函数

mismatch

mismatch(first1,last1,first2)

用第一个去匹配第二个，你需要保证第二个不必第一个短，返回匹配尾部

支持仿函数==

swap

就是很普通的交换，

copy(first,last,result)

特化了char*和wchar_t*为memmove，特化了T*和const T*，通过萃取，若指向的数据为基本数据类型则调用memmove，

否则再分为随机迭代器和非随机迭代器，随机迭代器使用last-first这个值来控制，非随机迭代器用if(last==frist)来控制。

copy_backward

和上面一样，但是为逆序拷贝

set_union

set_union(first1,last1,first2,last2,result)

就是遍历两个有序容器，然后赋值到result中，注意到它在碰到相同元素的时候两个迭代器都自增了，导致若第一个中有3个1，第二个中有5个1，则输出的时候只有5个1

set_intersection

同上

交集，得到3个1

set_difference

代码越来越平庸了，这个是S1-S2，出现在S1中不出现在S2中

set_symmetric_difference

对称差，平庸的代码

adjacent_find(first,last)

找到第一个相等的相邻元素，允许自定义仿函数

count(first,last,value)

对value出现对次数计数

count_if(first,last,op) 对op(*it)为真计数

越看越无聊了

find(first,last,value)

找value出现的位置，这个函数应该挺有用的

find_if(first,last,op) 同上啦

find_end 和find_first_of

这个函数没用，我为啥不用kmp算法

for_each(first,last,op)

op(*i)

geterate(first,last,op)

*i=op()

generate_n 同上

transform(first,last,result,op)

*result=op(*first)

transform(first1,last1,first2,last2,result,op)

*result=op(*first1,*first2)

includes(first1,last1,first2,last2)

保证有序，然后判断2是不是1的子序列，显然On

max_element(first,last)

区间最大值

min_element(first,last)

同上

merge(first1,last1,first2,last2,result)

归并排序的时候应该用得到吧

partition(first,last,pred)

pred(*)为真在前，为假在后On

remove(first,last,value)

把value移到尾部
remove_copu(first,last,result,value),非质变算法

remove_if remove_copy_if

同上

replace(first,last,old_value,new_value)

看名字就知道怎么实现的

replace_copy,replace_if,replace_copy_if

revese

秀得我头皮发麻，这个。。。。。。。

1
2
3

while(true)
  if(first==last||first==--last) return;
  else iter_swap(first++,last);

随机迭代器的版本还好

1	while(first<last) iter_swap(first++,--last);

reverse_copy ，常见的代码

rotate

这个代码有点数学，大概率用不到，一般情况下我们写splay都用3次reverse来实现的，复杂度也不过On,他这个代码就一步到位了，使用了gcd，没啥意思，STL果然效率第一

search

子序列首次出现的位置，

search_n

好偏，算了，没啥用的代码

swap_ranges(first1,last1,first2)

区间交换，swap的增强版

unique

移除重复元素

unique_copy

lower_bound upper_bound binary_search

不多说了，就是二分，

next_permutation

一直想不明白这个函数怎么实现的，今天来看看，既然是下一个排列，显然是需要找到一个刚好比现在这个大大排列，简单分析……6532,如果一个后缀都是递减的，显然这个后缀不能更大了，如果一个后缀都不能变得更大，就必须调整更前的，所以我们要找到这样的非降序….16532,把最后一个放到1的位置，其他的从小到大排列好就行了。也即swap(1,2),reverse(6531)

prev_permutation

同理

random_shuffle

洗牌算法，从first往last遍历，每次从最后面随机选一个放到当前位置即可。

partial_sort

partial_sort(first,middle,last)
保证[first,middle)有序且均小于[middle,last)直接对后面元素使用堆上浮，这样保证了小的元素均在[first,middle)中，然后使用sort_heap?????
&ems; 为啥第一步不用线性时间选择，第二步不用快排？

sort

大一就听说了这个的大名，现在来学习学习

Median_of_three

__median(a,b,c) 返回中间那个值

Partitionining

这个大家都会写，就是按照枢轴，把小的放左边，大的放到右边

threshold

当只有很少很少的几个元素的时候，插入排序更快。

final insertion sort

我们不必在快速排序中进行插入排序，但是可以提前推出，保证序列基本有序，然后再对整体使用插入排序

SGI sort

先快速排序到基本有序，然后插入排序

快速排序

先排序右边，后左边，且将左边当作下一层，当迭代深度恶化的时候，即超过了lg(n)*2的时候，采取堆排序
枢轴的选择，首、尾、中间的中位数

RW sort

这个就少了堆排序了，其他的和SGI一样

equal_range

lower_bound和upper_bound的合体
比直接lower_bound+upper_bound应该快一些，大概这样，二分中值偏小，做缩左断点，偏大则缩右端点，若二分中值等于value，则对左边lower_bound,对右边upper_bound,然后就能直接返回了

inplace_merge

将两个相邻的有序序列合并为有序序列，他在分析卡空间的做法，再见。。。不缺空间，

nth_element

线性时间选择，三点中值，递归变迭代，长度很小以后直接插入排序，666666

mergesort

借助inplace_merge直接完成了，

总结

STL的算法还是很厉害的。

仿函数

c++的一大特色，通过重载()来实现像函数一样的功能

一元仿函数

template<class Arg,class Result>
struct unary_function{
  typedef Arg argument_type;
  typedef Result result_type;
};

看到上面那玩意没，你得继承它。

negeta 取反，返回-x
logical_not !x
identity x
select1st a.first
select2nd a.second

二元仿函数

template<class Arg1,class Arg2,class Result>
struct unary_function{
  typedef Arg1 first_argument_type;
  typedef Arg2 second_argument_type;
  typedef Result result_type;
};

plus a+b
minus a-b
multiplies a*b
divides a/b
modulus a%b
equal_to a==b
not_equal_to a!=b
greater a>b
greater_equal a>=b
less a<b
less_equal a<=b
logical_and a&&b
logical_or a||b
project1st a
project2nd b

仿函数单位元

你要为你的仿函数设置一个identity_element单位元，用于快速幂

配接器

本质上，配接器是一种设计模式，改变仿函数的接口，成为仿函数配接器，改变容器接口，称为容器配接器，改变迭代器接口，称为迭代器配接器

容器配接器

queue和stack就是修饰了deque的配接器

迭代器配接器

迭代器的配接器有3个，insert itertors,reverse iterators,iostream iterators.

哇塞这东西有点深，明天再看。

2020-03-25

Language►Python

爬虫

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

网页

当我们输入网址以后，会建立http(https算了)连接，我们给服务器请求，服务器给我们回应，我们不断发送request,服务器不断返回response,请求又很多种。

大量的response

我们要把这些数据存起来，数据库啊啥的都行。

简单的爬虫

import requests
res = requests.get("http://www.baidu.com")
res.encoding = 'utf-8'
print(res.text)

上面的代码能够得到百度网站

分析html

import requests
#import bs4
from bs4 import BeautifulSoup
res = requests.get("http://www.baidu.com")
res.encoding = 'utf-8'
print(res.text)
soup = BeautifulSoup(res.text,"html.parser")
print(soup.text)

得到连接

import requests
import bs4
res = requests.get("http://www.baidu.com")
res.encoding = 'utf-8'
print(res.text)
soup = bs4.BeautifulSoup(res.text, "html.parser")
print(soup.text)
for link in soup.select('a'):
    print(link.text,link['href'])

结束了，好简单，还准备写一套流程的，现在免了

2020-03-24

Language►Java►Spring

spring学习1-spring入门

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

学习

spring 是一个轻量级框架

他最重要的地方时AOP和IOC，他的目的是降低耦合度，减少代码量

AOP

面向切面编程，

IOC

控制反转，即将对象的创建交给spring,配置文件+注解

耦合问题

比方说我们要在B类中使用A类，就会在B类中A a=new A();然后这样就导致了B依赖A

工厂模式解决耦合

用工厂来控制A类，在B中就能 A a=factory.getA(); 这又导致了B和工厂耦合。

ioc方案

解析xml配置文件中的类，在工厂类中利用反射创建类的对象，这就降低了类的耦合度，我们想要换类的时候，只要将xml中的类名称改掉就可以了。

一站式框架

springMVC+ioc+jdbcTemplate

2020-03-24

大数据

hadoop

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

hadoop

hadoop = common+hdfs+mapreduce+yarn

common

工具、rpc通信

hdfs

分布式文件系统，一个文件分成多个128Mb的文件，存储在多个节点，为了保证分区容错性，存有备份，默认为3。主从架构。

2020-03-17

Language►C++

Boost

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

Boost 与c++

Boost是基于C++标准的现代库，他的源码按照Boost Software License 来发布，允许任何人自由使用、修改和分发。

Boost有哪些功能？

Boost强大到拥有超过90个库，但我们暂时只学习其中一部分

Any

boost::any是一个数据类型，他可以存放任意的类型，例如说一个类型为boost::any的变量可以先存放一个int类型的值，然后替换为一个std::string的类型字符串。

Array

好像是一个数组容器，和std::vector应该差别不大。

and more …

这个系列的博客来干嘛？

这个系列的博客用来介绍Boost提供的接口，不对Boost进行源码分析，关于Boost的源码，预计会在Boost源码分析笔记系列的博客中。

Boost.Any

Any在C++17被编入STL
C++是强类型语言，没有办法向Python那样把一个int赋值给一个double这种骚操作，而Boost.Any库为我们模拟了这个过程，使得我们可以实现弱类型语言的东西。

在基本数据类型中玩弱类型

#include <boost/any.hpp>
int main(){
  boost::any a = 1;
  a = 3.14;
  a = true;
}

这样的代码是允许编译的，大概是因为boost::any内部储存的是指针。

char数组不行了

#include <boost/any.hpp>
int main(){
  boost::any a = 1;
  a = "hello world";
}

上诉代码可以编译和运行，但是一定会碰到问题的，当我们把char数组弄过去的时候，就不太行了，原因是char[]不支持拷贝构造，但是我们可以通过把std::string来解决这个问题。

用std::string代替char数组

#include <boost/any.hpp>
int main(){
  boost::any a = 1;
  a = std::string("hello world");
}

可以见到我们用string完美地解决了这个问题。

写很容易，如何读呢？

我们已经学习了boost::any的写操作，但是应该如何读取呢？

#include <boost/any.hpp>
#include <iostream>
int main(){
  boost::any a = 1;
  std::cout << boost::any_cast<int>(a) << std::endl;
}

boost提供了一个模版函数any_cast<T>来对我们的any类进行读取

类型不对会抛出异常

有了any<T>的模版，看起来我们可以对boost进行任意读取，我们试试下这个

#include <boost/any.hpp>
#include <iostream>
int main() {
  boost::any a = 1;
  a = "hello world";
  std::cout << boost::any_cast<int>(a) << std::endl;
}

抛出了如下异常

1	libc++abi.dylib: terminating with uncaught exception of type boost::wrapexcept<boost::bad_any_cast>: boost::bad_any_cast: failed conversion using boost::any_cast

实际上上诉代码是永远无法成功的。因为你把一个char数组传了进去。

成员函数

boost的any是有很多成员函数的。比方说empty可以判断是否为空，type可以得到类型信息，

#include <boost/any.hpp>
#include <iostream>
#include <typeinfo>

int main() {
  boost::any a = std::string("asdf");
  if (!a.empty()) {
    std::cout << a.type().name() << std::endl;
    a = 1;
    std::cout << a.type().name() << std::endl;
  }
}

代码运行结果如下，表示首先是字符串，然后是整形。

1 2	NSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE i

拿到指针

当我们把一个any的地址传给any_cast的时候，我们会得到any内部数据的指针，

#include <boost/any.hpp>
#include <iostream>

int main()
{
  boost::any a = 1;
  int *i = boost::any_cast<int>(&a);
  std::cout << *i << std::endl;
}

Boost::Tuple

boost::tuple是一个元组。在c++11被编入STL

第六行无法通过编译，这说明tuple的长度最长只能是10

第9-12行定义了3个元组

第13行演示了如何通过make_tuple构造元组

第14行演示了如何通过get来访问元组里面的元素

第16行演示了get的返回值是引用

第19-20行演示了tuple的等号操作

第23-27行演示了tuple中可以储存引用

第28行通过tie构造了一个全引用元组

#include <boost/tuple/tuple.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <boost/tuple/tuple_io.hpp>
#include <bits/stdc++.h>

// boost::tuple<int, int, int, int, int, int, int, int, int, int, int>too_long;
int main() {
  // 基本操作
  boost::tuple<int, int, int, int, int, int, int, int, int, int> a(
      1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  boost::tuple<std::string, std::string> b("hello", "world");
  boost::tuple<std::string, std::string> c("hello", "world2");
  std::cout << boost::make_tuple(1, 2, 3, 4) << std::endl;
  std::cout << "a.get<0>() is " << a.get<0>() << std::endl;
  std::cout << "a is " << a << std::endl;
  a.get<0>() = -1;
  std::cout << "a is " << a << std::endl;
  std::cout << "b is " << b << std::endl;
  std::cout << "b==c is " << (b == c) << std::endl;
  std::cout << "b==b is " << (b == b) << std::endl;

  // 进阶操作
  int x = 1, y = 2;
  boost::tuple<int&, int> reference(boost::ref(x), y);
  // boost::tuple<int&, int> reference(x, y); 也可以
  x = 5;
  std::cout << "reference is " << reference << std::endl;
  auto reference2 = boost::tie(x, y);
  x = 10, y = 11;
  std::cout << "reference2 is " << reference2 << std::endl;
}

输出

(1 2 3 4)
a.get<0>() is 1
a is (1 2 3 4 5 6 7 8 9 10)
a is (-1 2 3 4 5 6 7 8 9 10)
b is (hello world)
b==c is 0
b==b is 1
reference is (5 2)
reference2 is (10 11)

Boost::Variant

boost::variant和any很像，variant和any一样在C++17中被编入STL

variant可以指定一部分数据类型，你可以在这一部分中随便赋值，就像下面写到的一样，另外和any的any_cast不一样的是variant使用get<T>来获得内容。

#include <boost/variant.hpp>
#include <iostream>
#include <string>

int main() {
  boost::variant<double, char, std::string> v;
  v = 3.14;
  std::cout << boost::get<double>(v) << std::endl;
  v = 'A';
  // std::cout << boost::get<double>(v) << std::endl; 这句现在会报错
  std::cout << boost::get<char>(v) << std::endl;
  v = "Hello, world!";
  std::cout << boost::get<std::string>(v) << std::endl;
}

访问者模式

variant允许我们使用访问者模式来访问其内部的成员，使用函数boost::apply_visitor来实现，访问者模式使用的时候重载仿函数。仿函数记得继承static_visitor即可。

#include <boost/variant.hpp>
#include <iostream>
#include <string>

struct visit : public boost::static_visitor<> {
  void operator()(double &d) const { std::cout << "double" << std::endl; }
  void operator()(char &c) const { std::cout << "char" << std::endl; }
  void operator()(std::string &s) const { std::cout << "string" << std::endl; }
};

int main() {
  boost::variant<double, char, std::string> v;
  v = 3.14;
  boost::apply_visitor(visit(), v);
  v = 'A';
  boost::apply_visitor(visit(), v);
  v = "Hello, world!";
  boost::apply_visitor(visit(), v);
}

输出

1
2
3

double
char
string

StringAlgorithms

我终于找到一个暂时没有被编入C++17的库了，听说在C++20中他也没进，哈哈哈。

大小写转化

首先上来的肯定就是大小写转化啦，使用函数to_upper_copy(string)就可以了。

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "abcdefgABCDEFG";
  std::cout << boost::algorithm::to_upper_copy(s) << std::endl;
  std::cout << boost::algorithm::to_lower_copy(s) << std::endl;
}

删除子串

erase_all_copy就是说先copy一份，然后再将子串全部删除，如果不带copy就是说直接操作穿进去的母串。下面的代码都可以去掉_copy,erase_first指的是删除第一次出现的，last指的是删除最后一次出现的，nth指的是删除第n次出现的，n从0开始,erase_head值的是删除前n个字符，erase_tail指的是删除后n个字符。

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "000111000111ababab000111000111";
  std::cout << s << std::endl;

  // boost::algorithm::erase_all(s,"ab");
  std::cout << boost::algorithm::erase_all_copy(s, "ab") << std::endl;
  std::cout << boost::algorithm::erase_first_copy(s, "111") << std::endl;
  std::cout << boost::algorithm::erase_last_copy(s, "111") << std::endl;
  std::cout << boost::algorithm::erase_nth_copy(s, "111",0) << std::endl;
  std::cout << boost::algorithm::erase_nth_copy(s, "111",100) << std::endl;
  std::cout << boost::algorithm::erase_head_copy(s, 4) << std::endl;
  std::cout << boost::algorithm::erase_tail_copy(s, 4) << std::endl;
}

查找子串

find一类的函数，同上,他将返回一个iterator_range的迭代器。这个迭代器可以操作子串。注意子串和母串共享空间。

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "000111000111ababab000111000111";
  std::cout << s << std::endl;
  auto x = boost::algorithm::find_first(s,"000");
  x[0] = '2';
  std::cout << s << std::endl;
  //std::cout << boost::algorithm::find_last(s, "111") << std::endl;
}

又是一套代码下来了

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "000111000111ababab000111000111";
  std::cout << s << std::endl;
  auto x = boost::algorithm::find_first(s,"000");
  x = boost::algorithm::find_last(s,"1");
  x = boost::algorithm::find_nth(s,"1",3);
  x = boost::algorithm::find_tail(s,3);
  x = boost::algorithm::find_head(s,3);
  std::cout << s << std::endl;
  //std::cout << boost::algorithm::find_last(s, "111") << std::endl;
}

替换子串

replace又是一套如下

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "000111000111ababab000111000111";
  std::cout << s << std::endl;

  // boost::algorithm::replace_all(s,"ab");
  std::cout << boost::algorithm::replace_all_copy(s, "ab","all") << std::endl;
  std::cout << boost::algorithm::replace_first_copy(s, "111","first") << std::endl;
  std::cout << boost::algorithm::replace_last_copy(s, "111","last") << std::endl;
  std::cout << boost::algorithm::replace_nth_copy(s, "111", 0,"nth") << std::endl;
  std::cout << boost::algorithm::replace_nth_copy(s, "111", 100,"nth") << std::endl;
  std::cout << boost::algorithm::replace_head_copy(s, 2,"Head") << std::endl;
  std::cout << boost::algorithm::replace_tail_copy(s, 2,"Tail") << std::endl;
}

修剪字符串

trim_left_copy 指的是从左边开始修建，删掉空字符等，trim_right_copy是从右边开始修建，trim_copy是两边一起修剪。

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "\t  ab  d d  d d d \t";
  std::cout << "|" << s << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_left_copy(s) << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_right_copy(s) << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_copy(s) << "|" << std::endl;
}

这个代码输出了

|      ab  d d  d d d     |
|ab  d d  d d d     |
|      ab  d d  d d d|
|ab  d d  d d d|

我们还可以通过指定谓词来修剪使用trim_left_copy_if

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = " 01 0 1 000ab  d d  d d d 11111111";
  std::cout << "|" << s << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_left_copy_if(s,boost::algorithm::is_any_of(" 01")) << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_right_copy_if(s,boost::algorithm::is_any_of(" 01")) << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_copy_if(s,boost::algorithm::is_any_of(" 01")) << "|" << std::endl;
}

更多的谓词,我们还有is_lower、is_upper、is_space等谓词。

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = " 01 0 1 000ab  d d  d d d 11111111";
  std::cout << "|" << s << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_copy_if(s,boost::algorithm::is_space()) << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_copy_if(s,boost::algorithm::is_digit()) << "|" << std::endl<<std::endl;
  s = "aaaBBBaBBaaa";
  std::cout << "|" << s << "|" << std::endl;
  std::cout << "|" << boost::algorithm::trim_copy_if(s,boost::algorithm::is_lower()) << "|" << std::endl;


}

字符串比较

starts_with(s,t)判断s是否以t开头，类似的有ends_with,contains,lexicographical_compare分别表示s是否以t结尾，s是否包含t，s与t的字典序比较。

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::cout << boost::algorithm::starts_with("abcde", "abc") << std::endl;
  std::cout << boost::algorithm::ends_with("abcde", "cde") << std::endl;
  std::cout << boost::algorithm::contains("abcde", "cde") << std::endl;
  std::cout << boost::algorithm::lexicographical_compare("abcde", "cde") << std::endl;
  std::cout << boost::algorithm::lexicographical_compare("abcde", "abcde") << std::endl;
  std::cout << boost::algorithm::lexicographical_compare("cde", "abcde") << std::endl;
}

字符串分割

这个就简单多了，直接split+谓词函数就行了

#include <boost/algorithm/string.hpp>
#include <iostream>

int main() {
  std::string s = "abc abc abc * abc ( abc )";
  std::vector<std::string> v;
  boost::algorithm::split(v, s, boost::algorithm::is_any_of(" *()"));
  for (auto x : v) std::cout << x << ".";
  std::cout << std::endl;
}

输出

1	abc.abc.abc...abc...abc...

我们注意看有些函数前面有个i，比如ierase_all, 这个说的是忽略大小写。

boost::regex

C++11的时候，被编入STL
明天接着整。。。

2020-03-17

Language►C++

Boost源码

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

remove_cv

remove_cv 这个模版类能够帮我们去掉类型的const，他的实现很简单，即使用模版元技术：

template <class T> struct remove_cv{ typedef T type; };
template <class T> struct remove_cv<T const>{ typedef T type;  };
template <class T> struct remove_cv<T volatile>{ typedef T type; };
template <class T> struct remove_cv<T const volatile>{ typedef T type; };

这个代码应该非常容易理解，remove_cv的模版是一个T,我们对他做模版偏特化，将const 和volatile分离，然后使用::value就可以得到没有const、volatile的类型了，所以这个类也叫remove_cv。

is array

is array的实现非常简单，我们先假设所有的都不是array，即如第四行所示，然后利用偏特化，特判掉所有的array即可，让他们继承true_type,这样我们在使用的时候用::value即可判断。

#if defined( __CODEGEARC__ )
   template <class T> struct is_array : public integral_constant<bool, __is_array(T)> {};
#else
   template <class T> struct is_array : public false_type {};
#if !defined(BOOST_NO_ARRAY_TYPE_SPECIALIZATIONS)
   template <class T, std::size_t N> struct is_array<T[N]> : public true_type {};
   template <class T, std::size_t N> struct is_array<T const[N]> : public true_type{};
   template <class T, std::size_t N> struct is_array<T volatile[N]> : public true_type{};
   template <class T, std::size_t N> struct is_array<T const volatile[N]> : public true_type{};
#if !BOOST_WORKAROUND(__BORLANDC__, < 0x600) && !defined(__IBMCPP__) &&  !BOOST_WORKAROUND(__DMC__, BOOST_TESTED_AT(0x840))
   template <class T> struct is_array<T[]> : public true_type{};
   template <class T> struct is_array<T const[]> : public true_type{};
   template <class T> struct is_array<T const volatile[]> : public true_type{};
   template <class T> struct is_array<T volatile[]> : public true_type{};
#endif
#endif

integral_consant

这也是一个模版元技术，他储存了自己的类型，模版的类型，模版的值的类型，他的实现如下

template <class T, T val>
  struct integral_constant
  {
     typedef mpl::integral_c_tag tag;
     typedef T value_type;
     typedef integral_constant<T, val> type;
     static const T value = val;

     operator const mpl::integral_c<T, val>& ()const
     {
        static const char data[sizeof(long)] = { 0 };
        static const void* pdata = data;
        return *(reinterpret_cast<const mpl::integral_c<T, val>*>(pdata));
     }
     BOOST_CONSTEXPR operator T()const { return val; }
  };

这里很明显了，value是值，value_type是value的类型，type是自己的类型。

true_type false_type

这里就很有意思了，看看就懂

1 2	typedef integral_constant<bool, true> true_type; typedef integral_constant<bool, false> false_type;

可能有人会问这个有什么用，其实这样的，很多时候我们需要为我们的类添加一个value，表示true或者false，正常的实现方法是写两遍，一遍处理全部，另一遍特化false，这样写的话，代码复用就太low了，这时候，其实我们只需要实现一遍基类，派生的时候一个继承true，另一个继承false就OK了。

is_function

这个代码就nb了，我还没看懂，先留个坑,我猜了一下，大概是用来判断一个类型是否是函数指针的。

remove_bounds

这个模版元我还真没猜出他的功能，话说怎么可能有人想得到这个bounds指的是数组的bounds呢？这个模版元的功能是传入一个数组，传出他的内容，即将T[]映射为T。注意： remove_bounds就是remove_extent。

template <class T> struct remove_extent{ typedef T type; };

#if !defined(BOOST_NO_ARRAY_TYPE_SPECIALIZATIONS)
template <typename T, std::size_t N> struct remove_extent<T[N]> { typedef T type; };
template <typename T, std::size_t N> struct remove_extent<T const[N]> { typedef T const type; };
template <typename T, std::size_t N> struct remove_extent<T volatile [N]> { typedef T volatile type; };
template <typename T, std::size_t N> struct remove_extent<T const volatile [N]> { typedef T const volatile type; };
#if !BOOST_WORKAROUND(__BORLANDC__, BOOST_TESTED_AT(0x610)) && !defined(__IBMCPP__) &&  !BOOST_WORKAROUND(__DMC__, BOOST_TESTED_AT(0x840))
template <typename T> struct remove_extent<T[]> { typedef T type; };
template <typename T> struct remove_extent<T const[]> { typedef T const type; };
template <typename T> struct remove_extent<T volatile[]> { typedef T volatile type; };
template <typename T> struct remove_extent<T const volatile[]> { typedef T const volatile type; };
#endif
#endif

还是老样子，数组就特判掉，然后返回其头，否则就返回他的本身。

remove_reference

这个名字就很棒，就是移除引用的意思。同样他也是模版元技术，他先将所有的类型映射为自己，然后通过模版偏特化的方式将那些引用映射为本身。这里有一个c++的特性即下面代码
这个代码看懂的人应该不多了。

#include <iostream>

void f(int& x) { std::cout << "&" << std::endl; }
void f(int&& x) { std::cout << "&&" << std::endl; }

int main() {
  int a = 1, b = 2, c = 3, d = 4;
  f(a);
  f(b);
  f(c);
  f(d);
  f(1);
  f(2);
  f(3);
  f(4);
}

这里的&&就是右值引用的意思，所以输出是

&
&
&
&
&&
&&
&&
&&

然后我们来看源码

namespace detail{
//
// We can't filter out rvalue_references at the same level as
// references or we get ambiguities from msvc:
//
template <class T>
struct remove_rvalue_ref
{
   typedef T type;
};
#ifndef BOOST_NO_CXX11_RVALUE_REFERENCES
template <class T>
struct remove_rvalue_ref<T&&>
{
   typedef T type;
};
#endif

} // namespace detail

template <class T> struct remove_reference{ typedef typename boost::detail::remove_rvalue_ref<T>::type type; };
template <class T> struct remove_reference<T&>{ typedef T type; };

#if defined(BOOST_ILLEGAL_CV_REFERENCES)
// these are illegal specialisations; cv-qualifies applied to
// references have no effect according to [8.3.2p1],
// C++ Builder requires them though as it treats cv-qualified
// references as distinct types...
template <class T> struct remove_reference<T&const>{ typedef T type; };
template <class T> struct remove_reference<T&volatile>{ typedef T type; };
template <class T> struct remove_reference<T&const volatile>{ typedef T type; };
#endif

同样的我们使用模版元技术，将引用就消除了。

decay

这个模版元的意思是移除引用、移除const、移除volatile、数组移除范围、函数变成指针。

namespace detail
{

   template <class T, bool Array, bool Function> struct decay_imp { typedef typename remove_cv<T>::type type; };
   template <class T> struct decay_imp<T, true, false> { typedef typename remove_bounds<T>::type* type; };
   template <class T> struct decay_imp<T, false, true> { typedef T* type; };

}

 template< class T >
 struct decay
 {
 private:
     typedef typename remove_reference<T>::type Ty;
 public:
    typedef typename boost::detail::decay_imp<Ty, boost::is_array<Ty>::value, boost::is_function<Ty>::value>::type type;
 };

实际上做起来的时候是先移除引用，最后移除cv的。

any

如{ post_link ‘Boost-学习笔记2-Boost-Any’ Any接口学习 }所示，any能够支持我们的c++向python一样，给一个变量瞎赋值，这也太爽了。

构造函数如下

template<typename ValueType>
any(const ValueType & value)
  : content(new holder<
        BOOST_DEDUCED_TYPENAME remove_cv<BOOST_DEDUCED_TYPENAME decay<const ValueType>::type>::type
    >(value))
{
}

这里是接受任意的类型，然后对这个类型使用decay得到他的基本类型，最后让holder来替我们管理。holder保持了一个输入参数的副本，我们发现这个holder类型的值放到了一个叫content的指针中。

holder

holder继承自placeholder，placeholder是一个接口，我们不去管他，holder内部的副本保存在held中。

        template<typename ValueType>
        class holder
#ifndef BOOST_NO_CXX11_FINAL
          final
#endif
          : public placeholder
        {
        public: // structors

            holder(const ValueType & value)
              : held(value)
            {
            }

#ifndef BOOST_NO_CXX11_RVALUE_REFERENCES
            holder(ValueType&& value)
              : held(static_cast< ValueType&& >(value))
            {
            }
#endif
        public: // queries

            virtual const boost::typeindex::type_info& type() const BOOST_NOEXCEPT
            {
                return boost::typeindex::type_id<ValueType>().type_info();
            }

            virtual placeholder * clone() const
            {
                return new holder(held);
            }

        public: // representation

            ValueType held;

        private: // intentionally left unimplemented
            holder & operator=(const holder &);
        };

any数据类型的读取

any数据有两种读取方式，一是指针，想要读取出里面的元素，显然元素是operand->content->held, 我们要得到他的指针的话，先构造出指针来： holder<remove_cv<ValueType>::type>*, 因为operand->content是placeholer,这也就是为什么下面的代码的括号在->held之前的原因。最后用boost::addressof取出地址就可以了。

template<typename ValueType>
ValueType * any_cast(any * operand) BOOST_NOEXCEPT
{
    return operand && operand->type() == boost::typeindex::type_id<ValueType>()
        ? boost::addressof(
            static_cast<any::holder<BOOST_DEDUCED_TYPENAME remove_cv<ValueType>::type> *>(operand->content)->held
          )
        : 0;
}

第二种方式是读取拷贝，先移除引用，调用上面的指针读取，最后指针取内容就可以返回了。

    template<typename ValueType>
    ValueType any_cast(any & operand)
    {
        typedef BOOST_DEDUCED_TYPENAME remove_reference<ValueType>::type nonref;


        nonref * result = any_cast<nonref>(boost::addressof(operand));
        if(!result)
            boost::throw_exception(bad_any_cast());

        // Attempt to avoid construction of a temporary object in cases when
        // `ValueType` is not a reference. Example:
        // `static_cast<std::string>(*result);`
        // which is equal to `std::string(*result);`
        typedef BOOST_DEDUCED_TYPENAME boost::conditional<
            boost::is_reference<ValueType>::value,
            ValueType,
            BOOST_DEDUCED_TYPENAME boost::add_reference<ValueType>::type
        >::type ref_type;
#ifdef BOOST_MSVC
#   pragma warning(push)
#   pragma warning(disable: 4172) // "returning address of local variable or temporary" but *result is not local!
#endif
        return static_cast<ref_type>(*result);
#ifdef BOOST_MSVC
#   pragma warning(pop)
#endif
    }

any的成员函数

前两个就不说了，直接说第三个，如果content存在，就调用content的type

bool empty() const BOOST_NOEXCEPT
{
    return !content;
}

void clear() BOOST_NOEXCEPT
{
    any().swap(*this);
}

const boost::typeindex::type_info& type() const BOOST_NOEXCEPT
{
    return content ? content->type() : boost::typeindex::type_id<void>().type_info();
}

type是这样实现的

virtual const boost::typeindex::type_info& type() const BOOST_NOEXCEPT
{
    return boost::typeindex::type_id<ValueType>().type_info();
}

2020-03-16

数据结构

Y快速前缀树

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

Y快速前缀树

继X快速前缀树以后，Dan Willard又提出了X快速前缀树的改进版本

改进X快速前缀树

我们还是继续考虑n个小于M的整数(n<M)，我们按照大小，从小到大分组,每组的元素的个数在区间$[\frac{lgM}{4},2lgM]$上,对每组构建一颗普通平衡树，这样我们一共会得到$\frac{n}{2lgM}$到$\frac{4n}{lgM}$颗树，我们在这些树中取一个随便的值r，r要在平衡树的最大和最小值之间,这样我们每棵树对应了一个r，把所有的r作为键,其对应的平衡树作为值放到X快速平衡树中，这就是Y快速平衡树。

Y快速前缀树的查找前驱后继

首先在上层X前缀树上查找前驱和后继，最终我们会定位到两个叶子节点，也就对应了两颗普通平衡树，我们在这两颗普通平衡树里面直接找前驱后继然后合并就结束了。总复杂度$lglg(\frac{n}{lgM})+2*lg(lgM)=lglgM$

Y快速前缀树的插入

先在X前缀树上查询前驱后继，然后在其对应的平衡树上插入真正要插入的值，总复杂度$lglg(\frac{n}{lgM})+lglgM=lglgM$，这里有可能导致插入的值太多，进行分裂，我们来看一下这次分裂前插入了多少元素，我们考虑最坏的情况，不妨设之前也是分裂的，那么他的大小为$lgM$，到这次分裂一共插入了lgM+1个元素，导致现在的大小超过了2lgM，于是这lgM次插入的均摊分裂复杂度为$\frac{lg(\frac{n}{lgM})}{lgM}\lt 1$,于是总复杂度为$lglgM$

Y快速前缀树的删除

同样的，我们还是先查询前驱后街，然后在对应的平衡树上删除真正要删除的值，总复杂度为$lglg(\frac{n}{lgM})+lglgM=lglgM$,这里有可能导致平衡树上剩下的值太少，我们考虑合并，合并后如果依然太大，比方说大于lgM，我们就再次分裂为两个平衡树即可。这里可以证明，均摊复杂度依然是O1，因为从$\frac{lgM}{2}$到$\frac{lgM}{4}$也要$\frac{lgM}{4}$次删除，均摊下来，依然是O1,为了懒惰删除，我们甚至可以不必要求合并后超过lgM猜分裂，到2lgM也行。懒惰总是优秀的。于是总体复杂度为$lglgM$

总结

至此，Y快速前缀树的所有操作均为lglgM了.

代码

欠着欠着，以后再补。

2020-03-16

数据结构

X快速前缀树

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

X快速前缀树

我以前就说过，当你的数据结构达到了一定的基础，就可以学习那些更加高级的数据结构了，往往那些更加高级的数据结构由基本数据结构组合而成。

先提出一个问题

现在要你维护一个多重集合，支持3种操作，1询问一个值(这个值不一定在集合中)的前驱和后继，2向集合中插入一个元素，3从集合中删掉一个元素，1操作$10^6$次，2操作$10^5$次，3操作$10^5$,要求你在1S内完成回答。保证集合元素都小于M=$10^6$

普通平衡树？

我们考虑到上面三种操作都是普通平衡树常见的操作，很多人可能就直接拿起他自己的平衡树上了。很遗憾，大概率是无法通过的。因为操作次数太多了。

观察，思考

我们的操作需要的是大量的查询，大量、大量，你个普通平衡树怎么操作的过来？

新的平衡树

现在我们提出一个新的平衡树，这个平衡树非常厉害，他支持$O(lgM)$的时间来删除和插入，支持$O(lglgM)$的时间来查询前驱后继。

X快速前缀树

字典树+哈希表+维护叶子节点的双向链表+二分
首先，我们先建立一颗普通的01字典树，这个树我们对他稍作修改，考虑到字典树的节点分3类：叶子节点、根节点、内部节点，我们让叶子节点间构造双向环状链表，其次，对于仅有左儿子的内部节点，让其右儿子指向子树的最小叶子节点，对于仅有右儿子的内部节点，让其左儿子指向子树的最大叶子节点。另一方面，我们对字典树的每一层都建立一个hash表，hash掉他们节点所代表的值是有还是没有，这样我们就构造出了一个X快速前缀树的模型了。

X快速前缀树的查找

假设我们要找一个数k，我们在树上寻找树和k的lca[最低公共祖先]，这个过程可以只要在hash表中二分即可知道lca在哪一个节点，可以证明，这个节点要么没有左儿子，要么没有右儿子。如果有的话，他的儿子和k的lcp[最长公共前缀]一定更长，但是这与他自己是lca的事实相悖。另一方面，由于我们的单儿子节点储存了最值信息，如果这个节点没有右儿子，则他的右儿子指向的值是k的前驱，至于后继，在叶子节点的链表上后移一个单位即可。这个过程总复杂度在二分lca上，树的高度为lgn，二分高度，所以总体复杂度为$O(lglgn)$

X快速前缀树的插入

找出lca，若lca没有右儿子，则说明当前节点要插入到右儿子里面。照着做就行了，同时注意向下递归的时候把值也插入到hash表里面，递归到叶子的时候重新连接双向环状链表(前驱和后继)，最后回溯的时候维护单儿子节点的信息，以及树根方面的值就行了。

X快速前缀树的删除

找到待删除节点，然后删掉，重新维护叶子链表，回溯的时候从hash表里面删掉自己，对于单儿子的节点也要根据子树回溯即可。

2020-03-16

数据结构

三叉搜索树

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

ternary search tree

字典树的缺点

不难想到，对于那些字符集特别大的字典树来说，他们的空间消耗特别大，因为每个节点都要储存大量的指针，而这些指针往往又是空的。

将BST与trie结合起来

考虑这样一种树，每个节点有三个儿子，左右儿子表示自己的左右兄弟，向下的儿子表示真正的儿子。这样的树，将极大的提高了空间利用率。

偷个图来放着

这里插入了as,at,be,by,he….

三叉搜索树的插入

考虑递归，假设我们插入到了某个节点，若下儿子与当前字符相等，则递归到下儿子并使用下一个字符来递归，如果当前字符小于下儿子，则递归到左儿子，保持当前字符不变，如果当前节点不存在了，则创建新节点，直接向下儿子走。

三叉搜索树的删除

我们还是用数字来记录终结节点的终结字符串有多少个，若找到待删除的节点以后，终止与此的只有一个字符串，则直接删掉，否则让终极节点的计数器减1，注意在回溯的时候，如果三个儿子都没有终结字符了，就删掉自己。

三叉搜索树的查找

递归递归。

三叉搜索树的缺点

树的平衡是一个很大的问题，这个我也没办法

三叉搜索树的本质

很多时候，很多数据结构像变戏法一样，我们从本质看带三叉搜索树，我们发现下儿子其实是字典树的边，在考虑左右儿子，其实这个就是bst，哈哈哈发现了没有，
我们考虑删掉所有下儿子，你会发现，剩下的是一个bst森林，就像lct删掉指向父亲没有指向自己的边以后，就是splay森林一样，很神奇。我们将这些bst森林转化为一个一个普通的数组，让这些数组从新充当节点，然后将下儿子连回去，
一切都清晰了，又变成普通字典树了。
所以三叉搜索树的本质是优化后的字典树，每个节点都是一个bst。即trie套bst，外层树为trie，内层树为bst。

三叉搜索树的优化？

我们让这里的bst用avl代替？用rbt代替？用sbt代替？都可以，但是我觉得这样编码太难了吧，若实在是真的差这点效率，可以去这样做，但我认为，把普通字典树的节点用avl-map、rbt-map、sbt-map直接范型编程或设为类中类不香吗。在这玩树套树确实高大上，效率也高，但编码难度也太高了。

代码

先欠着，以后再还。

2020-03-16

ACM►学习笔记►字符串

基数树

next hexonext butterfly volantis yearn yilia shoka indigo apollo landscape cactus matery icarus fluid material

基数树

基数树是一种更加节省空间的数据结构，他是字典树的升华，

字典树的缺陷

常常字典树会很深，而不胖，这会导致空间的浪费，因为里面的指针很多，往往我们发现，如下列字典树
稍等片刻！正在将字符数据转化为图形

graph LR
start((start))--a--> 1((1))
1((1))--b-->2((2))
2((2))--c-->3((end))
2((2))--d-->3((end))

用基数树改进字典树

我们可以通过压缩字符路径为字符串路径，即将长链压缩为一条边。

graph LR
start((start))--ab-->2((2))
2((2))--c-->3((end))
2((2))--d-->3((end))

当然你甚至还能这样

1
2
3

graph LR
start((start))--abc-->3((end))
start((start))--abd-->3((end))

这些都是合法的基数树。注意，基数树仍然是字典树，只不过他压缩了路径

用基数树加速IP路由检索

路由检索常常是检索一个01字符串，这时候我们可以通过压缩的方式，每两位压缩、或者三位、四位压缩，能够让查找次数更少，当然这样做可能会牺牲空间，但是他依然是基数树优化的字典树。