slacr_

Just to record my life and thoughts.
笔记/编程/杂乱/极简

[Algolia]Records存储大小限制的解决方法

Oct 15, 2023Blog572 words in 4 min

今天想更新一下Algolia搜索的索引数据, 结果免费版的上传json文件的每一条record限制10K.就会报错上传失败.

免费版每个index最大1GB, 每月10K请求, 最大1M条records,每秒3次查询, 个人也够用了. 不想升级, 凑合着用, 付费是不可能付费的.
方法就是将hexo生成的json文件重新调整每一条record的大小, 限制在10K以下.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# -*- coding: utf-8 -*-
import json
from pprint import pprint

MAXSIZE = 7500
new_json = []


def cut_content(encoded_content):
'''分隔utf-8编码成为指定大小的元素的列表'''
# PARTS = len(encoded_content)//MAXSIZE + 1
# ls = []
# for i in range(PARTS):
# if i == PARTS-1:
# seg = encoded_content[i*MAXSIZE:]
# ls.append(seg)
# else:
# seg = encoded_content[i*MAXSIZE:i+MAXSIZE]
# ls.append(seg)
# return [s.decode('utf-8') for s in ls]

unit_length = MAXSIZE # 每个单元的长度
decoded_list = [] # 存储解码后的列表
start_index = 0
end_index = start_index + unit_length

while start_index < len(encoded_content):

# 确保不超过字节数据的长度
if end_index > len(encoded_content):
end_index = len(encoded_content)

# 尝试解码
try:
decoded_string = encoded_content[start_index:end_index].decode(
'utf-8')
decoded_list.append(decoded_string)
except UnicodeDecodeError:
# 如果解码出错,则将单元长度减少,继续尝试
end_index -= 1
else:
# 解码成功,更新起始索引
start_index = end_index
end_index = start_index + unit_length
return decoded_list


def add_new_entries(content_list, old_entry):
'''组成新的条目并添加到new_json'''
new_entry = old_entry.copy()
del new_entry['content']
for idx, s in enumerate(content_list):
entry = new_entry.copy()
entry["idx"] = idx
entry["content"] = s
new_json.append(entry)


def gene_file():
'''生成新json文件'''
with open('./data.json', 'w', encoding='utf-8') as file:
json.dump(new_json, file, ensure_ascii=False, indent=4)


def main():
with open('./c2VhcmNo.json', 'r', encoding='utf-8') as json_file:
json_data = json.load(json_file)
for i in range(len(json_data)):
content = json_data[i]["content"]
encoded_content = content.encode('utf-8')
psize = len(encoded_content)
if psize > MAXSIZE:
content_list = cut_content(encoded_content)
add_new_entries(content_list, json_data[i])
else:
new_json.append(json_data[i])
# for i in new_json:
# print(i['idx'], len(i["content"].encode()))
gene_file()


if __name__ == "__main__":
main()

导入以后现在 194条才1.54M, 1百万条1GB的限制估计这辈子是用不完的.

  • Author:

    slacr_

  • Copyright:

  • Published:

    October 15, 2023

  • Updated:

    October 15, 2023

Buy me a cup of coffee ☕.

1000000