[Algolia]Records存储大小限制的解决方法

今天想更新一下Algolia搜索的索引数据, 结果免费版的上传json文件的每一条record限制10K.就会报错上传失败.

免费版每个index最大1GB, 每月10K请求, 最大1M条records,每秒3次查询, 个人也够用了. 不想升级, 凑合着用, 付费是不可能付费的.
方法就是将hexo生成的json文件重新调整每一条record的大小, 限制在10K以下.

# -*- coding: utf-8 -*-
import json
from pprint import pprint

MAXSIZE = 7500
new_json = []


def cut_content(encoded_content):
    '''分隔utf-8编码成为指定大小的元素的列表'''
    # PARTS = len(encoded_content)//MAXSIZE + 1
    # ls = []
    # for i in range(PARTS):
    #     if i == PARTS-1:
    #         seg = encoded_content[i*MAXSIZE:]
    #         ls.append(seg)
    #     else:
    #         seg = encoded_content[i*MAXSIZE:i+MAXSIZE]
    #         ls.append(seg)
    # return [s.decode('utf-8') for s in ls]

    unit_length = MAXSIZE  # 每个单元的长度
    decoded_list = []  # 存储解码后的列表
    start_index = 0
    end_index = start_index + unit_length

    while start_index < len(encoded_content):

        # 确保不超过字节数据的长度
        if end_index > len(encoded_content):
            end_index = len(encoded_content)

        # 尝试解码
        try:
            decoded_string = encoded_content[start_index:end_index].decode(
                'utf-8')
            decoded_list.append(decoded_string)
        except UnicodeDecodeError:
            # 如果解码出错，则将单元长度减少，继续尝试
            end_index -= 1
        else:
            # 解码成功，更新起始索引
            start_index = end_index
            end_index = start_index + unit_length
    return decoded_list


def add_new_entries(content_list, old_entry):
    '''组成新的条目并添加到new_json'''
    new_entry = old_entry.copy()
    del new_entry['content']
    for idx, s in enumerate(content_list):
        entry = new_entry.copy()
        entry["idx"] = idx
        entry["content"] = s
        new_json.append(entry)


def gene_file():
    '''生成新json文件'''
    with open('./data.json', 'w', encoding='utf-8') as file:
        json.dump(new_json, file, ensure_ascii=False, indent=4)


def main():
    with open('./c2VhcmNo.json', 'r', encoding='utf-8') as json_file:
        json_data = json.load(json_file)
        for i in range(len(json_data)):
            content = json_data[i]["content"]
            encoded_content = content.encode('utf-8')
            psize = len(encoded_content)
            if psize > MAXSIZE:
                content_list = cut_content(encoded_content)
                add_new_entries(content_list, json_data[i])
            else:
                new_json.append(json_data[i])
    # for i in new_json:
    #     print(i['idx'], len(i["content"].encode()))
        gene_file()


if __name__ == "__main__":
    main()

导入以后现在 194条才1.54M, 1百万条1GB的限制估计这辈子是用不完的.

Author:
slacr_
Copyright:
Published:
October 15, 2023
Updated:
October 15, 2023

Buy me a cup of coffee ☕.

支