Package com.bytedesk.ai.ocr
Class OcrEventListener
java.lang.Object
com.bytedesk.ai.ocr.OcrEventListener
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ChunkRestServiceprivate final FileRestServiceprivate final RobotService -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate StringcleanOcrText(String text) 清理OCR文字private List<org.springframework.ai.document.Document>convertTextToDocuments(String extractedText, UploadEntity upload) 将OCR提取的文字转换为Document列表createChunksFromDocuments(List<org.springframework.ai.document.Document> documents, FileResponse fileResponse) 从Documents创建Chunk记录private FileResponsefindOrCreateFileRecord(UploadEntity upload, String extractedText) 查找或创建文件记录private booleanisImageFile(UploadEntity upload) 判断是否为图片文件private booleanisValidOcrContent(String content) 验证OCR内容是否有效void监听上传创建事件,判断是否图片类型,进行OCR识别private voidprocessOcrResultToChunks(String extractedText, UploadEntity upload) 处理OCR结果,将提取的文字转换为Documents并存储到chunksprivate List<org.springframework.ai.document.Document>simpleTextSplit(String text, Map<String, Object> metadata) 简单的文字分割方法(fallback)private voidupdateFileDocIdList(String fileUid, List<String> chunkUids) 更新文件记录的docIdList
-
Field Details
-
robotService
-
fileRestService
-
chunkRestService
-
-
Constructor Details
-
OcrEventListener
public OcrEventListener()
-
-
Method Details
-
onUploadCreateEvent
监听上传创建事件,判断是否图片类型,进行OCR识别- Parameters:
event- 上传创建事件
-
isImageFile
判断是否为图片文件- Parameters:
upload- 上传实体- Returns:
- 是否为图片文件
-
processOcrResultToChunks
处理OCR结果,将提取的文字转换为Documents并存储到chunks- Parameters:
extractedText- OCR提取的文字upload- 上传实体
-
findOrCreateFileRecord
查找或创建文件记录 -
convertTextToDocuments
private List<org.springframework.ai.document.Document> convertTextToDocuments(String extractedText, UploadEntity upload) 将OCR提取的文字转换为Document列表 -
simpleTextSplit
private List<org.springframework.ai.document.Document> simpleTextSplit(String text, Map<String, Object> metadata) 简单的文字分割方法(fallback) -
cleanOcrText
清理OCR文字 -
createChunksFromDocuments
private List<String> createChunksFromDocuments(List<org.springframework.ai.document.Document> documents, FileResponse fileResponse) 从Documents创建Chunk记录 -
isValidOcrContent
验证OCR内容是否有效 -
updateFileDocIdList
更新文件记录的docIdList
-