DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Bayo (2025) Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-26 18:04
1716 views
Best robot vacuum deal: Get the Roborock Q5 Max for 53% off at Amazon
SAVE $320: As of May 8, get the Roborock Q5 Max+ for $279.99, down from its usual price of $599.99 a
Read More
2025-06-26 17:19
974 views
Chinese automaker GWM launches fuel
On July 27, Chinese automaker Great Wall Motor (GWM) announced the launch of its luxury off-road SUV
Read More
2025-06-26 17:06
2413 views
Geely set to begin export of Zeekr EVs in mid
Chinese electric vehicle brand Zeekr is set to begin exports of its 001 hatchbacks to Europe in mid-
Read More