If you are wondering, here is a quick log about the relaunch of Cheméo. You can get the log on the homepage of Cheméo but some of you are following the news here. So, here you have:
30 Jan: Since the 28, the new crawler is performing pretty well. We will soon have a fresh dataset to start againg Cheméo.
2 Feb: I guess we have now 90% of the data, the last 10% will most likely require as long as the first 90%. Not to wait too long, the parsing will start tonight or tomorrow and the remaining data will be integrated later.
3 Feb: The parsing is going well, the next step is merging the data to build for each compound a record. From this record analysis and computations are performed. The critical step will be the refreshing of the record with the data from the crawler.
4 Feb: Crawling, parsing and merging are operating very well together. The pipeline is running smoothly. Next step is performing the calculations and generating the indexing data. This is a bit harder. I will most likely start it tomorrow.
12 Feb: Some quick stats, the parsing of the crawled data takes about 5h, the merging of all the documents referring to a single compound into a master document takes 1h something and the extension and indexing of each master document to generate the ready for production document is not yet completed. It has a lot of control cases and I am clearing them as they come.
13 Feb: The first batch of ready for production document completed... Now the fun part is coming, insertion in the index and relauch! Update: first insert was ok, I will start the validation on the test server soon.