特色: 1.今晚在线社区独有的双风格分桢形式 2.社区最多可进行3级分类,大部分论坛应该够用了。 3.多种发帖类型: 普通帖、积分浏览帖、金钱购买帖、散分(求助)帖、 回复可见帖 4.更贴心得发帖辅助功能,可自动识别URL和图片。 5.采用类似今晚在线文章系统的后台管理等级无限级自定义分类,真正适合多用户管理。 6.UTF-8编码,解决贴繁体或国外字符乱码问题。 7.CSS控制整个社区颜色及风格。 8.版面多种可选属性,让社区变得更灵活。 -characteristics : 1. Tonights unique online community-style form of two sub-Lo. Community up for three classifications, Most of the forum should be good enough for us. 3. When posting a variety of types : general note, integral View points, and the purchase money note, casual pm (call) reads, Visibility four points back. when posting stickers experience more auxiliary functions, automatic identification of the URL and pictures. 5. Tonight similar article online system management background unlimited class hierarchy defined classification, is really suitable for multi-user management. 6.UTF-8 encoding, solving affixed Spanish or foreign characters distortion problems. 7.CSS control of the entire communities of color and style. 8. Layout multiple optional attributes to allow community b 下载
|
1、锁定某个主题抓取; 2、能够产生日志文本文件,格式为:时间戳(timestamp)、URL; 3、抓取某一URL时最多允许建立2个连接(注意:本地作网页解析的线程数则不限) 4、遵守文明蜘蛛规则:必须分析robots.txt文件和meta tag有无限制;一个线程抓完一个网页后要sleep 2秒钟; 5、能对HTML网页进行解析,提取出链接URL,能判别提取的URL是否已处理过,不重复解析已crawl过的网页; 6、能够对spider/crawler程序的一些基本参数进行设置,包括:抓取深度(depth)、种子URL等; 7、使用User-agent向服务器表明自己的身份; 8、产生抓取统计信息:包括抓取速度、抓取完成所需时间、抓取网页总数;重要变量和所有类、方法加注释; 9、请遵守编程规范,如类、方法、文件等的命名规范, 10、可选:GUI图形用户界面、web界面,通过界面管理spider/crawler,包括启停、URL增删等 -1, the ability to lock a particular theme crawls; 2, can produce log text file format : timestamp (timestamp), the URL; 3. crawls up a URL to allow for the establishment of two connecting (Note : local website for a few analytical thread is not limited) 4, abide by the rules of civilized spiders : to be analyzed robots.txt file and meta tag unrestricted; End grasp a thread after a website to sleep two seconds; 5, capable of HTML pages for analysis, Links to extract URL, the extract can judge whether the URL have been processed. Analysis has not repeat crawl over the web; 6. to the spider / crawler some of the basic procedures for setting up parameters, including : Grasp depth (depth), seeds URL; 7. use User-agent to the server to identify themselves; 8, crawls produce statistical informati 下载
|