博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
通过设置Referer反"反盗链"
阅读量:6152 次
发布时间:2019-06-21

本文共 2709 字,大约阅读时间需要 9 分钟。

package cn.searchphoto.util;import java.io.File;import java.io.FileOutputStream;import java.io.InputStream;import java.io.OutputStream;import java.net.URL;import java.net.URLConnection;import java.util.zip.GZIPInputStream;/*** 下载远程网站的图片,通过设置Referer反反盗链。** @author JAVA世纪网(java2000.net, laozizhu.com)*/public class ImageDownloader {/*** 下载文件到指定位置* @param imgurl 下载连接* @param f 目标文件* @return 成功返回文件,失败返回null*/public static File download(String imgurl, File f) {try {URL url = new URL(imgurl);URLConnection con = url.openConnection();int index = imgurl.indexOf("/", 10);con.setRequestProperty("Host", index == -1 ? imgurl.substring(7) : imgurl.substring(7, index));con.setRequestProperty("Referer", imgurl);InputStream is = con.getInputStream();if (con.getContentEncoding() != null && con.getContentEncoding().equalsIgnoreCase("gzip")) {is = new GZIPInputStream(con.getInputStream());}byte[] bs = new byte[1024];int len = -1;OutputStream os = new FileOutputStream(f);try {while ((len = is.read(bs)) != -1) {os.write(bs, 0, len);}} finally {try {os.close();} catch (Exception ex) {}try {is.close();} catch (Exception ex) {}}return f;} catch (Exception ex) {ex.printStackTrace();return null;}}}

 

#1 cookie的处理import urllib2, cookielibcookie_support= urllib2.HTTPCookieProcessor(cookielib.CookieJar())opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)urllib2.install_opener(opener)content = urllib2.urlopen('http://XXXX').read() #2 用代理和cookieopener = urllib2.build_opener(proxy_support, cookie_support, urllib2.HTTPHandler) #3 表单的处理import urllibpostdata=urllib.urlencode({    'username':'XXXXX',    'password':'XXXXX',    'continueURI':'http://www.verycd.com/',    'fk':fk,    'login_submit':'登录'}) req = urllib2.Request(    url = 'http://secure.verycd.com/signin/*/http://www.verycd.com/',    data = postdata)result = urllib2.urlopen(req).read() #4 伪装成浏览器访问headers = {    'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}req = urllib2.Request(    url = 'http://secure.verycd.com/signin/*/http://www.verycd.com/',    data = postdata,    headers = headers) #5 反”反盗链”headers = {    'Referer':'http://www.cnbeta.com/articles'}

 

#6 多线程并发抓取 from threading import Threadfrom Queue import Queuefrom time import sleep#q是任务队列#NUM是并发线程总数#JOBS是有多少任务q = Queue()NUM = 2JOBS = 10#具体的处理函数,负责处理单个任务def do_somthing_using(arguments):    print arguments#这个是工作进程,负责不断从队列取数据并处理def working():    while True:        arguments = q.get()        do_somthing_using(arguments)        sleep(1)        q.task_done()#fork NUM个线程等待队列for i in range(NUM):    t = Thread(target=working)    t.setDaemon(True)    t.start()#把JOBS排入队列for i in range(JOBS):    q.put(i)#等待所有JOBS完成q.join()

 

转载地址:http://kggya.baihongyu.com/

你可能感兴趣的文章
zabbix 监控docker
查看>>
传播行为
查看>>
CCF NOI1140 高精度乘法
查看>>
如何制定绩效计划
查看>>
安装Microsoft Dynamics CRM 2011时出现“Microsoft.Crm.Setup.Common.Analyzer+CollectAction 操作失败”的解决办法...
查看>>
js异步编程终级解决方案 async/await
查看>>
Android Studio 更新
查看>>
让urllib2的DNS亦通过Proxy查询
查看>>
transient和synchronized的使用
查看>>
Hello World
查看>>
277 div2 C Palindrome Transformation
查看>>
How to view file history in Git?
查看>>
WP7 电话转发应用 MessageTel
查看>>
python 中运算符 “//”、“ /”和“%”的比较
查看>>
1100 Mars Numbers
查看>>
netcore log4相关
查看>>
大学哪些课让你觉得真是白上了?
查看>>
cookie操作
查看>>
Openstack安全规则说明
查看>>
Orchard官方文档
查看>>