22017,

1/1101

W.H.

드디어.. 드디어 만든 파이썬 처음 배운지48시간 내에 만든 프로그램 - 무게타 소설 백업플그램

http://www.hackerschool.org/HS_Boards/zboard.php?AllArticle=true&no=19736 [복사]

http://blog.naver.com/aaaa875/110102871580

소스는 정말 지저분해도 할일은 합니다 ㅋㅋ

무게타로 접속하기 위해서 헤더 쫌 변조해주고(아이폰으로)

다음 파일을 받은 뒤 파싱으로 본문을 뜬 후

다시 파싱으로 다음페이지를 떠서 이동

금요일 파이썬 책 대여
토요일 이 프로그램 완성
ㅋㅋㅋㅋ

# -*- coding: utf-8 -*-

import httplib
import re
import string

n = 1
next_page = raw_input('소설 시작 페이지 /?PAGEKEY= 뒤의 부분 주소를 적어주세요')
text_page = raw_input('몇페이지입니까? ')
text_name = raw_input('저장할 이름은? .txt포함해서 ')
site_cookie = raw_input('쿠키값중 user_no값을 넣어주세요')
text_save = file(text_name,'w')
text_save.close()
n = 1

while int(n)<=int(text_page):
    print n
    print ' page\n'
        host = 'wr.mugeta.com'
        h = httplib.HTTP(host)
    h.putrequest('GET','/?PAGEKEY='+next_page)
    h.putheader('Host', host)
    h.putheader('User-Agent', 'Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543 Safari/419.3')
    h.putheader('Accept','text/html')
    h.putheader('Cookie',site_cookie)
    h.endheaders()
    errorCode, errorMessage, headers = h.getreply()
        a = h.getfile()
        text_tmp = a.read()
        h.close()
        print text_tmp
    #//정규식_본문_1차
        tmp_a = ''.join(re.findall('"90%" align="center" ?(.*?)</td>',text_tmp,re.DOTALL))
        print '================='
        print tmp_a
        print '================='
    #//정규식_본문_2차 쓸데없는것 제거&변환
        text_main=str(re.sub(''','\'',tmp_a,1000,re.DOTALL))
        text_main=str(re.sub('<br>','\n',text_main,1000,re.DOTALL))
        text_main=str(re.sub('"','"',text_main,1000,re.DOTALL))
        text_main=str(re.sub('','',text_main,1000,re.DOTALL))
        text_main=str(re.sub('    ','',text_main,1000,re.DOTALL))
        text_main=str(re.sub('<tr>','',text_main,1000,re.DOTALL))
        text_main=str(re.sub('<td>','',text_main,1000,re.DOTALL))
        text_main=str(re.sub('<tr>','',text_main,1000,re.DOTALL))
        print text_main
        print '================='

    #//텍스트에 저장
    text_save = file(text_name,'a+')
    text_save.write(text_main)
    text_save.close()

    #//다음페이지 주소 추출
    m = re.findall('<a href=[\'](.*?)[\']',text_tmp,re.DOTALL)
    print m
    next_page = m[1]
    next_page = ''.join(next_page)
    print next_page
    #next_page.replace('<a accesskey="9" href="http://wr.mugeta.com','')
    n = n+1
print 'complete'

Hit : 8591 Date : 2011/02/13 01:10


W.H.	참고로 주석은 귀찮아서 지우지 않았습... 퍽	2011/02/13
두루뭉술	난 언제 저런거 만들어보나	2011/02/13
rkdgh0112	오 저도 무게타 즐겨하는데 소설 서비스 중지됐잖아요 ㅋㅋ..	2011/02/13
ganesha	저도 파이썬 배우려고 오늘 책샀는데 ㅋㅋㅋ	2011/02/13
xodnr631	저도 파이썬 배우고싶어서 도서관에 책 신청했는데 언제 될런지... -_-	2011/02/14