python-28: web scrapper

2022-12-30 2 분 소요

`from requests import get`

requests로 부터 get 함수 가져오기 (get은 웹사이트를 받아옴)

base_url = "https://weworkremotely.com/remote-jobs/search?utf8=%E2%9C%93&term="
search_term = "python"

response = get(f"{base_url}{search_term}")

print(response.text)

사이트 주소를 base_url에 저장,
검색어를 search_term에 저장
get안에 변수 base_url과 search_term을 붙여넣기 위해 f”“를 사용함
(문자열 안에 변수를 넣을 수 있음)
get()을 response로 저장하고, response의 text만 받아오면 데이터를 받아올 수있음

response만 받아오면 status_code를 반환함.(정상일 경우 200을 반환)

`if`문을 이용해서 웹사이트에 접근할때 오류에 대비 할 수 있음

if response.status_code != 200:
    print("Can't request website")
else:
    print(response)

만약 response의 status_code가 200이 아닐 경우
Can’t request website를 반환하고,
그렇지 않을경우(정상일경우) response를 보여주기

웹페이지를 더 쉽게 검색할 수 있게 해주는 beautiful soup 사용하기

soup = BeautifulSoup(response.text, "html.parser")

html_doc을 첫번째 인수로받고 ‘html.parser’라는 문자열을 두번째 인수로 받음
html_doc은 웹페이지의 text를 뜻하고, ‘html.parser은 beautifulsoup한테 html을 보내준다는 뜻

`find_all`

find_all('A', 'B' )

B라는 class를 가진 A tag를 모두 찾기
A라는 tag만 적을 수 있음

from requests import get
from bs4 import BeautifulSoup

base_url = "https://weworkremotely.com/remote-jobs/search?utf8=%E2%9C%93&term="
search_term = "python"

response = get(f"{base_url}{search_term}")

if response.status_code != 200:
    print("Can't request website")
else:
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup.find_all("section","class_=jobs"))

job라는 클래스의 section 태그 모두 찾기
find_all은 argument가 많기때문에, argument의 이름을 직접 적어줌(class_=)

len

len은 list나 tuple의 개수를 보여줌

from requests import get
from bs4 import BeautifulSoup

base_url = "https://weworkremotely.com/remote-jobs/search?utf8=%E2%9C%93&term="
search_term = "python"

response = get(f"{base_url}{search_term}")

if response.status_code != 200:
    print("Can't request website")
else:
    soup = BeautifulSoup(response.text, "html.parser")
    jobs = soup.find_all("section", "jobs")
    print(len(jobs))

for in

list의 요소들을 변수로 만들어줘서 검사 함

  for job_section in jobs:
    job_posts = job_section.find_all("li")

list.pop(n)

list의 요소 제거

    job_posts.pop(-1)

list[n]

list의 요소 불러오기

for post in job_posts:
      anchors = post.find_all("a")
      anchor = anchors[1]
      link = anchor['href']

list의 요소 불러올 때 숫자(1)로 표현할수도 있음
ahchors[1]이 가져온 데이터는 html인데 beautifulsoup는 html을 dictionary처럼 만들 수 있음.
dictionary를 가져 올 때 anchor[‘href’]로 표현할 수도 있음

object의 list를 가지고 있고 list의 길이를 알고 있으면 list에서 object를 가져올 수 있음

``py company, kind, region = anchor.find_all(‘span’, class_=”company”) print(company, kind, region) title = anchor.find(‘span’, class_=’title’)

- class가 company인 span의 개수가 3개 인 것을 알고 있으니 그에 맞게 이름을 지어주면 됨

![image](https://user-images.githubusercontent.com/111720411/209949601-13b7658e-2ce0-401d-9100-ecc64ce9816e.png)

### string (html요소를 제거하고 text만(문자열) 가져오기)

```py
      print(company.stirng, kind.stirng, region.stirng, title.stirng)

dictionary를 활용해 데이터 저장하기

      job_data = {
        'comapny': company.string,
        'region': region.string,
        'position': title.string
      }

새 list에 저장하고 dictionary로 보기

from requests import get
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=%E2%9C%93&term="
search_term = "python"

response = get(f"{url}{search_term}")

if response.status_code != 200:
  print("Can't login Website")
else:
  result = []
  soup = BeautifulSoup(response.text, 'html.parser')
  jobs = soup.find_all("section", class_="jobs")
  for job_section in jobs:
    job_posts = job_section.find_all("li")
    job_posts.pop(-1)
    for post in job_posts:
      anchors = post.find_all("a")
      anchor = anchors[1]
      link = anchor['href']
      company, kind, region = anchor.find_all('span', class_="company")
      title = anchor.find('span', class_='title')
      job_data = {
        'link': f"https://weworkremotely.com{link}/",
        'company': company.string,
        'region': kind.string,
        'position': title.string
      }
      result.append(job_data)

  for results in result:
    print(results)
    print("//////////")

새 list에 append로 추가해주고,
list를 for in문으로 돌려서
구분자와 함께 출력하기

refactor

extractors라는 폴더를 만들고
그 안에 wwr.py라는 파일을 만든다.
wwr.py안에 아래 코드를 입력한다

from requests import get
from bs4 import BeautifulSoup


def extract_wwr_jobs(keyword):
  url = "https://weworkremotely.com/remote-jobs/search?utf8=%E2%9C%93&term="
  response = get(f"{url}{keyword}")

  if response.status_code != 200:
    print("Can't login Website")
  else:
    result = []
    soup = BeautifulSoup(response.text, 'html.parser')
    jobs = soup.find_all("section", class_="jobs")
    for job_section in jobs:
      job_posts = job_section.find_all("li")
      job_posts.pop(-1)
      for post in job_posts:
        anchors = post.find_all("a")
        anchor = anchors[1]
        link = anchor['href']
        company, kind, region = anchor.find_all('span', class_="company")
        title = anchor.find('span', class_='title')
        job_data = {
          'link': f"https://weworkremotely.com{link}/",
          'company': company.string,
          'region': kind.string,
          'position': title.string
        }
        result.append(job_data)

    return result

main.py에는 아래 코드를 입력해서 다른 폴더의 함수를 호출한다

from requests import get
from bs4 import BeautifulSoup
from extractors.wwr import extract_wwr_jobs

jobs = extract_wwr_jobs("python")
print(jobs)

Twitter Facebook LinkedIn

python-28: web scrapper

`from requests import get`

`if`문을 이용해서 웹사이트에 접근할때 오류에 대비 할 수 있음

웹페이지를 더 쉽게 검색할 수 있게 해주는 beautiful soup 사용하기

`find_all`

len

for in

list.pop(n)

list[n]

object의 list를 가지고 있고 list의 길이를 알고 있으면 list에서 object를 가져올 수 있음

dictionary를 활용해 데이터 저장하기

새 list에 저장하고 dictionary로 보기

refactor

공유하기

댓글남기기

참고

among-us: 게임 맵 꾸미기

among-us: 캐릭터만들기: 에니메이션

among-us: 캐릭터만들기: 조이스틱

among-us: 캐릭터만들기: 터치이동

from requests import get

if문을 이용해서 웹사이트에 접근할때 오류에 대비 할 수 있음

웹페이지를 더 쉽게 검색할 수 있게 해주는 beautiful soup 사용하기

find_all

len

for in

list.pop(n)

list[n]

object의 list를 가지고 있고 list의 길이를 알고 있으면 list에서 object를 가져올 수 있음

dictionary를 활용해 데이터 저장하기

새 list에 저장하고 dictionary로 보기

refactor

공유하기

댓글남기기

참고

among-us: 게임 맵 꾸미기

among-us: 캐릭터만들기: 에니메이션

among-us: 캐릭터만들기: 조이스틱

among-us: 캐릭터만들기: 터치이동

`from requests import get`

`if`문을 이용해서 웹사이트에 접근할때 오류에 대비 할 수 있음

`find_all`