This is the web page for Text Analytics at the University of Oklahoma.
Due 02/07/2019
For this assignment, we will make a small python program using the cloud resources that we will have for this class. You need to sign up for the Google Cloud using the directions previously given. Sign up for GitHub. Also, register as a student so you get unlimited private repositories.
After completing this assignment, you will have experience creating a python project, retrieving documents from the web, reading and parsing semi-structured data, and running test. Read this document in full before starting. Each section specifies steps you need to take to complete the homework.
Be sure to submit the form with your static ip address of your instance. https://goo.gl/forms/k862rsgaXLEJDsiP2
If you have not created your instance, the instructions available here: https://oudalab.github.io/textanalytics/documents/cloud-config.pdf
Fill out this form to sign up for GitHub: https://goo.gl/forms/4GzEH6Rg5PN00c933
Create a private repository called cs5293sp19-assignment0
Add collaborators cegme
and chanukyalakamsani
by going to Settings > Collaborators
.
Then go to your instance, create a new folder /hw
and clone the repository in that folder. For example:
cd /
sudo mkdir /hw
sudo chown `whoami`:`whoami` /hw
chmod 777 /hw
cd hw
git clone https://github.com/cegme/cs5293sp19-assignment0
Your code should be in the /hw/cs5293sp19-assignment0/
directory of your instance. For code struction should look like the structure below.
cs5293sp19-assignment0/
├── COLLABORATORS
├── LICENSE
├── Pipfile
├── Pipfile.lock
├── README.md
├── assignment0
│ ├── __init__.py
│ └── main.py
├── docs
├── setup.cfg
├── setup.py
└── tests
├── test_download.py
└── test_random.py
You can create each initial file and folder using the touch
and mkdir
commands. The Python packaging how to page has more example of proper python packages
https://python-packaging.readthedocs.io/en/latest/minimal.html
from setuptools import setup, find_packages
setup(
name='assignment0',
version='1.0',
author='Your Name',
authour_email='your ou email',
packages=find_packages(exclude=('tests', 'docs')),
setup_requires=['pytest-runner'],
tests_require=['pytest']
)
Note, the setup.cfg
is necessary to run pytests. It should have at least the following text inside:
[aliases]
test=pytest
[tool:pytest]
norecursedirs = .*, CVS, _darcs, {arch}, *.egg, venv
The first line allows you to use the python setup.py test command. When pytest runs, it will ignore all test files in the folders with names under tool:pytest.
Create the Pipfile using the command pipenv install --python 3.7
. Note for this project we will use Python 3.7 so be sure to install it using pyenv.
If you do not have python 3.7 on your system, revisit the start up script given in the document config file: https://oudalab.github.io/textanalytics/instance/startup.sh
from . import main
def promise():
text = main.download()
promises = main.extract_requests(text)
titles = main.extract_titles(promises)
randomtitle = main.random_title(titles)
print(f"A promise: {randomtitle}")
This file allows the directory to be treated as a python module. In most cases, these files will be empty but in this case, we will use our module file to make it easy to run the program.
Here is an example output:
user@testinstance:/hw/cs5293sp19-assignment0$ pipenv run python
Python 3.7.2
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import assignment0
>>> assignment0.promise()
A promise: Tell Ford Motor Co.'s president that unless he cancels plans to build a massive plant in Mexico, the company will face a 35 percent tax on cars imported back into the United States. Trump is confident he can get this done before taking office. (Last year he incorrectly said this had already happened.)
>>> assignment0.promise()
A promise: Ensure that Americans can still afford to golf.
>>>
In the tests folder, add a set of files that test the different features of your code. Test do not have to be too creative but they should show that your code works as expected. There are several testing frameworks for python, for this project use the py.test
framework. For questions use the message board and see the pytest documentation for more examples
http://doc.pytest.org/en/latest/assert.html . This tutorial give the best discussion of how to write tests https://semaphoreci.com/community/tutorials/testing-python-applications-with-pytest .
Install the pytest in your current pipfile. You can install it using the command pipenv install pytest
. To run test, you can use the command pipenv run python -m pytest
. This will run pytest using the installed version of python. Alternatively, you can use the command pipenv run python setup.py test
.
Below are supplied test files
import pytest
import assignment0
from assignment0 import main
def test_download_sanity():
assert main.download() is not None
def test_download_size():
assert len(main.download()) == 135131
import pytest
import assignment0
from assignment0 import main
def test_extract_request():
text = main.download()
promises = main.extract_requests(text)
assert len(promises) == 174
title0 = "Propose a Constitutional Amendment to impose term limits on all members of Congress"
def test_extract_titles():
text = main.download()
promises = main.extract_requests(text)
titles = main.extract_titles(promises)
assert titles[0] == title0
def test_random_title():
text = main.download()
promises = main.extract_requests(text)
titles = main.extract_titles(promises)
randomtitle = main.random_title(titles)
assert type(randomtitle) == str
assert len(randomtitle) > 0
README.md
fileThe readme file should be all uppercase with either no extension or a .md
extension. You should write your name in it, and example of how to run it, and a list of any web or external resources that you used for help. Note that you should not be copying code from any website not provided by the instructor.
This file should contain a comma separated list describing who you worked with and a small text description describing the nature of the collaboration. This information should be listed in three fields as in the example is below:
Katherine Johnson, kj@nasa.gov, Helped me understand calculations
Dorothy Vaughan, doro@dod.gov, Helped me with multiplexed time management
Create a python package containing a function that downloads the campaign promisses and prints a random one to the screen with each function call. The skeleton code is given below. It is your job to write the supporting functions.
import json
import random
import urllib.request
# Python3 type hints
# https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html
from typing import List, Dict, Any
url = "https://raw.githubusercontent.com/TrumpTracker/trumptracker.github.io/master/_data/data.json"
def download():
""" This function downloads the json data from the url."""
# TODO add code here
return ""
def extract_requests(text: str) -> List[Dict[str, Any]]:
"""
This function turns the json data into a dict object and
extracts and returns the array of promises.
"""
# TODO add code here
return []
def extract_titles(promises: List[Dict[str, Any]]) -> List[str]:
""" Make a new array with just the titles. """
# TODO add code here
return []
def random_title (titles: List[str]) -> str:
""" This function takes list of titles and returns one string at random. """
# TODO add code here
return []
When ready to submit, create a tag on your repository using git tag:
git tag v1.0
git push origin v1.0
We will use this tag and the code on your instance to evaluate your assignment.
Task | Percent |
---|---|
Cloud is configured | 20% |
Code compiles | 30% |
Code Passes all test cases | 50% |
Total | 100% |
If you are interested in adding continuous integration to your code. Add a file named .travis.yml
with the contents below:
language: python
dist: xenial
python:
- '3.7'
install:
- 'pip install pipenv'
- 'pipenv sync'
script: 'python -m pytest'
This travis-ci website has more information: https://docs.travis-ci.com/user/languages/python/
Back to CS3113