CS 5293 Spring 19

Logo

This is the web page for Text Analytics at the University of Oklahoma.

View the Project on GitHub oudalab/cs5293sp19

Assignment 0 - CS 5293 Spring 2019

Due 02/07/2019

For this assignment, we will make a small python program using the cloud resources that we will have for this class. You need to sign up for the Google Cloud using the directions previously given. Sign up for GitHub. Also, register as a student so you get unlimited private repositories.

After completing this assignment, you will have experience creating a python project, retrieving documents from the web, reading and parsing semi-structured data, and running test. Read this document in full before starting. Each section specifies steps you need to take to complete the homework.

Sign up for GCP an instance

Be sure to submit the form with your static ip address of your instance. https://goo.gl/forms/k862rsgaXLEJDsiP2

If you have not created your instance, the instructions available here: https://oudalab.github.io/textanalytics/documents/cloud-config.pdf

GitHub

Fill out this form to sign up for GitHub: https://goo.gl/forms/4GzEH6Rg5PN00c933

Create a repository for your project on your instance (10 pts)

Create a private repository called cs5293sp19-assignment0

Add collaborators cegme and chanukyalakamsani by going to Settings > Collaborators.

Then go to your instance, create a new folder /hw and clone the repository in that folder. For example:

cd /
sudo mkdir /hw
sudo chown `whoami`:`whoami` /hw
chmod 777 /hw
cd hw
git clone https://github.com/cegme/cs5293sp19-assignment0

Perform the tasks below

Create a python package

Your code should be in the /hw/cs5293sp19-assignment0/ directory of your instance. For code struction should look like the structure below.

cs5293sp19-assignment0/
├── COLLABORATORS
├── LICENSE
├── Pipfile
├── Pipfile.lock
├── README.md
├── assignment0
│   ├── __init__.py
│   └── main.py
├── docs
├── setup.cfg
├── setup.py
└── tests
    ├── test_download.py
    └── test_random.py

You can create each initial file and folder using the touch and mkdir commands. The Python packaging how to page has more example of proper python packages https://python-packaging.readthedocs.io/en/latest/minimal.html

The setup.py file

from setuptools import setup, find_packages

setup(
	name='assignment0',
	version='1.0',
	author='Your Name',
	authour_email='your ou email',
	packages=find_packages(exclude=('tests', 'docs')),
	setup_requires=['pytest-runner'],
	tests_require=['pytest']	
)

The setup.cfg gile

Note, the setup.cfg is necessary to run pytests. It should have at least the following text inside:

[aliases]
test=pytest

[tool:pytest]
norecursedirs = .*, CVS, _darcs, {arch}, *.egg, venv

The first line allows you to use the python setup.py test command. When pytest runs, it will ignore all test files in the folders with names under tool:pytest.

The Pipfile and Pipfile.lock

Create the Pipfile using the command pipenv install --python 3.7. Note for this project we will use Python 3.7 so be sure to install it using pyenv.

If you do not have python 3.7 on your system, revisit the start up script given in the document config file: https://oudalab.github.io/textanalytics/instance/startup.sh

The __init__.py file

from . import main
def promise():
    text = main.download()
    promises = main.extract_requests(text)
    titles = main.extract_titles(promises)
    randomtitle = main.random_title(titles)
    print(f"A promise: {randomtitle}")

This file allows the directory to be treated as a python module. In most cases, these files will be empty but in this case, we will use our module file to make it easy to run the program.

Here is an example output:

user@testinstance:/hw/cs5293sp19-assignment0$ pipenv run python
Python 3.7.2  
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import assignment0
>>> assignment0.promise()
A promise: Tell Ford Motor Co.'s president that unless he cancels plans to build a massive plant in Mexico, the company will face a 35 percent tax on cars imported back into the United States. Trump is confident he can get this done before taking office. (Last year he incorrectly said this had already happened.)
>>> assignment0.promise()
A promise: Ensure that Americans can still afford to golf.
>>> 

Test folder

In the tests folder, add a set of files that test the different features of your code. Test do not have to be too creative but they should show that your code works as expected. There are several testing frameworks for python, for this project use the py.test framework. For questions use the message board and see the pytest documentation for more examples http://doc.pytest.org/en/latest/assert.html . This tutorial give the best discussion of how to write tests https://semaphoreci.com/community/tutorials/testing-python-applications-with-pytest .

Install the pytest in your current pipfile. You can install it using the command pipenv install pytest. To run test, you can use the command pipenv run python -m pytest. This will run pytest using the installed version of python. Alternatively, you can use the command pipenv run python setup.py test.

Below are supplied test files

test_download.py

import pytest

import assignment0
from assignment0 import main


def test_download_sanity():
    assert main.download() is not None


def test_download_size():
    assert len(main.download()) == 135131

test_random.py

import pytest

import assignment0
from assignment0 import main


def test_extract_request():
    text = main.download()
    promises = main.extract_requests(text)
    assert len(promises) == 174


title0 = "Propose a Constitutional Amendment to impose term limits on all members of Congress"
def test_extract_titles():
    text = main.download()
    promises = main.extract_requests(text)
    titles = main.extract_titles(promises) 

    assert titles[0] == title0


def test_random_title():
    text = main.download()
    promises = main.extract_requests(text)
    titles = main.extract_titles(promises) 
    randomtitle = main.random_title(titles)

    assert type(randomtitle) == str
    assert len(randomtitle) > 0

Create a README.md file

The readme file should be all uppercase with either no extension or a .md extension. You should write your name in it, and example of how to run it, and a list of any web or external resources that you used for help. Note that you should not be copying code from any website not provided by the instructor.

Createa COLLABORATORS file

This file should contain a comma separated list describing who you worked with and a small text description describing the nature of the collaboration. This information should be listed in three fields as in the example is below:

Katherine Johnson, kj@nasa.gov, Helped me understand calculations
Dorothy Vaughan, doro@dod.gov, Helped me with multiplexed time management

Write code

Create a python package containing a function that downloads the campaign promisses and prints a random one to the screen with each function call. The skeleton code is given below. It is your job to write the supporting functions.

main.py

import json
import random
import urllib.request

# Python3 type hints
# https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html
from typing import List, Dict, Any

url = "https://raw.githubusercontent.com/TrumpTracker/trumptracker.github.io/master/_data/data.json"


def download():
    """ This function downloads the json data from the url."""
    # TODO add code here
    return ""


def extract_requests(text: str) -> List[Dict[str, Any]]:
    """
        This function turns the json data into a dict object and
        extracts and returns the array of promises.
    """
    # TODO add code here
    return []


def extract_titles(promises: List[Dict[str, Any]]) -> List[str]:
    """ Make a new array with just the titles. """
    # TODO add code here
    return []


def random_title (titles: List[str]) -> str:
    """ This function takes list of titles and returns one string at random. """
    # TODO add code here
    return []

Add git tag

When ready to submit, create a tag on your repository using git tag:

git tag v1.0
git push origin v1.0

We will use this tag and the code on your instance to evaluate your assignment.

Grading Criteria:

Task Percent
Cloud is configured 20%
Code compiles 30%
Code Passes all test cases 50%
Total 100%

(extra) travis continuous integration

If you are interested in adding continuous integration to your code. Add a file named .travis.yml with the contents below:

language: python
dist: xenial
python:
  - '3.7'
install:
  - 'pip install pipenv'
  - 'pipenv sync'
script: 'python -m pytest'

This travis-ci website has more information: https://docs.travis-ci.com/user/languages/python/

Adendum

2019-02-04


Back to CS3113