CS 5293 Spring 21


This is the web page for Text Analytics at the University of Oklahoma.

View the Project on GitHub oudalab/cs5293sp21

Syllabus CS 5293

Text Analytics (Spring 2021)

Class hours: Tuesday/Thursday 9:00 - 10:15am
Online, In


Dr. Christan Grant

Teaching Assistant

Oluwasijibomi A. Ajisegiri (SJ)

Note: Any email messages to the professors or teaching assistants must include cs5293 in the subject line. Any email without this string in the subject line will likely be filtered as junk.

External Tutors

The William Kerber Teaching Scholars will be available for questions and help for several CS topics.

Nathan Huffman, Jennifer Pham, Jack Schwarz, Ethan Womack are available at the following times:

Day Times
Mondays 12p - 4p
Tuesdays 9:00a - 10:45a
12:30p - 4:30p
Wednesdays 12p - 4p
Thursdays 9a - 10:45a
12:30p - 4:30p
Fridays 12:00p - 3:20p

The Zoom ID is 930-4068~6553.

cstutoring is the code word.

The above times are subject to change.


Students must have a working understanding of statistics and data structures, in addition to a select set of software skills. The prerequisite courses/skills are listed here: - Statistics: MATH 4743 or MATH 4753 or ISE 3293 or ISE 5013 - Data Structures/Discrete Structures: CS 5005 or CS 2413 and CS 2813 - Software skills: Students should be well versed in Java and/or C++ and also familiar with at least one scripting language such as Python. Students should also be comfortable working with the GNU/Linux command line.

Undergraduate students with a 3.5 GPA or higher may enroll with permission from the instructor.

Course Description

Text Analytics are the methods and techniques used to extract useful knowledge from text to support decision making. This field includes a collection of research from the natural language processing, databases, data mining, and machine learning communities. The aim of this course is to be a primer for text analytics theory and practice. After taking this course, students will have an understanding of how to independently obtain, parse, and analyze textual information for organizations that want to extract valuable insights.

Topics discussed in the course include: obtaining data sets, understanding data formats, duplicate detection, cleaning data sets, tagging, indexing and search, evaluating algorithms, classification, clustering, topic modeling and entity resolution. Time permitting, we may discuss advanced topic such as relation extraction, slot filling, knowledge graphs, knowledge base construction, the semantic web, question answering or other cutting-edge topics.

Lectures will be a mix of traditional lectures, class discussions, videos and other activities. Participation is required to get the most out of the class.

Learning Management System

We will use the Canvas learning system. This course website can be reached through canvas.ou.edu. Please check this system regularly to keep informed of all announcements, updates, and changes. Important course information will also be distributed through the course website.

Course Materials

Required Textbooks:

Computer Accounts and Software

Increasingly, software is developed and executed in “the cloud”. This semester the class will make heavy use of a popular cloud infrastructure. Students will be able to deploy virtual machines with various configurations, on the fly. Credentials for using this infrastructure will be distributed after the first week of class. For questions and issues using this software, students should use the in-class discussion board. All students enrolled in class should also have a CS account and access to a Linux-based systems in the CS department. For most computer science students, an account will be automatically created. All code written for this course MUST run using the compilers or interpreters that will be specified for the assignments. It is your responsibility to ensure that your code runs on these systems. For compatibility reasons, we recommend developing and testing on a Linux-based machine.

Course Policies

Proper Academic Conduct


Points for this class will come from a variety of sources. The different components are weighted as follows:

Activities 30%
Discussions 30%
Projects 40%

Activities will be assigned approximately every week. These may be coding assignments, essay questions, or other activities.

Discussion topics will be assignmend bi-weekly. The discussions will take place on the canvas discussion boards but students may also comment on the discussion topic in class to receive credit. Discussion may ask your oppinion on a topic, or they could ask you to performa a task and “report back.”

Approximately four Projects will be given over the course of the semester. These project will require a substantial amount of planning, programming and debugging. We encourage you to budget your time well for these. The projects will be due at 11:45 pm CST on the day indicated in the class schedule.

Submission Format

Unless otherwise specified, for written student submissions, should only be .txt files, portable document format .pdf, or Markdown.md. Files of type .doc, .docx, or .rtf will not be accepted. Compressed files should be of type .gz or .tar.gz. Files of the .rar format will not be accepted. Other file types, particularly coding files, may be used in the class. The expected file type will be stated. Often, files packaged under non-Unix/Linux flavored operating systems, such as Windows, have a non-negative number of compatibility issues with our grading systems. If the graders cannot open files for these reasons, the assignment will not receive credit.

Late Policy

We aim to have a flexible policy. We are in a pandemic and we understand that students will have difficulty completing assignments during their designated time. It is also important that students are able to make the most effort to complete the assignment before solutions are discussed. Therefore, deadlines for assignments will be posted and we will expect them to be completed before they are graded or before the topic is discussed. This variable amount of lag time should allow time for students to handle any last minute emergencies.

Final Grade Scale

Grade cut-offs will be at or below the traditional 90, 80, 70, etc. cut-offs.

Grade questions

Please note that when an exam/assignment is brought with grading questions, we may examine the entire exam/assignment and your final grade may end up lower.

Canvas Grade Summary

Canvas has a grade book that is used to store the data that are used to calculate your course grade. It is the responsibility of each student in this class to check their grades on Canvas after each assignment is returned. If an error is found, bring the graded document to any of the instructors or TAs, and we will correct Canvas.


Specific Outcomes of Instruction

By the end of the semester, the students will increase their:

Course Evaluations

The College of Engineering utilizes student ratings as one of the bases for evaluating the teaching effectiveness of each of its faculty members. The results of these forms are important data used in the process of awarding tenure, making promotions, and giving salary increases. In addition, the faculty uses these forms to improve their own teaching effectiveness. The original request for the use of these forms came from students, and it is students who eventually benefit most from their use. Please take this task seriously and respond as honestly and precisely as possible, both to the machine-scored items and to the open-ended questions.

Reasonable Accommodation

The University of Oklahoma is committed to providing reasonable accommodation for all students with disabilities. Students with disabilities who require accommodations in this course are requested to speak with the professor as early in the semester as possible. Students with disabilities must be registered with the Office of Disability Services prior to receiving accommodations in this course. The Office of Disability Services is located in the University Community Center at 730 College Avenue; the phone is 405-325-3852 or TDD only is 403-325-4173.

Should you need modifications or adjustments to your course requirements because of documented pregnancy-related or childbirth-related issues, please contact me as soon as possible to discuss. Generally, modifications will be made where medically necessary and similar in scope to accommodations based on temporary disability. Please see http://www.ou.edu/eoo/faqs/pregnancy-faqs.html for commonly asked questions.

##Title IX Resources For any concerns regarding gender-based discrimination, sexual harassment, sexual misconduct, stalking, or intimate partner violence, the University offers a variety of resources, including advocates on-call 24.7, counseling services, mutual no contact orders, scheduling adjustments and disciplinary sanctions against the perpetrator. Please contact the Sexual Misconduct Office 405-325-2215 (8-5, M-F) or OU Advocates 405-615-0013 (24.7) to learn more or to report an incident.

Technical Support

For OU IT support, please phone (405) 325-HELP. For help with issues pertaining to any CS department machines (in room DEH 115).

This syllabus is subject to change. Students are responsible for any changes/additions to this syllabus announced during the semester.


Key Class Resources

Dates and details in the syllabus and schedule are subject to frequent change, please check regularly. Major changes will be announced on Canvas.

External Resources




This page is available online at: https://oudalab.github.io/cs5293sp21