Static Visualization Project

For the first half of the course you will be building a suite of static visualizations on a topic of your choosing. This will be a small suite of 8-12 images connected by a narrative.

To get an idea of what this looks like, take a look at examples of some static visualizations from 2021. They had slightly different requirements, but these capture the spirit of what we're trying to create here.

The final deliverable will consist of a document (HTML or PDF) that weaves your 8-12 visualizations into a narrative of some kind. This can take the form of an article, blog post, infographic, poster, or anything else that allows you to combine your visualizations with your written explanations.

It may be helpful to know for planning purposes: In the second half of the quarter, you will work on interactive visualizations. You may decide to use the same data sources, or pick a different topic altogether.

Milestone 1: Proposal

For your proposal you will:

select a policy area of interest to you;
explore what data is available; and
decide on a goal. What do you hope to show with your suite of visualizations?

Some things that might be useful to consider:

Availability of data. The less time you have to spend cleaning & preparing your data the better. You will need a high quality source of open data, something that does not require scraping or other time-consuming efforts. It is essential that you take some time to verify assumptions you are making about data availability, having to pivot mid-quarter will present major challenges.

Uniformity of sources. Be sure that your data sources have a common identifier / unit of analysis. Useful things to look for would be commonly accepted identifiers like zip code or census tract, or data that is provided via a relational API, and thus comes with built-in identifiers for each record.

Different units of observation. You will need different categories of data to create different visualizations. Consider looking for data that has some combination of:

geospatial features
time-series characteristics
data at different hierarchies (e.g. state/county/city)
networks/connections between people/organizations
variety of categorical and numeric variables

Avoiding too much data. There's no "right" number of records, but I would probably be most comfortable in the 10,000-1,000,000 range. Reach out if you feel you'll be significantly outside of that range.

Also consider that many datasets can be sliced by year. That can be a convenient way to scale the size of your work by picking an appropriately sized subset.

Sources of Data

Some possible sources of data:

Data.gov - US Gov data portal
Data is Plural - a newsletter highlighting interesting data sets
Kaggle - curated datasets for ML tasks
City of Chicago Data Portal - most large cities have their own as well!
Census.gov Data - great for supplemental demographics and/or geospatial data
UNC Dataverse
Information is Beautiful
Google Dataset Search

Submission

A form will be shared via Ed which will ask you to submit a Git repository.

Your repository should include a 'proposal.md' file that contains:

Your name.
A description of your subject area of choice. (1-2 paragraphs)
Your likely data sources. For each, include:
- a URL
- short description (2-3 sentences)
- estimates of how many rows/columns the source will provide
A list of any questions you have for us.

Grading Specifications

To earn an S on this assignment it must be complete. If any portion is missing (e.g. for data sources, you include URLs but no description of how you will use the data) you will receive an N.

You are not being graded on having the right answers to any of these questions, this is to get you thinking about it early— and for us to get you feedback.

Milestone 2: Draft

You will have two weeks to go from your initial proposal to a working draft.

This milestone serves as a check-in to ensure you are on track to complete the assignment and as an opportunity obtain valuable peer feedback that will enhance your final product.

Specifications

To receive an S you must:

Have at least 8 images with least 5 truly distinct types (e.g. choropleth, scatter plot, network graph, bar chart)
At least half should be using real data. (Ideally, all data acquisition/cleaning would be done.)
Each image should:
- Use appropriate visual encodings.
- Include all essential elements (titles, labels, etc.)
- Effort should be put in to visual design. (Not be using default themes.)
- Have a few sentences explaining the context; a draft of how the image will fit your final narrative.
- Generally, be in a state where specific feedback would be useful.
Include Python code that generates your graphs. This can be in Python files, Jupyter notebooks, or any other format for now.
Apply a theme to all images that is distinct from the defaults and appropriate to your intended usage.
- You may opt to use up to two themes, for the purpose of obtaining feedback.
Answer the following questions in static_draft/about.md:

# Project Name

Your Name

## What is your current goal? Has it changed since the proposal?

## Are there data challenges you are facing? Are you currently depending on mock data?

## Describe each of the provided images with 2-3 sentences to give the context and how it relates to your goal.

## What form do you envision your final deliverable taking? (An article incorporating the images? A poster? An infographic?)

Non-Factors

A few things that you will not be graded on at this point:

Quality of narrative, beyond the inclusion of some explanatory sentences as noted above.
Quality of design choices, beyond fulfillment of requirements stated above.
Cohesiveness, which is only a factor in the final submission.
Code quality, comments.

Milestone 3: Critique

You will provide feedback to a group of your peers, and receive feedback on your own work.

Understanding how other people see your work is essential to finding its flaws and strengthening it. Learning to give and receive critique is an important skill in itself.

Critique should:

Not be cruel. Follow guidelines given in class.
Include specific constructive suggestions.
Use context of the project to ground suggestions. (e.g. "Since the audience you mentioned wanting to reach X, I think Y.")

Your group should plan to meet during Week 4 and exchange feedback. Your group will be asked to submit a document describing the meeting.

Your good faith participation in your group's critique process will earn you an S.

Reductions in this grade:

Group feedback that your participation in the critique process did not meet expectations.
If your Milestone 2 is late enough that you are not able to participate in critique, it will result in a reduced grade for this assignment.

Milestone 4: Final Submission

Your final submission will be a web page or PDF that incorporates your images into something that addresses your goal.

Your project by now should be using real data, have incorporated feedback from the critique process, and been pulled together with a consistent visual theme.

You will receive feedback in three different areas:

Grading Specification: Visual Design

This portion of the grade looks at how well you apply the principles of visualization design discussed in class. To receive an S:

At least 8 images.
Of at least 5 distinct types.
No significant presentation errors: missing elements, misleading axes, improper use of graph type, etc.
Demonstration of critique feedback incorporated into final product.

Grading Specification: Narrative

This portion of the grade focuses on your incorporation of your images into a larger narrative. To receive an S, your narrative will be evaluated for:

Appropriate use of real data. (No mocked data at this stage.)
Writing and visuals are clear and relevant to stated goal.
Narrative is well-supported by appropriately chosen graphs.
Cohesive visual design between images and supplemental content.
Can not be: interactive, or a notebook of any kind.

Grading Specification: Code Quality / Documentation

Finally, for your code quality and documentation. An S requires:

Your repository should make use of at directories to keep it managable, I'd suggest:
data/ - All data (and optionally, data-fetching code).
- If your data will exceed 100MB, please reach out to discuss options on how to share your data with us.
scratch/ - Anything in this directory won't be graded, useful for experimentation.
src/ - Code that you would like us to include in final evaluation, should at minimum generate all graphs from included data.
static_draft/ - Files specific to the second milestone.
static_file/ - Files specific to the final milestone.
Code may be:
- One or more .py files with clear instructions on how to generate all graphs.
- An ipython or marimo notebook.
In either case, code (non-scratch) must be:
- correct (without major error in how Python is used)
- appropriately commented
- in accordance with the Python Style Guide as amended by Style Addendum.
A README.md file that contains:
- your project name
- your name
- a description of your project
- an embedded screenshot of your final project
- a ## Data Sources section that cites all sources and explains data usage/licenses

Example repository structure

├── LICENSE
├── README.md
├── data
│  ├── crown_exxon_ash.json
│  ├── download_census.py
│  └── fips.csv
├── scratch
│  ├── altair_theme_demo.py
│  └── exploration.ipynb
├── src
│  ├── graphs_marimo.py
│  └── theme.py
├── static_draft
│  ├── about.md
│  └── exploration-draft.ipynb
└── static_final
   ├── final.pdf
   └── zine_layout.svg

Style Addendum

The standards for code written for data science contexts, in particular in notebooks is somewhat different from larger software applications where code is expected to be maintained for longer periods.

Your final code submission should follow the UChicago Python Style Guide, with these two amendments:

You may use global variables in moderation. A common reason for this is to load & mutate a single master data set that is then used in the remainder of the file. That usage is acceptable, but be sure to use ample commenting to ensure code remains readable.

Notebooks must execute sequentially. Jupyter notebooks allow non-sequential execution of cells, this can be confusing and lead to different results during development vs. subsequent runs. If you are using Jupyter, be absolutely certain that your notebook can run by restarting the kernel and running cells in order.