Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Create a blog post to explain the structure of an IPUMS extract #26

Open
6 tasks
00krishna opened this issue May 20, 2024 · 2 comments
Open
6 tasks
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@00krishna
Copy link
Collaborator

00krishna commented May 20, 2024

Issue Description

Write a blog post that explains the structure of basic IPUMS data extracts. IPUMS extracts come in different forms. The most basic extract form involves downloading a DDI (.xml) file and a data DAT compressed archive. The DDI file contains metadata about the variables in the extract--such as the variable names, data types, data ranges, etc. The DAT file contains only a fixed width format of numbers--never text.

The second type of extract is the NHGIS file, which contains a shapefile (.shp) containing both the GIS map of the selected geometries (city, state, county, etc) and data variables, and a CSV file containing just the variable information per geometric unit.

The post should explain the format of the extracts and the information contained in each component. The intent is that in subsequent blog posts, the author can explain the code for extracting information from these files without having to explain the structure of the extract at the same time.

Difficulty: Beginner

Time: 6 - 8 hours

Requirements

  • Explain the different types of data downloads: DDI + DAT and NHGIS format
  • Explain the contents of the DDI file and what is meant by metadata: such as data types, names, and special characters used in the data.
  • Explain that the DAT file is a fixed with numeric format and needs to be parsed with the metadata. That is the reason why packages such as IPUMS.jl are necessary.
  • Explain the meaning of special characters such as the missing data or Not-In-The-Universe characters that are part of the metadata.
  • Explain the components of an NHGIS extract including the separate shapefile and CSV file.
  • Explain that these data extract can be downloaded from the IPUMS website or they can be downloaded through the IPUMS.jl function.

Expected Outcomes

The anticipated outcome is a blog post, written in Markdown, that contains the elements listed above. This blog post is more informative and non-technical, so there is no reason to show a lot of code. Using code and the IPUMS.jl package will come in a subsequent blog post.

Additional Notes

Additional information about the structure of IPUMS extracts is available on the IPUMS website. Some good sources of information include.

Other Resources

Julia Slack:

  • documentation channel - you should post here first
  • helpdesk channel - this would be to get more attention to your issue but maybe not as precise as you need.
  • health-and-medicine channel - this is where most of JuliaHealth is located these days.

Julia Discourse - I would advise posting here if you have an issue that you feel is long or requires a lot of time to explain as you might lose it within Julia Slack. Consider cross-posting your forum post to the Julia Slack in helpdesk and/or documentation.

@00krishna 00krishna added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels May 20, 2024
@TheCedarPrince
Copy link
Member

This looks like a good write-up @00krishna. I actually think this might be better to put into the documentation of the package versus a separate blog post on the JuliaHealth website. What's your opinion Krishna?

@00krishna
Copy link
Collaborator Author

@TheCedarPrince Sure, this can go directly into the package documentation as a tutorial. That sounds fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants