photogrammar

Code for getting and exploring the photogrammar data.

For example, we download a list of all the photo ids (these uniquely define the urls for scraping the rest of the data, by running the following code:

python src/get_photo_ids.py

This creates a file pickle/all_urls.p, a python pickle file. Now we can run the code to download MARC records from the Library of Congress website for all photo ids in the all_urls.p file. This is done by:

python src/get_marc_records.py

When finished, there should be files in the marc_records directory, such as 'marc_recordsfsa1997000988.csv'. Now, to finish the first stage of the scrape, we download the image urls using a similar syntax:

python src/get_img_urls.py

Which will create text files in the directory 'img_url' such as 'img_url/fsa1997000987.txt' which contain the urls of the photo images.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
marc_records		marc_records
pickle		pickle
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

photogrammar

About

Releases

Packages

Languages

License

nolauren/photogrammar

Folders and files

Latest commit

History

Repository files navigation

photogrammar

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages