Getting newspaper article data and saving it to a SQL database

 This week, I decided I would like to learn more about working with text, starting by building some wordclouds based on Newspaper data. At the same time, I wanted to improve my knowledge of collecting data and saving it to a database.

My main goal here was to get and save some interesting data I could work with. I thought newspaper articles for the months of 2020 could be interesting to see the main subjects in the news each month.

Initially, I looked at different newspaper archives out there for articles. There are many great ones, but not many I could find that offer an API. I discovered that the New York Times does have an API with data related to their articles.

This is an article that documents my learning journey, getting the data from the API and saving it to a SQLite


The APIs are accessible at https://developer.nytimes.com/apis

I found the documenation for the APIs (including the FAQ at https://developer.nytimes.com/faq) very good for getting started.

There are a number of different APIs available from the New York Times, including the Books API, which has information on best sellers lists and book reviews, the Film API, for film reviews, and the Community API, which has information on user generated content such as comments.

In this article, I am using the Archive API (https://developer.nytimes.com/docs/archive-product/1/overview). If you pass a year and month to the API, it returns that month’s article data. The article data tells us many things, including where the full article is available on the NYT website, when it was published, its headline and its lead paragraph.

There are three steps I go through here. I:

  1. Connect to the API to explore the data available and see what data I want to save.
  2. Build a database table based on the data I want to save.
  3. Get data for each month of the year and save to the DB.


_______________________________________________________________________________

More info:

Website link: https://databasescientist.org/

Contact us:  contact@databasescientist.org

Nomination Link: https://databasescientist.org/award-nomination/?ecategory=Awards&rcategory=Awardee

_______________________________________________________________________________

social media:

Twitter: https://x.com/databasesc10061

Pinterest: https://in.pinterest.com/databasescientist/

Linked in: https://www.linkedin.com/in/databasescientist-database-440a12365/

You tube: https://www.youtube.com/@databasescientist


Comments

Popular posts from this blog

Large Language Models and Vector Databases for News Recommendations

Memory Management in Flutter: Best Practices and Pitfalls

NIH autism database announcement raises concerns among researchers