Tutorial III: Retrieving a large number of articlesΒΆ

Now that we have learned to ping several APIs for a single article, we will repeat the procedure for a large number of articles. In this example the number of articles we would like to retrieve is 20 from each API.

Often, we are looking for hundreds of articles. Rather than asking the API for all the results at once, the APIs offer a paging mechanism through start and records. That way we can receive chunks of the result set at a time. start defines the index of the first returned article and records the number of articles returned by the query.

>>> for p in [arcas.Ieee, arcas.Plos, arcas.Arxiv, arcas.Springer, arcas.Nature]:
...    for start in range(2):
...
...        api = p()
...        parameters = api.parameters_fix(title='Game', abstract='Game',
...                                        records=10, start=(start * 10))
...        url = api.create_url_search(parameters)
...        request = api.make_request(url)
...        root = api.get_root(request)
...        raw_article = api.parse(root)
...
...    for art in raw_article:
...        article = api.to_dataframe(art)
...        api.export(article, 'results_{}.json'.format(api.__class__.__name__))

In our example this might not seem as an important difference. But assume you were asking for a hundred of articles. Some APIs have a limited number of articles that be can returned, thus using this practice we avoid overloading the API.

Note that you need to require a key before being able to use arcas.Ieee and arcas.Springer.