Sommaire
We encourage you to watch our videos, they have been designed guide you through Sparklis step by step and get control of it you can build your own queries as quickly as possible. Once you have watched a tutorial you may need to read the text and repeat the steps at your own pace. That is why we are reproducing it here.
Getting started with the interface
In the left-hand window, choose the type of entity you are interested in. They are displayed in orange: bibliographic record, book, collection, concept, document, image, issue, person and proceedings.
Once this entity has been chosen, you can qualify it by clicking on a property, in purple, always in the first window. To add other properties, click again on the entity name, then on a new property.
The second window is used to select/specify.
The third window is used to sort and group your query results.
If you make a mistake, click on the right arrow of the program and not on that of your browser.
By clicking on the gearwheel, you can change some parameters, including the number of result lines.
Once query is ready, you can copy a permalink that will allow you to share or find it, you can also view the Sparql request that the program generated by clicking on the Yasgui view button. Using this screen, you can download your results table in csv format and access the Google graph menus.
Initial query
How to find articles by a particular author?
Of course, if we only want the list of articles of an author, it would be more logical to do this search on the Persee.fr portal, but this request will be the starting point for other, more complex queries, which can only do used in data.persee.fr
I therefore request all the documents, in the left window, I then click on “has an identifier” to obtain their identifier. I want to know who is the author of these articles: I click on document then on “has an aut”, this author has a name, I also ask for it, always in the left window, that of entities and properties. I then type in the input field of the second window the name of the author I am interested in: here Gaston Maspero. I validate by pressing “Enter” and clicking “OK”.
I now have the articles of an author, I would like to know the co-authors of these articles.
So, I will add that these documents also have an author who is NOT Gaston Maspero: I click on Document in my request being written, I ask for the author, I indicate his name in the input field of the middle window, and I add “not” by clicking on this word in the right-hand window. I can add that I want this criterion to be optional by clicking on “optionally” in the right window, and this way I keep the articles only signed by Gaston Maspero.
Making lists
I now have a list of articles from Gaston Maspero and, possibly, other people. For each article, I get a result line with a first author and a second author. So I get a lot of lines. By clicking on the column “2nd name” in the results table, I can ask in the third window “the list of”, then I indicate, in that third column, at the top of the displayed options, that I want the list for each identifier.
Sorting results
Sparql is essentially a non-ordered language. Search results appear in the order in which they are found, usually in relation to the order in which the data was entered in the database.
I have a list of articles with a list of their authors, I would like to sort them by number of authors.
In the query writing area, I click in the upper part of the query, before the list aggregation commands, to display all the columns. In the results table, I click on the 2nd name column, then on “the number of” in the third window, then I indicate – in that third window – that I want this number for each identifier. Then I click, in that third window again, on “Highest to lowest” to rank the results in descending order of author number.
Obtaining a curve
I want to search for all documents that have an ID and a print publication date. By clicking on the “identifier” column in the results table and then on “the number of” in the third window, I ask for the number of identifiers for each year: “for each Date of print publication” in the upper part of the third window. By clicking on “lowest to highest”, I order the years in ascending order.
Next, I can click on Yasgui view, click on Google charts, then on Chart config and choose a curve. After a few adjustments, I get the curve of the yearly number of documents.
The query already built. (You will still have to click on Yasgui view, then at the bottom of the Sparql query, on Google Charts, on Chart config on the right-hand side, and choose a curve.)
How to reuse a sample request ?
I clicked on a permalink leading to a query that gives all the co-authors for all the authors in the database in co-author number descending order
If you want to know the co-authors of a given author, just click on the first “that has a name” (in the request being written), click in the third window “that is”, enter in the input field of the second window the name of the author you are interested in, validate this choice (return key, then “OK”) and you will have the co-authors of your author. You can then delete the last part of the request, which is no longer necessary if you have only one reference author. And ask for the title or identifier of the common works and possibly sort them, by clicking on “anything” in the request and asking for the title or identifier.
The query already built, with the list of co-authors for each identifier,
Search several word families in titles
To make this query we will use regular expressions, if you are not familiar with this concept, watch online how to write these expressions which allow you to search for different words derived from the same radical
I start by searching for documents that have a title and a print publication date. If I want to retreive the documents, I can add the identifier, and I can also add the authors if I want them.
would like to specify that the documents I am looking for contain one of the words “agriculture, agriculture, agriculteur ou agricultrice” in the singular or plural form in their title, so this is where regular expressions come in. Therefore I choose in the third window “matches as regex“, I type my regular expression (agric(ole?s?|ultures?|ultures?|ulteurs?|ultrices?) in the field of the second window and press the return key.
I now want to add that I also search for documents that contain the words of the family of “paysans” (paysa(ns?|nes?ries?)) and those of the family of “rural” (rura(le?s?|ux|lités?)). I select in the query being formed the regex and I choose in the third column, a choice between, I click again on “matches as regexp“, I type my second regexp, and I start again for the third: “a choice between“, matches as regex, and I ty my regexp.
I then give a name to this column: I select in my query being built “a choice” and I enter the name I chose: e.g. keyword.
I will now make an aggregation as we already did other queries: I select the “document” column, I ask for “the number of” in the third window, I specify “for each date of print publication” and for my “keyword “choices “for each keyword” and I order the years by clicking on “date of print publication” in the query and then on “lowest to highest” in the third window. And here we are!
Building several curves from a query
We are moving away from Sparklis to see how to process the data downloaded after a complex query.
I saved the csv file of the previous request on my drive, and opened a spreadsheet
I will import my data.
(File > Import)
I specify, when importing, that my data is comma-delimited.
In my spreadsheet, I create three new sheets and I name them after my regular expressions.
In the first sheet (the one where I imported my data) I sort the data by the B column, containing the regular expressions.
I select the data pertaining to each regular expression. I copy them, regular expression by regular expression, in the tab of the same name. Each of these sheets contains the years, the name of the regular expression and the number of documents respectively.
Now, I clean the data. In each of the three sheets, I delete column B, which is uniformly filled in with the name of the tab, it is no longer information that I need.
In the first sheet, that is now empty, I enter the column names: Years, agri, paysan, rural.
In the first column I enter a year just before the first year of my data. My data starts in 1843, so I enter 1840, and I pull the handle to have increasing numbers from 1840 to 2016
In the first cell of the Agri column, I paste this formula:
=IFERROR(vlookup($A2,agri!$A$1:$B,2,false),0)
which will search in the “agri” sheet if it finds the year present in this row in column A, if the year is present in the “agri” sheet, I will retrieve the corresponding number of documents, otherwise, a “0” will be filled in.
I paste the formula and pull the handle to duplicate it in each row.
I do the same for the other columns, changing the name of the tab in the formula: I replace agri! by paysan! then by rural!
Now, I can click on the graph icon to choose curves.
I check “use the first column as”.
And here are my curves!
I get a graph with three curves representing the comparative trends in the use of these three word families in the titles of all the documents available in Persée.
Tips: best practices on Sparklis
- To go back, use the arrow to the left of the program – not the browser arrow.
- To remove a choice, a step, click on the red cross.
- You can type the property you are looking for in the input field of the first window if you do not want to search the list (type “aut”, or “publication”, etc.)
- Reselect the resource if you want to apply several properties to it, e.g. an author who has a name, (click on the author again) and a date of birth.
- Always request the name of an author or the identifier of a document at the beginning of a query, unless you want to aggregate them into a number.
- To make a list, or to count the elements, click on the relevant column in the results table, then on the choice in the third window.
- Click at the top of the query to display all columns once you have made an aggregation.
- Click at the top of the query if the sorting requested in the third window gives you a 1 or 0 depending on whether the element meets the condition or not.
- In the third window, locate where the focus is located by looking at the highlighted part: -<-
- Use the resource identifier as a unique means of identification and thus facilitate the selection, grouping and sorting of results.
- For illustrations, to select those for which we have the distribution rights, use the property “that has an access rights” with the value “http://creativecommons.org/licenses/by-nc-sa/3.0/fr/”.
- To effectively sort authors and manage homonyms, remember to request for the family name and given name, not the name.
- A property can be optional thanks to “optionally” in the third window.
- As it stands now, to search for a term that contains an accented letter, it is better to go through “matches as regex” in the third window (very shortly, you will be able to simply enter them in the input field of the second window).
- When can sort the same entity using several criteria, for example the number of articles and the list of titles for each author, the “for each aut” option is duplicated in the right window, always click on the top one!