ItaliaNLP REST API Documentation

documents

/documents/

OPTIONS
POST

POST

Inserts the document in the system

Once the text is loaded, the text is split in sentences, tokenized according to the selected language tokenization rules, and finally postagged.

Parameters

text: String, the document to be inserted

lang: String, (optional) the language of the document. Allowed values: ("IT/EN"), default "IT"

async: String, (optional), performs the loading of the document in async mode. In async mode, the API returns immediately and assigns the associated id to the loaded document. Allowed values: "true" or "false", default "false".

id: Integer, (optional) the id to be assigned to the uploaded text

metadata: dictionary (optional), a key value list of attributes to be assigned to the document e.g.: {'attribute1': 1, 'attribute2': 2}

extra_tasks: list (optional), extra actions to be performed after the insertion of the document. Allowed values are: sentiment, sentiment_per_sentence, witness, hate, readability, named_entity, syntax

Output

Returns:

id: the id of the inserted text.

async: True if the insertion of the document was requested asynchronously, False otherwise.

already_existing: True if the document was already loaded, False otherwise.

{'id': 121}

/documents/part-of-speech/<pk>

GET
OPTIONS

GET

Returns the part-of-speech information on a previously loaded document with id pk.

Parameters

page: (Integer, optional), the page to be fetched

Output

A JSON response containing the following fields.

prev (String), the url pointing to the previous page, if available.
next (String), the url pointing to the next page, if available.
num_sentences (Integer), the number of sentences of the document.
data (List), each element of the list contains information for each sentence, represented by the following dictionary:
- sequence (Integer), the number of the sentence with respect to the document
- raw_text (String), the raw text related to the current sentence
- tokens (List), each element (dictionary) of the list contains information for each token of the sentence. The dictionary contains the following fields:
  - sequence (Integer), the position of the token in the sentence
  - word (String), the word
  - lemma (String), the lemma
  - ten (String), if available, the grammatical tense of the part-of-speech
  - num (String), if available, represents the grammatical number of the part-of-speech
  - per (String), if available, represents the grammatical person of the part-of-speech
  - gen (String), if available, represents the grammatical gender of the part-of-speech
  - mod (String), if available, the grammatical mood of the part-of-speech
  - cpos (String), the coarse grained part-of-speech
  - pos (String), the fine grained part-of-speech

For a more detailed description of the part-of-speech tagset for italian, please refer to: http://www.italianlp.it/docs/ISST-TANL-POStagset.pdf

Example:

{ "num_sentences": 2, "prev": null, "data": [ { "tokens": [ { "word": "Questo", "ten": null, "sequence": 1, "per": null, "lemma": "questo", "num": "s", "gen": "m", "mod": null }, { "word": "un", "ten": null, "sequence": 2, "per": null, "lemma": "uno", "num": "s", "gen": "m", "mod": null }, { "word": "testo", "ten": null, "sequence": 3, "per": null, "lemma": "testo", "num": "s", "gen": "m", "mod": null }, { "word": "di", "ten": null, "sequence": 4, "per": null, "lemma": "di", "num": null, "gen": null, "mod": null }, { "word": "prova", "ten": null, "sequence": 5, "per": null, "lemma": "prova", "num": "s", "gen": "f", "mod": null }, { "word": ".", "ten": null, "sequence": 6, "per": null, "lemma": ".", "num": null, "gen": null, "mod": null } ], "sequence": 1 }, { "tokens": [ { "word": "Anche", "ten": null, "sequence": 1, "per": null, "lemma": "anche", "num": null, "gen": null, "mod": null }, { "word": "questo", "ten": null, "sequence": 2, "per": null, "lemma": "questo", "num": "s", "gen": "m", "mod": null }, { "word": "!", "ten": null, "sequence": 3, "per": null, "lemma": "!", "num": null, "gen": null, "mod": null }, { "word": "!", "ten": null, "sequence": 4, "per": null, "lemma": "!", "num": null, "gen": null, "mod": null } ], "sequence": 2 } ], "next": null }

/documents/batch

OPTIONS
POST

POST

Inserts multiple documents in the system.

Once the documents are loaded, these are split in sentences, tokenized according to the selected language tokenization rules, and finally postagged.

Note: documents with existing id in the system are ignored.

Parameters

documents: (list) the texts to be inserted in the system. Each element of the list must contain the following fields.
- text: String, the document to be inserted
- lang: String, (optional) the language of the document. Allowed values: ("IT/EN"), default "IT"
- id: Integer, (optional) the id to be assigned to the uploaded text
- metadata: dictionary (optional), a key value list of attributes to be assigned to the document e.g.: {'attribute1': 1, 'attribute2': 2}
- extra_tasks: list (optional), extra actions to be performed after the insertion of the document. Allowed values are: sentiment, witness, hate, readability, named_entity, syntax

Output

{'status': 'OK'}

/documents/query

OPTIONS
POST

POST

Retrieves documents according to filter specified in the requests

Parameters

page: (Integer, optional), the page to be fetched
page_size: (Integer, optional), the number of rows to be return for each paginated result
doc_ids: (list of Integers, optional), limits the query to the documents matching the ids contained in the list
forms: (list of Strings, optional), limits the query to the documents containing one of the forms in the list (OR)
lemmas: (Integer, optional), limits the query to the documents containing one of the lemmas in the list (OR)
created_at_start_date: (Integer, optional), limits the query to the documents created after the specified date
created_at_end_date: (Integer, optional), limits the query to the documents created before the specified date

Output

count: the number of results
has_next: more results are available
data: the documents matching the query

Example: { "count": 569, "has_next": true, "data": [ { "sentiment_positive_negative_probability": 0.0183275410862281, "sentiment_value": "NEUTRAL", "named_entity_executed": false, "postagging_executed": false, "language": "IT", "sentiment_negative_probability": 0.000239243180282673, "created_at": "2017-06-01T09:19:42.886133Z", "parsing_executed": false, "sentiment_neutral_probability": 0.963801449262715, "sentiment_positive_probability": 0.017631766470774, "witness_yes_probability": null, "sentiment_executed": true, "doc_time": "2017-06-01T09:19:42.886159Z", "witness_no_probability": null, "witness_value": null, "witness_executed": false, "raw_text": "Il Presidente Silvio Berlusconi: Domenica 11 giugno scegliete i candidati di Forza Italia https://t.co/DQlFhmxxSf", "id": 4 }, { "sentiment_positive_negative_probability": 0.224048973755481, "sentiment_value": "NEUTRAL", "named_entity_executed": false, "postagging_executed": false, "language": "IT", "sentiment_negative_probability": 0.000230149411837146, "created_at": "2017-06-01T09:19:43.591283Z", "parsing_executed": false, "sentiment_neutral_probability": 0.409358290883985, "sentiment_positive_probability": 0.366362585948697, "witness_yes_probability": null, "sentiment_executed": true, "doc_time": "2017-06-01T09:19:43.591310Z", "witness_no_probability": null, "witness_value": null, "witness_executed": false, "raw_text": "RT @forza_italia: VIDEO | Berlusconi: l'11 giugno votate i candidati di Forza Italia, sono competenti, onesti e capaci. https://t.co/ySfixo\u2026", "id": 20 } ] }

/documents/similarity

GET
OPTIONS

GET

Calculates the similarity score between two documents with id doc_id_1 and doc_id_2.

Parameters

doc_id_1: The id of the first document.

doc_id_2: The id of the second document.

Output

result (Float, optional): The similarity score between the two documents. The similarity score is in range [0, 1]. If the similarity of the documents is not defined, null is returned.

error (String, optional): If any error is occurred in the score computation, this value represents the error that has been occurred. Allowed value is: nan_result_exception.

Example: {'result': 0.9999999999999999, 'error': null}

/documents/clustering

GET
OPTIONS
POST

POST

Asynchronously performs clustering on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the clustering term will be performed.

Output

Returns the id of the clustering asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET

Fetches the result of the clustering on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

Output

status: (String) Possible values are "OK" if the clustering process is completed, "IN_PROGRESS" otherwise.
result: (Dict), available only if status is "OK", otherwise null, contains a tree representation on the performed clustering.

Example: { "status": "OK", "id": 1, "result": { "centroid_doc_id": "3", "node_id": 8, "children": [ { "centroid_doc_id": "3", "node_id": 6, "children": [ { "centroid_doc_id": "3", "node_id": 2, "n_documents": 1, "document_id": "3" }, { "centroid_doc_id": "4", "node_id": 3, "n_documents": 1, "document_id": "4" } ], "n_documents": 2 }, { "centroid_doc_id": "5", "node_id": 7, "children": [ { "centroid_doc_id": "5", "node_id": 4, "n_documents": 1, "document_id": "5" }, { "centroid_doc_id": "1", "node_id": 5, "children": [ { "centroid_doc_id": "1", "node_id": 0, "n_documents": 1, "document_id": "1" }, { "centroid_doc_id": "2", "node_id": 1, "n_documents": 1, "document_id": "2" } ], "n_documents": 2 } ], "n_documents": 3 } ], "n_documents": 5 } }

/documents/actions/syntax/(P<pk>[0-9]+)

GET
OPTIONS

GET

Performs the syntactic analysis task selected document with primary id pk

Output

Returns the status of the action. Example: {'status': "OK"}

/documents/actions/named-entity/(P<pk>[0-9]+)

GET
OPTIONS

GET

Performs named entity extraction task task selected document with primary id pk

Output

Returns the status of the action. Example: {'status': "OK"}

/documents/actions/sentiment/(P<pk>[0-9]+)

GET
OPTIONS

GET

Performs sentiment analysis task on the selected document with primary id pk

Output

Returns the status of the action. Example: {'status': "OK"}

/documents/actions/witness/(P<pk>[0-9]+)

GET
OPTIONS

GET

Performs witness identification task on the selected document with primary id pk.

The following metadata are considered in classification:

tweet_source: the client used to write the tweet.

tweet_geo_dist: the spatial distance from the event expressed in km (e.g. 1.5).

tweet_time_dist: the temporal distance from the event expressed in seconds.

Output

Returns the status of the action. Example: {'status': "OK"}

/documents/details/(P<pk>[0-9]+)

GET
OPTIONS

GET

Returns all the document and linguistic information available in a previously loaded document with id pk.

Parameters

page: (Integer, optional), the page to be fetched

Output

A JSON response containing the following fields.

For a more detailed description of the part-of-speech tagset for Italian, please refer to: : http://www.italianlp.it/docs/ISST-TANL-POStagset.pdf

{ "created_at":"2017-06-19T09:46:09.851844Z", "doc_time":"2017-06-19T09:46:09.851876Z", "language":"IT", "named_entity_executed":true, "postagging_executed": true, "parsing_executed":true, "readability_executed" : true, "readability_score_all": 60, "readability_score_base": 40, "readability_score_lexical": 30, "readability_score_syntax": 60, "sentiment_executed":true, "sentiment_sentence_executed":true, "sentiment_negative_probability":0.396968678319151, "sentiment_neutral_probability":0.568241364127122, "sentiment_positive_negative_probability":0.00327191886038508, "sentiment_positive_probability":0.0315180386933427 "sentiment_value":"NEUTRAL", "witness_executed":true, "witness_no_probability":0.597324983897493, "witness_value":"NO", "witness_yes_probability":0.402675016102507, "hate_executed":true, "hate_no_probability":0.597324983897493, "hate_value":"hate", "hate_yes_probability":0.402675016102507, "sentences":{ "count":1, "prev":null, "data":[ { "tokens":[ { "word":"Mario", "ten":null, "sequence":1, "pos":"SP", "named_entity_instance":{ "sentence":12, "id":34, "entity_type":"PER" }, "lemma":"Mario", "num":null, "per":null, "dep_type":"subj", "cpos":"S", "dep_parent":68, "gen":null, "mod":null }, { "word":"va", "ten":"p", "sequence":2, "pos":"V", "named_entity_instance":null, "lemma":"andare", "num":"s", "per":"3", "dep_type":"ROOT", "cpos":"V", "dep_parent":null, "gen":null, "mod":"i" }, { "word":"in", "ten":null, "sequence":3, "pos":"E", "named_entity_instance":null, "lemma":"in", "num":null, "per":null, "dep_type":"comp_loc", "cpos":"E", "dep_parent":68, "gen":null, "mod":null }, { "word":"Spagna", "ten":null, "sequence":4, "pos":"SP", "named_entity_instance":{ "sentence":12, "id":35, "entity_type":"GPE" }, "lemma":"Spagna", "num":null, "per":null, "dep_type":"prep", "cpos":"S", "dep_parent":69, "gen":null, "mod":null }, { "word":"con", "ten":null, "sequence":5, "pos":"E", "named_entity_instance":null, "lemma":"con", "num":null, "per":null, "dep_type":"comp", "cpos":"E", "dep_parent":68, "gen":null, "mod":null }, { "word":"Luca", "ten":null, "sequence":6, "pos":"SP", "named_entity_instance":{ "sentence":12, "id":36, "entity_type":"PER" }, "lemma":"Luca", "num":null, "per":null, "dep_type":"prep", "cpos":"S", "dep_parent":71, "gen":null, "mod":null } ], "readability_score_all": 60, "readability_score_lexical": 30, "readability_score_syntax": 60, "readability_score_base": 40, "sentiment_executed":true, "sentiment_negative_probability":0.396968678319151, "sentiment_neutral_probability":0.568241364127122, "sentiment_positive_negative_probability":0.00327191886038508, "sentiment_positive_probability":0.0315180386933427 "sentiment_value":"NEUTRAL", "sequence":1 } ], "next":null } }

/documents/linguistic_monitoring/(P<pk>[0-9]+)

GET
OPTIONS

GET

Performs and return the lingustic monitoring of a document with id pk.

Requirement: the document must be syntactically parsed before calling this API.

Output

{ "result": { "morpho_syntax": { "morpho_syntax_distribution": { "pos_num": { "A": 6, "VA": 4, "AP": 2, "B": 18, "E": 28, "DI": 1, "CC": 9, "BN": 1, "PR": 1, "EA": 14, "N": 1, "RD": 25, "PC": 12, "S": 58, "FS": 10, "T": 2, "FF": 6, "V": 39, "CS": 4, "SP": 1, "RI": 6 }, "cpos_distr": { "A": 0.03225806451612903, "C": 0.05241935483870968, "B": 0.07661290322580645, "E": 0.1693548387096774, "D": 0.004032258064516129, "F": 0.06451612903225806, "N": 0.004032258064516129, "P": 0.05241935483870968, "S": 0.23790322580645162, "R": 0.125, "T": 0.008064516129032258, "V": 0.17338709677419356 }, "pos_distr": { "A": 0.024193548387096774, "VA": 0.016129032258064516, "B": 0.07258064516129033, "E": 0.11290322580645161, "PC": 0.04838709677419355, "CC": 0.036290322580645164, "BN": 0.004032258064516129, "PR": 0.004032258064516129, "EA": 0.056451612903225805, "N": 0.004032258064516129, "RD": 0.10080645161290322, "AP": 0.008064516129032258, "S": 0.23387096774193547, "FS": 0.04032258064516129, "DI": 0.004032258064516129, "T": 0.008064516129032258, "FF": 0.024193548387096774, "V": 0.15725806451612903, "CS": 0.016129032258064516, "SP": 0.004032258064516129, "RI": 0.024193548387096774 }, "cpos_num": { "A": 8, "C": 13, "B": 19, "E": 42, "D": 1, "F": 16, "N": 1, "P": 13, "S": 59, "R": 31, "T": 2, "V": 43 }, "conj_distr": { "sub": 0.30769230769230765, "coord": 0.6923076923076923 } } }, "syntax": { "principals_vs_subordinates_ratio": { "subordinates_ratio": 0.17307692307692307, "principals_ratio": 0.8269230769230769 }, "average_number_of_tokens_per_proposition": 5.767441860465116, "average_max_tree_height": 6.7, "average_length_linear_dependency": 2.2927927927927927, "average_number_of_dependents_for_head_verb": { "avg": 1.794871794871795, "num_per_arity": { "1": 12, "0": 3, "3": 6, "2": 16, "4": 2 } }, "syntax_categories": { "num": { "clit": 11, "comp_temp": 2, "punc": 14, "sub": 4, "pred": 1, "comp": 30, "arg": 10, "det": 31, "comp_loc": 2, "mod_loc": 2, "mod_temp": 3, "ROOT": 10, "obj": 12, "mod": 33, "neg": 1, "aux": 3, "conj": 12, "mod_rel": 3, "subj": 11, "prep": 42, "con": 11 }, "distr": { "clit": 0.04435483870967742, "comp_temp": 0.008064516129032258, "obj": 0.04838709677419355, "sub": 0.016129032258064516, "subj": 0.04435483870967742, "pred": 0.004032258064516129, "arg": 0.04032258064516129, "det": 0.125, "comp_loc": 0.008064516129032258, "mod_temp": 0.012096774193548387, "ROOT": 0.04032258064516129, "aux": 0.012096774193548387, "mod": 0.13306451612903225, "neg": 0.004032258064516129, "punc": 0.056451612903225805, "comp": 0.12096774193548387, "mod_rel": 0.012096774193548387, "mod_loc": 0.008064516129032258, "prep": 0.1693548387096774, "conj": 0.04838709677419355, "con": 0.04435483870967742 } }, "avg_proposition_per_period": 3.9, "subordinate_chains_statistics": { "num_per_chain_length": { "1": 2 }, "avg": 1.0 } }, "lexical_info": { "lexical_density": 0.5387931034482759, "vdb_info": { "alta_disp_perc": 0.11981566820276497, "alto_uso_perc": 0.1382488479262673, "lessico_fondamentale_perc": 0.7419354838709677, "vdb_perc": 0.9434782608695652 } }, "basic_info": { "average_sentence_length": 24.8, "num_sentences": 10, "num_tokens": 248, "average_word_length": 4.149193548387097, "type_token_ratio": { "300": { "lemmas": 0.4475806451612903, "words": 0.5564516129032258 }, "200": { "lemmas": 0.47, "words": 0.575 }, "100": { "lemmas": 0.56, "words": 0.71 }, "500": { "lemmas": 0.4475806451612903, "words": 0.5564516129032258 }, "400": { "lemmas": 0.4475806451612903, "words": 0.5564516129032258 } } } } }

A JSON response containing the following fields.

/documents/term_extraction

GET
OPTIONS
POST

POST

Asynchronously performs the term extraction on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the term extraction will be performed.
configuration: (Dictionary, optional): the Term Extractor configuration to be used. Fields:
- pos_start_term (List), A list of accepted pos of the start of the term (e.g: ["c:S", "p:V", "p:VA"]). Syntax: "(c|f):POS" where 'c' and 'f': means coarse and fine respectively.
- pos_internal_term (List), A list of accepted pos of the internal part of the term (e.g: ["p:V", "p:VA"]).
- pos_end_term (List), A list of accepted pos of the end the term (e.g: ["p:V", "p:VA"]).
- statistical_threshold_single (Integer, optional, default: 30), the maximum amount of the extracted single terms.
- statistical_threshold_multi (Integer, optional, default: 100), the maximum amount of extracted multi terms.
- statistical_frequency_threshold (Integer, optional, default:0): the minimum frequency of the term in order to be extracted.
- max_length_term (Integer, optional, default: 5) : The max length of the extracted term.
- apply_contrast (bool, default: False): whether to apply a contrastive filter. By default a journalistic corpus will be used as a contrastive corpus.
- contrast_doc_ids (List of Integers, optional): if not empty, the documents specified in the parameter will be used as the contrastive corpus. Requires `apply_contrast': True.

For a more detailed description of the part-of-speech tagset for italian, please refer to: http://www.italianlp.it/docs/ISST-TANL-POStagset.pdf

Output

Returns the id of the term extraction asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET

Fetches the result of a term extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

Output

status: (String) Possible values are "OK" if the extraction process is completed, "IN_PROGRESS" otherwise.
terms: (List), available only if status is "OK", otherwise null
- term: (String) the words which compose the term
- domain_relevance: (Integer) the relevance of the term in the selected document collection
- frequency: (Integer) the frequency of this entity in the selected documents

Example: { "status": "OK", "terms": [ { "term": "giornata", "frequency": 10, "domain_relevance": 100 } ] }

/documents/named_entity_extraction

GET
OPTIONS
POST

POST

Asynchronously performs the named entity extraction on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the named entity extraction will be performed.

Output

Returns the id of the named entity extraction asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET

Fetches the result of a named entity extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

Output

status: (String) Possible values are "OK" if the extraction process is completed, "IN_PROGRESS" otherwise.
named_entities: (List), available only if status is "OK", otherwise null
- words: (String) the words which compose the named entity
- entity_type: (String) the type of the named entity {GPE, LOC, ORG, PER}
- frequency: (Integer) the frequency of this entity in the selected documents

Example: { "status": "OK", "named_entities": [ { "frequency": 10, "words": "Roma", "entity_type": "GPE" } ] }

/documents/relation_extraction

GET
OPTIONS
POST

POST (content-type: application/json)

Asynchronously performs the relation extraction on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the relation extraction will be performed.

selected_terms : (List of Strings, optional), the terms which will be selected in the relation graph. Example usage:

'selected_terms': ['carta di credito']

selected_named_entities : (List of Dictionaries of Strings, optional), the named entitities which will be selected in the relation graph. 'selected_named_entities': {'GPE': ["Svezia'], 'PER: ['Luca']} Note, one of selected_named_entities or selected_named_entities must specified.

Output

Returns the id of the relation extaction asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET (DEPRECATED! Use /documents/relation_extraction/fetch )

Fetches the result of a relation extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

output_format: (String, optional) the output format of the call to this API. if "gexf" is specified, a gexf response representing the graph is returned. The type of the graph can be specified through the "gexf_matrix_type" parameter ("freq", "cosine", "log_likelihood").

Output

status: (String) Possible values are "OK" if the relation extraction process is completed, "IN_PROGRESS" otherwise.
graphs: (Dictionary), available only if status is "OK", otherwise null.
- nodes: (Dictionary): the nodes of the graph. For each node are reported the frequency in the analyzed corpus, the entity type (TERM, ORG, PER, GPE, LOC) and the words representing the entity.
- freq: (Dictionary): the arcs of the graph calculated with respect to the frequency metric. For each arch the frequency of the relation is reported.
- cosine: (Dictionary): the arcs of the graph calculated with respect to the cosine metric. For each arch the weight of the relation is reported.
- log_likelihood: (Dictionary): the arcs of the graph calculated with respect to the log likelihood metric. For each arch the weight of the relation is reported.

Example: { "status": "OK", "graphs": { "nodes": { "1": { "freq": 14.0, "type": "TERM", "words": "rilievi di Carabinieri" }, "0": { "freq": 14.0, "type": "ORG", "words": "Scientifica" }, "2": { "freq": 14.0, "type": "TERM", "words": "corso" } }, "relations": { "freq": { "1": { "1": 2.0, "0": 2.0, "2": 2.0 }, "0": { "1": 2.0, "0": 2.0, "2": 2.0 }, "2": { "1": 2.0, "0": 2.0, "2": 2.0 } }, "cosine": { "1": { "1": 1.0, "0": 0.4999999999999999, "2": 0.4999999999999999 }, "0": { "1": 0.4999999999999999, "0": 1.0, "2": 0.4999999999999999 }, "2": { "1": 0.4999999999999999, "0": 0.4999999999999999, "2": 1.0 } }, "log_likelihood": { "1": { "0": 0.3801404006531304, "2": 0.3801404006531304 }, "0": { "1": 0.3801404006531304, "2": 0.3801404006531304 }, "2": { "1": 0.3801404006531304, "0": 0.3801404006531304 } } } } }

/documents/relation_extraction/fetch

OPTIONS
POST

POST

Fetches the result of a relation extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

filter_nodes: (Dictionary, optional): filters a subset of the nodes of the graph. Example: {"TERM": ["giorni"], "GPE": ["Stoccolma"]}

filter_nodes_frequency: (Integer, optional): selects the nodes of the graph that have frequency greater or equal than the one specified by the value of the parameter.

filter_edge_threshold: (Float, optional): selects the edges of the graph that have value greater or equal than the value specified by the value of the parameter.

Output

status: (String) Possible values are "OK" if the relation extraction process is completed, "IN_PROGRESS" otherwise.
graphs: (Dictionary), available only if status is "OK", otherwise null.
- nodes: (Dictionary): the nodes of the graph. For each node are reported the frequency in the analyzed corpus, the entity type (TERM, ORG, PER, GPE, LOC) and the words representing the entity.
- freq: (Dictionary): the arcs of the graph calculated with respect to the frequency metric. For each arch the frequency of the relation is reported.
- cosine: (Dictionary): the arcs of the graph calculated with respect to the cosine metric. For each arch the weight of the relation is reported.
- log_likelihood: (Dictionary): the arcs of the graph calculated with respect to the log likelihood metric. For each arch the weight of the relation is reported.

twitter-monitor

/twitter-monitor/monitor

DELETE
GET
OPTIONS
POST

POST

Creates a Twitter Monitor.

Parameters

name (String): The descriptive name assigned to the monitor.
sample_ratio (Float): A value in range [0, 1] that represents the sampling ratio assigned to the monitor. Values next to 1 indicate the downloads the most of the tweets, while values next to 0 indicate the monitor discards the majority of the tweets.
query (String): The query that is used to extract tweets from Twitter. Syntax: (keyword)+ (OR (keyword)+)*. Example: "Matteo Renzi OR Silvio Berlusconi" fetches all the tweets containing Matteo Renzi or Silvio Berlusconi.
seconds_update (Integer): The number of seconds after the monitor fetches a new batch of tweets from Twitter (default 3600).
until (ISO-8601 Date, optional): If provided, do not fetches tweets before the selected date. (Example format: 2018-10-22T00:00:00.000Z)
custom_import (boolan, default: False) : If set to True, the monitor contents must be manually populated through the `populate_custom_import' API

GET

Returns all the created Twitter Monitors.

Output

A JSON List containing the following fields.

id (Integer): The id of the monitor
name (String): The descriptive name assigned to the monitor.
created_at (ISO-8601 Date): The creation date of the monitor.
sample_ratio (Float): A value in range [0, 1] that represents the sampling ratio assigned to the monitor. Values next to zero indicate the the monitor discards the most of the tweets , while values next to 1 indicate the the monitor retains the majority of the tweets.
query (String): The query which is used to extract tweets from Twitter.
enabled (Bool): True if the monitor is enabled, False otherwise.

[ { "name":"Matteo Renzi", "created_at":"2017-10-13T08:42:21.727533Z", "enabled":true, "query":"renzi", "sample_ratio":1.0, "id":1 } ]

DELETE

Deletes a Twitter Monitor.

Parameters

id (Integer): The monitor to be deleted.

/twitter-monitor/monitor/<pk>/populate_custom_import

OPTIONS
POST

POST

Manually populates content of a Twitter Monitor with id `pk'

Parameters

posts (List): The array of posts to be imported. Each post must contain the following fields:
- id (Integer): The id of the document
- text (String) : The text of the document
- date (ISO-8601 Date): The date of the document
- username (String) : The username that wrote the document
- permalink (String : A link to the post id

/twitter-monitor/monitor/<pk>/tweets

GET
OPTIONS

GET

Retrieves the list of tweets fetched by a monitor with id pk.

The list of tweets is ordered in descending order of sentiment value, showing the most relevant positive, negative and neutral tweets.

Parameters

num_max_tweets (Integer), The max number of tweets which will be returned.
sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.

Output

A JSON Dictionary containing the following list:

results: JSON array composed by:
- date (ISO-8601 Date), The date in which the tweet was written.
- text (String), The text of the tweet.
- sentiment_value (String), The sentiment of the tweet.
- sentiment_value (Float), A value in [0, 1] representing the probability of belonging the the sentiment_value class.
- document_id (Integer), The document id of the tweet. All the details of the document can be fetched using /documents/details/ API.
- username (String), The Twitter username of the tweet.
- num_likes (Integer), The number of likes that the tweet has received.
- num_retweets (Integer), The number of retweets of this tweet.

Example: ``` {"results": [ { "date":"2017-10-13T08:10:00Z", "text":"Educare in situazioni difficili Alle 10.30 a @LaRadioNeParla parliamo del progetto...", "sentiment_value":"NEUTRAL", "sentiment_probability":0.909674979458854, "document_id":918750655363268608, "username": "an-example-username", "num_likes": 1, "num_retweets": 3

} ] } ```

/twitter-monitor/monitor/<pk>/term_extraction

GET
OPTIONS

GET

Performs a term extraction job on the monitor tweets according to the specified filter in the parameters.

Parameters

sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.

Output

A JSON Dictionary containing the following list:

term_extraction_id: the id the must be used on order to fetch the most relevants terms with the /documents/term_extraction API.

Example: {"term_extraction_id": 5}

Returns 400 status code if no tweets are available.

/twitter-monitor/monitor/<pk>/named_entity_extraction

GET
OPTIONS

GET

Performs a named entity extraction job on the monitor tweets according to the specified filter in the parameters.

Parameters

sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.

Output

A JSON Dictionary containing the following list:

named_entity_extraction_id: the id the must be used on order to fetch the named entities with the /documents/named_entity_extraction API.

Example: {"named_entity_extraction_id": 5}

Returns 400 status code if no tweets are available.

/twitter-monitor/monitor/<pk>/relation_extraction

OPTIONS
POST

GET

Performs a named entity extraction job on the monitor tweets according to the specified filter in the parameters.

Parameters

sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.
selected_terms : (List of Strings, optional), the terms which will be selected in the relation graph. Example usage:

'selected_terms': ['carta di credito'] - selected_named_entities : (List of Dictionaries of Strings, optional), the named entitities which will be selected in the relation graph. Example usage: 'selected_named_entities': {'GPE': ["Svezia'], 'PER: ['Luca']} Note, one of selected_named_entities or selected_named_entities must specified.

Output

A JSON Dictionary containing the following list:

relation_extraction_id: the id the must be used on order to fetch the results with the /documents/relation_extraction API.

Example: {"relation_extraction_id": 5}

Returns 400 status code if no tweets are available.

/twitter-monitor/monitor/<pk>/sentiment

GET
OPTIONS

GET

Returns a summary of the sentiment of the tweets fetched by the tweet monitor.

Parameters

group_by (String), allowed values are day, week, month

Output

A JSON Dictionary containing the following fields.

data: An array of results. The array is ordered by time interval. For each time interval, an object containing the following fields is returned:
- date (ISO-8601 Date), The date in which begins the selected interval.
- avg_prob_neg (Float), The average sentiment negative probability.
- avg_prob_neu (Float), The average sentiment neutralprobability.
- avg_prob_pos (Float), The average sentiment positive probability.
- avg_prob_pos_neg (Float), The average sentiment positive/negative probability.
- num_neg (Integer), The number of negative tweets.
- num_neu (Integer), The number of neutral tweets.
- num_pos (Integer), The number of positive tweets.
- num_pos_neg (Integer), The number of positive/negative tweets.

Example: { "data":[ { "date":"2017-10-13T00:00:00Z", "avg_prob_neg":0.178653849125782 "avg_prob_neu":0.708800928388736, "avg_prob_pos":0.0982935014690109, "avg_prob_pos_neg":0.0142517210164714, "num_neg":4, "num_neu":26, "num_pos":3, "num_pos_neg":0, } ] }

/twitter-monitor/monitor/action

GET
OPTIONS

GET

Performs an action on the Twitter Monitor.

Parameters

id (Integer): The monitor on which the action will be performed.
action (String): The action to be performed on the monitor. Available actions are:
- disable : disables the monitor from fetching new tweets.
- enable : resumes a previously disabled monitor.

facebook-monitor

/facebook-monitor/monitor

DELETE
GET
OPTIONS
POST

POST

Creates a Facebook Monitor, which fetches comments on posts written in specific Facebook pages according to the specified parameters. Each comment is classified with sentiment values and hate values.

Parameters

name (String): The descriptive name assigned to the monitor.
seconds_update(Integer, optional): The number of seconds after that the monitor fetches new comments from the monitored Facebook pages (default 3600).
until (ISO-8601 Date, optional): If provided, do not fetches post and comments before the selected date.
page_ids (List of Integers): the page ids that must be monitored Page ids can be obtained through this service: https://findmyfbid.in/

IMPORTANT: a post becomes not monitored if it does not receive comments after 2 days from the last comment date.

GET

Returns all the created Facebook Monitors.

Output

A JSON List containing the following fields.

id (Integer): The id of the monitor
name (String): The descriptive name assigned to the monitor.
created_at (ISO-8601 Date): The creation date of the monitor.
enabled (Bool): True if the monitor is enabled, False otherwise.
fb_pages (List of dictionaries): the list of the monitored pages.

Example: [ { "id":1, "enabled":true, "fb_pages":[ { "fb_id":"56369076544", "name":"Beppe Grillo" } ], "name":"Pagine di Beppe Grillo", "created_at":"2017-11-23T15:22:36.193021Z" } ]

DELETE

Deletes a Facebook Monitor.

Parameters

id (Integer): The monitor to be deleted.

/facebook-monitor/monitor/action

GET
OPTIONS

GET

Performs an action on the Facebook Monitor.

Parameters

id (Integer): The monitor on which the action will be performed.
action (String): The action to be performed on the monitor. Available actions are:
- disable : disables the monitor from fetching new comments.
- enable : resumes a previously disabled monitor.

/facebook-monitor/monitor/<pk>/sentiment

GET
OPTIONS

GET

Returns a summary of the sentiment of the comments fetched by the Facebook monitor.

Parameters

group_by (String), allowed values are day, week, month
query (String, optional): the query that will be used to filter the comments. Query syntax: (keywords)+ ((OR (keywords)+) | (AND (keywords)+) )*. Parenthesis can be used to disambiguate AND and OR: Example: keywords1 AND (keywords2 OR keywords3) is different from (keywords1 AND keywords2) OR keywords3 (AND has higher precedence over OR)

Output

A JSON Dictionary containing the following fields.

data: An array of results. The array is ordered by time interval. For each time interval, an object containing the following fields is returned:
- date (ISO-8601 Date), The date in which begins the selected interval.
- avg_prob_neg (Float), The average sentiment negative probability.
- avg_prob_neu (Float), The average sentiment neutral probability.
- avg_prob_pos (Float), The average sentiment positive probability.
- avg_prob_pos_neg (Float), The average sentiment positive/negative probability.
- num_neg (Integer), The number of negative comments.
- num_neu (Integer), The number of neutral comments.
- num_pos (Integer), The number of positive comments.
- num_pos_neg (Integer), The number of positive/negative comments.
- avg_prob_hate (Float), The average hate probability accross all comments.
- num_hate (Integer), The number of hate comments.
- num_no_hate (Integer), The number of not hate comments.

Example: { "data":[ { "avg_prob_neg":0.766765100847907, "avg_prob_neu":0.12048334014573, "avg_prob_pos":0.0409078342632469, "avg_prob_pos_neg":0.0718437247431161, "date":"2017-11-23T00:00:00Z", "num_neg":44, "num_neu":5, "num_pos":1, "num_pos_neg":0, "avg_prob_hate":0.614693956070406, "num_hate":33, "num_no_hate":33 } ] }

/facebook-monitor/monitor/<pk>/comments

GET
OPTIONS

GET

Retrieves the list of comments fetched by a monitor with id pk.

The list of comments is ordered in descending order of sentiment value, showing the most relevant positive, negative and neutral comments.

Parameter

query (String, optional): the query that will be used to filter the comments. Query syntax: (keywords)+ ((OR (keywords)+) | (AND (keywords)+) )*. Parenthesis can be used to disambiguate AND and OR: Example: keywords1 AND (keywords2 OR keywords3) is different from (keywords1 AND keywords2) OR keywords3 (AND has higher precedence over OR)
num_max_comments (Integer), The max number of comments which will be returned.
sentiment (String), filter the list selecting only the comments that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
min_date (ISO-8601 Date), Filters the list returning only the comments written after the specified date.
max_date (ISO-8601 Date), Filters the list returning only the comments written before the specified date.
launch_term_extraction (String, optional), if set to "true", the response will return a term_extraction_id that can be used to extract the most relevant term in the tweet collection

Output

A JSON List containing the following fields.

date (ISO-8601 Date), The date in which the comment was written.
text (String), The text of the comment.
sentiment_value (String), The sentiment of the comment.
sentiment_value (Float), A value in [0, 1] representing the probability of belonging the the sentiment_value class.
document_id (Integer), The document id of the comment. All the details of the document can be fetched using /documents/details/ API.
launch_term_extraction (String, optional), if set to "true", the response will return a term_extraction_id that can be used to extract the most relevant term in the tweet collection

Example: {"results": [ { "date":"2017-10-13T08:10:00Z", "text":"Educare in situazioni difficili Alle 10.30 a @LaRadioNeParla parliamo del progetto...", "sentiment_value":"NEUTRAL", "sentiment_probability":0.909674979458854, "document_id":918750655363268608, "username": "an-example-username" }, "term_extraction_id": 5 ] }

system_stats

/system_stats/queue_stats

GET
OPTIONS

GET

Returns information on the pending tasks over time on the main processing queue.

Example: { celery_queue_stats: [ { current_length: 11691, current_date_time: "2018-03-06T13:51:35.573624" }, { current_length: 11725, current_date_time: "2018-03-06T13:51:45.579795" }, { current_length: 11757, current_date_time: "2018-03-06T13:51:55.581251" }, { current_length: 11787, current_date_time: "2018-03-06T13:52:05.587459" }, { current_length: 11804, current_date_time: "2018-03-06T13:52:15.588029" }] }

ontology

/ontology/merger

OPTIONS
POST

POST

Merges two ontologies in OWL format.

Parameters

ontology1: (OWL) The content of the first OWL file to be merged.

ontology2: (OWL) The content of the second OWL file to be merged.

NOTE: The two ontologies needs to belong to different namespaces in order to be merged, otherwise an error is thrown.

Output

Returns an OWL document representing the union of the two ontologies given in input

Example: {'merged_owl': <CONTENT OF THE MERGED OWL DOCUMENT>}