ItaliaNLP REST API Documentation

documents

POST

Inserts the document in the system

Once the text is loaded, the text is split in sentences, tokenized according to the selected language tokenization rules, and finally postagged.

Parameters

text: String, the document to be inserted

lang: String, (optional) the language of the document. Allowed values: ("IT/EN"), default "IT"

async: String, (optional), performs the loading of the document in async mode. In async mode, the API returns immediately and assigns the associated id to the loaded document. Allowed values: "true" or "false", default "false".

id: Integer, (optional) the id to be assigned to the uploaded text

metadata: dictionary (optional), a key value list of attributes to be assigned to the document e.g.: {'attribute1': 1, 'attribute2': 2}

extra_tasks: list (optional), extra actions to be performed after the insertion of the document. Allowed values are: sentiment, sentiment_per_sentence, witness, hate, readability, named_entity, syntax

Output

Returns:

id: the id of the inserted text.

async: True if the insertion of the document was requested asynchronously, False otherwise.

already_existing: True if the document was already loaded, False otherwise.

{'id': 121}

GET

Returns the part-of-speech information on a previously loaded document with id pk.

Parameters

page: (Integer, optional), the page to be fetched

Output

A JSON response containing the following fields.

  • prev (String), the url pointing to the previous page, if available.
  • next (String), the url pointing to the next page, if available.
  • num_sentences (Integer), the number of sentences of the document.
  • data (List), each element of the list contains information for each sentence, represented by the following dictionary:

    • sequence (Integer), the number of the sentence with respect to the document
    • raw_text (String), the raw text related to the current sentence
    • tokens (List), each element (dictionary) of the list contains information for each token of the sentence. The dictionary contains the following fields:
      • sequence (Integer), the position of the token in the sentence
      • word (String), the word
      • lemma (String), the lemma
      • ten (String), if available, the grammatical tense of the part-of-speech
      • num (String), if available, represents the grammatical number of the part-of-speech
      • per (String), if available, represents the grammatical person of the part-of-speech
      • gen (String), if available, represents the grammatical gender of the part-of-speech
      • mod (String), if available, the grammatical mood of the part-of-speech
      • cpos (String), the coarse grained part-of-speech
      • pos (String), the fine grained part-of-speech

For a more detailed description of the part-of-speech tagset for italian, please refer to: http://www.italianlp.it/docs/ISST-TANL-POStagset.pdf

Example:

{ "num_sentences": 2, "prev": null, "data": [ { "tokens": [ { "word": "Questo", "ten": null, "sequence": 1, "per": null, "lemma": "questo", "num": "s", "gen": "m", "mod": null }, { "word": "un", "ten": null, "sequence": 2, "per": null, "lemma": "uno", "num": "s", "gen": "m", "mod": null }, { "word": "testo", "ten": null, "sequence": 3, "per": null, "lemma": "testo", "num": "s", "gen": "m", "mod": null }, { "word": "di", "ten": null, "sequence": 4, "per": null, "lemma": "di", "num": null, "gen": null, "mod": null }, { "word": "prova", "ten": null, "sequence": 5, "per": null, "lemma": "prova", "num": "s", "gen": "f", "mod": null }, { "word": ".", "ten": null, "sequence": 6, "per": null, "lemma": ".", "num": null, "gen": null, "mod": null } ], "sequence": 1 }, { "tokens": [ { "word": "Anche", "ten": null, "sequence": 1, "per": null, "lemma": "anche", "num": null, "gen": null, "mod": null }, { "word": "questo", "ten": null, "sequence": 2, "per": null, "lemma": "questo", "num": "s", "gen": "m", "mod": null }, { "word": "!", "ten": null, "sequence": 3, "per": null, "lemma": "!", "num": null, "gen": null, "mod": null }, { "word": "!", "ten": null, "sequence": 4, "per": null, "lemma": "!", "num": null, "gen": null, "mod": null } ], "sequence": 2 } ], "next": null }

POST

Inserts multiple documents in the system.

Once the documents are loaded, these are split in sentences, tokenized according to the selected language tokenization rules, and finally postagged.

Note: documents with existing id in the system are ignored.

Parameters

  • documents: (list) the texts to be inserted in the system. Each element of the list must contain the following fields.
    • text: String, the document to be inserted
    • lang: String, (optional) the language of the document. Allowed values: ("IT/EN"), default "IT"
    • id: Integer, (optional) the id to be assigned to the uploaded text
    • metadata: dictionary (optional), a key value list of attributes to be assigned to the document e.g.: {'attribute1': 1, 'attribute2': 2}
    • extra_tasks: list (optional), extra actions to be performed after the insertion of the document. Allowed values are: sentiment, witness, hate, readability, named_entity, syntax

Output

{'status': 'OK'}

POST

Retrieves documents according to filter specified in the requests

Parameters

  • page: (Integer, optional), the page to be fetched
  • page_size: (Integer, optional), the number of rows to be return for each paginated result
  • doc_ids: (list of Integers, optional), limits the query to the documents matching the ids contained in the list
  • forms: (list of Strings, optional), limits the query to the documents containing one of the forms in the list (OR)
  • lemmas: (Integer, optional), limits the query to the documents containing one of the lemmas in the list (OR)
  • created_at_start_date: (Integer, optional), limits the query to the documents created after the specified date
  • created_at_end_date: (Integer, optional), limits the query to the documents created before the specified date

Output

  • count: the number of results
  • has_next: more results are available
  • data: the documents matching the query

Example: { "count": 569, "has_next": true, "data": [ { "sentiment_positive_negative_probability": 0.0183275410862281, "sentiment_value": "NEUTRAL", "named_entity_executed": false, "postagging_executed": false, "language": "IT", "sentiment_negative_probability": 0.000239243180282673, "created_at": "2017-06-01T09:19:42.886133Z", "parsing_executed": false, "sentiment_neutral_probability": 0.963801449262715, "sentiment_positive_probability": 0.017631766470774, "witness_yes_probability": null, "sentiment_executed": true, "doc_time": "2017-06-01T09:19:42.886159Z", "witness_no_probability": null, "witness_value": null, "witness_executed": false, "raw_text": "Il Presidente Silvio Berlusconi: Domenica 11 giugno scegliete i candidati di Forza Italia https://t.co/DQlFhmxxSf", "id": 4 }, { "sentiment_positive_negative_probability": 0.224048973755481, "sentiment_value": "NEUTRAL", "named_entity_executed": false, "postagging_executed": false, "language": "IT", "sentiment_negative_probability": 0.000230149411837146, "created_at": "2017-06-01T09:19:43.591283Z", "parsing_executed": false, "sentiment_neutral_probability": 0.409358290883985, "sentiment_positive_probability": 0.366362585948697, "witness_yes_probability": null, "sentiment_executed": true, "doc_time": "2017-06-01T09:19:43.591310Z", "witness_no_probability": null, "witness_value": null, "witness_executed": false, "raw_text": "RT @forza_italia: VIDEO | Berlusconi: l'11 giugno votate i candidati di Forza Italia, sono competenti, onesti e capaci. https://t.co/ySfixo\u2026", "id": 20 } ] }

GET

Calculates the similarity score between two documents with id doc_id_1 and doc_id_2.

Parameters

doc_id_1: The id of the first document.

doc_id_2: The id of the second document.

Output

result (Float, optional): The similarity score between the two documents. The similarity score is in range [0, 1]. If the similarity of the documents is not defined, null is returned.

error (String, optional): If any error is occurred in the score computation, this value represents the error that has been occurred. Allowed value is: nan_result_exception.

Example: {'result': 0.9999999999999999, 'error': null}

POST

Asynchronously performs clustering on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the clustering term will be performed.

Output

Returns the id of the clustering asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET

Fetches the result of the clustering on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

Output

  • status: (String) Possible values are "OK" if the clustering process is completed, "IN_PROGRESS" otherwise.
  • result: (Dict), available only if status is "OK", otherwise null, contains a tree representation on the performed clustering.

Example: { "status": "OK", "id": 1, "result": { "centroid_doc_id": "3", "node_id": 8, "children": [ { "centroid_doc_id": "3", "node_id": 6, "children": [ { "centroid_doc_id": "3", "node_id": 2, "n_documents": 1, "document_id": "3" }, { "centroid_doc_id": "4", "node_id": 3, "n_documents": 1, "document_id": "4" } ], "n_documents": 2 }, { "centroid_doc_id": "5", "node_id": 7, "children": [ { "centroid_doc_id": "5", "node_id": 4, "n_documents": 1, "document_id": "5" }, { "centroid_doc_id": "1", "node_id": 5, "children": [ { "centroid_doc_id": "1", "node_id": 0, "n_documents": 1, "document_id": "1" }, { "centroid_doc_id": "2", "node_id": 1, "n_documents": 1, "document_id": "2" } ], "n_documents": 2 } ], "n_documents": 3 } ], "n_documents": 5 } }

GET

Performs the syntactic analysis task selected document with primary id pk

Output

Returns the status of the action. Example: {'status': "OK"}

GET

Performs named entity extraction task task selected document with primary id pk

Output

Returns the status of the action. Example: {'status': "OK"}

GET

Performs sentiment analysis task on the selected document with primary id pk

Output

Returns the status of the action. Example: {'status': "OK"}

GET

Performs witness identification task on the selected document with primary id pk.

The following metadata are considered in classification:

tweet_source: the client used to write the tweet.

tweet_geo_dist: the spatial distance from the event expressed in km (e.g. 1.5).

tweet_time_dist: the temporal distance from the event expressed in seconds.

Output

Returns the status of the action. Example: {'status': "OK"}

GET

Returns all the document and linguistic information available in a previously loaded document with id pk.

Parameters

page: (Integer, optional), the page to be fetched

Output

A JSON response containing the following fields.

  • postagging_executed (Bool): true if if the postagging tasks performed on the selected document, false otherwise.

  • sentiment_executed (Bool): true if the sentiment classifier was performed on the selected document, false otherwise.

  • sentiment_sentence_executed (Bool): true if the sentiment classifier was performed on the sentences of the selected document, false otherwise.

  • sentiment_positive_probability (float): if available, the probability assigned by the sentiment classifier of being positive of the document.

  • sentiment_negative_probability (float): if available, the probability assigned by the sentiment classifier of being negative of the document.

  • sentiment_neutral_probability (float): if available, the probability assigned by the sentiment classifier of being neutral of the document.

  • sentiment_positive_negative_probability (float): if available, the probability assigned by the sentiment classifier of being positive and negative of the document.

  • sentiment_positive_probability (float): if available, the probability assigned by the sentiment classifier of being positive of the document.

  • sentiment_value (String): if available, the assigned sentiment class.

  • witness_executed (Bool): true if the witness classifier was performed on the selected document, false otherwise.

  • witness_yes_probability (float): if available, the probability assigned by the witness classifier of being `witness' of the document.

  • witness_no_probability (float): if available, the probability assigned by the witness classifier of being `not witness' of the document.

  • witness_value (String): if available, the assigned witness class of the document assigned by the witness classifier.

  • hate_executed (Bool): true if the hate classifier was performed on the selected document, false otherwise.

  • hate_yes_probability (float): if available, the probability assigned by the hate classifier of being `hate' of the document.

  • hate_no_probability (float): if available, the probability assigned by the hate classifier of being `not hate' of the document.

  • hate_value (String): if available, the assigned hate class of the document assigned by the hate classifier.

  • named_entity_executed (Bool): true if the named entity extraction process was performed on the selected document, false otherwise.

  • language (String): the language of the document. This value is returned accordingly to the selected language option of the /documents (POST) API.

  • parsing_executed (Bool): true if the syntactic parsing process was performed on the executed on the selected document, false otherwise.

  • readability_executed (Bool): true if the readability classifier was performed on the document, false otherwise.

  • readability_score_all (float): if available, a score in range [0, 100] representing the global complexity of the document.

  • readability_score_lexical (float): if available, a score in range [0, 100] representing the lexical complexity of the document.

  • readability_score_base (float): if available, a score in range [0, 100] representing the base complexity of the document.

  • readability_score_syntax (float): if available, a score in range [0, 100] representing the syntactic complexity of the document.

  • sentences (dictionary): contains linguistic and entity information on the sentences of the selected document document. The information on the sentences are paginated. Fields:

    • prev (String), the url pointing to the previous page, if available.
    • next (String), the url pointing to the next page, if available.
    • count (Integer), the number sentences of the document.
    • data (List), each element of the list contains information for each sentence, represented by the following dictionary:

      • sequence (Integer), the number of the sentence with respect to the document
      • raw_text (String), the raw text related to the current sentence
      • readability_score_all (float): if available, a score in range [0, 100] representing the global complexity of the sentence.

      • readability_score_lexical (float): if available, a score in range [0, 100] representing the lexical complexity of the sentence.

      • readability_score_base (float): if available, a score in range [0, 100] representing the base complexity of the sentence.

      • readability_score_syntax (float): if available, a score in range [0, 100] representing the syntactic complexity of the sentence.

      • sentiment_executed (Bool): true if the sentiment classifier was performed on the selected sentence, false otherwise.

      • sentiment_positive_probability (float): if available, the probability assigned by the sentiment classifier of being positive of the sentence.

      • sentiment_negative_probability (float): if available, the probability assigned by the sentiment classifier of being negative of the sentence.

      • sentiment_neutral_probability (float): if available, the probability assigned by the sentiment classifier of being neutral of the sentence.

      • sentiment_positive_negative_probability (float): if available, the probability assigned by the sentiment classifier of being positive and negative of the sentence.

      • sentiment_positive_probability (float): if available, the probability assigned by the sentiment classifier of being positive of the sentence.

      • sentiment_value (String): if available, the assigned sentiment class.

      • tokens (List), each element (dictionary) of the list contains information for each token of the sentence. The dictionary contains the following fields:

        • sequence (Integer), the position of the token in the sentence
        • word (String), the word
        • lemma (String), the lemma
        • ten (String), if available, the grammatical tense of the part-of-speech
        • num (String), if available, represents the grammatical number of the part-of-speech
        • per (String), if available, represents the grammatical person of the part-of-speech
        • gen (String), if available, represents the grammatical gender of the part-of-speech
        • mod (String), if available, the grammatical mood of the part-of-speech
        • cpos (String), the coarse grained part-of-speech
        • pos (String), the fine grained part-of-speech
        • dep_type (String), if available, represents the type of dependency with respect to the parent token.
        • dep_parent (Integer), The sequence number of the parent token with respect to the current sentence.
        • named_entity_instance (Dictionary) , if the named entity extraction process was performed and this field is available, the token is part of a named entity instance. The dictionary contains the following fields:
          • id (Integer), the identifier of the named entity instance
          • entity_type (String), the type of the named entity {Person (PER), Organization (ORG), Location (LOC), Geopolitical Entity (GPE)}

For a more detailed description of the part-of-speech tagset for Italian, please refer to: : http://www.italianlp.it/docs/ISST-TANL-POStagset.pdf

{ "created_at":"2017-06-19T09:46:09.851844Z", "doc_time":"2017-06-19T09:46:09.851876Z", "language":"IT", "named_entity_executed":true, "postagging_executed": true, "parsing_executed":true, "readability_executed" : true, "readability_score_all": 60, "readability_score_base": 40, "readability_score_lexical": 30, "readability_score_syntax": 60, "sentiment_executed":true, "sentiment_sentence_executed":true, "sentiment_negative_probability":0.396968678319151, "sentiment_neutral_probability":0.568241364127122, "sentiment_positive_negative_probability":0.00327191886038508, "sentiment_positive_probability":0.0315180386933427 "sentiment_value":"NEUTRAL", "witness_executed":true, "witness_no_probability":0.597324983897493, "witness_value":"NO", "witness_yes_probability":0.402675016102507, "hate_executed":true, "hate_no_probability":0.597324983897493, "hate_value":"hate", "hate_yes_probability":0.402675016102507, "sentences":{ "count":1, "prev":null, "data":[ { "tokens":[ { "word":"Mario", "ten":null, "sequence":1, "pos":"SP", "named_entity_instance":{ "sentence":12, "id":34, "entity_type":"PER" }, "lemma":"Mario", "num":null, "per":null, "dep_type":"subj", "cpos":"S", "dep_parent":68, "gen":null, "mod":null }, { "word":"va", "ten":"p", "sequence":2, "pos":"V", "named_entity_instance":null, "lemma":"andare", "num":"s", "per":"3", "dep_type":"ROOT", "cpos":"V", "dep_parent":null, "gen":null, "mod":"i" }, { "word":"in", "ten":null, "sequence":3, "pos":"E", "named_entity_instance":null, "lemma":"in", "num":null, "per":null, "dep_type":"comp_loc", "cpos":"E", "dep_parent":68, "gen":null, "mod":null }, { "word":"Spagna", "ten":null, "sequence":4, "pos":"SP", "named_entity_instance":{ "sentence":12, "id":35, "entity_type":"GPE" }, "lemma":"Spagna", "num":null, "per":null, "dep_type":"prep", "cpos":"S", "dep_parent":69, "gen":null, "mod":null }, { "word":"con", "ten":null, "sequence":5, "pos":"E", "named_entity_instance":null, "lemma":"con", "num":null, "per":null, "dep_type":"comp", "cpos":"E", "dep_parent":68, "gen":null, "mod":null }, { "word":"Luca", "ten":null, "sequence":6, "pos":"SP", "named_entity_instance":{ "sentence":12, "id":36, "entity_type":"PER" }, "lemma":"Luca", "num":null, "per":null, "dep_type":"prep", "cpos":"S", "dep_parent":71, "gen":null, "mod":null } ], "readability_score_all": 60, "readability_score_lexical": 30, "readability_score_syntax": 60, "readability_score_base": 40, "sentiment_executed":true, "sentiment_negative_probability":0.396968678319151, "sentiment_neutral_probability":0.568241364127122, "sentiment_positive_negative_probability":0.00327191886038508, "sentiment_positive_probability":0.0315180386933427 "sentiment_value":"NEUTRAL", "sequence":1 } ], "next":null } }

GET

Performs and return the lingustic monitoring of a document with id pk.

Requirement: the document must be syntactically parsed before calling this API.

Output

{ "result": { "morpho_syntax": { "morpho_syntax_distribution": { "pos_num": { "A": 6, "VA": 4, "AP": 2, "B": 18, "E": 28, "DI": 1, "CC": 9, "BN": 1, "PR": 1, "EA": 14, "N": 1, "RD": 25, "PC": 12, "S": 58, "FS": 10, "T": 2, "FF": 6, "V": 39, "CS": 4, "SP": 1, "RI": 6 }, "cpos_distr": { "A": 0.03225806451612903, "C": 0.05241935483870968, "B": 0.07661290322580645, "E": 0.1693548387096774, "D": 0.004032258064516129, "F": 0.06451612903225806, "N": 0.004032258064516129, "P": 0.05241935483870968, "S": 0.23790322580645162, "R": 0.125, "T": 0.008064516129032258, "V": 0.17338709677419356 }, "pos_distr": { "A": 0.024193548387096774, "VA": 0.016129032258064516, "B": 0.07258064516129033, "E": 0.11290322580645161, "PC": 0.04838709677419355, "CC": 0.036290322580645164, "BN": 0.004032258064516129, "PR": 0.004032258064516129, "EA": 0.056451612903225805, "N": 0.004032258064516129, "RD": 0.10080645161290322, "AP": 0.008064516129032258, "S": 0.23387096774193547, "FS": 0.04032258064516129, "DI": 0.004032258064516129, "T": 0.008064516129032258, "FF": 0.024193548387096774, "V": 0.15725806451612903, "CS": 0.016129032258064516, "SP": 0.004032258064516129, "RI": 0.024193548387096774 }, "cpos_num": { "A": 8, "C": 13, "B": 19, "E": 42, "D": 1, "F": 16, "N": 1, "P": 13, "S": 59, "R": 31, "T": 2, "V": 43 }, "conj_distr": { "sub": 0.30769230769230765, "coord": 0.6923076923076923 } } }, "syntax": { "principals_vs_subordinates_ratio": { "subordinates_ratio": 0.17307692307692307, "principals_ratio": 0.8269230769230769 }, "average_number_of_tokens_per_proposition": 5.767441860465116, "average_max_tree_height": 6.7, "average_length_linear_dependency": 2.2927927927927927, "average_number_of_dependents_for_head_verb": { "avg": 1.794871794871795, "num_per_arity": { "1": 12, "0": 3, "3": 6, "2": 16, "4": 2 } }, "syntax_categories": { "num": { "clit": 11, "comp_temp": 2, "punc": 14, "sub": 4, "pred": 1, "comp": 30, "arg": 10, "det": 31, "comp_loc": 2, "mod_loc": 2, "mod_temp": 3, "ROOT": 10, "obj": 12, "mod": 33, "neg": 1, "aux": 3, "conj": 12, "mod_rel": 3, "subj": 11, "prep": 42, "con": 11 }, "distr": { "clit": 0.04435483870967742, "comp_temp": 0.008064516129032258, "obj": 0.04838709677419355, "sub": 0.016129032258064516, "subj": 0.04435483870967742, "pred": 0.004032258064516129, "arg": 0.04032258064516129, "det": 0.125, "comp_loc": 0.008064516129032258, "mod_temp": 0.012096774193548387, "ROOT": 0.04032258064516129, "aux": 0.012096774193548387, "mod": 0.13306451612903225, "neg": 0.004032258064516129, "punc": 0.056451612903225805, "comp": 0.12096774193548387, "mod_rel": 0.012096774193548387, "mod_loc": 0.008064516129032258, "prep": 0.1693548387096774, "conj": 0.04838709677419355, "con": 0.04435483870967742 } }, "avg_proposition_per_period": 3.9, "subordinate_chains_statistics": { "num_per_chain_length": { "1": 2 }, "avg": 1.0 } }, "lexical_info": { "lexical_density": 0.5387931034482759, "vdb_info": { "alta_disp_perc": 0.11981566820276497, "alto_uso_perc": 0.1382488479262673, "lessico_fondamentale_perc": 0.7419354838709677, "vdb_perc": 0.9434782608695652 } }, "basic_info": { "average_sentence_length": 24.8, "num_sentences": 10, "num_tokens": 248, "average_word_length": 4.149193548387097, "type_token_ratio": { "300": { "lemmas": 0.4475806451612903, "words": 0.5564516129032258 }, "200": { "lemmas": 0.47, "words": 0.575 }, "100": { "lemmas": 0.56, "words": 0.71 }, "500": { "lemmas": 0.4475806451612903, "words": 0.5564516129032258 }, "400": { "lemmas": 0.4475806451612903, "words": 0.5564516129032258 } } } } }

A JSON response containing the following fields.

POST

Asynchronously performs the term extraction on a set of documents.

Parameters

  • doc_ids: (List of Integers) the documents ids on which the term extraction will be performed.

  • configuration: (Dictionary, optional): the Term Extractor configuration to be used. Fields:

    • pos_start_term (List), A list of accepted pos of the start of the term (e.g: ["c:S", "p:V", "p:VA"]). Syntax: "(c|f):POS" where 'c' and 'f': means coarse and fine respectively.
    • pos_internal_term (List), A list of accepted pos of the internal part of the term (e.g: ["p:V", "p:VA"]).
    • pos_end_term (List), A list of accepted pos of the end the term (e.g: ["p:V", "p:VA"]).
    • statistical_threshold_single (Integer, optional, default: 30), the maximum amount of the extracted single terms.
    • statistical_threshold_multi (Integer, optional, default: 100), the maximum amount of extracted multi terms.
    • statistical_frequency_threshold (Integer, optional, default:0): the minimum frequency of the term in order to be extracted.
    • max_length_term (Integer, optional, default: 5) : The max length of the extracted term.
    • apply_contrast (bool, default: False): whether to apply a contrastive filter. By default a journalistic corpus will be used as a contrastive corpus.
    • contrast_doc_ids (List of Integers, optional): if not empty, the documents specified in the parameter will be used as the contrastive corpus. Requires `apply_contrast': True.

For a more detailed description of the part-of-speech tagset for italian, please refer to: http://www.italianlp.it/docs/ISST-TANL-POStagset.pdf

Output

Returns the id of the term extraction asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET

Fetches the result of a term extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

Output

  • status: (String) Possible values are "OK" if the extraction process is completed, "IN_PROGRESS" otherwise.
  • terms: (List), available only if status is "OK", otherwise null
    • term: (String) the words which compose the term
    • domain_relevance: (Integer) the relevance of the term in the selected document collection
    • frequency: (Integer) the frequency of this entity in the selected documents

Example: { "status": "OK", "terms": [ { "term": "giornata", "frequency": 10, "domain_relevance": 100 } ] }

POST

Asynchronously performs the named entity extraction on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the named entity extraction will be performed.

Output

Returns the id of the named entity extraction asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET

Fetches the result of a named entity extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

Output

  • status: (String) Possible values are "OK" if the extraction process is completed, "IN_PROGRESS" otherwise.
  • named_entities: (List), available only if status is "OK", otherwise null
    • words: (String) the words which compose the named entity
    • entity_type: (String) the type of the named entity {GPE, LOC, ORG, PER}
    • frequency: (Integer) the frequency of this entity in the selected documents

Example: { "status": "OK", "named_entities": [ { "frequency": 10, "words": "Roma", "entity_type": "GPE" } ] }

POST (content-type: application/json)

Asynchronously performs the relation extraction on a set of documents.

Parameters

doc_ids: (List of Integers) the documents ids on which the relation extraction will be performed.

selected_terms : (List of Strings, optional), the terms which will be selected in the relation graph. Example usage:

'selected_terms': ['carta di credito']

selected_named_entities : (List of Dictionaries of Strings, optional), the named entitities which will be selected in the relation graph. 'selected_named_entities': {'GPE': ["Svezia'], 'PER: ['Luca']} Note, one of selected_named_entities or selected_named_entities must specified.

Output

Returns the id of the relation extaction asynchronous operation, to be used to fetch the result when ready.

Example: {'id': 121}

GET (DEPRECATED! Use /documents/relation_extraction/fetch )

Fetches the result of a relation extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

output_format: (String, optional) the output format of the call to this API. if "gexf" is specified, a gexf response representing the graph is returned. The type of the graph can be specified through the "gexf_matrix_type" parameter ("freq", "cosine", "log_likelihood").

Output

  • status: (String) Possible values are "OK" if the relation extraction process is completed, "IN_PROGRESS" otherwise.
  • graphs: (Dictionary), available only if status is "OK", otherwise null.
    • nodes: (Dictionary): the nodes of the graph. For each node are reported the frequency in the analyzed corpus, the entity type (TERM, ORG, PER, GPE, LOC) and the words representing the entity.
    • freq: (Dictionary): the arcs of the graph calculated with respect to the frequency metric. For each arch the frequency of the relation is reported.
    • cosine: (Dictionary): the arcs of the graph calculated with respect to the cosine metric. For each arch the weight of the relation is reported.
    • log_likelihood: (Dictionary): the arcs of the graph calculated with respect to the log likelihood metric. For each arch the weight of the relation is reported.

Example: { "status": "OK", "graphs": { "nodes": { "1": { "freq": 14.0, "type": "TERM", "words": "rilievi di Carabinieri" }, "0": { "freq": 14.0, "type": "ORG", "words": "Scientifica" }, "2": { "freq": 14.0, "type": "TERM", "words": "corso" } }, "relations": { "freq": { "1": { "1": 2.0, "0": 2.0, "2": 2.0 }, "0": { "1": 2.0, "0": 2.0, "2": 2.0 }, "2": { "1": 2.0, "0": 2.0, "2": 2.0 } }, "cosine": { "1": { "1": 1.0, "0": 0.4999999999999999, "2": 0.4999999999999999 }, "0": { "1": 0.4999999999999999, "0": 1.0, "2": 0.4999999999999999 }, "2": { "1": 0.4999999999999999, "0": 0.4999999999999999, "2": 1.0 } }, "log_likelihood": { "1": { "0": 0.3801404006531304, "2": 0.3801404006531304 }, "0": { "1": 0.3801404006531304, "2": 0.3801404006531304 }, "2": { "1": 0.3801404006531304, "0": 0.3801404006531304 } } } } }

POST

Fetches the result of a relation extraction on a set of documents.

Parameters

id: (Integer) The id obtained by a POST call to this API.

output_format: (String, optional) the output format of the call to this API. if "gexf" is specified, a gexf response representing the graph is returned. The type of the graph can be specified through the "gexf_matrix_type" parameter ("freq", "cosine", "log_likelihood").

filter_nodes: (Dictionary, optional): filters a subset of the nodes of the graph. Example: {"TERM": ["giorni"], "GPE": ["Stoccolma"]}

filter_nodes_frequency: (Integer, optional): selects the nodes of the graph that have frequency greater or equal than the one specified by the value of the parameter.

filter_edge_threshold: (Float, optional): selects the edges of the graph that have value greater or equal than the value specified by the value of the parameter.

Output

  • status: (String) Possible values are "OK" if the relation extraction process is completed, "IN_PROGRESS" otherwise.
  • graphs: (Dictionary), available only if status is "OK", otherwise null.
    • nodes: (Dictionary): the nodes of the graph. For each node are reported the frequency in the analyzed corpus, the entity type (TERM, ORG, PER, GPE, LOC) and the words representing the entity.
    • freq: (Dictionary): the arcs of the graph calculated with respect to the frequency metric. For each arch the frequency of the relation is reported.
    • cosine: (Dictionary): the arcs of the graph calculated with respect to the cosine metric. For each arch the weight of the relation is reported.
    • log_likelihood: (Dictionary): the arcs of the graph calculated with respect to the log likelihood metric. For each arch the weight of the relation is reported.

Example: { "status": "OK", "graphs": { "nodes": { "1": { "freq": 14.0, "type": "TERM", "words": "rilievi di Carabinieri" }, "0": { "freq": 14.0, "type": "ORG", "words": "Scientifica" }, "2": { "freq": 14.0, "type": "TERM", "words": "corso" } }, "relations": { "freq": { "1": { "1": 2.0, "0": 2.0, "2": 2.0 }, "0": { "1": 2.0, "0": 2.0, "2": 2.0 }, "2": { "1": 2.0, "0": 2.0, "2": 2.0 } }, "cosine": { "1": { "1": 1.0, "0": 0.4999999999999999, "2": 0.4999999999999999 }, "0": { "1": 0.4999999999999999, "0": 1.0, "2": 0.4999999999999999 }, "2": { "1": 0.4999999999999999, "0": 0.4999999999999999, "2": 1.0 } }, "log_likelihood": { "1": { "0": 0.3801404006531304, "2": 0.3801404006531304 }, "0": { "1": 0.3801404006531304, "2": 0.3801404006531304 }, "2": { "1": 0.3801404006531304, "0": 0.3801404006531304 } } } } }

twitter-monitor

POST

Creates a Twitter Monitor.

Parameters

  • name (String): The descriptive name assigned to the monitor.

  • sample_ratio (Float): A value in range [0, 1] that represents the sampling ratio assigned to the monitor. Values next to 1 indicate the downloads the most of the tweets, while values next to 0 indicate the monitor discards the majority of the tweets.

  • query (String): The query that is used to extract tweets from Twitter. Syntax: (keyword)+ (OR (keyword)+)*. Example: "Matteo Renzi OR Silvio Berlusconi" fetches all the tweets containing Matteo Renzi or Silvio Berlusconi.

  • seconds_update (Integer): The number of seconds after the monitor fetches a new batch of tweets from Twitter (default 3600).

  • until (ISO-8601 Date, optional): If provided, do not fetches tweets before the selected date. (Example format: 2018-10-22T00:00:00.000Z)

  • custom_import (boolan, default: False) : If set to True, the monitor contents must be manually populated through the `populate_custom_import' API

GET

Returns all the created Twitter Monitors.

Output

A JSON List containing the following fields.

  • id (Integer): The id of the monitor

  • name (String): The descriptive name assigned to the monitor.

  • created_at (ISO-8601 Date): The creation date of the monitor.

  • sample_ratio (Float): A value in range [0, 1] that represents the sampling ratio assigned to the monitor. Values next to zero indicate the the monitor discards the most of the tweets , while values next to 1 indicate the the monitor retains the majority of the tweets.

  • query (String): The query which is used to extract tweets from Twitter.

  • enabled (Bool): True if the monitor is enabled, False otherwise.

[ { "name":"Matteo Renzi", "created_at":"2017-10-13T08:42:21.727533Z", "enabled":true, "query":"renzi", "sample_ratio":1.0, "id":1 } ]

DELETE

Deletes a Twitter Monitor.

Parameters

  • id (Integer): The monitor to be deleted.

POST

Manually populates content of a Twitter Monitor with id `pk'

Parameters

  • posts (List): The array of posts to be imported. Each post must contain the following fields:
    • id (Integer): The id of the document
    • text (String) : The text of the document
    • date (ISO-8601 Date): The date of the document
    • username (String) : The username that wrote the document
    • permalink (String : A link to the post id

GET

Retrieves the list of tweets fetched by a monitor with id pk.

The list of tweets is ordered in descending order of sentiment value, showing the most relevant positive, negative and neutral tweets.

Parameters

  • num_max_tweets (Integer), The max number of tweets which will be returned.
  • sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
  • min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
  • max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.

Output

A JSON Dictionary containing the following list:

  • results: JSON array composed by:
    • date (ISO-8601 Date), The date in which the tweet was written.
    • text (String), The text of the tweet.
    • sentiment_value (String), The sentiment of the tweet.
    • sentiment_value (Float), A value in [0, 1] representing the probability of belonging the the sentiment_value class.
    • document_id (Integer), The document id of the tweet. All the details of the document can be fetched using /documents/details/ API.
    • username (String), The Twitter username of the tweet.
    • num_likes (Integer), The number of likes that the tweet has received.
    • num_retweets (Integer), The number of retweets of this tweet.

Example: ``` {"results": [ { "date":"2017-10-13T08:10:00Z", "text":"Educare in situazioni difficili Alle 10.30 a @LaRadioNeParla parliamo del progetto...", "sentiment_value":"NEUTRAL", "sentiment_probability":0.909674979458854, "document_id":918750655363268608, "username": "an-example-username", "num_likes": 1, "num_retweets": 3

} ] } ```

GET

Performs a term extraction job on the monitor tweets according to the specified filter in the parameters.

Parameters

  • sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
  • min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
  • max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.

Output

A JSON Dictionary containing the following list:

  • term_extraction_id: the id the must be used on order to fetch the most relevants terms with the /documents/term_extraction API.

Example: {"term_extraction_id": 5}

Returns 400 status code if no tweets are available.

GET

Performs a named entity extraction job on the monitor tweets according to the specified filter in the parameters.

Parameters

  • sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
  • min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
  • max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.

Output

A JSON Dictionary containing the following list:

  • named_entity_extraction_id: the id the must be used on order to fetch the named entities with the /documents/named_entity_extraction API.

Example: {"named_entity_extraction_id": 5}

Returns 400 status code if no tweets are available.

GET

Performs a named entity extraction job on the monitor tweets according to the specified filter in the parameters.

Parameters

  • sentiment (String, optional), filter the list selecting only the tweets that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
  • min_date (ISO-8601 Date, optional), Filters the list returning only the tweets written after the specified date.
  • max_date (ISO-8601 Date, optional) , Filters the list returning only the tweets written before the specified date.
  • selected_terms : (List of Strings, optional), the terms which will be selected in the relation graph. Example usage:

'selected_terms': ['carta di credito'] - selected_named_entities : (List of Dictionaries of Strings, optional), the named entitities which will be selected in the relation graph. Example usage: 'selected_named_entities': {'GPE': ["Svezia'], 'PER: ['Luca']} Note, one of selected_named_entities or selected_named_entities must specified.

Output

A JSON Dictionary containing the following list:

  • relation_extraction_id: the id the must be used on order to fetch the results with the /documents/relation_extraction API.

Example: {"relation_extraction_id": 5}

Returns 400 status code if no tweets are available.

GET

Returns a summary of the sentiment of the tweets fetched by the tweet monitor.

Parameters

  • group_by (String), allowed values are day, week, month

Output

A JSON Dictionary containing the following fields.

  • data: An array of results. The array is ordered by time interval. For each time interval, an object containing the following fields is returned:
    • date (ISO-8601 Date), The date in which begins the selected interval.
    • avg_prob_neg (Float), The average sentiment negative probability.
    • avg_prob_neu (Float), The average sentiment neutralprobability.
    • avg_prob_pos (Float), The average sentiment positive probability.
    • avg_prob_pos_neg (Float), The average sentiment positive/negative probability.
    • num_neg (Integer), The number of negative tweets.
    • num_neu (Integer), The number of neutral tweets.
    • num_pos (Integer), The number of positive tweets.
    • num_pos_neg (Integer), The number of positive/negative tweets.

Example: { "data":[ { "date":"2017-10-13T00:00:00Z", "avg_prob_neg":0.178653849125782 "avg_prob_neu":0.708800928388736, "avg_prob_pos":0.0982935014690109, "avg_prob_pos_neg":0.0142517210164714, "num_neg":4, "num_neu":26, "num_pos":3, "num_pos_neg":0, } ] }

GET

Performs an action on the Twitter Monitor.

Parameters

  • id (Integer): The monitor on which the action will be performed.
  • action (String): The action to be performed on the monitor. Available actions are:
    • disable : disables the monitor from fetching new tweets.
    • enable : resumes a previously disabled monitor.

facebook-monitor

POST

Creates a Facebook Monitor, which fetches comments on posts written in specific Facebook pages according to the specified parameters. Each comment is classified with sentiment values and hate values.

Parameters

  • name (String): The descriptive name assigned to the monitor.

  • seconds_update(Integer, optional): The number of seconds after that the monitor fetches new comments from the monitored Facebook pages (default 3600).

  • until (ISO-8601 Date, optional): If provided, do not fetches post and comments before the selected date.

  • page_ids (List of Integers): the page ids that must be monitored Page ids can be obtained through this service: https://findmyfbid.in/

IMPORTANT: a post becomes not monitored if it does not receive comments after 2 days from the last comment date.

GET

Returns all the created Facebook Monitors.

Output

A JSON List containing the following fields.

  • id (Integer): The id of the monitor

  • name (String): The descriptive name assigned to the monitor.

  • created_at (ISO-8601 Date): The creation date of the monitor.

  • enabled (Bool): True if the monitor is enabled, False otherwise.

  • fb_pages (List of dictionaries): the list of the monitored pages.

Example: [ { "id":1, "enabled":true, "fb_pages":[ { "fb_id":"56369076544", "name":"Beppe Grillo" } ], "name":"Pagine di Beppe Grillo", "created_at":"2017-11-23T15:22:36.193021Z" } ]

DELETE

Deletes a Facebook Monitor.

Parameters

  • id (Integer): The monitor to be deleted.

GET

Performs an action on the Facebook Monitor.

Parameters

  • id (Integer): The monitor on which the action will be performed.
  • action (String): The action to be performed on the monitor. Available actions are:
    • disable : disables the monitor from fetching new comments.
    • enable : resumes a previously disabled monitor.

GET

Returns a summary of the sentiment of the comments fetched by the Facebook monitor.

Parameters

  • group_by (String), allowed values are day, week, month
  • query (String, optional): the query that will be used to filter the comments. Query syntax: (keywords)+ ((OR (keywords)+) | (AND (keywords)+) )*. Parenthesis can be used to disambiguate AND and OR: Example: keywords1 AND (keywords2 OR keywords3) is different from (keywords1 AND keywords2) OR keywords3 (AND has higher precedence over OR)

Output

A JSON Dictionary containing the following fields.

  • data: An array of results. The array is ordered by time interval. For each time interval, an object containing the following fields is returned:
    • date (ISO-8601 Date), The date in which begins the selected interval.
    • avg_prob_neg (Float), The average sentiment negative probability.
    • avg_prob_neu (Float), The average sentiment neutral probability.
    • avg_prob_pos (Float), The average sentiment positive probability.
    • avg_prob_pos_neg (Float), The average sentiment positive/negative probability.
    • num_neg (Integer), The number of negative comments.
    • num_neu (Integer), The number of neutral comments.
    • num_pos (Integer), The number of positive comments.
    • num_pos_neg (Integer), The number of positive/negative comments.
    • avg_prob_hate (Float), The average hate probability accross all comments.
    • num_hate (Integer), The number of hate comments.
    • num_no_hate (Integer), The number of not hate comments.

Example: { "data":[ { "avg_prob_neg":0.766765100847907, "avg_prob_neu":0.12048334014573, "avg_prob_pos":0.0409078342632469, "avg_prob_pos_neg":0.0718437247431161, "date":"2017-11-23T00:00:00Z", "num_neg":44, "num_neu":5, "num_pos":1, "num_pos_neg":0, "avg_prob_hate":0.614693956070406, "num_hate":33, "num_no_hate":33 } ] }

GET

Retrieves the list of comments fetched by a monitor with id pk.

The list of comments is ordered in descending order of sentiment value, showing the most relevant positive, negative and neutral comments.

Parameter

  • query (String, optional): the query that will be used to filter the comments. Query syntax: (keywords)+ ((OR (keywords)+) | (AND (keywords)+) )*. Parenthesis can be used to disambiguate AND and OR: Example: keywords1 AND (keywords2 OR keywords3) is different from (keywords1 AND keywords2) OR keywords3 (AND has higher precedence over OR)
  • num_max_comments (Integer), The max number of comments which will be returned.
  • sentiment (String), filter the list selecting only the comments that matches the specified sentiment value. Allowed values are "POSITIVE", "NEGATIVE" and "NEUTRAL".
  • min_date (ISO-8601 Date), Filters the list returning only the comments written after the specified date.
  • max_date (ISO-8601 Date), Filters the list returning only the comments written before the specified date.
  • launch_term_extraction (String, optional), if set to "true", the response will return a term_extraction_id that can be used to extract the most relevant term in the tweet collection

Output

A JSON List containing the following fields.

  • date (ISO-8601 Date), The date in which the comment was written.
  • text (String), The text of the comment.
  • sentiment_value (String), The sentiment of the comment.
  • sentiment_value (Float), A value in [0, 1] representing the probability of belonging the the sentiment_value class.
  • document_id (Integer), The document id of the comment. All the details of the document can be fetched using /documents/details/ API.
  • launch_term_extraction (String, optional), if set to "true", the response will return a term_extraction_id that can be used to extract the most relevant term in the tweet collection

Example: {"results": [ { "date":"2017-10-13T08:10:00Z", "text":"Educare in situazioni difficili Alle 10.30 a @LaRadioNeParla parliamo del progetto...", "sentiment_value":"NEUTRAL", "sentiment_probability":0.909674979458854, "document_id":918750655363268608, "username": "an-example-username" }, "term_extraction_id": 5 ] }

system_stats

GET

Returns information on the pending tasks over time on the main processing queue.

Example: { celery_queue_stats: [ { current_length: 11691, current_date_time: "2018-03-06T13:51:35.573624" }, { current_length: 11725, current_date_time: "2018-03-06T13:51:45.579795" }, { current_length: 11757, current_date_time: "2018-03-06T13:51:55.581251" }, { current_length: 11787, current_date_time: "2018-03-06T13:52:05.587459" }, { current_length: 11804, current_date_time: "2018-03-06T13:52:15.588029" }] }

ontology

POST

Merges two ontologies in OWL format.

Parameters

ontology1: (OWL) The content of the first OWL file to be merged.

ontology2: (OWL) The content of the second OWL file to be merged.

NOTE: The two ontologies needs to belong to different namespaces in order to be merged, otherwise an error is thrown.

Output

Returns an OWL document representing the union of the two ontologies given in input

Example: {'merged_owl': <CONTENT OF THE MERGED OWL DOCUMENT>}