Newest 'nlp' Questions - Stack Overflow

Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

0
votes
0answers
10 views

Natural Language Processing Error when building MLP model

I am building an MLP model to classify text comments with labels. I have a NumPy array of vectorized comments, and an array of text labels. I have tried vectorizing the labels and determined that ...
-1
votes
0answers
7 views

How to Create an algorithm that can learn from publicly available news articles and summarize them in 40-70 words using NLP

Using Natural Language Processing(NLP) that can pick news article on any topic (like sports, entertainment, etc.) and summarize it in 40-70 words or 4-7 sentences. Create an algorithm that can learn ...
0
votes
0answers
15 views

Algorithm for translating MLB play-by-play records into descriptive text

I'm trying to collect a dataset that could be used for automatically generating baseball articles. I have play-by-play records of MLB games from retrosheet.org that I would like to be written out to ...
0
votes
1answer
12 views

How to combine TF-IDF scores to be the equivalent of concatenating two strings

I have a corpus of 5000 book titles and I am trying to perform some clustering on these. I am using the sklearn TfidfVectorizer library to generate the TF-IDF matrix for each title. However, I now ...
0
votes
0answers
7 views

Can I transform a text into CSAT or NPS?

I am still a beginner with NLP , and I want to have some client satisfaction indicators using NLP. So the input data I have is some client reviews and I would like to know if it's a good idea to ...
1
vote
1answer
23 views

Tweeter content classification problem --> How to extract words with high predictive power?

I am working on a classification problem with Tweeter data. User labeled tweets (relevant, not relevant) are used to train a machine learning classifier to predict if an unseen tweet is relevant or ...
0
votes
0answers
12 views

How to train a pre-trained word2vec/GloVe model?

Suppose I train a word2vec model on Wikipedia text, then can I train this model further using my text document? Will it be useful? Please provide some code if it is possible. Actually, I have a very ...
0
votes
1answer
19 views

Algorithm for finding highly frequent patterns followed by a set of text messages

I have large amount of text messages. I want to find usual patterns followed by these messages (say 20 most common patterns). Example messages: msg1 = "Rahul, Your New Delhi (NDLS) - Agra Cantt (AGC) ...
1
vote
1answer
31 views

How to ignore Null values in a CSV columns with pandas while processing the text?

I have a CSV file and each word in a sentence is represented in cell, with a null cell between each sentence. My problem is in run_id column, after I load the csv file using pandas I separate each ...
-2
votes
2answers
23 views

Checking for dictionary key based on dictionary values in a sentence

I have a dictionary where values correspond to words and their keys correspond to categories for those words. I want to check whether these words/values exist in a sentence, if yes return category/key ...
0
votes
0answers
15 views

Not able to load spacy 'en' module [duplicate]

I am facing issue in installing the spacy en module. The following code: nlp = spacy.load('en' , disable=['ner', 'parser']) Produces the following error: ~\AppData\Local\Continuum\anaconda3\lib\...
-2
votes
1answer
26 views

Match strings which are similar but diffetent in writing style

I want to know is there any easy way to match strings like, i.e MBA with master in buisness administration, M.B.A, M-B-A, mba or Ms in Cs with masters in computer science, or with M.s in Computer ...
0
votes
0answers
19 views

How to use python to develop office complement?

I'm trying to develop a web add-in for office, but I have not managed to call a Python function from the Javascript API. My question is whether there is any way to call the Python function from the ...
0
votes
0answers
16 views

How to transform CoNLL2011 to CoNLL2003

I want to train an NER model with AllenNLP and it seems I either need a CoNLL2003 dataset or need to modify the reader. I have a dataset which is in CoNLL2011 format. Here is a part of it: #begin ...
0
votes
1answer
17 views

How does Beam Search operate on the output of The Transformer?

According to my understanding (please correct me if I'm wrong), Beam Search is BFS where it only explores the "graph" of possibilities down b the most likely options, where b is the beam size. To ...
0
votes
2answers
30 views

Trying to convert text List to lower case but it turns everything to NaN

I am currently trying to work with text data and I am relatively new at this. The column I'm trying to work with is the cast column, as shown below: 0 [Sam Worthington, Zoe Saldana, Sigourney ...
0
votes
0answers
30 views

How to update spaCy’s part-of-speech tagger with phrases which include more than 1 word but the whole phrase has only 1 tag?

How to update spaCy’s part-of-speech tagger with phrases which include more than 1 word but the whole phrase has only 1 tag? When I tokenize a phrase by word, Python throws an “IndexError: list index ...
0
votes
0answers
22 views

How to specify a rule with word repetition?

I am working on a mandarin chinese parser for a school project. For the interogative form, one syntax is to repeat the verb with negation (like "you are not are english ?"). How can I specify a ...
2
votes
1answer
38 views

Extract verbs from sentence in R?

Please note that I am aware of Extracting Nouns and Verbs from Text and it doesn't work for me because the function they use doesn't exist in openNLP package. Here is my column of strings: tibble(...
-3
votes
0answers
20 views

Modern approaches to string segmentation, product identification [on hold]

I know this question has been asked a few times before, but not recently. So I would like to find a modern (most effective and efficient) approach to solving this problem. I have a quite accurate ...
0
votes
0answers
22 views

Different space vector embedding mapping [on hold]

i want to build a model that can map the embedding from another embedding , so the embedding layer would be the output layer ! something like : Embedding_input=>Dense_Layers=>Embedding_output the ...
0
votes
0answers
14 views

What is the better way to use tokenization to get more accuracy?

What extra features can I add for cleaning the text in this code to improve accuracy ? Is it better to use nltk.sent_tokenize and then nltk.word_tokenize or we can directly use nltk.word_tokenize and ...
-1
votes
0answers
20 views

Is BERT supported natively out of Keras? [on hold]

I am using a BertLayer class from https://github.com/strongio/keras-bert/blob/master/keras-bert.ipynb for my NLP work but I am wondering if there is anything which more native in the Keras side or any ...
0
votes
0answers
24 views

Stepping through Java build phases

I'm building a derivation for the standford coreNLP language processing system. I've gotten as far as the derivation below, the system builds and exits correctly (save the commented installPhase lines)...
-1
votes
0answers
16 views

Does BERT implicitly model for word count?

Given that BERT is bidirectional, does it implicitly model for word count in some given text as well? I am asking in the case of classifying data column descriptions as valid or not. I am looking to ...
0
votes
0answers
9 views

Weighting Categorisation Categories in NLP

I would like to build a NLP text classifier in Python that categorises text into one of three categories (A, B or C). It is more important that the classifier classifies A's correctly than B's or C's. ...
0
votes
0answers
11 views

Mallet stops working for large data sets?

I am trying to use LDA Mallet to assign my tweets to topics, and it works perfectly well when I feed it with up to 500,000 tweets, but it seems to stop working when I use my whole data set, which is ...
-6
votes
0answers
32 views

Reconciliation using Artificial Intelligence [on hold]

I am new to Artificial Intelligence, I have done basic study on Python, AI, ML, Tensorflow. I am trying to build a project which can reconcile any given 2 excel or csv files with any number of columns ...
0
votes
0answers
9 views

Training spacy model not working: running the train_ner script has no effect

I am writing a program that uses the spacy model en_core_web_md for Named Entity Recognition. It was not identifying all my entities correctly: for instance, there were some names of people and ...
0
votes
0answers
11 views

seq2seq for NMT: why the decoder keeps predict repeated tokens?

I have build a encoder decoder architecture for machine translation. During the inference, I found that the decoder is generating repeated words, something like: tensor([[ 2, 6, 13, 5, 4, 6, 13, ...
0
votes
0answers
14 views

'FileNotFoundError' while trying to run pyrouge

I've been trying to install Python package pyrouge for a while. Finally by following all these steps here I installed. It was the most helpful answer related to pyrouge I have seen so far. It does not ...
0
votes
0answers
10 views

How to train a model for nlp in automl for images

The probelm i'm stuck with is detecting some kind of information in an image using nlp , i have watched a talk on google next link: https://www.youtube.com/watch?v=gspeD9Jdn_g&list=...
1
vote
1answer
32 views

Spacy NER doesn't identify lowercase entities

I am facing problem to detect named entities which starts with lowercase letter. I have tried the solution provided on link https://github.com/explosion/spaCy/issues/701. It seems to be not working ...
0
votes
0answers
10 views

No data available in CleanNLP entity table

I am using the CleanNLP package in R and I have two annotations - the first (x) represents a single document while the second (y) represents the first 1000 lines of a number of documents read into r (...
0
votes
1answer
23 views

Efficient metrics evaluation in PyTorch

I am new to PyTorch and want to efficiently evaluate among others F1 during my Training and my Validation Loop. So far, my approach was to calculate the predictions on GPU, then push them to CPU and ...
0
votes
0answers
25 views

How to convert Speech to text using google api

what solution will you suggest I want to Convert speech to text, not in English and then translate text to English look for specific keywords save data in the database
2
votes
1answer
18 views

ValueError: [E024] Could not find an optimal move to supervise the parser

I am getting the following error while training spacy NER model with my custom training data. ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the ...
0
votes
0answers
11 views

How to get dummy columns from a scipy sparse csr_matrix

I have an NLP problem where I'm attempting to feature engineer dummy columns using vectorized text. I have a scipy.sparse.csr.csr.matrix called 'descriptions_test' descriptions_test.shape (54504, ...
1
vote
1answer
26 views

What are the purposes of each step in train-evaluate-predict in tensorflow?

What do each of the stages do? I understand that for neural nets in nlp, the train will find the best parameters for the word embedding. But what is the purpose of the evaluation step? What is it ...
-4
votes
0answers
20 views

What are the Research Career Options for AI intermediate? [on hold]

I'm trying to explore what fields can I pursue for PHD if I'm an acceptably knowledgeable in AI and ML? I'm doing my thesis in Computer Vision, I do have expertise in Natural Language Processing. ...
1
vote
1answer
24 views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
0
votes
1answer
24 views

Extract text from .txt file and save into .csv files with columns and header

I have approximately 100 text files with clinical notes that consist of 1-2 paragraphs. Each file is named doc_1.txt to doc_179.txt accordingly. I would like to save the text from each file into a ....
0
votes
0answers
15 views

RuntimeError: input.size(-1) must be equal to input_size

from torch.utils.data import Dataset, DataLoader if __name__ == '__main__': x_y_data_set = LyricsDataSet(lyrics) data_loader = DataLoader(x_y_data_set, batch_size=1, shuffle=...
0
votes
1answer
23 views

What's the best approach to extract data from text? [on hold]

So I have this situation where people send me emails and I need to identify specific data like kind of material, quantity, description, pressure, measurements, and other technical specifications. The ...
0
votes
0answers
12 views

Extracting statements containing a specific phrase from a Text object file using NLTK

I have been working on creating a list that contains sentences that have a pattern in them using NLTK. My issue is that I'm not able to create a list of the output I have been using findall() which ...
-1
votes
0answers
24 views

Optimal stop words removal method supporting R 3.3.0 [on hold]

I tried to remove stopwords from R versio 3.3.0 or below I tried using tm but it not able install slam. spacyr etc.. has python dependancy? I tried using custom function its to slow.
0
votes
0answers
10 views

How to categorize positive and negative features from top features

I have trained user reviews thru average tfidf wor2vec model and got top features. Would like to tag top features as positive & negative. Could you please suggest. def top_tfidf_feats(row, ...
0
votes
1answer
26 views

Topic modelling using LDA

While defining corpus and dictionary for building the LDA model by defining topics how can we different topics keywords It is working while giving an explicitly topic number, but I want that to be ...
-1
votes
0answers
16 views

How to extract Text content that are aligned in different format in a scanned document using python?

I am working on text extraction from scanned document,where i want to extract the contents in section.For ex: Considering a resume having different sections like 'objective','education' etc.. Here i ...