Newest 'pandas' Questions - Stack Overflow

Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data-science libraries in Python.

0
votes
0answers
6 views

Using pandas how to concatinate one column values into same row over a matching column from a csv file? [duplicate]

Using python pandas I want to combine second column values into same row for matching first column. input.csv --> IP, REMOTE_HOST ip1, host1 ip1, host2 ip3, host3 ip4, host1 ip4, host2 ip5, ...
1
vote
1answer
12 views

I'm getting key error: 'District' for the command df.groupby([“District”]).sum()[['TSP','TSPExp']].sort_values(['TSPExp'], ascending=[False]).head(3)

To find the effectiveness of fund utilization of TSPFund and TSPExp. df['percent']=df['TotExp']/df['Tot']*100 df['percent'].head(14) df.groupby(['District']).sum()[['percent']].sort_values(['percent']...
0
votes
0answers
10 views

Pasting pandas df as values into a styled excel sheet (maintaining the formatting - ecstatically as well)

I have a work process that I am automating. I believe this would be a very familiar scenario to many others. The task is somewhat like this: Download zip files from mail (csv files) extract values ...
1
vote
1answer
10 views

Passing a .loc[date] slice into altair chart has odd results depending on slice date

Was struggling with plotting a few layers of a chart before I realized the layer specification wasn't the problem, but that somehow the slice I pass the chart is acting (to me) oddly. If it's not ...
-5
votes
0answers
26 views

How to convert YYYYMMDD date format in data file to decimal year YYYY.YYYY

I have time series data in a file. The timestamps columns are in the form of "YYYYMMDD HHMMSS" (year month day hour min sec). I want to convert it to decimal year format i.e. YYYY.YYYY which is the ...
0
votes
1answer
35 views

How can I extract a pattern from a text when it involves a new line?

Say I have following text in a cell of a dataset (csv file): I want to extract the words/phrase that appears after the keywords Decision and reason. I can do it like so: import pandas as pd text = '...
-1
votes
2answers
24 views

Calling a column name based on a different columns values? [duplicate]

I want to very simply call a column in a df based on a different columns value. Below is what I would use, but how can I add another method on top of this that says give me all the values in a column ...
0
votes
0answers
20 views

I am trying to normalize a pandas column--why does it keep crashing my computer?

Essentially, I am trying to use json_normalize to normalize a column in my pandas dataframe. The normalized portion only includes 5 additional columns which doesn't seem ludicrous to me--though the ...
1
vote
1answer
13 views

Pandas error on unix datetime conversation — OutOfBoundsDatetime: cannot convert input with unit 's'

I am getting this error File "pandas/_libs/tslib.pyx", line 356, in pandas._libs.tslib.array_with_unit_to_datetime pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input with ...
-1
votes
1answer
25 views

How to group columns changes when merge 5 months data with same structure?

I have 5 tables, all of them are in same format with same rows and same columns, but just data from 5 different months. I want to see for each customer, whether one of his/her status has been changed. ...
0
votes
1answer
26 views

What does matrix[x] for different x indicate?

While using the MNIST datasetfrom kaggle,i have noticed that all the tutorials use mnist[x] for different values of x to retrieve different pictures. import pandas as pd import numpy as np import ...
3
votes
2answers
24 views

Reading values within pandas.groupby

I have a dataframe like below name item 0 Jack A 1 Sarah B 2 Ross A 3 Sean C 4 Jack C 5 Ross B What I like to do is to produce a dictionary that connects people ...
0
votes
2answers
50 views

Selecting odd numbers of rows and columns

HI I have this Matlab code which is picking the odd number of rows and columns from Residuals Matrix. I would like to convert this piece of code from to Python. Could you please let me know if my code ...
0
votes
2answers
17 views

Extract domain names from multiple email addresses in Data Frame

I am trying to extract multiple domain names from the following data frame: email 0 test1@gmail1.com; test1@gmail2.com 1 test3@gmail3.com; test4@gmail4.com 2 test5@gmail5.com I can split ...
0
votes
1answer
25 views

Pandas Concatenation not working properly

So I've been setting up a label archive on my deep learning classifier and I wanted to concatenate the labels of an already existing 2D archive into one I just made. The one that exists is '...
0
votes
0answers
15 views

pandas - groupby identifier & date and compute cumulative returns across different horizons

Given a DataFrame that looks like this: ticker date return 0 AAPL 2012-12-31 0.032615 1 AAPL 2013-01-02 0.036938 2 AAPL 2013-01-...
0
votes
1answer
14 views

Python: Add aggregated columns to DataFrame based on the Key and additional conditions

I have 2 dataframes in the following view: dogs dataframe is: DogID PuppyName1 PuppyName2 PuppyName3 PuppyName4 DogWeight Dog1 Nick NaN NaN NaN 12.7 Dog2 ...
2
votes
2answers
34 views

Speed up a loop filtering a string [duplicate]

I want to filter a column containing tweets (3+million rows) in a pandas dataframe by dropping those tweets that do not contain a keyword/s. To do this, I'm running the following loop (sorry, I'm new ...
0
votes
0answers
14 views

How to display minutes:time in a plot in pandas jupyter notebook

My dataframe already has time in the Hours.Minutes format, however when it is plotted it is treated as a number on the plot. How to have it plotted as time ? Am using matplotlib.pyplot and ...
0
votes
2answers
26 views

How can I convert to a tidy format in python?

My pandas dataframe has separate columns that are one-hot encoded and a total column at the end that sums them up (total = val1+val2). Some rows have 1s for multiple val columns: | name | val1 | ...
0
votes
1answer
32 views

Pandas version 0.22.0 - drop_duplicates() got an unexpected keyword argument 'keep'

I am trying to drop duplicates in my dataframe using drop_duplicates(subset=[''], keep=False). Apparently it is working just okay in my Jupyter Notebook but when I am trying to executing through ...
1
vote
0answers
15 views

how to do collect_set over a moving window in Pandas?

I have a table that consists of 3 columns : merchant_id week_id customer_id For each merchant and each week, I would like to get a list of distinct customers from the previous 4 weeks in Pandas. ...
0
votes
0answers
29 views

How to run a function on each row of pandas dataframe in parallel

I have a function x: def xfunc (var_y, df_row): l=do_something(df_row[1]) m=do_something(df_row[3]) #write(l,m) to var_y I would like to pass a row to xfunc for each row in a df and ...
0
votes
0answers
9 views

TensorFlow data pipeline from remote JSON API

I'm trying to collect/format my data to train a TensorFlow model, and currently all my data reside in a JSON API that I control. After training the model, I'll want to use the API I've written to send ...
0
votes
1answer
20 views

KMeans scatter plot with centroids on timeseries data frame

I'm working on a pandas timeseries dataframe which contains 2 columns: timestamp and delta. An example is the following one: >> df.head() timestamp delta 0 2016-07-30 00:05:...
1
vote
0answers
31 views

Why is my Data frame empty after assigning values through a for loop?

I am trying to create a dataframe with the attributes I need from a json data object through for loop. But when I try to print the dataframe contents its showing empty even the loop ran successfully. ...
1
vote
1answer
19 views

picking values from columns [duplicate]

I have a pandas DataFrame with values in a number of columns, make it two for simplicity, and a column of column names I want to use to pick values from the other columns: import pandas as pd import ...
-1
votes
0answers
29 views

Problem due to classmethod in class: Producing same output repeatedly

I have created my_class and applying the method of my_class on dataframe. class my_class: def __init__(self, search_result): self.search_result = search_result self.search_result_json = ...
4
votes
1answer
43 views

Is there a more elegant way to read in CSV columns and merge with record IDs?

Forgive me if this simple. I'm new to Python and self-taught. I have a folder full of CSV files. Each file represent one record and contains one column (among 5 total columns in each file with no ...
0
votes
1answer
21 views

To get data of points plotted in different area of scatter plots in python

I have created a scatter plot for a fantasy league for different players between 'players cost' and 'fantasy points'. Now I want to get the information (like Name,age,team,etc) of the data points on ...
0
votes
2answers
28 views

How to use pandas.DataFrame.apply with getattr function in Python

Suppose I'd like to remove '$' signs from my dataframe in Pandas. And I have created a class called TransformFunctions so that I can use getattr() to invoke function from that class (the reason being ...
2
votes
0answers
20 views

Getting all acceptable string arguments to DataFrameGroupby.aggregate

So, I have a piece of code that takes a groupby object and a dictionary mapping columns in the groupby to strings, indicating aggregation types. I want to validate that all the values in the ...
2
votes
2answers
31 views

Split the list by every 4th element if 5th element is “Name” or split by 5th element if 5th element is Address

I have a 8000 list of names, company, address1, address2, address3(optional) ina sequence order as shown below. This is a python list [John It Tech 1243 mary drive florida-32006 mark Infotech 1245 ...
0
votes
0answers
34 views

Why would pandas not read my csv file using pandas.read_csv?

I have had my friend try to import the file on their computer and it worked. It would not work on mine. I have the file csv file in the same folder as the script. I am not sure why it wouldn't open. ...
1
vote
0answers
8 views

Is there a function in Arcmapper or Python/Pandas that can connect the endpoints of two polylines to each other for an entire dataset?

I am currently trying to create a group of polygons based off of two layer files I have that both contain shape files of glacier terminus lines in them. These lines are separated spatially based on ...
0
votes
1answer
13 views

How to combine values in a row depending on value in a previous row in another column in pandas

I have a pandas dataframe with several columns (words, start time, stop time, speaker). I want to combine all values in the 'word' column while the values in the 'speaker' column do not change. In ...
0
votes
1answer
25 views

Python pandas: Setting index value of dataframe to another dataframe as a column using multiple column conditions

I have two dataframes: data_df and geo_dimension_df. I would like to take the index of geo_dimension_df, which I renamed to id, and make it a column on data_df called geo_id. I'll be inserting both ...
0
votes
0answers
29 views

Creating a Column for Bins in DF based on numerical column

I have a df that looks like, ID | Time_A | Time_B B 5.0 3.5 C 3.0 4.0 A 2.5 1.0 I want to create a new column that classifies Time_B column into Bins of 0-5, 5-10, 10-15, ...
1
vote
5answers
54 views

put the whole list in one dataframe column

I'm trying to create a dataframe from a dictionary: dict = {'foo': [1, 2, 3, 4], 'bar': [5, 6, 7, 8]} and I use the below command to create the dataframe: df = pd.DataFrame.from_dict(dict, ...
0
votes
0answers
17 views

rendering lists in bokeh

I have pandas data frame object that basically has three columns: 1. date object 2. list of prices in that date 3. list of quantities I want to plot a bar graph in bokeh that shows each date ...
1
vote
4answers
30 views

Pandas slicing with if condition

I am trying to create a column based on the simple logic but it does not work. I'd like to create a new column named 'Commodity' with a simple logic: if df['ID'].str[:3] = 'FWD': df['Commodity'] ...
0
votes
2answers
29 views

Is there a way to get value from adjacent column of data frame?

I have a list consisting of symbols like ['T.TO', 'VCB.TO'] I have a dataframe like this: 1 RIC Expected Return 2 T.TO 2 3 A.TO 1.1 4 VCB.TO 0.004 5 ASN.TO 3 6 00G.H 1.1 ...
-2
votes
0answers
34 views

Pandas convert int/string to string with no decimal values

I've a pandas column like below. |column| |---| |1| |2| |apple| How can I convert the column to string without making the integers to have any decimal points as below. This is the output I get when ...
-1
votes
0answers
18 views

Using pandas groubby i have an output. How do I create a new column based off the indexed column value and display underneath keep the aggs

I am using pandas groupby to form a Dataframe.I then use pd.pivot table for easier viewing. gby1 = df3.groupby(['Year', 'ENCOUNTER_TYPE']).agg({'ACCESSION': 'count', 'TDiff' : 'mean'}).rename(columns=...
0
votes
0answers
19 views

How to populate a dataframe column based on the value of another column

Suppose I have 3 dataframe variables: res_df_union is the main dataframe and df_res and df_vacant are subdataframes created from res_df_union. They all share 2 columns called uniqueid and ...
0
votes
3answers
36 views

use list to set function and name output in a loop

I'd like to use a loop to change the function applied to a DataFrame and name the output in python For example, I would like to calculate the mean, max,sum, min, etc of the same DataFrame and I'd ...
0
votes
0answers
21 views

Concat dataframe to multi index dataframe with gradient values

I have a Multi-index dataframe with multiple test result values. For further data analysis I want to add the derivation to the dataframe. I tried to either calculate it via a lambda function directly ...
0
votes
2answers
27 views

Trying to convert text List to lower case but it turns everything to NaN

I am currently trying to work with text data and I am relatively new at this. The column I'm trying to work with is the cast column, as shown below: 0 [Sam Worthington, Zoe Saldana, Sigourney ...
1
vote
1answer
18 views

Pandas Group 2-D NumPy Data by Range of Values

I have a large data set in the form of a 2D array. The 2D array represents continuous intensity data and I want to use this to create another 2D array of the same size only this time, the values are ...
0
votes
1answer
26 views

How to get mean of matrix columns for a column of matrices inside dataframe?

I have a dataframe with two columns. The first column has the class number (either 1 or 0). The second column holds matrices that are (1999,13). I am trying to figure out how to convert the matrices ...