如何根据根据条件重置的累积总和进行分组

How to group based on cumulative sum that resets on a condition(如何根据根据条件重置的累积总和进行分组)
本文介绍了如何根据根据条件重置的累积总和进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 pandas df,其字数与文章相对应.我希望能够添加另一列 MERGED,该列基于具有最小累积总和min_words"的文章组.

I have a pandas df with word counts corresponding to articles. I want to be able to be able to add another column MERGED that is based on groups of articles that have a minimum cumulative sum of 'min_words'.

df = pd.DataFrame([[  0,  6],
       [  1,  10],
       [  3,   5],
       [  4,   7],
       [  5,  26],
       [  6,   7],
       [  9,   4],
       [ 10, 133],
       [ 11,  42],
       [ 12,   1]], columns=['ARTICLE', 'WORD_COUNT'])

df
Out[15]: 
   ARTICLE  WORD_COUNT
0        0           6
1        1          10
2        3           5
3        4           7
4        5          26
5        6           7
6        9           4
7       10         133
8       11          42
9       12           1

那么如果 min_words = 20 这是所需的输出:

So then if min_words = 20 this is the desired output:

    df
Out[17]: 
   ARTICLE  WORD_COUNT  MERGED
0        0           6       0
1        1          10       0
2        3           5       0
3        4           7       1
4        5          26       1
5        6           7       2
6        9           4       2
7       10         133       2
8       11          42       3
9       12           1       4

如上所示,最终文章可能不满足 min_words 条件,这没关系.

As seen above, it is possible that the final article(s) won't satisfy the min_words condition, and that's ok.

推荐答案

只能做self def功能

We can only do self def function

def dymcumsum(v, limit):
     idx = []
     sums = 0
     for i in range(len(v)):
         sums += v[i]
         if sums >= limit:
             idx.append(i)
             sums = 0
     return(idx)
df['New']=np.nan
df.loc[dymcumsum(df.WORD_COUNT,20),'New']=1
df.New=df.New.iloc[::-1].eq(1).cumsum()[::-1].factorize()[0]+1
 
df
   ARTICLE  WORD_COUNT  New
0        0           6    1
1        1          10    1
2        3           5    1
3        4           7    2
4        5          26    2
5        6           7    3
6        9           4    3
7       10         133    3
8       11          42    4
9       12           1    5

这篇关于如何根据根据条件重置的累积总和进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)