Newest 'cuda' Questions - Stack Overflow

Questions tagged [cuda]

CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.

1
vote
0answers
25 views

Assigning threads equitably to columns in parallel processing

I'm currently working in a program that applies Gaussian Blur to an image, I assign a thread to a certain number of columns of my image in order to process it using a parallel model. The problem I'm ...
-3
votes
0answers
16 views

How to fix “cudnn status execution failed” error in specific environment?

I have a problem on using cudnn function. I'm using cuda function in cudnn library. In many other environment cases, it doesn't have any problems. But, in specific environment, it makes error when ...
0
votes
0answers
8 views

Using .cu source codes in Qt-creator

I'm making a simple code to generate random numbers in CUDA end C++ to use them for some calculus afterwards, but I could not figure out how to set the program to be compiled by multiple compilers. ...
-3
votes
0answers
40 views

What's the matter with this CUDA division?

In the code I wrote I have to deal with a trivial division in the kernel. After debugging a lot I found out that the code works (divide with success) only when the denominator is a power of 2. I ...
0
votes
0answers
53 views

How to use multi-GPU cuFFT library in Fortran?

I could not find any multi-GPU features from the intrinsic cufft module, so I wrote a C-Fortran interface. I am using a PGI compiler v19.4 along with MVAPICH2. I used the following commands to compile ...
0
votes
0answers
38 views

Why don't calls of malloc show up in nvprof's statistical profiler?

Is there a way to get CUDA's nvprof to include function calls like malloc in its statistical profiler? I've been trying to improve the performance of my application. Naturally, I've been using ...
-1
votes
1answer
43 views

Wrong results using CUDA streams and memCpyAsync, become correct adding cudaDeviceSynchronize

I'm developing a CUDA matrix multiplication, but I did some modifications to observe how they affect performances. I'm trying to observe the behavior (and I'm measuring the changes in GPU events ...
-3
votes
0answers
25 views

CUDA solution of sparse Ax=b for various b's.. any source code or repository for this problem?

trying to implement batch QR from nvidia cusolver documentations. any one got a source code or respiratory link. my A sparse matrix is constant.
-1
votes
0answers
21 views

How to print the value of a pointers assigned by cudaMemcpy [duplicate]

When trying to print the value of the arrary with value assigned by cudaMemcpy, I got a core-dump. The following is the code. I run this code in Unbuntu with CUDA10 #include <iostream> using ...
0
votes
1answer
24 views

Where is the CUDA toolkit located on Ubuntu?

I installed Nvidia's 375 driver and CUDA 8.0 on Ubuntu 16.04 from Nvidia's .deb package. I want to build TensorFlow with GPU support. This is the output of TensorFlow's configure script: ./configure ...
1
vote
2answers
61 views

code is running, but the gpu function won't be executed

I got two functions: The add_cpu function works fine, but the add_gpu function does not. I tried to check sum options on my GPU driver Software and read my code over and over again. I tried the exact ...
-2
votes
0answers
40 views

CUDA float value problem , The value is strange when operated with float

i used to CUDA version 10.0. i calculate float value. but The computed result is not a valid value. my code is __global__ void ict_kernel(int *imgData_0, int *imgData_1, int *imgData_2, int range, ...
0
votes
0answers
14 views

Is Folly library compatible with CUDA?

When I try to use Facebook Open-source Library (Folly) with CUDA I get the following error: error: allowing all exceptions is incompatible with previous function "malloc" A simplified version of ...
-1
votes
0answers
16 views

Removing sequential duplicates from an unsorted thrust::vector

Background Summary I have a vector of daily stock prices. I want to remove any days where the price hasn't changed. Example before [100, 100, 100, 95, 97, 100, 80, 80] after [100, 95, 97, 100, 80] ...
-2
votes
1answer
48 views

Undefined symbol when trying to link with shared library built from CUDA objects

I'm experimenting with building a simple application from a couple of .cu source files and a very simple C++ main that calls a function from one of the .cu files. I'm making a shared library (.so file)...
0
votes
1answer
43 views

CUDA compile problems on Windows, Cmake error: No CUDA toolset found

so I've been successfully working on my CUDA program on my Linux but I would like to support Windows platform as well. However, I've been struggling with correctly compiling it. I use : Windows 10 ...
0
votes
0answers
35 views

Earliest CUDA version with certain libraries

What was the earliest version of CUDA to have (integrated or separately) the following libraries? nVIDIA Tools Extension (a.k.a. nvtx, nvToolsExt)? nVIDIA OpenCL support (a.k.a. OpenCL)?
-2
votes
1answer
42 views

How to interpret these results for mean filter for both GPU and CPU serial versions?

I implemented image Mean Filter code for the CPU serial version and NVIDIA GPU parallel version. I got the running times (Please See the results of test cases and specs of the devices. Why case 2 has ...
1
vote
0answers
29 views

Processing and creation of array in numba.cuda device function

I want to pass to device function a slice of an array and then create there some new array and return their combination. But it seems not the general way to solve such problems with cuda because numba ...
-2
votes
0answers
15 views

Issue with calling PyCuda function (LogicError: cuFuncSetBlockShape failed: invalid resource handle)

Firstly, I'll say in advance that I've gone through all the threads on here as well as the PyCuda forums regarding the given error message, and have tried all the given solutions, and yet I continue ...
-2
votes
0answers
21 views

Can MacBook run the CUDA library..? [duplicate]

I have been using a Mac pro 2011 model, on which running a deep learning models is such a impossible task due to its limited resources. So my question is that recently I started learning deep ...
1
vote
0answers
87 views

Global memory access coalescing in CUDA - Maxwell architecture

I have code for matrices multiplication running on my Geforce 940m (Maxwell architecture) with CUDA compute capability 5.0. I have used NVIDIA Visual Profiler to measure number of global load ...
-3
votes
0answers
29 views

Where is the bug

I'm beginning with GPU programming and I'm trying to implement a simple matrix multiplication but it fails, the program returns a matrix of 0 instaed of 6. Can someone points me where it fails. ...
0
votes
1answer
28 views

Finding the nVIDIA Toolkit Extensions library with CMake

I'm using a recent version of CMake, with inherent support for CUDA as a language, to build a project. This project requires the nVIDIA Toolkit Extensions library. On a previous system, I had it under ...
0
votes
2answers
41 views

Recipe to copy 1D strided data with cudaMemcpy2D

If one has two continuous ranges of device memory it is possible to copy memory from from one to the other using cudaMemcpy. double* source = ... double* dest = ... cudaMemcpy(dest, source, ...
0
votes
0answers
43 views

how do i avoid a race condition with this atomic operation? [duplicate]

Take the following code fragment example: __global__ void my_kernel(float *d_min, uint32_t *d_argmin, float *d_input, uint32_t N) { uint32_t ii = blockDim.x * blockIdx.x + threadIdx.x;...
-2
votes
0answers
37 views

Persistent LNK1318 FORMAT(11) error after program crashed

I am writing a program in visual studio. The program uses c++ and CUDA. Things were working fine until I changed a piece of code that started causing the program to crash after a bit. I believe this ...
-1
votes
0answers
23 views

cudaMemGetInfo from multiple processes behaves inconsistently on Windows 10

When running two (or more) programs utilizing CUDA (v10.1) at the same time, I am observing significant discrepancies in the behavior of cudaMemGetInfo. I have two GTX 2080 graphics cards (each with ...
1
vote
0answers
28 views

How to set up MSVC++ 14.0 build tools for python copperhead? [duplicate]

I'd like to use python copperhead for CUDA C++ prototyping, which requires MSVC++14, and I want to make copperhead work first without CUDA. I've installed Microsoft Build Tools 2015 and tried to ...
1
vote
2answers
71 views

CUDA per-thread arrays with different types

Each instance of my CUDA kernel (i.e. each thread) needs three private arrays, with different types. e.g. __global__ void mykernel() { type1 a[aLen]; type2 b[bLen]; type3 c[cLen]; .....
0
votes
1answer
45 views

Is it possible to guarantee each different kernel stream cannot be interleaved?

If the kernel is launched with different streams, can we guarantee that each stream does not interleave? It seems that different kernel streams are interleaved together. What I want to is that ...
-1
votes
0answers
69 views

How can I improve the performance of this large CUDA kernel with a double nested loop?

I have a CUDA kernel for calculating symmetric matrices that happen to be very large (on the order of 16 million entries). Each entry in the matrix is independent, so the kernel uses each thread to ...
1
vote
1answer
43 views

nvcc fatal : '--ptxas-options=-v': expected a number

Getting the nvcc fatal : '--ptxas-options=-v': expected a number error when I try to build a Windows port of Faster-RCNN. You may reach the setup file (which is a Python script) directly from here. ...
-1
votes
1answer
51 views

Passing GpuMat directly to cufftExecC2C function for doing fast fourier transform

I am trying to optimize my code using opencv with cuda and cufft library. Everytime I have do fast fourier transform, I have to download cv::Mat from GpuMat and then do cufft. (Please see the code ...
-2
votes
0answers
36 views

Tensorflow, CUDA, VS versions

According to some online research, it seems like no version of Tensorflow is compatible with CUDA 10.1 yet. Is this true? If that's the case, and I must use CUDA 10.0, can I do so with Visual Studio ...
3
votes
1answer
45 views

Understanding indexing and how many thread there are in a block

I'm studying cuda programming and I've found that there are more than one way to indexing a grid. What I don't understan is how those indexing tecnique are different between each other. Those are my ...
0
votes
0answers
43 views

Which constructor is called for the defined class? [duplicate]

I am working on a matrix class which has all the computation happening on gpu using CUDA libraries. I have given a stripped down version of the class to show the problem I am facing. The problem is ...
0
votes
0answers
59 views

Losing data after successive CUDA kernel launches

I am trying to create an array of double elements where each element is a sum of elements. However, after a new value is added, this value is lost and the vector vet is filled again with zeros as if ...
1
vote
1answer
59 views

Yocto for Nvidia Jetson fails because of GCC 7 - cannot compute suffix of object files

I am trying to use Yocto with meta-tegra ( https://github.com/madisongh/meta-tegra ) to build a minimal system for the Nvidia Jetson Nano. I need to use CUDA ( current version 10 for Nano ) with ...
-2
votes
0answers
20 views

What is the meaning and use of ROWS_RESULT_STEPS, ROWS_HALO_STEPS, COLS_RESULT_STEPS, COLS_HALO_STEPS?

I'm studying the CUDA sparable convolution sample and don't know why ROWS_RESULT_STEPS, ROWS_HALO_STEPS, COLS_RESULT_STEPS, and COLS_HALO_STEPS are used in the program? Is the kernel is working for ...
0
votes
0answers
31 views

How to switch CUDA version after installing two different version of CUDA?

I already had CUDA V9.0 but now i installed CUDA V10.0 with cudnn 7.3.I have upgraded my tensorflow-gpu version to 1.13.1.But when i imported tensorflow i got the following error.When i searched the ...
0
votes
0answers
79 views

How to approach implementing a GPU device-side sprintf?

I'm considering implementing sprintf() (and snprintf(), vsprintf(), vsnprintf()) - for use in CUDA code. Compilers' standard C (and C++) library is not available to GPU-side CUDA code - and can't be ...
2
votes
1answer
43 views

How to use only one GPU for tensorflow session?

I have two GPUs. My program uses TensorRT and Tensorflow. When I run only TensorRT part, it is fine. When I run together with Tensorflow part, I have error as [TensorRT] ERROR: engine.cpp (370) - ...
1
vote
0answers
34 views

Nvcc missing when installing cudatoolkit?

I have installed cuda along pytorch with conda install pytorch torchvision cudatoolkit=10.0 -c pytorch However, it seems like nvcc was not installed along with it. If I want to use for example nvcc -...
-1
votes
0answers
69 views

A CUDA kernel for matrix multiplication of sparse gpuarray: sum only when the product exceeds some threshold

I would like to know if it is possible to user-define Matlab's built-in matrix multiplication mtimes() (or an good equivalence in an open-source linear algebra library in C/C++). The goal is: adding ...
0
votes
0answers
39 views

standard way to call compiled CUDA code from python

I am trying to figure out if there is a standard way to import compiled CUDA code within Python. I've done a bit of searching and it looks like you can import compiled C++ code with cython and ...
0
votes
0answers
68 views

Fastest way to read an image ROI with Thrust

I'm trying to calculate the mean of the ROI of an image using Thrust, but it is too slow (it's way faster on the CPU): { struct MeanTransform { thrust::device_vector<float4>::...
1
vote
0answers
45 views

How to recover from CUDA errors when using cudaLaunchHostFunc instead of cudaStreamAddCallback

The documentation page for cudaStreamAddCallback says that it is "slated for eventual deprecation and removal" and to use cudaLaunchHostFunc instead. However, documentation for cudaLaunchHostFunc says ...
-1
votes
0answers
47 views

Returning an output of variable size from the device to the host

I have an kernel operation that creates an output of unknown size and needs to "send" that result back to the cpu. I'm reluctant to pre-allocate a big enough space from cpu because estimation of size ...
0
votes
0answers
30 views

call cublas library from device code on cuda 9.0 is slower than that on cuda 8.0

I called the cublas library from the device code. The device is GTX 1060, and when using vs2015 and cuda 8.0, it worked fine and fast. However, when I used vs2015 and cuda9.0 on the same computer, it ...